Skip to main content

A Simple Introduction to Vega-Lite

Build an interactive plot with ease using Vega-lite
Created on August 9|Last edited on February 2
By building new ways to view and interact with your data, you can develop intuitions and gain insights that are almost impossible to learn without visualisations. With all the great visualisation tools available, it's easier than ever to create interactive plots and share your insights with the world.
One such tool is Vega-Lite. It allows you to declaratively create visualisations of your data, meaning that you tell it what you want and it'll sort out the how. This is in contrast to other imperative visualisation libraries in which you have to worry about how to draw each pixel.
In this tutorial, we'll make an interactive plot of Big Mac prices over time, grouped by country. We'll go through what a "grammar of graphics" is, how to connect data fields to visual properties with Vega-Lite, and finally how to add some interactivity to our plot.
We will use the Bigmac dataset from calmcode. You can see the data and an interactive plot below. It's zoomable and has a tooltip that displays info about each data point when you over it.

bigmac3
date
currency_code
country_name
local_price
dollar_ex
dollar_price
2000-04-01
ARS
Argentina
2.5
1
2.5
2000-04-01
AUD
Australia
2.59
1.68
1.5416666666666698
2000-04-01
BRL
Brazil
2.95
1.79
1.64804469273743
2000-04-01
CAD
Canada
2.85
1.47
1.93877551020408
2000-04-01
CHF
Switzerland
5.9
1.7
3.4705882352941204
2000-04-01
CLP
Chile
1260
514
2.45136186770428
2000-04-01
CNY
China
9.9
8.28
1.19565217391304
2000-04-01
CZK
Czech Republic
54.37
39.1
1.39053708439898
2000-04-01
DKK
Denmark
24.75
8.04
3.07835820895522
2000-04-01
EUR
Euro area
2.56
1.075268817
2.38080000045235
2000-04-01
GBP
Britain
1.9
0.632911392
3.00200000192128
2000-04-01
HKD
Hong Kong
10.2
7.79
1.30937098844673
2000-04-01
HUF
Hungary
339
279
1.21505376344086
2000-04-01
IDR
Indonesia
14500
7945
1.82504719949654
2000-04-01
ILS
Israel
14.5
4.05
3.58024691358025
2001-04-01
ARS
Argentina
2.5
1
2.5
2001-04-01
AUD
Australia
3
1.98
1.51515151515152
2001-04-01
BRL
Brazil
3.6
2.19
1.6438356164383603
2001-04-01
CAD
Canada
3.33
1.56
2.13461538461538
2001-04-01
CHF
Switzerland
6.3
1.73
3.64161849710983
2001-04-01
CLP
Chile
1260
601
2.09650582362729
2001-04-01
CNY
China
9.9
8.28
1.19565217391304
2001-04-01
CZK
Czech Republic
56
39
1.43589743589744
2001-04-01
DKK
Denmark
24.75
8.46
2.92553191489362
2001-04-01
EUR
Euro area
2.57
1.136363636
2.26160000072371
2001-04-01
GBP
Britain
1.99
0.699300699
2.84570000122365
2001-04-01
HKD
Hong Kong
10.7
7.8
1.3717948717948705
2001-04-01
HUF
Hungary
399
303
1.31683168316832
2001-04-01
IDR
Indonesia
14700
10855
1.35421464762782
2002-04-01
ARS
Argentina
2.5
3.13
0.7987220447284351
2002-04-01
AUD
Australia
3
1.86
1.61290322580645
2002-04-01
BRL
Brazil
3.6
2.34
1.53846153846154
2002-04-01
CAD
Canada
3.33
1.57
2.12101910828025
2002-04-01
CHF
Switzerland
6.3
1.66
3.79518072289157
2002-04-01
CLP
Chile
1400
655
2.13740458015267
2002-04-01
CNY
China
10.5
8.28
1.26811594202899
2002-04-01
CZK
Czech Republic
56.28
34
1.65529411764706
2002-04-01
DKK
Denmark
24.75
8.38
2.9534606205250604
2002-04-01
EUR
Euro area
2.67
1.123595506
2.37629999919206
2002-04-01
GBP
Britain
1.99
0.689655172
2.8855000017313
2002-04-01
HKD
Hong Kong
11.2
8
1.4
2002-04-01
HUF
Hungary
459
272
1.6875
2002-04-01
IDR
Indonesia
16000
9430
1.69671261930011
2002-04-01
ILS
Israel
12
4.79
2.50521920668058
2003-04-01
ARS
Argentina
4.1
2.88
1.42361111111111
2003-04-01
AUD
Australia
3
1.61
1.86335403726708
2003-04-01
BRL
Brazil
4.55
3.07
1.48208469055375
2003-04-01
CAD
Canada
3.2
1.45
2.20689655172414
2003-04-01
CHF
Switzerland
6.3
1.37
4.5985401459854005
2003-04-01
CLP
Chile
1400
716
1.9553072625698305
Loading...
Run set
1

Vega-Lite is a visualisation grammar, an implementation of a "grammar of graphics".
From Hadley Wickham's "A layered grammar of graphics":
A grammar of graphics is a tool that enables us to concisely describe the components of a graphic.
A grammar of graphics aims to define a minimal common set of object interfaces which can create as many types of plots as possible
The most famous, widely used implementation of a grammar of graphics is ggplot2. Vega-Lite is heavily inspired by ggplot2 so if you're interested in reading about the origins of its layered design, ggplot2's author, Hadley Wickham, describes it in "A layered grammar of graphics".
For a comprehensive look at the origins of the development of a grammar of graphics in general, "The Grammar of Graphics" by Leland Wilkinson goes into great detail about each component used in a graphic grammar.
What are the different parts of a grammar, and how does Vega-Lite represent them?

Data

Vega expects a column for each variable and a row for each observation. In our data, our columns fields are date, currency_code, name, local_price , dollar_ex, dollar_price and each of our rows are a sample of the price of a Big Mac at a given location and date.

Marks

Marks are the geometric shapes (points, lines, bars etc.) you want to use to visually represent the data.

Encoding Channels

This is where Vega gets interesting! I think because of the somewhat dry name, it seems less interesting than it actually is. It took me a bit of mental energy to get this to click, but I think it's a really powerful way to think about visualisation in general.
When first thinking of a visualisation, we can describe it to ourself like:
I want my x-axis to be the foo and my y-axis to be bar ... oh, and I want the color to be baz!
where foo bar and baz are fields in your data.
Every mark has attributes like position, size, colour etc. which present the underlying values in a visual way. In other words, these attributes are channels in which we can encode the data. This mapping from data to visual properties is set using encoding channels.
These bind fields of our data to visual ways to represent them.

Fields

A channel definition object must have a field definition. This tells Vega-Lite which column in your data you want to represent. Each field must have a field name and a data type.

Types

For Vega-Lite to know how to display and compare your data, you need to tell it what type of data you have. It supports Nominal, Ordinal, Quantitative, Temporal and GeoJSON. More information on these types can be found here.

Example

Let's start nice and simple by plotting all the data with points, using point marks.
Here, we'll bind the 'x-position' encoding channel to the date column, with type set to temporal, which means that the x-axis will display a timeline of the data. We'll bind the 'y-position' encoding channel to the dollar_price column with type set to quantitative.
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {"url": "data/bigmac.csv"},
"title": "All Points",
"mark": "point",
"encoding":{
"x": {
"field": "date",
"type": "temporal"
},
"y": {
"field": "dollar_price",
"type": "quantitative"
}
}
}
We get this plot:

Run set
1

What if we wanted to colour each point based on the country the dollar_price was recorded?
To do this, let's bind the color encoding channel to the country_name field, which is of type nominal.
Let's also update the mark type to be line rather than point and add a tooltip when we mouse over the lines.
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {"url": "data/bigmac.csv"},
"title": "Lines with Colour and Tooltip",
"mark": {"type": "line", "size": 1, "tooltip": true},
"encoding":{
"x": {
"field": "date",
"type": "temporal"
},
"y": {
"field": "dollar_price",
"type": "quantitative"
},
"color": {
"field": "country_name",
"type": "nominal"
}
}
}
Here is the result:

Run set
1

That's starting to look like what we want but it's still not quite right. It's a little difficult to see the tooltip when we hover because the points are so small. It might be nice to have little points like before and have lines connecting them. To do this, we'll need some way to compose two plots on top of each other, one for the points (and the tooltip) and another for the lines.

Layer

Another very nice thing about Vega-Lite is that it has many ways to compose different views together.
For this plot, we just need to layer one view on top of another, which we can do using the layer operator and passing in multiple views as an array.
Below, we separate our x , color and y encodings because our lines and points share these same encodings. We then use the layer operator to have two separate view descriptions; one using line mark and the other using point mark. Finally, we add the tooltip to the layer using point marks so we can add a little bit of interactivity.
{
"$schema":"https://vega.github.io/schema/vega-lite/v5.json",
"data":{
"url":"data/bigmac.csv"
},
"title":"All Points with Colour and Tooltip",
"encoding":{
"x": {"field":"date", "type":"temporal"},
"color": {"field":"country_name", "type":"nominal"},
"y": {"field":"dollar_price", "type":"quantitative"}
},
"layer":[
{"mark":"line"},
{
"mark":"point",
"encoding":{
"tooltip":[
{"field":"country_name", "type":"nominal"},
{"field":"dollar_price", "type":"quantitative"}
]
}
}
]
}
That's getting very very close to what we want but it's still a little difficult to see when all the points are close together. It would be nice to be able to zoom into the y-axis to get more detail. Adding this additional interactivity is left as an exercise for the reader (or maybe a follow up post with more fancy features).
Here's the final zoomable plot with a tooltip, points and lines:

Run set
1


Conclusion

Vega-Lite is great for creating plots of your data.
By just declaring:
  • the types of your data
  • the graphical marks you want to use to display them
  • the data fields you want to map to visual encodings
you get a lovely plot which you can share and quite easily add interactivity to.
If you'd like to learn more about Vega-Lite, its docs are great. Also, check out the UW Interactive Data Lab's Visualization Curriculum which uses the Vega-Lite JavaScript API. There's lots of other features we didn't touch in this tutorial like transformation, parameter binding and the other view compositions methods which are worth exploring too.
Thanks for reading! 🚀