A Simple Introduction to Vega-Lite
Build an interactive plot with ease using Vega-lite
Created on August 9|Last edited on February 2
Comment
By building new ways to view and interact with your data, you can develop intuitions and gain insights that are almost impossible to learn without visualisations. With all the great visualisation tools available, it's easier than ever to create interactive plots and share your insights with the world.
One such tool is Vega-Lite. It allows you to declaratively create visualisations of your data, meaning that you tell it what you want and it'll sort out the how. This is in contrast to other imperative visualisation libraries in which you have to worry about how to draw each pixel.
In this tutorial, we'll make an interactive plot of Big Mac prices over time, grouped by country. We'll go through what a "grammar of graphics" is, how to connect data fields to visual properties with Vega-Lite, and finally how to add some interactivity to our plot.
We will use the Bigmac dataset from calmcode. You can see the data and an interactive plot below. It's zoomable and has a tooltip that displays info about each data point when you over it.
bigmac3
Run set
1
Vega-Lite is a visualisation grammar, an implementation of a "grammar of graphics".
A grammar of graphics is a tool that enables us to concisely describe the components of a graphic.
A grammar of graphics aims to define a minimal common set of object interfaces which can create as many types of plots as possible
The most famous, widely used implementation of a grammar of graphics is ggplot2. Vega-Lite is heavily inspired by ggplot2 so if you're interested in reading about the origins of its layered design, ggplot2's author, Hadley Wickham, describes it in "A layered grammar of graphics".
For a comprehensive look at the origins of the development of a grammar of graphics in general, "The Grammar of Graphics" by Leland Wilkinson goes into great detail about each component used in a graphic grammar.
What are the different parts of a grammar, and how does Vega-Lite represent them?
Data
Vega expects a column for each variable and a row for each observation. In our data, our columns fields are date, currency_code, name, local_price , dollar_ex, dollar_price and each of our rows are a sample of the price of a Big Mac at a given location and date.
Marks
Marks are the geometric shapes (points, lines, bars etc.) you want to use to visually represent the data.
Encoding Channels
This is where Vega gets interesting! I think because of the somewhat dry name, it seems less interesting than it actually is. It took me a bit of mental energy to get this to click, but I think it's a really powerful way to think about visualisation in general.
When first thinking of a visualisation, we can describe it to ourself like:
I want my x-axis to be the foo and my y-axis to be bar ... oh, and I want the color to be baz!
where foo bar and baz are fields in your data.
Every mark has attributes like position, size, colour etc. which present the underlying values in a visual way. In other words, these attributes are channels in which we can encode the data. This mapping from data to visual properties is set using encoding channels.
These bind fields of our data to visual ways to represent them.
Fields
A channel definition object must have a field definition. This tells Vega-Lite which column in your data you want to represent. Each field must have a field name and a data type.
Types
For Vega-Lite to know how to display and compare your data, you need to tell it what type of data you have. It supports Nominal, Ordinal, Quantitative, Temporal and GeoJSON. More information on these types can be found here.
Example
Here, we'll bind the 'x-position' encoding channel to the date column, with type set to temporal, which means that the x-axis will display a timeline of the data. We'll bind the 'y-position' encoding channel to the dollar_price column with type set to quantitative.
{"$schema": "https://vega.github.io/schema/vega-lite/v4.json","data": {"url": "data/bigmac.csv"},"title": "All Points","mark": "point","encoding":{"x": {"field": "date","type": "temporal"},"y": {"field": "dollar_price","type": "quantitative"}}}
We get this plot:
Run set
1
What if we wanted to colour each point based on the country the dollar_price was recorded?
To do this, let's bind the color encoding channel to the country_name field, which is of type nominal.
Let's also update the mark type to be line rather than point and add a tooltip when we mouse over the lines.
{"$schema": "https://vega.github.io/schema/vega-lite/v5.json","data": {"url": "data/bigmac.csv"},"title": "Lines with Colour and Tooltip","mark": {"type": "line", "size": 1, "tooltip": true},"encoding":{"x": {"field": "date","type": "temporal"},"y": {"field": "dollar_price","type": "quantitative"},"color": {"field": "country_name","type": "nominal"}}}
Here is the result:
Run set
1
That's starting to look like what we want but it's still not quite right. It's a little difficult to see the tooltip when we hover because the points are so small. It might be nice to have little points like before and have lines connecting them. To do this, we'll need some way to compose two plots on top of each other, one for the points (and the tooltip) and another for the lines.
Layer
Another very nice thing about Vega-Lite is that it has many ways to compose different views together.
For this plot, we just need to layer one view on top of another, which we can do using the layer operator and passing in multiple views as an array.
Below, we separate our x , color and y encodings because our lines and points share these same encodings. We then use the layer operator to have two separate view descriptions; one using line mark and the other using point mark. Finally, we add the tooltip to the layer using point marks so we can add a little bit of interactivity.
{"$schema":"https://vega.github.io/schema/vega-lite/v5.json","data":{"url":"data/bigmac.csv"},"title":"All Points with Colour and Tooltip","encoding":{"x": {"field":"date", "type":"temporal"},"color": {"field":"country_name", "type":"nominal"},"y": {"field":"dollar_price", "type":"quantitative"}},"layer":[{"mark":"line"},{"mark":"point","encoding":{"tooltip":[{"field":"country_name", "type":"nominal"},{"field":"dollar_price", "type":"quantitative"}]}}]}
That's getting very very close to what we want but it's still a little difficult to see when all the points are close together. It would be nice to be able to zoom into the y-axis to get more detail. Adding this additional interactivity is left as an exercise for the reader (or maybe a follow up post with more fancy features).
Here's the final zoomable plot with a tooltip, points and lines:
Run set
1
Conclusion
Vega-Lite is great for creating plots of your data.
By just declaring:
- the types of your data
- the graphical marks you want to use to display them
- the data fields you want to map to visual encodings
you get a lovely plot which you can share and quite easily add interactivity to.
If you'd like to learn more about Vega-Lite, its docs are great. Also, check out the UW Interactive Data Lab's Visualization Curriculum which uses the Vega-Lite JavaScript API. There's lots of other features we didn't touch in this tutorial like transformation, parameter binding and the other view compositions methods which are worth exploring too.
Thanks for reading! 🚀
Add a comment