I'm trying to build a visualization for histograms of numerical data using Vega Lite. Right now I am prototyping the visualization using a very simple mock dataset (Also available here):
{
"data": {
"fill": [
{"count": 30000, "level": "filled"},
{"count": 50000, "level": "missing"}
],
"histogram": [
{"bin_end": 20, "bin_start": 0, "count": 1000},
{"bin_end": 30, "bin_start": 20, "count": 20000}
]
},
"metadata": {}
}
The data format above is predetermined and unfortunately I am not able to change it as it comes from an API. I'm trying to plot the histogram section of the data to plot, well, an histogram, and the fill section of the data to plot a simple bar chart. Something like this:
I understand that I can use the "property" option to access nested data like this, as document in this section of Vega documentation, and this works as long as I am only plotting one of the charts, as shown by the examples below:
Example 1 in Vega Editor: Histogram only
Example 2 in Vega Editor: Barplot only
However, when I try to put both of them together it simply does not work. I get the weird chart below, where it seems that the data for the barplot is completely absent.
Link to vega editor for weird chart
And when inspecting the data using Vega Editor built in Data Viewer it seems that only the histogram data is being read.
Furthermore, this behavior seems to be order dependent, as switching the order of the charts in the HConcat block changes which chart gets messed up:
Inverted Chart
Am I missing something here? Is this some sort of limitation of Vegalite?
You're missing the name property so it looks like the data was simply overwritten by whatever was retrieved last. Here you go.
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.2.0.json",
"config": {"view": {"continuousHeight": 300, "continuousWidth": 400}},
"hconcat": [
{
"data": {"name": "a",
"format": {"type": "json", "property": "data.histogram"},
"url": "https://gist.githubusercontent.com/hemagso/f7b4381be43b34ece4d8aa78c936c7d5/raw/0bae0177b8a2a5d33e23c0d164d4439d248aa9ff/mock,json"
},
"encoding": {
"x": {
"bin": {"binned": true},
"field": "bin_start",
"scale": {"type": "linear"}
},
"x2": {"field": "bin_end"},
"y": {"field": "count", "type": "quantitative"}
},
"mark": "bar"
},
{
"data": {"name": "b",
"format": {"type": "json", "property": "data.fill"},
"url": "https://gist.githubusercontent.com/hemagso/f7b4381be43b34ece4d8aa78c936c7d5/raw/0bae0177b8a2a5d33e23c0d164d4439d248aa9ff/mock,json"
},
"encoding": {
"color": {"field": "level", "type": "nominal"},
"x": {"field": "level", "type": "nominal"},
"y": {"field": "count", "type": "quantitative"}
},
"mark": "bar"
}
]
}
Related
As a learning exercise I decided to try plotting a density plot of continuously-compounded daily returns of the Nasdaq 100 index for calendar year 2020. I am unable to get vega-lite to produce any visualization, and yet there are no errors in the online editor. I'm just inexplicably given an empty plot.
Because of the embedded data, the plot spec is some 2500 lines long, so I've saved it as a gist: https://gist.github.com/nathanvy/2c080ee0b7e93b11e544c5275d31f2b1
What am I missing?
Change logreturn to value:
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"width": 300,
"height": 300,
"title": "Nasdaq 100 (NDX) Log Returns, 2020",
"mark": "area",
"transform": [{"density": "logreturn"}],
"encoding": {
"x": {
"field": "value",
"title": "Logarithmic Daily Return",
"type": "quantitative"
},
"y": {
"field": "density",
"title": "Probability of Return",
"type": "quantitative"
}
}
So I am working with vega-lite-v4 (that's the version our businesses airtable extension uses) and the answer to my previous post was that I need to use the pivot transform
But any time I try and use it as it is explained in the v4 documentation (https://vega.github.io/vega-lite-v4/docs/pivot.html) it throws an error as if the pivoted field does not exist
I've used the following test data:
Airtable test data
With the following test code:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"title": "Table 2",
"transform": [{
"pivot": "type",
"value": "calls",
"groupby": ["Month"]
}],
"mark": "bar",
"encoding": {
"x": {"field": "Month", "type": "nominal"},
"y": {"field": "Total", "type": "quantitative"}
}
}
And I still get the same error:
Total is not a valid transform name, inline data name, or field name from the Table 2 table:
"y": {
"field": "Total",
------------^
"type": "quantitative"
}
Even when I copy and paste the examples from the above documentation into the widget, it comes up with this error like pivot isn't making these fields
Can anyone help me figure out why this isn't working, or what to use instead?
EDIT:
So, a weird solution/workaround I found is to calculate the field as itself:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"title": "Table 2",
"transform": [{
"pivot": "type",
"value": "calls",
"groupby": ["Month"]
},
{"calculate" : "datum.Total", "as" : "newTotal"}
],
"mark": "bar",
"encoding": {
"x": {"field": "Month", "type": "nominal"},
"y": {"field": "newTotal", "type": "quantitative"}
}
}
This makes the graph behave completely as normal. I can use this for now, but it means I have to hard code each field name with a calculate transform, does this help anyone understand what's going on with this transform?
First of all, Vega and Vega-lite field names are case-sensitive, so "Month" is not the same as "month".
In your first code sample, "month" is incorrect and should be "Month":
"x": {"field": "month", "type": "nominal"},
but in the second code sample that was changed to "Month" which is correct:
"x": {"field": "Month", "type": "nominal"},
Try just correcting field name "Month" in the first code sample without calculating "newTotal":
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"title": "Table 2",
"transform": [{
"pivot": "type",
"value": "calls",
"groupby": ["Month"]
}],
"mark": "bar",
"encoding": {
"x": {"field": "Month", "type": "nominal"},
"y": {"field": "Total", "type": "quantitative"}
}
}
[EDIT: added following]
Here is a working example using your example data with pivot transform and rendered as bar chart by Vega-lite v5.2.0 with no errors.
Try using Vega-lite v5.2.0 instead of v4.
View in Vega on-line editor
Is there a Vega/Vega-Lite transform which I can use to select the first n rows in data set?
Suppose I get a dataset from a URL such as:
Person
Height
Jeremy
6.2
Alice
6.0
Walter
5.8
Amy
5.6
Joe
5.5
and I want to create a bar chart showing the height of only the three tallest people. Assume that we know for certain that the dataset from the URL is already sorted. Assume that we cannot change the data as returned by the URL.
I want to do something like this:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {
"url": "heights.csv"
},
"transform": [
{"head": 3}
],
"mark": "bar",
"encoding": {
"x": {"field": "Person", "type": "nominal"},
"y": {"field": "Height", "type": "quantitative"}
}
}
only the head transform does not actually exist - is there something else I can do to get the same effect?
The Vega-Lite documentation has an example along these lines in filtering top-k items.
Your case is a bit more specialized: you do not want to order based on rank, but rather based on the original ordering of the data. You can do this using a count-based window transform followed by an appropriate filter. For example (view in editor):
{
"data": {
"values": [
{"Person": "Jeremy", "Height": 6.2},
{"Person": "Alice", "Height": 6.0},
{"Person": "Walter", "Height": 5.8},
{"Person": "Amy", "Height": 5.6},
{"Person": "Joe", "Height": 5.5}
]
},
"transform": [
{"window": [{"op": "count", "as": "count"}]},
{"filter": "datum.count <= 3"}
],
"mark": "bar",
"encoding": {
"x": {"field": "Height", "type": "quantitative"},
"y": {"field": "Person", "type": "nominal", "sort": null}
}
}
In example below the mean aggregation used. How to calculate aggregation as a multiplication of all the elements?
And, is it possible to use a custom JS function? Like const myfn = (list) => list.length, (I know there's a buit-in count, it's just to illustrate the idea).
Playground
{
"data": {"url": "data/cars.json"},
"mark": "bar",
"encoding": {
"x": {"field": "Cylinders", "type": "ordinal"},
"y": {"aggregate": "mean", "field": "Acceleration", "type": "quantitative"}
}
}
Unfortunately, product is not one of the Built-in Aggregations in Vega-Lite, and by design the schema does not support injecting arbitrary Javascript functions (it supports a limited Vega Expression syntax). Unless you preprocess your data before injecting into the Vega-Lite specification, you're limited to building your custom computation from the operations available there.
For your specific question, since the log of a product equals the sum of logs, one way you could compute the product within the specification is via a series of transforms like this (playground):
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {"url": "data/cars.json"},
"transform": [
{"calculate": "log(datum.Acceleration)", "as": "logA"},
{"aggregate": [{"op": "sum", "field": "logA", "as": "log_prod_A"}], "groupby": ["Cylinders"]},
{"calculate": "exp(datum.log_prod_A)", "as": "prod_A"}
],
"mark": "bar",
"encoding": {
"x": {"field": "Cylinders", "type": "ordinal"},
"y": {"field": "prod_A", "type": "quantitative", "title": "prod(A)"}
}
}
A single bar dominates because there are many more entries with 4 Cylinders than with other numbers.
I am trying to make a sorted bar chart with labels and fill encoding. But when I add the the fill encoding it breaks the sort. Via the github issues it seems like there are ways to get around this, but I can seem find a solution.
Given the spec without using the fill encoding the sorting works as expected.
{
"$schema": "https://vega.github.io/schema/vega-lite/v3.json",
"data": {
"values": [
{
"a": "A",
"b": 28,
"color": "black"
},
{
"a": "B",
"b": 55,
"color": "grey"
},
{
"a": "C",
"b": 43,
"color": "red"
}
]
},
"encoding": {
"y": {
"field": "a",
"type": "ordinal",
"sort": {
"encoding": "x",
"order": "descending"
}
},
"x": {
"field": "b",
"type": "quantitative"
}
},
"layer": [
{
"mark": "bar"
},
{
"mark": {
"type": "text",
"align": "left",
"baseline": "middle",
"dx": 3
},
"encoding": {
"text": {
"field": "b",
"type": "quantitative"
}
}
}
]
}
When you add the fill encoding to the top level encoding object it breaks the sort with the following warning
"fill": {
"field": "color",
"type": "ordinal",
"scale": null
}
[Warning] Domains that should be unioned has conflicting sort properties. Sort will be set to true.
Full vega-editor here
Is there a work around for this.
It appear to relate to the these issues (maybe) #2536, #5408
Yep, the underlying issue is https://github.com/vega/vega-lite/issues/5048. In this particular case, adding color to once layer adds a stack transform to one part of the dataflow but not the other so we cannot merge it. This is a great test case. Can you add this example to a new github issue so we can try to resolve it?
You can manually fix this example by disabling stacking the x encoding.
"stack": null
See this spec.