Using Vega Lite to display already-aggregated data - vega-lite

I'm trying to show a stacked bar chart of sums over time. The data looks something like this:
[
{
"date": 12345,
"sumA": 100,
"sumB": 150
},
...
]
I'm encoding the x axis to the field "date". I need the bar at date 12345 to be stacked with one part being 100 high, and the other, shown in another color, being 150 high.
Vega Lite seems to expect the raw data, but this would be too slow. I do this aggregate on the server side to save time. Can I spoon-feed Vega Lite the aggregates like in my example above?

You can use the fold transform to fold your two columns into one, and then the channel encodings take care of the rest. For example (vega editor):
{
"data": {
"values": [
{"date": 1, "sumA": 100, "sumB": 150},
{"date": 2, "sumA": 200, "sumB": 50},
{"date": 3, "sumA": 80, "sumB": 120},
{"date": 4, "sumA": 120, "sumB": 30},
{"date": 5, "sumA": 150, "sumB": 110}
]
},
"transform": [
{"fold": ["sumA", "sumB"], "as": ["column", "value"]}
],
"mark": {"type": "bar"},
"encoding": {
"x": {"type": "ordinal", "field": "date"},
"y": {"type": "quantitative", "field": "value"},
"color": {"type": "nominal", "field": "column"}
}
}

Related

Fill missing data with 0 in specified date range with Vega-Lite

I have to render a line-chart that consumes data from an API and it only returns values for the days that do have some data. For the days where there is no data, it does not return an entry with 0 as it'd be expected.
This means that the chart doesn't represent values with 0, which is an issue.
I can't modify this API, so my question would be if there is a way I can tell vega-lite to render data within a date range and, if there is no data for some day, show it as 0.
I guess I'd be able to transform the data before sending it to my react-vega component, but if this can be done by vega-lite, it'd be much better.
You can use impute (you have to supply dates converted to number in the impute though - I have raised a bug here)
The spec below imputes a zero value for 2012-01-05:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"description": "Using utc scale with local time input.",
"data": {
"values": [
{"date": "2012-01-01", "price": 150},
{"date": "2012-01-02", "price": 100},
{"date": "2012-01-03", "price": 170},
{"date": "2012-01-04", "price": 165},
{"date": "2012-01-06", "price": 200}
]
},
"transform": [
{
"impute": "price",
"key": "date",
"value": 0,
"keyvals": [
1325376000000,
1325462400000,
1325548800000,
1325635200000,
1325721600000,
1325721600000
]
},
{"timeUnit": "day", "field": "date", "as": "dateTU"}
],
"mark": "line",
"encoding": {
"x": {"field": "date", "timeUnit": "date"},
"y": {"field": "price", "type": "quantitative"}
}
}

Vega-lite: skip invalid values instead of treating as 0 for aggregate sum

When doing an aggregate sum on a column and charting it using Vega-Lite, is it possible to skip invalid values instead of treating them as 0 when doing the addition? When there is missing/invalid data, I want to show it as such, rather than as 0.
For example, this graph is what I expect when aggregating on date to get the sums for x and y.
Whereas in this example, the y value for both rows where date=2022-01-20 are NaN, so I would want there to be no data point for the sum of column y and show it as missing data, instead of as 0.
Is there a way to do that? I’ve looked through the documentation but may have missed something. I've tried using filter like so, but that filters out an entire row, rather than just the invalid value of a particular column for the row when doing the sum.
I’m thinking something like pandas GroupBy.sum(min_coun=1), so that if there isn't at least 1 non-NaN value, then the result will be presented as NaN.
OK, try this which removes NaN and null but leaves zero.
Editor.
Or this which removes a load of useless transforms.
Editor
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"description": "Google's stock price over time.",
"data": {
"values": [
{"date": "2022-01-20", "g": "apples", "x": "NaN", "y": "NaN"},
{"date": "2022-01-20", "g": "oranges", "x": "10", "y": "20"},
{"date": "2022-01-21", "g": "oranges", "x": "30", "y": "NaN"},
{"date": "2022-01-21", "g": "grapes", "x": "40", "y": "20"},
{"date": "2022-01-22", "g": "apples", "x": "NaN", "y": "NaN"},
{"date": "2022-01-22", "g": "grapes", "x": "10", "y": "NaN"}
]
},
"transform": [
{"calculate": "parseFloat(datum['x'])", "as": "x"},
{"calculate": "parseFloat(datum['y'])", "as": "y"},
{"fold": ["x", "y"]},
**{"filter": {"field": "value", "valid": true}},**
{
"aggregate": [{"op": "sum", "field": "value", "as": "value"}],
"groupby": ["date", "key"]
}
],
"encoding": {"x": {"field": "date", "type": "temporal"}},
"layer": [
{
"encoding": {
"y": {"field": "value", "type": "quantitative"},
"color": {"field": "key", "type": "nominal"}
},
"mark": "line"
},
{
"encoding": {
"y": {"field": "value", "type": "quantitative"},
"color": {"field": "key", "type": "nominal"}
},
"mark": {"type": "point", "tooltip": {"content": "encoding"}}
}
]
}

How to set different colors area chart depending on value Vega Lite

I want to create area chart in which area above y=0 will be one color anb below - another. I have a problem with setting conditional color in Vega-Lite. (I'm using guide from https://vega.github.io/vega-lite/docs/condition.html)
"color": {
"condition": {
"test": "datum['y'] < 0",
"value": {
"x1": 1,
"y1": 1,
"x2": 1,
"y2": 0,
"gradient": "linear",
"stops": [
{
"offset": 0,
"color": "white"
},
{
"offset": 1,
"color": "orange"
}
]
}
},
"value": {
// otherValue
}
Full code here:
https://codesandbox.io/s/interactive-vega-lite-bar-chart-forked-b0hnh?file=/index.html
The condition is not working on mark area. Tried the same for type bar or point and it worked. Still managed to get the gradient effect with colors red, white and orange. You can refer the below config or refer link:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"description": "Google's stock price over time.",
"data": {
"values": [
{"date": "2021-03-29", "value": -10, "rValue": 0},
{"date": "2021-03-28", "value": -20, "rValue": 0},
{"date": "2021-03-27", "value": 20, "rValue": 0},
{"date": "2021-03-26", "value": 40, "rValue": 0},
{"date": "2021-03-25", "value": 35, "rValue": 0}
]
},
"width": 400,
"height": 200,
"mark": {"type": "area"},
"encoding": {
"x": {
"field": "date",
"type": "temporal",
"axis": {"format": "%d/%m/%Y", "labelPadding": 10}
},
"y": {"field": "value", "type": "quantitative"},
"fill": {
"value": {
"gradient": "linear",
"stops": [
{"offset": 0, "color": "red"},
{"offset": 0.5, "color": "white"},
{"offset": 1, "color": "orange"}
]
}
}
},
"config": {}
}
Also, I tried your mark area with a simple configuration and that didn't work. I have created a config having bar which works properly with condition but if the same is changed to area than it does not work.
One way to do this is to overlay a separate layer.
However I'm not sure how to color the bit before the first negative value.
Open the Chart in the Vega Editor

How do I create a legend for a layered line plot

Basically what I have is a line graph that is layered from several line graphs. Since each graph has only one line, there is no legend automatically generated, so what is the best way to get a legend for the chart? I have been considering trying to transform my dataset. This is weekly deaths total from the cdc from 2019-June 2020. The way the csv is arranged is each date for each state has a record with each disease type as it's own column and integers as the column values. So there isn't one field to chart, there are many, hence the layering. Any insights into how to solve this problem would be much appreciated! Here is my work so far:
https://observablehq.com/#justin-krohn/covid-excess-deaths
You can create a legend for a layered chart by setting the color encoding for each layer to a datum specifying what label you would like it to have. For example (vega editor):
{
"data": {
"values": [
{"x": 1, "y1": 1, "y2": 2},
{"x": 2, "y1": 3, "y2": 1},
{"x": 3, "y1": 2, "y2": 4},
{"x": 4, "y1": 4, "y2": 3},
{"x": 5, "y1": 3, "y2": 5}
]
},
"encoding": {"x": {"field": "x", "type": "quantitative"}},
"layer": [
{
"mark": "line",
"encoding": {
"y": {"field": "y1", "type": "quantitative"},
"color": {"datum": "y1"}
}
},
{
"mark": "line",
"encoding": {
"y": {"field": "y2", "type": "quantitative"},
"color": {"datum": "y2"}
}
}
]
}
Alternatively, you can use a Fold Transform to pivot your data so that instead of manual layers, you can plot the multiple lines with a simple color encoding. For example (vega editor):
{
"data": {
"values": [
{"x": 1, "y1": 1, "y2": 2},
{"x": 2, "y1": 3, "y2": 1},
{"x": 3, "y1": 2, "y2": 4},
{"x": 4, "y1": 4, "y2": 3},
{"x": 5, "y1": 3, "y2": 5}
]
},
"transform": [{"fold": ["y1", "y2"], "as": ["name", "y"]}],
"mark": "line",
"encoding": {
"x": {"field": "x", "type": "quantitative"},
"y": {"field": "y", "type": "quantitative"},
"color": {"field": "name", "type": "nominal"}
}
}

how to define json for multi-line graph data

I have been trying to find out how to define the data for a multi-line graph in vega-lite but I can't get it to work. The examples show data for a csv file at a URL endpoint ( https://vega.github.io/vega-editor/?mode=vega-lite&spec=line_color&showEditor=1 ), but I want to view the data I define in a simple json.
Here is what I have for a single line graph:
var LineSpec = {
"description": "variation over time for",
"data": {
"values":
[
{"date": "2012-04-23T18:25:43.511Z","price": 10},
{"date": "2012-04-25T18:25:43.511Z","price": 7},
{"date": "2012-04-27T18:25:43.511Z","price": 4},
{"date": "2012-05-01T18:25:43.511Z","price": 1},
{"date": "2012-05-03T18:25:43.511Z","price": 2},
{"date": "2012-05-05T18:25:43.511Z","price": 6},
{"date": "2012-05-07T18:25:43.511Z","price": 8},
{"date": "2012-05-09T18:25:43.511Z","price": 4},
{"date": "2012-05-11T18:25:43.511Z","price": 7}
]
},
"mark": "line",
"encoding": {
"x": {"field": "date", "type": "temporal"},
"y": {"field": "price", "type": "quantitative"},
"color": {"field": "symbol", "type": "nominal"}
}
};
How do I modify "data" so as display a multiline graph? (and if possible display more useful information that undefined in the symbol table). Here is what I see right now:
Line graph with undefined symbol
Thank you!
You will have to add the symbol field to your data. I added the symbol field and symbols A and B. This data should render a multi-line graph with the two symbols in the legend:
{
"description": "variation over time for",
"data": {
"values": [
{"date": "2012-04-23T18:25:43.511Z","price": 10, "symbol": "A"},
{"date": "2012-04-25T18:25:43.511Z","price": 7, "symbol": "B"},
{"date": "2012-04-27T18:25:43.511Z","price": 4, "symbol": "A"},
{"date": "2012-05-01T18:25:43.511Z","price": 1, "symbol": "B"},
{"date": "2012-05-03T18:25:43.511Z","price": 2, "symbol": "A"},
{"date": "2012-05-05T18:25:43.511Z","price": 6, "symbol": "B"},
{"date": "2012-05-07T18:25:43.511Z","price": 8, "symbol": "A"},
{"date": "2012-05-09T18:25:43.511Z","price": 4, "symbol": "B"},
{"date": "2012-05-11T18:25:43.511Z","price": 7, "symbol": "A"}
]
},
"mark": "line",
"encoding": {
"x": {"field": "date", "type": "temporal"},
"y": {"field": "price", "type": "quantitative"},
"color": {"field": "symbol", "type": "nominal"}
}
}