Vega Lite: symlog scale with more tick marks - vega-lite

Is there a way to use the symlog scale, but with tick marks placed on many orders of magnitude?
To explain, the log scale has tick marks on many orders of magnitude, but cannot show ≤ 1 values (last bar is hidden in this example):
On the other hand, symlog scale can represent negative, zero, and one-valued data, but by default only has tick marks for the largest order of magnitude:

You can specify a list of desired tick values using the axis.values specification. For example:
{
"mark": "point",
"data": {
"values": [
{"x": 0, "y": 1},
{"x": 1, "y": 10},
{"x": 2, "y": 100},
{"x": 3, "y": 1000},
{"x": 4, "y": 10000},
{"x": 5, "y": 100000},
{"x": 6, "y": 1000000},
{"x": 7, "y": 10000000}
]
},
"encoding": {
"x": {"field": "x", "type": "quantitative"},
"y": {
"field": "y",
"axis": {"values": [10, 100, 1000, 10000, 100000, 1000000, 10000000]},
"scale": {"type": "symlog", "constant": 1}
}
}
}

Related

Vega-lite: skip invalid values instead of treating as 0 for aggregate sum

When doing an aggregate sum on a column and charting it using Vega-Lite, is it possible to skip invalid values instead of treating them as 0 when doing the addition? When there is missing/invalid data, I want to show it as such, rather than as 0.
For example, this graph is what I expect when aggregating on date to get the sums for x and y.
Whereas in this example, the y value for both rows where date=2022-01-20 are NaN, so I would want there to be no data point for the sum of column y and show it as missing data, instead of as 0.
Is there a way to do that? I’ve looked through the documentation but may have missed something. I've tried using filter like so, but that filters out an entire row, rather than just the invalid value of a particular column for the row when doing the sum.
I’m thinking something like pandas GroupBy.sum(min_coun=1), so that if there isn't at least 1 non-NaN value, then the result will be presented as NaN.
OK, try this which removes NaN and null but leaves zero.
Editor.
Or this which removes a load of useless transforms.
Editor
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"description": "Google's stock price over time.",
"data": {
"values": [
{"date": "2022-01-20", "g": "apples", "x": "NaN", "y": "NaN"},
{"date": "2022-01-20", "g": "oranges", "x": "10", "y": "20"},
{"date": "2022-01-21", "g": "oranges", "x": "30", "y": "NaN"},
{"date": "2022-01-21", "g": "grapes", "x": "40", "y": "20"},
{"date": "2022-01-22", "g": "apples", "x": "NaN", "y": "NaN"},
{"date": "2022-01-22", "g": "grapes", "x": "10", "y": "NaN"}
]
},
"transform": [
{"calculate": "parseFloat(datum['x'])", "as": "x"},
{"calculate": "parseFloat(datum['y'])", "as": "y"},
{"fold": ["x", "y"]},
**{"filter": {"field": "value", "valid": true}},**
{
"aggregate": [{"op": "sum", "field": "value", "as": "value"}],
"groupby": ["date", "key"]
}
],
"encoding": {"x": {"field": "date", "type": "temporal"}},
"layer": [
{
"encoding": {
"y": {"field": "value", "type": "quantitative"},
"color": {"field": "key", "type": "nominal"}
},
"mark": "line"
},
{
"encoding": {
"y": {"field": "value", "type": "quantitative"},
"color": {"field": "key", "type": "nominal"}
},
"mark": {"type": "point", "tooltip": {"content": "encoding"}}
}
]
}

Adding Axis padding/inner margin to Vega-Lite chart

I have a created a vega-lite scatterplot chart. The data for this chart will always be positive, however it is often zero. In this application, it would be helpful for the user for points who's x or y are zero to not overlap with the lines for the axis
The straightforward solution is to try and manually adjust the domain and range to start before 0 and after the maximum value. However, I'd like to know if there is a way to do this in the configuration instead. I have read through the documentation and, to my knowledge and ability, I have not yet found such a solution.
If you want to ensure that the lowest point does not overlap the axis, one way to do so is to use the axis "offset" property, which allows you to specify the horizontal offset of the y-axis in pixels. For example (open in editor):
{
"data": {
"values": [
{"x": 0, "y": 2},
{"x": 1, "y": 4},
{"x": 2, "y": 3},
{"x": 3, "y": 5},
{"x": 4, "y": 4}
]
},
"mark": "point",
"encoding": {
"x": {"field": "x", "type": "quantitative"},
"y": {"field": "y", "type": "quantitative", "axis": {"offset": 20}}
}
}
In addition to adjusting the domain and adding an axis offset, you can use the scale config properties to add padding for all scales of a certain type. This will add some whitespace between the axis lines and the location of the closest data points:
{
"config": {
"scale": {"continuousPadding": 5}
},
"data": {
"values": [
{"x": 0, "y": 2},
{"x": 1, "y": 4},
{"x": 2, "y": 3},
{"x": 3, "y": 5},
{"x": 4, "y": 4}
]
},
"mark": "point",
"encoding": {
"x": {"field": "x", "type": "quantitative"},
"y": {"field": "y", "type": "quantitative"}
}
}

How do I create a legend for a layered line plot

Basically what I have is a line graph that is layered from several line graphs. Since each graph has only one line, there is no legend automatically generated, so what is the best way to get a legend for the chart? I have been considering trying to transform my dataset. This is weekly deaths total from the cdc from 2019-June 2020. The way the csv is arranged is each date for each state has a record with each disease type as it's own column and integers as the column values. So there isn't one field to chart, there are many, hence the layering. Any insights into how to solve this problem would be much appreciated! Here is my work so far:
https://observablehq.com/#justin-krohn/covid-excess-deaths
You can create a legend for a layered chart by setting the color encoding for each layer to a datum specifying what label you would like it to have. For example (vega editor):
{
"data": {
"values": [
{"x": 1, "y1": 1, "y2": 2},
{"x": 2, "y1": 3, "y2": 1},
{"x": 3, "y1": 2, "y2": 4},
{"x": 4, "y1": 4, "y2": 3},
{"x": 5, "y1": 3, "y2": 5}
]
},
"encoding": {"x": {"field": "x", "type": "quantitative"}},
"layer": [
{
"mark": "line",
"encoding": {
"y": {"field": "y1", "type": "quantitative"},
"color": {"datum": "y1"}
}
},
{
"mark": "line",
"encoding": {
"y": {"field": "y2", "type": "quantitative"},
"color": {"datum": "y2"}
}
}
]
}
Alternatively, you can use a Fold Transform to pivot your data so that instead of manual layers, you can plot the multiple lines with a simple color encoding. For example (vega editor):
{
"data": {
"values": [
{"x": 1, "y1": 1, "y2": 2},
{"x": 2, "y1": 3, "y2": 1},
{"x": 3, "y1": 2, "y2": 4},
{"x": 4, "y1": 4, "y2": 3},
{"x": 5, "y1": 3, "y2": 5}
]
},
"transform": [{"fold": ["y1", "y2"], "as": ["name", "y"]}],
"mark": "line",
"encoding": {
"x": {"field": "x", "type": "quantitative"},
"y": {"field": "y", "type": "quantitative"},
"color": {"field": "name", "type": "nominal"}
}
}

Force order of painting, in non-stacked area chart

While working with Vega Lite, I'm unable to force the same order for painting non-stacked areas.
For example, see what happens here:
In the rows picture above, I expect light-blue to always be behind dark-blue. However this is not true for the 3rd, 4th, and 6th row.
I've tried several combinations of order, sort, and zrank — with no success. Any idea on how to force this?
See the sample full viz in the editor — I can't get India, Israel, or Japan to display the dark-blue area on top of the light-blue.
I don't think there is currently a way to control the z order of area charts from the chart spec, but you can control it via the order of the data source: the colors are stacked in the order that they appear. For example, here in row 0, color 1 comes first, and in row 1, color 0 comes first:
{
"data": {
"values": [
{"x": 0, "y": 2, "row": 0, "color": 1},
{"x": 1, "y": 1, "row": 0, "color": 1},
{"x": 0, "y": 1, "row": 0, "color": 0},
{"x": 1, "y": 2, "row": 0, "color": 0},
{"x": 0, "y": 2, "row": 1, "color": 0},
{"x": 1, "y": 1, "row": 1, "color": 0},
{"x": 0, "y": 1, "row": 1, "color": 1},
{"x": 1, "y": 2, "row": 1, "color": 1}
]
},
"mark": "area",
"encoding": {
"x": {"field": "x", "type": "temporal"},
"y": {"field": "y", "type": "quantitative", "stack": false},
"color": {"field": "color", "type": "ordinal"},
"row": {"field": "row", "type": "ordinal"}
},
"height": 50
}
If you rearrange the rows so that color 0 appears before color 1 in both cases, the stack order on the chart will be consistent:
{
"data": {
"values": [
{"x": 0, "y": 1, "row": 0, "color": 0},
{"x": 1, "y": 2, "row": 0, "color": 0},
{"x": 0, "y": 2, "row": 0, "color": 1},
{"x": 1, "y": 1, "row": 0, "color": 1},
{"x": 0, "y": 2, "row": 1, "color": 0},
{"x": 1, "y": 1, "row": 1, "color": 0},
{"x": 0, "y": 1, "row": 1, "color": 1},
{"x": 1, "y": 2, "row": 1, "color": 1}
]
},
"mark": "area",
"encoding": {
"x": {"field": "x", "type": "temporal"},
"y": {"field": "y", "type": "quantitative", "stack": false},
"color": {"field": "color", "type": "ordinal"},
"row": {"field": "row", "type": "ordinal"}
},
"height": 50
}
If you re-order the rows your input data by year (so all 2019 entries come before all 2020 entries), the stack order should be the same in each panel.

Using Vega Lite to display already-aggregated data

I'm trying to show a stacked bar chart of sums over time. The data looks something like this:
[
{
"date": 12345,
"sumA": 100,
"sumB": 150
},
...
]
I'm encoding the x axis to the field "date". I need the bar at date 12345 to be stacked with one part being 100 high, and the other, shown in another color, being 150 high.
Vega Lite seems to expect the raw data, but this would be too slow. I do this aggregate on the server side to save time. Can I spoon-feed Vega Lite the aggregates like in my example above?
You can use the fold transform to fold your two columns into one, and then the channel encodings take care of the rest. For example (vega editor):
{
"data": {
"values": [
{"date": 1, "sumA": 100, "sumB": 150},
{"date": 2, "sumA": 200, "sumB": 50},
{"date": 3, "sumA": 80, "sumB": 120},
{"date": 4, "sumA": 120, "sumB": 30},
{"date": 5, "sumA": 150, "sumB": 110}
]
},
"transform": [
{"fold": ["sumA", "sumB"], "as": ["column", "value"]}
],
"mark": {"type": "bar"},
"encoding": {
"x": {"type": "ordinal", "field": "date"},
"y": {"type": "quantitative", "field": "value"},
"color": {"type": "nominal", "field": "column"}
}
}