I have a created a vega-lite scatterplot chart. The data for this chart will always be positive, however it is often zero. In this application, it would be helpful for the user for points who's x or y are zero to not overlap with the lines for the axis
The straightforward solution is to try and manually adjust the domain and range to start before 0 and after the maximum value. However, I'd like to know if there is a way to do this in the configuration instead. I have read through the documentation and, to my knowledge and ability, I have not yet found such a solution.
If you want to ensure that the lowest point does not overlap the axis, one way to do so is to use the axis "offset" property, which allows you to specify the horizontal offset of the y-axis in pixels. For example (open in editor):
{
"data": {
"values": [
{"x": 0, "y": 2},
{"x": 1, "y": 4},
{"x": 2, "y": 3},
{"x": 3, "y": 5},
{"x": 4, "y": 4}
]
},
"mark": "point",
"encoding": {
"x": {"field": "x", "type": "quantitative"},
"y": {"field": "y", "type": "quantitative", "axis": {"offset": 20}}
}
}
In addition to adjusting the domain and adding an axis offset, you can use the scale config properties to add padding for all scales of a certain type. This will add some whitespace between the axis lines and the location of the closest data points:
{
"config": {
"scale": {"continuousPadding": 5}
},
"data": {
"values": [
{"x": 0, "y": 2},
{"x": 1, "y": 4},
{"x": 2, "y": 3},
{"x": 3, "y": 5},
{"x": 4, "y": 4}
]
},
"mark": "point",
"encoding": {
"x": {"field": "x", "type": "quantitative"},
"y": {"field": "y", "type": "quantitative"}
}
}
Related
When doing an aggregate sum on a column and charting it using Vega-Lite, is it possible to skip invalid values instead of treating them as 0 when doing the addition? When there is missing/invalid data, I want to show it as such, rather than as 0.
For example, this graph is what I expect when aggregating on date to get the sums for x and y.
Whereas in this example, the y value for both rows where date=2022-01-20 are NaN, so I would want there to be no data point for the sum of column y and show it as missing data, instead of as 0.
Is there a way to do that? I’ve looked through the documentation but may have missed something. I've tried using filter like so, but that filters out an entire row, rather than just the invalid value of a particular column for the row when doing the sum.
I’m thinking something like pandas GroupBy.sum(min_coun=1), so that if there isn't at least 1 non-NaN value, then the result will be presented as NaN.
OK, try this which removes NaN and null but leaves zero.
Editor.
Or this which removes a load of useless transforms.
Editor
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"description": "Google's stock price over time.",
"data": {
"values": [
{"date": "2022-01-20", "g": "apples", "x": "NaN", "y": "NaN"},
{"date": "2022-01-20", "g": "oranges", "x": "10", "y": "20"},
{"date": "2022-01-21", "g": "oranges", "x": "30", "y": "NaN"},
{"date": "2022-01-21", "g": "grapes", "x": "40", "y": "20"},
{"date": "2022-01-22", "g": "apples", "x": "NaN", "y": "NaN"},
{"date": "2022-01-22", "g": "grapes", "x": "10", "y": "NaN"}
]
},
"transform": [
{"calculate": "parseFloat(datum['x'])", "as": "x"},
{"calculate": "parseFloat(datum['y'])", "as": "y"},
{"fold": ["x", "y"]},
**{"filter": {"field": "value", "valid": true}},**
{
"aggregate": [{"op": "sum", "field": "value", "as": "value"}],
"groupby": ["date", "key"]
}
],
"encoding": {"x": {"field": "date", "type": "temporal"}},
"layer": [
{
"encoding": {
"y": {"field": "value", "type": "quantitative"},
"color": {"field": "key", "type": "nominal"}
},
"mark": "line"
},
{
"encoding": {
"y": {"field": "value", "type": "quantitative"},
"color": {"field": "key", "type": "nominal"}
},
"mark": {"type": "point", "tooltip": {"content": "encoding"}}
}
]
}
I want to have images as my axis labels. I tried to use the image mark, but that did not work and that is kind of expected. Label expression is also something I tried, but that did not work if I want it to be an image. What else could I tried or is it possible at all?
Line chart example
Using the same technique in another answer, an image axis can be added via an extra layer:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {
"values": [
{"x": 0.5, "y": 0.5, "img": "data/ffox.png"},
{"x": 1.5, "y": 1.5, "img": "data/gimp.png"},
{"x": 2.5, "y": 2.5, "img": "data/7zip.png"}
]
},
"layer": [{
"mark": {"type": "image", "width": 50, "height": 50},
"encoding": {
"x": {"field": "x", "type": "quantitative"},
"y": {"field": "y", "type": "quantitative"},
"url": {"field": "img", "type": "nominal"}
}
}, {
"transform": [{"calculate": "-.2", "as": "axis"}],
"mark": {"type": "image", "width": 25, "height": 25},
"encoding": {
"x": {"field": "x", "type": "quantitative", "axis":{"labelExpr": "", "title": null}},
"y": {"field": "axis", "type": "quantitative", "scale": {"domain": [0, 2.5]}},
"url": {"field": "img", "type": "nominal"}
}
}],
"resolve": {"scale": {"y": "shared"}}
}
Vega Editor
===== 2021-07-20 =====
Your BarChart's x encoding uses the x field which is not evenly distributed, so it is misaligned.
If you do have the a field shown in your editor, simplest way is to replace the encoding as "x": {"field": "a", "type": "ordinal", "axis": null}
Vega Editor
Even the a field was not there, you may wanna add such a field for aligning, or even ordering, the image axis.
The last resort I can think of is window transform which is an overkill, but adds no extra value as well:
Vega Editor
Is there a way to use the symlog scale, but with tick marks placed on many orders of magnitude?
To explain, the log scale has tick marks on many orders of magnitude, but cannot show ≤ 1 values (last bar is hidden in this example):
On the other hand, symlog scale can represent negative, zero, and one-valued data, but by default only has tick marks for the largest order of magnitude:
You can specify a list of desired tick values using the axis.values specification. For example:
{
"mark": "point",
"data": {
"values": [
{"x": 0, "y": 1},
{"x": 1, "y": 10},
{"x": 2, "y": 100},
{"x": 3, "y": 1000},
{"x": 4, "y": 10000},
{"x": 5, "y": 100000},
{"x": 6, "y": 1000000},
{"x": 7, "y": 10000000}
]
},
"encoding": {
"x": {"field": "x", "type": "quantitative"},
"y": {
"field": "y",
"axis": {"values": [10, 100, 1000, 10000, 100000, 1000000, 10000000]},
"scale": {"type": "symlog", "constant": 1}
}
}
}
Basically what I have is a line graph that is layered from several line graphs. Since each graph has only one line, there is no legend automatically generated, so what is the best way to get a legend for the chart? I have been considering trying to transform my dataset. This is weekly deaths total from the cdc from 2019-June 2020. The way the csv is arranged is each date for each state has a record with each disease type as it's own column and integers as the column values. So there isn't one field to chart, there are many, hence the layering. Any insights into how to solve this problem would be much appreciated! Here is my work so far:
https://observablehq.com/#justin-krohn/covid-excess-deaths
You can create a legend for a layered chart by setting the color encoding for each layer to a datum specifying what label you would like it to have. For example (vega editor):
{
"data": {
"values": [
{"x": 1, "y1": 1, "y2": 2},
{"x": 2, "y1": 3, "y2": 1},
{"x": 3, "y1": 2, "y2": 4},
{"x": 4, "y1": 4, "y2": 3},
{"x": 5, "y1": 3, "y2": 5}
]
},
"encoding": {"x": {"field": "x", "type": "quantitative"}},
"layer": [
{
"mark": "line",
"encoding": {
"y": {"field": "y1", "type": "quantitative"},
"color": {"datum": "y1"}
}
},
{
"mark": "line",
"encoding": {
"y": {"field": "y2", "type": "quantitative"},
"color": {"datum": "y2"}
}
}
]
}
Alternatively, you can use a Fold Transform to pivot your data so that instead of manual layers, you can plot the multiple lines with a simple color encoding. For example (vega editor):
{
"data": {
"values": [
{"x": 1, "y1": 1, "y2": 2},
{"x": 2, "y1": 3, "y2": 1},
{"x": 3, "y1": 2, "y2": 4},
{"x": 4, "y1": 4, "y2": 3},
{"x": 5, "y1": 3, "y2": 5}
]
},
"transform": [{"fold": ["y1", "y2"], "as": ["name", "y"]}],
"mark": "line",
"encoding": {
"x": {"field": "x", "type": "quantitative"},
"y": {"field": "y", "type": "quantitative"},
"color": {"field": "name", "type": "nominal"}
}
}
While working with Vega Lite, I'm unable to force the same order for painting non-stacked areas.
For example, see what happens here:
In the rows picture above, I expect light-blue to always be behind dark-blue. However this is not true for the 3rd, 4th, and 6th row.
I've tried several combinations of order, sort, and zrank — with no success. Any idea on how to force this?
See the sample full viz in the editor — I can't get India, Israel, or Japan to display the dark-blue area on top of the light-blue.
I don't think there is currently a way to control the z order of area charts from the chart spec, but you can control it via the order of the data source: the colors are stacked in the order that they appear. For example, here in row 0, color 1 comes first, and in row 1, color 0 comes first:
{
"data": {
"values": [
{"x": 0, "y": 2, "row": 0, "color": 1},
{"x": 1, "y": 1, "row": 0, "color": 1},
{"x": 0, "y": 1, "row": 0, "color": 0},
{"x": 1, "y": 2, "row": 0, "color": 0},
{"x": 0, "y": 2, "row": 1, "color": 0},
{"x": 1, "y": 1, "row": 1, "color": 0},
{"x": 0, "y": 1, "row": 1, "color": 1},
{"x": 1, "y": 2, "row": 1, "color": 1}
]
},
"mark": "area",
"encoding": {
"x": {"field": "x", "type": "temporal"},
"y": {"field": "y", "type": "quantitative", "stack": false},
"color": {"field": "color", "type": "ordinal"},
"row": {"field": "row", "type": "ordinal"}
},
"height": 50
}
If you rearrange the rows so that color 0 appears before color 1 in both cases, the stack order on the chart will be consistent:
{
"data": {
"values": [
{"x": 0, "y": 1, "row": 0, "color": 0},
{"x": 1, "y": 2, "row": 0, "color": 0},
{"x": 0, "y": 2, "row": 0, "color": 1},
{"x": 1, "y": 1, "row": 0, "color": 1},
{"x": 0, "y": 2, "row": 1, "color": 0},
{"x": 1, "y": 1, "row": 1, "color": 0},
{"x": 0, "y": 1, "row": 1, "color": 1},
{"x": 1, "y": 2, "row": 1, "color": 1}
]
},
"mark": "area",
"encoding": {
"x": {"field": "x", "type": "temporal"},
"y": {"field": "y", "type": "quantitative", "stack": false},
"color": {"field": "color", "type": "ordinal"},
"row": {"field": "row", "type": "ordinal"}
},
"height": 50
}
If you re-order the rows your input data by year (so all 2019 entries come before all 2020 entries), the stack order should be the same in each panel.