Fill missing data with 0 in specified date range with Vega-Lite - vega-lite

I have to render a line-chart that consumes data from an API and it only returns values for the days that do have some data. For the days where there is no data, it does not return an entry with 0 as it'd be expected.
This means that the chart doesn't represent values with 0, which is an issue.
I can't modify this API, so my question would be if there is a way I can tell vega-lite to render data within a date range and, if there is no data for some day, show it as 0.
I guess I'd be able to transform the data before sending it to my react-vega component, but if this can be done by vega-lite, it'd be much better.

You can use impute (you have to supply dates converted to number in the impute though - I have raised a bug here)
The spec below imputes a zero value for 2012-01-05:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"description": "Using utc scale with local time input.",
"data": {
"values": [
{"date": "2012-01-01", "price": 150},
{"date": "2012-01-02", "price": 100},
{"date": "2012-01-03", "price": 170},
{"date": "2012-01-04", "price": 165},
{"date": "2012-01-06", "price": 200}
]
},
"transform": [
{
"impute": "price",
"key": "date",
"value": 0,
"keyvals": [
1325376000000,
1325462400000,
1325548800000,
1325635200000,
1325721600000,
1325721600000
]
},
{"timeUnit": "day", "field": "date", "as": "dateTU"}
],
"mark": "line",
"encoding": {
"x": {"field": "date", "timeUnit": "date"},
"y": {"field": "price", "type": "quantitative"}
}
}

Related

VegaLite Split Slices and aggregate by ranges

I'm trying to create a similar dashboard using VegaLite:
My example is in this link
Is there a way to configure the ranges in the dashboard and show it in a similar way as in the screenshot?
In need to devide the pie chart to two ranges:
0<=x<1
X>=1
You could use a binned transform or if you want just two discrete categories then a calculated field works just fine and can also be used in the legend.
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"description": "A simple pie chart with embedded data.",
"data": {
"values": [
{"username": "client1", "value": 4},
{"username": "client2", "value": 0.6},
{"username": "client3", "value": 0},
{"username": "client4", "value": 3},
{"username": "client5", "value": 7},
{"username": "client6", "value": 8}
]
},
"transform": [
{"calculate": "datum.value>=3?'>=3':'<3'" ,"as": "binned"}
]
,
"mark": "arc",
"encoding": {
"theta": {"field": "value", "type": "quantitative", "aggregate": "count" },
"color": {"field": "binned", "type": "nominal"}
}
}

Vega-lite: skip invalid values instead of treating as 0 for aggregate sum

When doing an aggregate sum on a column and charting it using Vega-Lite, is it possible to skip invalid values instead of treating them as 0 when doing the addition? When there is missing/invalid data, I want to show it as such, rather than as 0.
For example, this graph is what I expect when aggregating on date to get the sums for x and y.
Whereas in this example, the y value for both rows where date=2022-01-20 are NaN, so I would want there to be no data point for the sum of column y and show it as missing data, instead of as 0.
Is there a way to do that? I’ve looked through the documentation but may have missed something. I've tried using filter like so, but that filters out an entire row, rather than just the invalid value of a particular column for the row when doing the sum.
I’m thinking something like pandas GroupBy.sum(min_coun=1), so that if there isn't at least 1 non-NaN value, then the result will be presented as NaN.
OK, try this which removes NaN and null but leaves zero.
Editor.
Or this which removes a load of useless transforms.
Editor
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"description": "Google's stock price over time.",
"data": {
"values": [
{"date": "2022-01-20", "g": "apples", "x": "NaN", "y": "NaN"},
{"date": "2022-01-20", "g": "oranges", "x": "10", "y": "20"},
{"date": "2022-01-21", "g": "oranges", "x": "30", "y": "NaN"},
{"date": "2022-01-21", "g": "grapes", "x": "40", "y": "20"},
{"date": "2022-01-22", "g": "apples", "x": "NaN", "y": "NaN"},
{"date": "2022-01-22", "g": "grapes", "x": "10", "y": "NaN"}
]
},
"transform": [
{"calculate": "parseFloat(datum['x'])", "as": "x"},
{"calculate": "parseFloat(datum['y'])", "as": "y"},
{"fold": ["x", "y"]},
**{"filter": {"field": "value", "valid": true}},**
{
"aggregate": [{"op": "sum", "field": "value", "as": "value"}],
"groupby": ["date", "key"]
}
],
"encoding": {"x": {"field": "date", "type": "temporal"}},
"layer": [
{
"encoding": {
"y": {"field": "value", "type": "quantitative"},
"color": {"field": "key", "type": "nominal"}
},
"mark": "line"
},
{
"encoding": {
"y": {"field": "value", "type": "quantitative"},
"color": {"field": "key", "type": "nominal"}
},
"mark": {"type": "point", "tooltip": {"content": "encoding"}}
}
]
}

visualize the duration of events

I want to visualize durations of events as a bar, my input value is a decimal value where the integer part represents days and the decimal part a fraction of a day. I can convert the input value to any value needed.
An event can span multiple days.
The code below contains data for two events, the duration of event a is 36 hours and the duration of event b is 12 hours. Of course, it's possible that an event can be over after just some minutes or take 3hours 14minutes 24seconds.
I want the x-axis have ticks every 30minutes, from the sample data I need 36 hours, an axis label can look like 0d 0:00.
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"height": "container",
"width": "container",
"data": {
"values": [
{
"event": "a",
"durationdecimal": 1.5
},
{
"event": "b",
"durationdecimal": 0.5
}
]
},
"mark": {"type": "bar"},
"encoding": {
"x": {
"field": "durationdecimal",
"type": "temporal",
"axis": {"grid": false},
"timeUnit": "utchoursminutes"
},
"y": {"field": "event", "type": "nominal", "title": null}
,
"tooltip": [{"field": "durationdecimal"}]
}
}
I appreciate any help.
I don't think your durationdecimal should be temporal as there is no date/month/year provided. I tried recreating your sample using quantitative type and have done conversions on labels using labelExpr and some expressions. It mostly covers all your mentioned requirements. The only remaining part seems to be of ticks for 30 mins.
Below is the config or refer editor:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"height": "container",
"width": "container",
"data": {
"values": [
{"event": "a", "durationdecimal": 1.5},
{"event": "c", "durationdecimal": 2.1},
{"event": "b", "durationdecimal": 0.5}
]
},
"mark": {"type": "bar"},
"transform": [
{
"calculate": "split(toString(datum.durationdecimal),'.')[0] + 'd ' + (split(toString(datum.durationdecimal),'.')[1] ? floor(('0.'+split(toString(datum.durationdecimal),'.')[1])*24) + ':00': '0:00')",
"as": "x_dateLabelTooltip"
}
],
"encoding": {
"x": {
"field": "durationdecimal",
"type": "quantitative",
"axis": {
"grid": false,
"labelExpr": "split(toString(datum.label),'.')[0] + 'd ' + (split(toString(datum.label),'.')[1] ? floor(('0.'+split(toString(datum.label),'.')[1])*24) + ':00': '0:00')"
}
},
"y": {"field": "event", "type": "nominal", "title": null},
"tooltip": [{"field": "x_dateLabelTooltip"}]
}
}
Let me know if this works for you.

Vega transform to select the first n rows

Is there a Vega/Vega-Lite transform which I can use to select the first n rows in data set?
Suppose I get a dataset from a URL such as:
Person
Height
Jeremy
6.2
Alice
6.0
Walter
5.8
Amy
5.6
Joe
5.5
and I want to create a bar chart showing the height of only the three tallest people. Assume that we know for certain that the dataset from the URL is already sorted. Assume that we cannot change the data as returned by the URL.
I want to do something like this:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {
"url": "heights.csv"
},
"transform": [
{"head": 3}
],
"mark": "bar",
"encoding": {
"x": {"field": "Person", "type": "nominal"},
"y": {"field": "Height", "type": "quantitative"}
}
}
only the head transform does not actually exist - is there something else I can do to get the same effect?
The Vega-Lite documentation has an example along these lines in filtering top-k items.
Your case is a bit more specialized: you do not want to order based on rank, but rather based on the original ordering of the data. You can do this using a count-based window transform followed by an appropriate filter. For example (view in editor):
{
"data": {
"values": [
{"Person": "Jeremy", "Height": 6.2},
{"Person": "Alice", "Height": 6.0},
{"Person": "Walter", "Height": 5.8},
{"Person": "Amy", "Height": 5.6},
{"Person": "Joe", "Height": 5.5}
]
},
"transform": [
{"window": [{"op": "count", "as": "count"}]},
{"filter": "datum.count <= 3"}
],
"mark": "bar",
"encoding": {
"x": {"field": "Height", "type": "quantitative"},
"y": {"field": "Person", "type": "nominal", "sort": null}
}
}

Dynamic scale domain in vega-lite

I would like to define my x-axis:
minimum value should be now()
maximum value should be automatically determined (just as if the domain of the scale would have not been defined)
"encoding": {
"y": {
"field": "Reference",
"type": "nominal",
},
"x":{
"field": "Date",
"type": "temporal",
"scale": {"domain": [now(), 1618000000000]}}
I also tried to use an expression to set-up now(), to no success:
"scale": {"domain": ["expr":"now()", 1618000000000]
You were quite close with the second attempt; you just need to put braces around the expression statement:
"scale": {"domain": [{"expr": "now()"}, "2021-05-01T00:00:00"]}
Here's a full example (open in editor):
{
"data": {
"values": [
{"date": "2021-03-01T00:00:00", "value": 1},
{"date": "2021-04-01T00:00:00", "value": 3},
{"date": "2021-05-01T00:00:00", "value": 2}
]
},
"mark": "line",
"encoding": {
"x": {
"field": "date",
"type": "temporal",
"scale": {"domain": [{"expr": "now()"}, "2021-05-01T00:00:00"]}
},
"y": {"field": "value", "type": "quantitative"}
}
}
If instead of setting the domain limits, you just want to ensure that now() appears as part of the domain, you can use a domain unionWith statement:
"scale": {"domain": {"unionWith": [{"expr": "now()"}]}}
This will create an automatically calculated domain that contains the current date.