Vega-lite: skip invalid values instead of treating as 0 for aggregate sum - vega-lite

When doing an aggregate sum on a column and charting it using Vega-Lite, is it possible to skip invalid values instead of treating them as 0 when doing the addition? When there is missing/invalid data, I want to show it as such, rather than as 0.
For example, this graph is what I expect when aggregating on date to get the sums for x and y.
Whereas in this example, the y value for both rows where date=2022-01-20 are NaN, so I would want there to be no data point for the sum of column y and show it as missing data, instead of as 0.
Is there a way to do that? I’ve looked through the documentation but may have missed something. I've tried using filter like so, but that filters out an entire row, rather than just the invalid value of a particular column for the row when doing the sum.
I’m thinking something like pandas GroupBy.sum(min_coun=1), so that if there isn't at least 1 non-NaN value, then the result will be presented as NaN.

OK, try this which removes NaN and null but leaves zero.
Editor.
Or this which removes a load of useless transforms.
Editor
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"description": "Google's stock price over time.",
"data": {
"values": [
{"date": "2022-01-20", "g": "apples", "x": "NaN", "y": "NaN"},
{"date": "2022-01-20", "g": "oranges", "x": "10", "y": "20"},
{"date": "2022-01-21", "g": "oranges", "x": "30", "y": "NaN"},
{"date": "2022-01-21", "g": "grapes", "x": "40", "y": "20"},
{"date": "2022-01-22", "g": "apples", "x": "NaN", "y": "NaN"},
{"date": "2022-01-22", "g": "grapes", "x": "10", "y": "NaN"}
]
},
"transform": [
{"calculate": "parseFloat(datum['x'])", "as": "x"},
{"calculate": "parseFloat(datum['y'])", "as": "y"},
{"fold": ["x", "y"]},
**{"filter": {"field": "value", "valid": true}},**
{
"aggregate": [{"op": "sum", "field": "value", "as": "value"}],
"groupby": ["date", "key"]
}
],
"encoding": {"x": {"field": "date", "type": "temporal"}},
"layer": [
{
"encoding": {
"y": {"field": "value", "type": "quantitative"},
"color": {"field": "key", "type": "nominal"}
},
"mark": "line"
},
{
"encoding": {
"y": {"field": "value", "type": "quantitative"},
"color": {"field": "key", "type": "nominal"}
},
"mark": {"type": "point", "tooltip": {"content": "encoding"}}
}
]
}

Related

How to Annotate a Line in line Chart in Vega-Lite?

How to annotate a line in line chart in vega-lite
For the below code https://vega.github.io/editor/#/
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"layer": [
{
"data": {"url": "data/stocks.csv"},
"mark": "line",
"encoding": {
"x": {"field": "date", "type": "temporal"},
"y": {"field": "price", "type": "quantitative"},
"color": {"field": "symbol", "type": "nominal"}
}
},
{
"data": {"values": [{}]},
"mark": {"type": "rule", "strokeDash": [2, 2], "size": 2},
"encoding": {"x": {"datum": {"year": 2006}}}
}
]
}
We get plot
If I want to annotate the line at a specific position like (2004,400)
I tried this, it is working, but I don't want to pass hardcoded values like "a": 2004, "b": 400,
{
"data": {
"values": [
{"a": 2004, "b": 400}
]
},
"mark": {"type": "text", "fontSize" : 16, "fontWeight":"bold", "align" : "left"},
"encoding": {
"text": {"value": "Optimum"},
"x": {"field": "a", "type": "quantitative", "title":""},
"y": {"field": "b", "type": "quantitative", "title":""}
}
},
How to pass specific values from the data like average value of date (say:2004) and average value of price (say:400)?
or
just next to the line in the middle of y-axis
Transform and Aggregate Worked, I needed only for Y axis average position, so below code worked good. For another axis we can use same transform.
{
"mark": {"type": "text", "fontSize" : 16, "fontWeight":"bold", "align" : "left"},
"transform": [
{
"aggregate": [{
"op": "mean",
"field": "price",
"as": "mean_y_axis"
}],
"groupby": ["date"]
}
],
"encoding": {
"text": {"value": "Optimum"},
"x": {"field": "date",
"type": "quantitative"},
"y": {"field": "mean_y_axis",
"type": "quantitative"}
}
}

How dynamically set domainMax

How do I dynamically set y domainMax, I want the y axis value to be twice the maximum value, but domMax is a set value, I need to get an int number, any advice would be appreciated
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {
"values": [
{"date": "2021-03-01T00:00:00", "value": 1},
{"date": "2021-04-01T00:00:00", "value": 3},
{"date": "2021-05-01T00:00:00", "value": 2}
]
},
"transform": [{"calculate": "max(datum.value)+1", "as": "domMax"}],
"params": [{"name": "domMax", "value": "domMax"}],
"mark": "line",
"encoding": {
"x": {"field": "date", "type": "temporal"},
"y": {
"field": "value",
"type": "quantitative",
"aggregate":"max",
"scale": {"domainMax": {"expr": "max(domMax)*2"}}
}
}
}

Set domainMin to 6 months before max date in data

I have the following Vega-Lite chart:
Open the Chart in the Vega Editor
Currently, I have the scale set as follows:
"scale": {"domainMin": "2021-06-01"}
However, what I really want is for the domainMin to be automatically calculated to be 6 months before the latest date in the notification_date field in the data.
I've looked at aggregate and expressions, but it's not exactly clear.
How can I get the maximum value of notification_date and subtract 6 months from it, and use that in "domainMin"?
Edit: To clarify, I don't want to filter the data. I want the user to be able to zoom out or pan to see the data outside the initial 6-month window. I get exactly what I want with "scale": {"domainMin": "2021-06-01"}, but this becomes out-of-date very quickly.
I have tried giving params and expr to domainMin, but I was unable to use the data fields in expr through datum.
The 2nd approach I tried will work for you, in this you will need to make use of joinaggregate/calculate/filter transforms. You will manually gather the max year and max months and then use it to filter your data.
Below is the modified config or refer the editor url:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"width": "container",
"height": "container",
"config": {
"group": {"fill": "#e5e5e5"},
"arc": {"fill": "#2b2c39"},
"area": {"fill": "#2b2c39"},
"line": {"stroke": "#2b2c39"},
"path": {"stroke": "#2b2c39"},
"rect": {"fill": "#2b2c39"},
"shape": {"stroke": "#2b2c39"},
"symbol": {"fill": "#2b2c39"},
"range": {
"category": [
"#2283a2",
"#003e6a",
"#a1ce5e",
"#FDBE13",
"#F2727E",
"#EA3F3F",
"#25A9E0",
"#F97A08",
"#41BFB8",
"#518DCA",
"#9460A8",
"#6F7D84",
"#D1DCA5"
]
}
},
"title": "South Western Sydney Cumulative and Daily COVID-19 Cases by LGA",
"data": {
"url": "https://davidwales.github.io/nsw-covid-19-data/confirmed_cases_table1_location.csv"
},
"transform": [
{
"filter": {
"and": [
{"field": "lhd_2010_name", "equal": "South Western Sydney"},
{"not": {"field": "lga_name19", "equal": "Penrith"}}
]
}
},
{"calculate": "utcyear(datum.notification_date)", "as": "yearNumber"},
{"calculate": "utcmonth(datum.notification_date)", "as": "monthNumber"},
{
"window": [
{"op": "count", "field": "notification_date", "as": "cumulative_count"}
],
"frame": [null, 0]
},
{
"joinaggregate": [
{"field": "monthNumber", "op": "max", "as": "max_month_count"},
{"field": "yearNumber", "op": "max", "as": "max_year"}
]
},
{"calculate": "abs(datum.max_month_count-6)", "as": "min_month_count"},
{
"filter": "datum.min_month_count < datum.monthNumber && datum.max_year === datum.yearNumber"
}
],
"layer": [
{
"selection": {
"date": {"type": "interval", "bind": "scales", "encodings": ["x"]}
},
"mark": {"type": "bar", "tooltip": true},
"encoding": {
"x": {
"timeUnit": "yearmonthdate",
"field": "notification_date",
"type": "temporal",
"title": "Date"
},
"color": {
"field": "lga_name19",
"type": "nominal",
"title": "LGA",
"legend": {"orient": "top", "columns": 4}
},
"y": {
"aggregate": "count",
"field": "lga_name19",
"type": "quantitative",
"title": "Cases",
"axis": {"title": "Daily Cases by SWS LGA"}
}
}
},
{
"mark": "line",
"encoding": {
"x": {
"timeUnit": "yearmonthdate",
"field": "notification_date",
"title": "Date",
"type": "temporal"
},
"y": {
"aggregate": "max",
"field": "cumulative_count",
"type": "quantitative",
"axis": {"title": "Cumulative Cases"}
}
}
}
],
"resolve": {"scale": {"y": "independent"}}
}
A bit simpler approach to filter approximately the last six months of data might look like this:
"transform": [
...,
{"joinaggregate": [{"op": "max", "field": "notification_date", "as": "last_date"}]},
{"filter": "datum.notification_date > datum.last_date - 6 * 30 * 24 * 60 * 60 * 1000"}
]
It makes use of the fact that dates are stored as millisecond time-stamps, and has the benefit that it will work across year boundaries.
This is not quite an answer to the question you asked, but if your data is reasonably up-to-date (that is, the most recent data point is close to the current date), you can do something like this:
"scale": { "domainMin": { "expr": "timeOffset('month', now(), -6)" } }

Highest Value Wrong Colour

Just made a simple bar chart, but for some reason, the final value is the wrong colour?
Code:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"width": 800,
"height": 600,
"title": "Death Rates Amongst Ages",
"data": {"url": "https://raw.githubusercontent.com/githubuser0099/Repo55/main/AgeBracket_DeathRate.csv"},
"transform": [
{"calculate":"parseInt(datum.Death_Rate)", "as": "Death_Rate"}
],
"mark": "bar",
"encoding": {
"x": {"field": "Death_Rate", "type": "quantitative", "title": ""},
"y": {"field": "Age", "type": "nominal", "title": "", "sort": "-x"},
"color": {
"field": "Age",
"type": "nominal",
"scale": {"scheme": "reds"}
}
}
}
The problem with your colour scale is: "Age" is currently encoded as a string (nominal variable). You define the type of "Age" as "nominal", but use a sequential colour scale ("reds"). Your data also has some issues - there are some empty spaces before 5-9, and 10-14.
In string comparison, white space < "0" < "100" < "15".
To solve the issue, we can get the first number from the range, and then define another channel to encode this first number (but hide the legend), then in the colour channel, you can define the colour order based on this additional channel.
Check the result and the codes below.
I have printed out the data and let you know how the calculation works.
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"width": 800,
"height": 600,
"title": "Death Rates Amongst Ages",
"data": {"url": "https://raw.githubusercontent.com/githubuser0099/Repo55/main/AgeBracket_DeathRate.csv"},
"transform": [
{"calculate":"parseInt(datum.Death_Rate)", "as": "Death_Rate"},
{"calculate": "split(datum['Age'], '-')[0]", "as": "Age_new"},
{"calculate": "replace(datum['Age_new'], ' ', '')", "as": "Age_new_2"},
{"calculate": "replace(datum['Age_new_2'], ' ', '')", "as": "Age_new_3"},
{"calculate": "parseInt(datum['Age_new_3'])", "as": "Age_new_4"}
],
"mark": "bar",
"encoding": {
"x": {"field": "Death_Rate", "type": "quantitative", "title": ""},
"y": {"field": "Age", "type": "nominal", "title": "", "sort": "-x"},
"opacity":{"field": "Age_new_4", "legend": null},
"color": {
"field": "Age",
"type": "ordinal",
"sort": "opacity",
"scale": {"scheme": "reds"}
}
}
}
Cheers,
KL

Vega-Lite: How do I include image marks in a doughnut chart?

I would like to have image marks surrounding my doughnut chart instead of texts. The example for image marks use x and y for its coordinate. How should I adjust that for a doughnut chart where we work with radius and theta?
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"description": "A simple pie chart with labels.",
"data": {
"values": [
{"category": "a", "value": 4, "image": url},
{"category": "b", "value": 6, "image": url},
{"category": "c", "value": 10, "image": url},
{"category": "d", "value": 3, "image": url},
{"category": "e", "value": 7, "image": url},
{"category": "f", "value": 8, "image": url}
]
},
"encoding": {
"theta": {"field": "value", "type": "quantitative", "stack": true},
"color": {"field": "category", "type": "nominal", "legend": null}
},
"layer": [{
"mark": {"type": "arc", "outerRadius": 80}
}, {
"mark": {"type": "text", "radius": 90},
"encoding": {
"text": {"field": "category", "type": "nominal"}
}
}],
"view": {"stroke": null}
}
New vega version:
Open the Chart in the Vega Editor
After some trials and reading through the doc, it seems Image Mark cannot be positioned by theta encoding, but the example shows that x and y encodings are supported.
Therefore, I worked out this positioning via simple trigonometry and an extra layer to place the images in the doughnut:
{
"transform": [
{"joinaggregate": [{"op":"sum", "field": "value", "as": "total"}]},
{
"window": [{"op": "sum", "field": "value", "as": "cum"}],
"frame": [null, 0]
},
{"calculate": "cos(2*PI*(datum.cum-datum.value/2)/datum.total)", "as": "y"},
{"calculate": "sin(2*PI*(datum.cum-datum.value/2)/datum.total)", "as": "x"}
],
"mark": {"type": "image", "width": 20, "height": 20},
"encoding": {
"url": {"field": "image"},
"x": {"field": "x", "type": "quantitative", "scale": {"domain": [-2, 2]}, "axis": null},
"y": {"field": "y", "type": "quantitative", "scale": {"domain": [-2, 2]}, "axis": null}
}
}
Yet another Vega Editor
As the order is messed up by the color encoding mentioned in comments below, a new window transform is added to generate an extra ordering field which is provided to color field
Renewed Vega Editor
3 changes were made: (2021-07-16)
Using cos in the calculate of y
Using sin in the calculate of x
Messing up the data value to check if working
Old & Wrong Vega Editor