Vega transform to select the first n rows - vega-lite

Is there a Vega/Vega-Lite transform which I can use to select the first n rows in data set?
Suppose I get a dataset from a URL such as:
Person
Height
Jeremy
6.2
Alice
6.0
Walter
5.8
Amy
5.6
Joe
5.5
and I want to create a bar chart showing the height of only the three tallest people. Assume that we know for certain that the dataset from the URL is already sorted. Assume that we cannot change the data as returned by the URL.
I want to do something like this:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {
"url": "heights.csv"
},
"transform": [
{"head": 3}
],
"mark": "bar",
"encoding": {
"x": {"field": "Person", "type": "nominal"},
"y": {"field": "Height", "type": "quantitative"}
}
}
only the head transform does not actually exist - is there something else I can do to get the same effect?

The Vega-Lite documentation has an example along these lines in filtering top-k items.
Your case is a bit more specialized: you do not want to order based on rank, but rather based on the original ordering of the data. You can do this using a count-based window transform followed by an appropriate filter. For example (view in editor):
{
"data": {
"values": [
{"Person": "Jeremy", "Height": 6.2},
{"Person": "Alice", "Height": 6.0},
{"Person": "Walter", "Height": 5.8},
{"Person": "Amy", "Height": 5.6},
{"Person": "Joe", "Height": 5.5}
]
},
"transform": [
{"window": [{"op": "count", "as": "count"}]},
{"filter": "datum.count <= 3"}
],
"mark": "bar",
"encoding": {
"x": {"field": "Height", "type": "quantitative"},
"y": {"field": "Person", "type": "nominal", "sort": null}
}
}

Related

Vega-Lite v4 - Pivot example (and my code) throwing errors

So I am working with vega-lite-v4 (that's the version our businesses airtable extension uses) and the answer to my previous post was that I need to use the pivot transform
But any time I try and use it as it is explained in the v4 documentation (https://vega.github.io/vega-lite-v4/docs/pivot.html) it throws an error as if the pivoted field does not exist
I've used the following test data:
Airtable test data
With the following test code:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"title": "Table 2",
"transform": [{
"pivot": "type",
"value": "calls",
"groupby": ["Month"]
}],
"mark": "bar",
"encoding": {
"x": {"field": "Month", "type": "nominal"},
"y": {"field": "Total", "type": "quantitative"}
}
}
And I still get the same error:
Total is not a valid transform name, inline data name, or field name from the Table 2 table:
"y": {
"field": "Total",
------------^
"type": "quantitative"
}
Even when I copy and paste the examples from the above documentation into the widget, it comes up with this error like pivot isn't making these fields
Can anyone help me figure out why this isn't working, or what to use instead?
EDIT:
So, a weird solution/workaround I found is to calculate the field as itself:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"title": "Table 2",
"transform": [{
"pivot": "type",
"value": "calls",
"groupby": ["Month"]
},
{"calculate" : "datum.Total", "as" : "newTotal"}
],
"mark": "bar",
"encoding": {
"x": {"field": "Month", "type": "nominal"},
"y": {"field": "newTotal", "type": "quantitative"}
}
}
This makes the graph behave completely as normal. I can use this for now, but it means I have to hard code each field name with a calculate transform, does this help anyone understand what's going on with this transform?
First of all, Vega and Vega-lite field names are case-sensitive, so "Month" is not the same as "month".
In your first code sample, "month" is incorrect and should be "Month":
"x": {"field": "month", "type": "nominal"},
but in the second code sample that was changed to "Month" which is correct:
"x": {"field": "Month", "type": "nominal"},
Try just correcting field name "Month" in the first code sample without calculating "newTotal":
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"title": "Table 2",
"transform": [{
"pivot": "type",
"value": "calls",
"groupby": ["Month"]
}],
"mark": "bar",
"encoding": {
"x": {"field": "Month", "type": "nominal"},
"y": {"field": "Total", "type": "quantitative"}
}
}
[EDIT: added following]
Here is a working example using your example data with pivot transform and rendered as bar chart by Vega-lite v5.2.0 with no errors.
Try using Vega-lite v5.2.0 instead of v4.
View in Vega on-line editor

theta and theta2 chanel encoding by field

I want to create arcs using vega-lite, here you will find a working example: enter link description here
but this does not work: enter link description here
The difference is only that I use this in the working example
"encoding": {
"theta": {"value": {"expr": "datum.thta"}},
"theta2": {"value": {"expr": "datum.thta2"}} }
And this is the code that does not work:
"encoding": {
"theta": {"field": "thta"},
"theta2": {"field": "thta2"}} }
Can someone explain why using the "fields" creates a half circle (not wanted) and "value" creates the quarter circle (wanted)?
Many thanks in advance
When you specify values via encodings, the scale is automatically adjusted based on the content of the data. You can fix this by specifying that the scale should be from 0 to 2pi (open in editor):
{
"width": 80,
"height": 80,
"params": [{"name": "radius", "value": 0}, {"name": "radius2", "value": 50}],
"data": {
"values": [{"name": "arc1", "quadrant": "TopRight", "ring": "Hold"}]
},
"transform": [
{
"calculate": "if(datum.quadrant === 'TopRight' , PI*0.5 , null)",
"as": "thta2"
},
{"calculate": "if(datum.quadrant === 'TopRight' , 0 , null)", "as": "thta"}
],
"mark": {
"type": "arc",
"radius": {"expr": "radius"},
"radius2": {"expr": "radius2"}
},
"encoding": {
"theta": {"field": "thta", "scale": {"domain": [0, "2*PI"]}},
"theta2": {"field": "thta2"}
}
}

Vega-lite: skip invalid values instead of treating as 0 for aggregate sum

When doing an aggregate sum on a column and charting it using Vega-Lite, is it possible to skip invalid values instead of treating them as 0 when doing the addition? When there is missing/invalid data, I want to show it as such, rather than as 0.
For example, this graph is what I expect when aggregating on date to get the sums for x and y.
Whereas in this example, the y value for both rows where date=2022-01-20 are NaN, so I would want there to be no data point for the sum of column y and show it as missing data, instead of as 0.
Is there a way to do that? I’ve looked through the documentation but may have missed something. I've tried using filter like so, but that filters out an entire row, rather than just the invalid value of a particular column for the row when doing the sum.
I’m thinking something like pandas GroupBy.sum(min_coun=1), so that if there isn't at least 1 non-NaN value, then the result will be presented as NaN.
OK, try this which removes NaN and null but leaves zero.
Editor.
Or this which removes a load of useless transforms.
Editor
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"description": "Google's stock price over time.",
"data": {
"values": [
{"date": "2022-01-20", "g": "apples", "x": "NaN", "y": "NaN"},
{"date": "2022-01-20", "g": "oranges", "x": "10", "y": "20"},
{"date": "2022-01-21", "g": "oranges", "x": "30", "y": "NaN"},
{"date": "2022-01-21", "g": "grapes", "x": "40", "y": "20"},
{"date": "2022-01-22", "g": "apples", "x": "NaN", "y": "NaN"},
{"date": "2022-01-22", "g": "grapes", "x": "10", "y": "NaN"}
]
},
"transform": [
{"calculate": "parseFloat(datum['x'])", "as": "x"},
{"calculate": "parseFloat(datum['y'])", "as": "y"},
{"fold": ["x", "y"]},
**{"filter": {"field": "value", "valid": true}},**
{
"aggregate": [{"op": "sum", "field": "value", "as": "value"}],
"groupby": ["date", "key"]
}
],
"encoding": {"x": {"field": "date", "type": "temporal"}},
"layer": [
{
"encoding": {
"y": {"field": "value", "type": "quantitative"},
"color": {"field": "key", "type": "nominal"}
},
"mark": "line"
},
{
"encoding": {
"y": {"field": "value", "type": "quantitative"},
"color": {"field": "key", "type": "nominal"}
},
"mark": {"type": "point", "tooltip": {"content": "encoding"}}
}
]
}

Set domainMin to 6 months before max date in data

I have the following Vega-Lite chart:
Open the Chart in the Vega Editor
Currently, I have the scale set as follows:
"scale": {"domainMin": "2021-06-01"}
However, what I really want is for the domainMin to be automatically calculated to be 6 months before the latest date in the notification_date field in the data.
I've looked at aggregate and expressions, but it's not exactly clear.
How can I get the maximum value of notification_date and subtract 6 months from it, and use that in "domainMin"?
Edit: To clarify, I don't want to filter the data. I want the user to be able to zoom out or pan to see the data outside the initial 6-month window. I get exactly what I want with "scale": {"domainMin": "2021-06-01"}, but this becomes out-of-date very quickly.
I have tried giving params and expr to domainMin, but I was unable to use the data fields in expr through datum.
The 2nd approach I tried will work for you, in this you will need to make use of joinaggregate/calculate/filter transforms. You will manually gather the max year and max months and then use it to filter your data.
Below is the modified config or refer the editor url:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"width": "container",
"height": "container",
"config": {
"group": {"fill": "#e5e5e5"},
"arc": {"fill": "#2b2c39"},
"area": {"fill": "#2b2c39"},
"line": {"stroke": "#2b2c39"},
"path": {"stroke": "#2b2c39"},
"rect": {"fill": "#2b2c39"},
"shape": {"stroke": "#2b2c39"},
"symbol": {"fill": "#2b2c39"},
"range": {
"category": [
"#2283a2",
"#003e6a",
"#a1ce5e",
"#FDBE13",
"#F2727E",
"#EA3F3F",
"#25A9E0",
"#F97A08",
"#41BFB8",
"#518DCA",
"#9460A8",
"#6F7D84",
"#D1DCA5"
]
}
},
"title": "South Western Sydney Cumulative and Daily COVID-19 Cases by LGA",
"data": {
"url": "https://davidwales.github.io/nsw-covid-19-data/confirmed_cases_table1_location.csv"
},
"transform": [
{
"filter": {
"and": [
{"field": "lhd_2010_name", "equal": "South Western Sydney"},
{"not": {"field": "lga_name19", "equal": "Penrith"}}
]
}
},
{"calculate": "utcyear(datum.notification_date)", "as": "yearNumber"},
{"calculate": "utcmonth(datum.notification_date)", "as": "monthNumber"},
{
"window": [
{"op": "count", "field": "notification_date", "as": "cumulative_count"}
],
"frame": [null, 0]
},
{
"joinaggregate": [
{"field": "monthNumber", "op": "max", "as": "max_month_count"},
{"field": "yearNumber", "op": "max", "as": "max_year"}
]
},
{"calculate": "abs(datum.max_month_count-6)", "as": "min_month_count"},
{
"filter": "datum.min_month_count < datum.monthNumber && datum.max_year === datum.yearNumber"
}
],
"layer": [
{
"selection": {
"date": {"type": "interval", "bind": "scales", "encodings": ["x"]}
},
"mark": {"type": "bar", "tooltip": true},
"encoding": {
"x": {
"timeUnit": "yearmonthdate",
"field": "notification_date",
"type": "temporal",
"title": "Date"
},
"color": {
"field": "lga_name19",
"type": "nominal",
"title": "LGA",
"legend": {"orient": "top", "columns": 4}
},
"y": {
"aggregate": "count",
"field": "lga_name19",
"type": "quantitative",
"title": "Cases",
"axis": {"title": "Daily Cases by SWS LGA"}
}
}
},
{
"mark": "line",
"encoding": {
"x": {
"timeUnit": "yearmonthdate",
"field": "notification_date",
"title": "Date",
"type": "temporal"
},
"y": {
"aggregate": "max",
"field": "cumulative_count",
"type": "quantitative",
"axis": {"title": "Cumulative Cases"}
}
}
}
],
"resolve": {"scale": {"y": "independent"}}
}
A bit simpler approach to filter approximately the last six months of data might look like this:
"transform": [
...,
{"joinaggregate": [{"op": "max", "field": "notification_date", "as": "last_date"}]},
{"filter": "datum.notification_date > datum.last_date - 6 * 30 * 24 * 60 * 60 * 1000"}
]
It makes use of the fact that dates are stored as millisecond time-stamps, and has the benefit that it will work across year boundaries.
This is not quite an answer to the question you asked, but if your data is reasonably up-to-date (that is, the most recent data point is close to the current date), you can do something like this:
"scale": { "domainMin": { "expr": "timeOffset('month', now(), -6)" } }

how to control the order of groups in a vega-lite stacked bar chart

Given a stacked bar chart such as in this example: https://vega.github.io/editor/?#/examples/vega-lite/stacked_bar_weather
I want to control the order of the items in the aggregation so that, for example, 'fog' comes at the bottom, with 'sun' next etc. Is that possible?
The reason for this is I have one type that is much larger than the others. I want this to appear at the bottom, then control the domain to 'cut off' most of that section.
Thanks
You can control the stack order via the order encoding: see https://vega.github.io/vega-lite/docs/stack.html#sorting-stack-order
Unfortunately, this only allows sorting by field value, rather than by an explicit order as you want here. The workaround is to use a calculate transform to turn your explicit order into a field (view in editor):
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {"url": "data/seattle-weather.csv"},
"transform": [
{
"calculate": "indexof(['sun', 'fog', 'drizzle', 'rain', 'snow'], datum.weather)",
"as": "order"
}
],
"mark": "bar",
"encoding": {
"x": {
"timeUnit": "month",
"field": "date",
"type": "ordinal",
"axis": {"title": "Month of the year"}
},
"y": {"aggregate": "count", "type": "quantitative"},
"color": {
"field": "weather",
"type": "nominal",
"scale": {
"domain": ["sun", "fog", "drizzle", "rain", "snow"],
"range": ["#e7ba52", "#c7c7c7", "#aec7e8", "#1f77b4", "#9467bd"]
},
"legend": {"title": "Weather type"}
},
"order": {"field": "order", "type": "ordinal"}
}
}