vega-lite: how to aggregate by week - vega-lite

I have seen that it's possible aggregate using several time units, in example by month, but not by week.
And I have seen that in vega it's possible to customize the time unit https://vega.github.io/vega/docs/transforms/timeunit/#chronological-time-units
Is it possible to use it in vega-lite and aggregate by week, and transform in example this aggregation from month to week?
Thank you

You can group by week using a monthdate timeUnit with a step size of 7:
"timeUnit": {"unit": "monthdate", "step": 7}
For example:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {"url": "data/seattle-temps.csv"},
"mark": "line",
"encoding": {
"x": {"timeUnit": {"unit": "yearmonthdate", "step": 7}, "field": "date", "type": "temporal"},
"y": {"aggregate": "mean", "field": "temp", "type": "quantitative"}
}
}
Note, however, that this starts a new week at the beginning of each month, which means if you do a heatmap by day of week and week there are gaps:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {"url": "data/seattle-temps.csv"},
"mark": "rect",
"encoding": {
"y": {"timeUnit": "day", "field": "date", "type": "ordinal"},
"x": {"timeUnit": {"unit": "yearmonthdate", "step": 7}, "field": "date", "type": "ordinal"},
"color": {"aggregate": "mean", "field": "temp", "type": "quantitative"}
}
}
If you want more fine-grained control over where weeks start, that's unfortunately not expressible as a timeUnit, but you can take advantage of Vega-Lite's full transform syntax to make more customized aggregates. For example, here we compute the week-of-year by counting Sundays in the data:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {"url": "data/seattle-temps.csv"},
"transform": [
{"timeUnit": "yearmonthdate", "field": "date", "as": "date"},
{
"aggregate": [{"op": "mean", "field": "temp", "as": "temp"}],
"groupby": ["date"]
},
{"calculate": "day(datum.date) == 0", "as": "sundays"},
{
"window": [{"op": "sum", "field": "sundays", "as": "week"}],
"sort": "date"
}
],
"mark": "rect",
"encoding": {
"y": {"timeUnit": "day", "field": "date", "type": "ordinal", "title": "Day of Week"},
"x": {"field": "week", "type": "ordinal", "title": "Week of year"},
"color": {"aggregate": "mean", "field": "temp", "type": "quantitative"}
}
}

Related

How to Annotate a Line in line Chart in Vega-Lite?

How to annotate a line in line chart in vega-lite
For the below code https://vega.github.io/editor/#/
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"layer": [
{
"data": {"url": "data/stocks.csv"},
"mark": "line",
"encoding": {
"x": {"field": "date", "type": "temporal"},
"y": {"field": "price", "type": "quantitative"},
"color": {"field": "symbol", "type": "nominal"}
}
},
{
"data": {"values": [{}]},
"mark": {"type": "rule", "strokeDash": [2, 2], "size": 2},
"encoding": {"x": {"datum": {"year": 2006}}}
}
]
}
We get plot
If I want to annotate the line at a specific position like (2004,400)
I tried this, it is working, but I don't want to pass hardcoded values like "a": 2004, "b": 400,
{
"data": {
"values": [
{"a": 2004, "b": 400}
]
},
"mark": {"type": "text", "fontSize" : 16, "fontWeight":"bold", "align" : "left"},
"encoding": {
"text": {"value": "Optimum"},
"x": {"field": "a", "type": "quantitative", "title":""},
"y": {"field": "b", "type": "quantitative", "title":""}
}
},
How to pass specific values from the data like average value of date (say:2004) and average value of price (say:400)?
or
just next to the line in the middle of y-axis
Transform and Aggregate Worked, I needed only for Y axis average position, so below code worked good. For another axis we can use same transform.
{
"mark": {"type": "text", "fontSize" : 16, "fontWeight":"bold", "align" : "left"},
"transform": [
{
"aggregate": [{
"op": "mean",
"field": "price",
"as": "mean_y_axis"
}],
"groupby": ["date"]
}
],
"encoding": {
"text": {"value": "Optimum"},
"x": {"field": "date",
"type": "quantitative"},
"y": {"field": "mean_y_axis",
"type": "quantitative"}
}
}

Set domainMin to 6 months before max date in data

I have the following Vega-Lite chart:
Open the Chart in the Vega Editor
Currently, I have the scale set as follows:
"scale": {"domainMin": "2021-06-01"}
However, what I really want is for the domainMin to be automatically calculated to be 6 months before the latest date in the notification_date field in the data.
I've looked at aggregate and expressions, but it's not exactly clear.
How can I get the maximum value of notification_date and subtract 6 months from it, and use that in "domainMin"?
Edit: To clarify, I don't want to filter the data. I want the user to be able to zoom out or pan to see the data outside the initial 6-month window. I get exactly what I want with "scale": {"domainMin": "2021-06-01"}, but this becomes out-of-date very quickly.
I have tried giving params and expr to domainMin, but I was unable to use the data fields in expr through datum.
The 2nd approach I tried will work for you, in this you will need to make use of joinaggregate/calculate/filter transforms. You will manually gather the max year and max months and then use it to filter your data.
Below is the modified config or refer the editor url:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"width": "container",
"height": "container",
"config": {
"group": {"fill": "#e5e5e5"},
"arc": {"fill": "#2b2c39"},
"area": {"fill": "#2b2c39"},
"line": {"stroke": "#2b2c39"},
"path": {"stroke": "#2b2c39"},
"rect": {"fill": "#2b2c39"},
"shape": {"stroke": "#2b2c39"},
"symbol": {"fill": "#2b2c39"},
"range": {
"category": [
"#2283a2",
"#003e6a",
"#a1ce5e",
"#FDBE13",
"#F2727E",
"#EA3F3F",
"#25A9E0",
"#F97A08",
"#41BFB8",
"#518DCA",
"#9460A8",
"#6F7D84",
"#D1DCA5"
]
}
},
"title": "South Western Sydney Cumulative and Daily COVID-19 Cases by LGA",
"data": {
"url": "https://davidwales.github.io/nsw-covid-19-data/confirmed_cases_table1_location.csv"
},
"transform": [
{
"filter": {
"and": [
{"field": "lhd_2010_name", "equal": "South Western Sydney"},
{"not": {"field": "lga_name19", "equal": "Penrith"}}
]
}
},
{"calculate": "utcyear(datum.notification_date)", "as": "yearNumber"},
{"calculate": "utcmonth(datum.notification_date)", "as": "monthNumber"},
{
"window": [
{"op": "count", "field": "notification_date", "as": "cumulative_count"}
],
"frame": [null, 0]
},
{
"joinaggregate": [
{"field": "monthNumber", "op": "max", "as": "max_month_count"},
{"field": "yearNumber", "op": "max", "as": "max_year"}
]
},
{"calculate": "abs(datum.max_month_count-6)", "as": "min_month_count"},
{
"filter": "datum.min_month_count < datum.monthNumber && datum.max_year === datum.yearNumber"
}
],
"layer": [
{
"selection": {
"date": {"type": "interval", "bind": "scales", "encodings": ["x"]}
},
"mark": {"type": "bar", "tooltip": true},
"encoding": {
"x": {
"timeUnit": "yearmonthdate",
"field": "notification_date",
"type": "temporal",
"title": "Date"
},
"color": {
"field": "lga_name19",
"type": "nominal",
"title": "LGA",
"legend": {"orient": "top", "columns": 4}
},
"y": {
"aggregate": "count",
"field": "lga_name19",
"type": "quantitative",
"title": "Cases",
"axis": {"title": "Daily Cases by SWS LGA"}
}
}
},
{
"mark": "line",
"encoding": {
"x": {
"timeUnit": "yearmonthdate",
"field": "notification_date",
"title": "Date",
"type": "temporal"
},
"y": {
"aggregate": "max",
"field": "cumulative_count",
"type": "quantitative",
"axis": {"title": "Cumulative Cases"}
}
}
}
],
"resolve": {"scale": {"y": "independent"}}
}
A bit simpler approach to filter approximately the last six months of data might look like this:
"transform": [
...,
{"joinaggregate": [{"op": "max", "field": "notification_date", "as": "last_date"}]},
{"filter": "datum.notification_date > datum.last_date - 6 * 30 * 24 * 60 * 60 * 1000"}
]
It makes use of the fact that dates are stored as millisecond time-stamps, and has the benefit that it will work across year boundaries.
This is not quite an answer to the question you asked, but if your data is reasonably up-to-date (that is, the most recent data point is close to the current date), you can do something like this:
"scale": { "domainMin": { "expr": "timeOffset('month', now(), -6)" } }

Highest Value Wrong Colour

Just made a simple bar chart, but for some reason, the final value is the wrong colour?
Code:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"width": 800,
"height": 600,
"title": "Death Rates Amongst Ages",
"data": {"url": "https://raw.githubusercontent.com/githubuser0099/Repo55/main/AgeBracket_DeathRate.csv"},
"transform": [
{"calculate":"parseInt(datum.Death_Rate)", "as": "Death_Rate"}
],
"mark": "bar",
"encoding": {
"x": {"field": "Death_Rate", "type": "quantitative", "title": ""},
"y": {"field": "Age", "type": "nominal", "title": "", "sort": "-x"},
"color": {
"field": "Age",
"type": "nominal",
"scale": {"scheme": "reds"}
}
}
}
The problem with your colour scale is: "Age" is currently encoded as a string (nominal variable). You define the type of "Age" as "nominal", but use a sequential colour scale ("reds"). Your data also has some issues - there are some empty spaces before 5-9, and 10-14.
In string comparison, white space < "0" < "100" < "15".
To solve the issue, we can get the first number from the range, and then define another channel to encode this first number (but hide the legend), then in the colour channel, you can define the colour order based on this additional channel.
Check the result and the codes below.
I have printed out the data and let you know how the calculation works.
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"width": 800,
"height": 600,
"title": "Death Rates Amongst Ages",
"data": {"url": "https://raw.githubusercontent.com/githubuser0099/Repo55/main/AgeBracket_DeathRate.csv"},
"transform": [
{"calculate":"parseInt(datum.Death_Rate)", "as": "Death_Rate"},
{"calculate": "split(datum['Age'], '-')[0]", "as": "Age_new"},
{"calculate": "replace(datum['Age_new'], ' ', '')", "as": "Age_new_2"},
{"calculate": "replace(datum['Age_new_2'], ' ', '')", "as": "Age_new_3"},
{"calculate": "parseInt(datum['Age_new_3'])", "as": "Age_new_4"}
],
"mark": "bar",
"encoding": {
"x": {"field": "Death_Rate", "type": "quantitative", "title": ""},
"y": {"field": "Age", "type": "nominal", "title": "", "sort": "-x"},
"opacity":{"field": "Age_new_4", "legend": null},
"color": {
"field": "Age",
"type": "ordinal",
"sort": "opacity",
"scale": {"scheme": "reds"}
}
}
}
Cheers,
KL

vega: filter nth of each group

If I were to group by date, how would I filter the nth entry of each group?
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {"url": "https://cdn.jsdelivr.net/npm/vega-datasets#v1.29.0/data/seattle-temps.csv"},
"mark": "point",
"encoding": {
"x": {"field": "date", "type": "temporal"},
"y": {"field": "temp", "type": "quantitative"}
}
}
edit
Let's keep this data-agnostic as my data has many columns and I would like rows in their entirety.
Transforms overview:
Convert times to dates for grouping.
Group by date and number each row within the groups.
Filter on nth row.
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {"url": "https://cdn.jsdelivr.net/npm/vega-datasets#v1.29.0/data/seattle-temps.csv"},
"transform": [
{"timeUnit": "yearmonthdate", "field": "date", "as": "date"},
{
"window": [{"op": "row_number", "as": "row"}],
"groupby": ["date"]
},
{"calculate": "datum.index", "as":"newnew"},
{"filter": "datum['row'] == 1"}
],
"mark": "point",
"encoding": {
"x": {"field": "date", "type": "temporal"},
"y": {"field": "temp", "type": "quantitative"}
}
}
The downside is that the vega editor becomes very slow after adding the window transform.

Stacked bar chart is created even though aggregation function is non-summative

I have a concatenated graph which features a main line graph which has a brush selection tool allowing the user to pan across the lines and points and change the data on 4 other graphs. For one of the other graphs, I have attempted to take the average of line graph data but it doesn't work. Instead of giving me a singular bar, I get stacked bars and the error: "Stacking is applied even though the aggregate function is non-summative ("mean")".
Here is my code:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"title": "This is kinda sick yo",
"data": {
"url": "data/test3.csv"
},
"hconcat": [
{
"encoding": {
"color": {
"condition": {
"selection": "brush",
"title": "Species",
"field": "Species",
"type": "nominal",
"scale": {"range": ["green", "#FFFF00", "red"]}
},
"value": "lightgray"
},
"x": {
"field": "Variable",
"type": "nominal",
"axis": {"labelAngle": -45, "title": "Element",
"grid": false}
},
"y": {
"title": "Total",
"field": "Total",
"type": "quantitative"
},
"tooltip": [
{"field": "Variable", "type": "nominal"},
{"field": "Total", "type": "quantitative"},
]
},
"width": 550,
"height": 300,
"mark": {"type": "line", "point": "true"},
"selection": {"brush": {"encodings": ["x"], "type": "interval"}},
"transform": [{"filter": {"selection": "click"}}]
},
{
"encoding": {
"color": {
"condition": {
"selection": "click",
"field": "Total",
"type": "quantitative",
"scale": {"range": ["green", "#FFFF00", "red"]}
},
"value": "lightgray"
},
"y": {"field": "Total", "aggregate": "average"},
"x": {"title": "Species", "field": "Species", "type": "nominal"},
"tooltip": [
{"field": "Species", "type": "nominal"},
{"field": "Total", "type": "quantitative", "aggregate": "average"},
{"field": "Variable", "type": "nominal"}
]
},
"height": 300,
"width": 80,
"mark": "bar",
"selection": {"click": {"encodings": ["color"], "type": "multi"}},
"transform": [{"filter": {"selection": "brush"}}, ]
},
{
"encoding": {
"color": {
"condition": {
"selection": "click",
"field": "Sex",
"type": "nominal",
"scale": {"range": ["#993162", "#75b0a2", "grey"]},
"legend": null
},
"value": "lightgray"
},
"y": {"field": "Fisher Sex Value", "type": "quantitative", "aggregate": "mean"},
"x": {"title": "Sex", "field": "Sex", "type": "nominal"},
"tooltip": [
{"field": "Sex", "type": "nominal"},
{"field": "Fisher Sex Value", "type": "quantitative", "aggregate": "mean"},
]
},
"height": 300,
"width": 75,
"mark": "bar",
"selection": {"click": {"encodings": ["color"], "type": "multi"}},
"transform": [{"filter": {"selection": "brush"}}]
},
{
"encoding": {
"color": {
"condition": {
"selection": "click",
"field": "Sex",
"type": "nominal",
"scale": {"range": ["#993162", "#75b0a2", "grey"]},
"legend": null
},
"value": "lightgray"
},
"y": {"field": "Mink Sex Value", "type": "quantitative", "aggregate": "mean"},
"x": {"title": "Sex", "field": "Sex", "type": "nominal"},
"tooltip": [
{"field": "Sex", "type": "nominal"},
{"field": "Mink Sex Value", "type": "quantitative", "aggregate": "mean"},
]
},
"height": 300,
"width": 75,
"mark": "bar",
"selection": {"click": {"encodings": ["color"], "type": "multi"}},
"transform": [{"filter": {"selection": "brush"}}]
},
{
"encoding": {
"color": {
"condition": {
"selection": "click",
"field": "Sex",
"type": "nominal",
"scale": {"range": ["#993162", "#75b0a2", "grey"]}
},
"value": "lightgray"
},
"y": {"field": "Otter Sex Value", "type": "quantitative", "aggregate": "mean"},
"x": {"title": "Sex", "field": "Sex", "type": "nominal"},
"tooltip": [
{"field": "Sex", "type": "nominal"},
{"field": "Otter Sex Value", "type": "quantitative", "aggregate": "mean"},
]
},
"height": 300,
"width": 75,
"mark": "bar",
"selection": {"click": {"encodings": ["color"], "type": "multi"}},
"transform": [{"filter": {"selection": "brush"}}]
}
]
}
The first graph is the line graph and the second graph is the one where aggregation fails and I get stacks.Here is an image of what the graph looks like currently. Any help would be much appreciated.
Vega-Lite's encoding aggregations will implicitly group by unaggregated fields you specify in a set of encodings. A simplified version of the second chart's encoding looks like this:
{
"encoding": {
"color": {"field": "Total"},
"y": {"field": "Total", "aggregate": "average"},
"x": {"field": "Species"},
"tooltip": [
{"field": "Species"},
{"field": "Total", "aggregate": "average"},
{"field": "Variable"}
]
The unaggregated encodings are ["Total", "Species", "Variable"], so the operation will group-by these before computing the average of Total within each group. Grouping by unique values of Total before taking the mean of Total in each group is probably not what you were hoping for.
Perhaps removing the color encoding from that chart will give you more meaningful results.