In Vega-lite, how do I filter by time? - vega-lite

I have a CSV with a row per 15-minute interval and columns for metrics like airflow and temperature. I'd like to filter the data so I can plot only the most recent day or week.
How do I add a filter for a date (Mar 9) or date range (Mar 6–12)? Is it more common to do filtering and aggregation (to hourly or daily averages) before handing the data to Vega-Lite?
Here's my code without a filter:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"title": "Average Airflow",
"description": "Average Airflow in Rooms",
"data": {"url": "hvac_data-wide.csv"},
"mark": "line",
"width": 608,
"height": 342,
"transform": [{
"timeUnit":"dayhours",
"field": "Date",
"as": "hours"
}],
"encoding": {
"x": {"field": "hours", "type": "temporal", "title": "Time"},
"y": {"field": "SaFl", "type": "quantitative", "aggregate": "mean", "title": "Source Air CFM"},
"color": {"field": "Floor", "type": "nominal"}
}
}
Here's some sample data:
Date,Category,Floor,Name,HVACModeStatus,RmTmp,SaFl,EaFl,CO2,NPW,ChW,wh,kwh
2022-03-07 08:10,zone,1,Lab1115,4,70.88,374.717,1109.641,,,,,
2022-03-07 08:10,zone,1,Lab1121,4,70.16,1700.559,1897.229,,,,,
2022-03-07 08:10,zone,1,Lab1126,2,73.22,1061.672,1572.01,,,,,
2022-03-07 08:15,zone,1,Lab1115,4,70.88,349.848,1170.564,,,,,
2022-03-07 08:15,zone,1,Lab1121,4,70.16,1699.6,1870.382,,,,,
2022-03-07 08:15,zone,1,Lab1126,2,73.22,1092.875,1606.451,,,,,
2022-03-07 08:20,zone,1,Lab1115,4,70.88,376.867,1156.398,,,,,
2022-03-07 08:20,zone,1,Lab1121,4,70.16,1692.929,1875.636,,,,,
2022-03-07 08:20,zone,1,Lab1126,2,73.22,1148.222,1580.696,,,,,
Thank you in advance.

Short answer: Filters can be added within the transform.
Here's code to show a date range filter:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"title": "Average Airflow",
"description": "Average Airflow in Rooms",
"data": {"url": "hvac_data-wide.csv"},
"mark": "line",
"width": 608,
"height": 342,
"transform": [
{
"timeUnit":"dayhours",
"field": "Date",
"as": "hours"
},
{"filter": {"field": "Date",
"range": [ {"year": 2022, "month": 3, "date": 8, "hours": 0},
{"year": 2022, "month": 3, "date": 8, "hours": 24, "minutes": 59}]}}
],
"encoding": {
"x": {"field": "hours", "type": "temporal", "title": "Time"},
"y": {"field": "SaFl", "type": "quantitative", "aggregate": "mean", "title": "Exhaust CFM"},
"color": {"field": "Floor", "type": "nominal"}
}
}
They can also be simple expression string, substitute this filter block: { "filter": "datum.Floor == 3"}
Or a primitive: { "filter": {"field": "Floor", "equal": 3}}
More info about filters on the Vega-Lite docs Predicate page.

Related

How to Annotate a Line in line Chart in Vega-Lite?

How to annotate a line in line chart in vega-lite
For the below code https://vega.github.io/editor/#/
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"layer": [
{
"data": {"url": "data/stocks.csv"},
"mark": "line",
"encoding": {
"x": {"field": "date", "type": "temporal"},
"y": {"field": "price", "type": "quantitative"},
"color": {"field": "symbol", "type": "nominal"}
}
},
{
"data": {"values": [{}]},
"mark": {"type": "rule", "strokeDash": [2, 2], "size": 2},
"encoding": {"x": {"datum": {"year": 2006}}}
}
]
}
We get plot
If I want to annotate the line at a specific position like (2004,400)
I tried this, it is working, but I don't want to pass hardcoded values like "a": 2004, "b": 400,
{
"data": {
"values": [
{"a": 2004, "b": 400}
]
},
"mark": {"type": "text", "fontSize" : 16, "fontWeight":"bold", "align" : "left"},
"encoding": {
"text": {"value": "Optimum"},
"x": {"field": "a", "type": "quantitative", "title":""},
"y": {"field": "b", "type": "quantitative", "title":""}
}
},
How to pass specific values from the data like average value of date (say:2004) and average value of price (say:400)?
or
just next to the line in the middle of y-axis
Transform and Aggregate Worked, I needed only for Y axis average position, so below code worked good. For another axis we can use same transform.
{
"mark": {"type": "text", "fontSize" : 16, "fontWeight":"bold", "align" : "left"},
"transform": [
{
"aggregate": [{
"op": "mean",
"field": "price",
"as": "mean_y_axis"
}],
"groupby": ["date"]
}
],
"encoding": {
"text": {"value": "Optimum"},
"x": {"field": "date",
"type": "quantitative"},
"y": {"field": "mean_y_axis",
"type": "quantitative"}
}
}

Vega-Lite "table" using text marks

Trying to replicate the example shown here:
https://observablehq.com/#mdeagen/vega-lite-table-using-text-marks#count
However, when I add the vegalite code to the online editor I get an error because of the following line of code:
{"filter": {"field": "row_num", "lte": count}},
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {
"url": "https://raw.githubusercontent.com/vega/vega-datasets/next/data/penguins.json"
},
"transform": [
{"window": [{"op": "row_number", "as": "row_num"}]},
{"filter": {"field": "row_num", "lte": count}},
{"fold": ["Beak Length (mm)", "Beak Depth (mm)", "Flipper Length (mm)", "Body Mass (g)", "Species", "Island", "Sex"]}
],
"mark": "text",
"encoding": {
"y": {"field": "row_num", "type": "ordinal", "axis": null},
"text": {"field": "value", "type": "nominal"},
"x": {"field": "key", "type": "nominal", "axis": {"orient": "top", "labelAngle": 0, "title": null, "domain": false, "ticks": false}, "scale": {"padding": 15}}
}, "config": {"view": {"stroke": null}}
}
Does anyone know a simple fix for this.
Thanks.

Set domainMin to 6 months before max date in data

I have the following Vega-Lite chart:
Open the Chart in the Vega Editor
Currently, I have the scale set as follows:
"scale": {"domainMin": "2021-06-01"}
However, what I really want is for the domainMin to be automatically calculated to be 6 months before the latest date in the notification_date field in the data.
I've looked at aggregate and expressions, but it's not exactly clear.
How can I get the maximum value of notification_date and subtract 6 months from it, and use that in "domainMin"?
Edit: To clarify, I don't want to filter the data. I want the user to be able to zoom out or pan to see the data outside the initial 6-month window. I get exactly what I want with "scale": {"domainMin": "2021-06-01"}, but this becomes out-of-date very quickly.
I have tried giving params and expr to domainMin, but I was unable to use the data fields in expr through datum.
The 2nd approach I tried will work for you, in this you will need to make use of joinaggregate/calculate/filter transforms. You will manually gather the max year and max months and then use it to filter your data.
Below is the modified config or refer the editor url:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"width": "container",
"height": "container",
"config": {
"group": {"fill": "#e5e5e5"},
"arc": {"fill": "#2b2c39"},
"area": {"fill": "#2b2c39"},
"line": {"stroke": "#2b2c39"},
"path": {"stroke": "#2b2c39"},
"rect": {"fill": "#2b2c39"},
"shape": {"stroke": "#2b2c39"},
"symbol": {"fill": "#2b2c39"},
"range": {
"category": [
"#2283a2",
"#003e6a",
"#a1ce5e",
"#FDBE13",
"#F2727E",
"#EA3F3F",
"#25A9E0",
"#F97A08",
"#41BFB8",
"#518DCA",
"#9460A8",
"#6F7D84",
"#D1DCA5"
]
}
},
"title": "South Western Sydney Cumulative and Daily COVID-19 Cases by LGA",
"data": {
"url": "https://davidwales.github.io/nsw-covid-19-data/confirmed_cases_table1_location.csv"
},
"transform": [
{
"filter": {
"and": [
{"field": "lhd_2010_name", "equal": "South Western Sydney"},
{"not": {"field": "lga_name19", "equal": "Penrith"}}
]
}
},
{"calculate": "utcyear(datum.notification_date)", "as": "yearNumber"},
{"calculate": "utcmonth(datum.notification_date)", "as": "monthNumber"},
{
"window": [
{"op": "count", "field": "notification_date", "as": "cumulative_count"}
],
"frame": [null, 0]
},
{
"joinaggregate": [
{"field": "monthNumber", "op": "max", "as": "max_month_count"},
{"field": "yearNumber", "op": "max", "as": "max_year"}
]
},
{"calculate": "abs(datum.max_month_count-6)", "as": "min_month_count"},
{
"filter": "datum.min_month_count < datum.monthNumber && datum.max_year === datum.yearNumber"
}
],
"layer": [
{
"selection": {
"date": {"type": "interval", "bind": "scales", "encodings": ["x"]}
},
"mark": {"type": "bar", "tooltip": true},
"encoding": {
"x": {
"timeUnit": "yearmonthdate",
"field": "notification_date",
"type": "temporal",
"title": "Date"
},
"color": {
"field": "lga_name19",
"type": "nominal",
"title": "LGA",
"legend": {"orient": "top", "columns": 4}
},
"y": {
"aggregate": "count",
"field": "lga_name19",
"type": "quantitative",
"title": "Cases",
"axis": {"title": "Daily Cases by SWS LGA"}
}
}
},
{
"mark": "line",
"encoding": {
"x": {
"timeUnit": "yearmonthdate",
"field": "notification_date",
"title": "Date",
"type": "temporal"
},
"y": {
"aggregate": "max",
"field": "cumulative_count",
"type": "quantitative",
"axis": {"title": "Cumulative Cases"}
}
}
}
],
"resolve": {"scale": {"y": "independent"}}
}
A bit simpler approach to filter approximately the last six months of data might look like this:
"transform": [
...,
{"joinaggregate": [{"op": "max", "field": "notification_date", "as": "last_date"}]},
{"filter": "datum.notification_date > datum.last_date - 6 * 30 * 24 * 60 * 60 * 1000"}
]
It makes use of the fact that dates are stored as millisecond time-stamps, and has the benefit that it will work across year boundaries.
This is not quite an answer to the question you asked, but if your data is reasonably up-to-date (that is, the most recent data point is close to the current date), you can do something like this:
"scale": { "domainMin": { "expr": "timeOffset('month', now(), -6)" } }

Vega Lite - Bar Chart - Incorrectly Sorted

I just made a simple bar chart in Vega Lite, which works perfectly here:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"width": 800,
"height": 600,
"title": "Biggest Killers",
"data": {"url": "https://raw.githubusercontent.com/githubuser0099/Assignment2.1/main/Cause_Of_Death_v2.csv"},
"mark": "bar",
"encoding": {
"x": {"field": "Toll", "type": "quantitative", "title": ""},
"y": {"field": "Cause Of Death", "type": "nominal", "title": "", "sort": "-x"}
}
}
However, when I try and add a colour scheme, with the longest bars in darkest red, and shortest bars with lightest red, for some reason part of my sorting breaks:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"width": 800,
"height": 600,
"title": "Biggest Killers",
"data": {"url": "https://raw.githubusercontent.com/githubuser0099/Assignment2.1/main/Cause_Of_Death_v2.csv"},
"mark": "bar",
"encoding": {
"x": {"field": "Toll", "type": "quantitative", "title": ""},
"y": {"field": "Cause Of Death", "type": "nominal", "title": "", "sort": "-x"},
"color": {
"field": "Toll",
"type": "quantitative",
"scale": {"scheme": "reds"}
}
}
}
Any ideas? Any help would be sincerely appreciated.
The reason that your sorting is getting messed is probably because your values for Toll field is in string, so you simply transform that field to number as done below:
"transform": [{"calculate": "toNumber(datum.Toll)", "as": "Toll"}],
Or providing y-axis as sorting descending, also seems to work:
"y": {
"field": "Cause Of Death",
"type": "nominal",
"title": "",
"sort": {"order": "descending"}
},
Below is the snippet for approach 1 and 2:
Approach 1:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"width": 800,
"height": 600,
"title": "Biggest Killers",
"data": {
"url": "https://raw.githubusercontent.com/githubuser0099/Assignment2.1/main/Cause_Of_Death_v2.csv"
},
"mark": "bar",
"transform": [{"calculate": "toNumber(datum.Toll)", "as": "Toll"}],
"encoding": {
"x": {"field": "Toll", "type": "quantitative", "title": ""},
"y": {
"field": "Cause Of Death",
"type": "nominal",
"title": "",
"sort": "-x"
},
"color": {
"field": "Toll",
"type": "quantitative",
"scale": {"scheme": "reds"}
}
}
}
Approach 2:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"width": 800,
"height": 600,
"title": "Biggest Killers",
"data": {
"url": "https://raw.githubusercontent.com/githubuser0099/Assignment2.1/main/Cause_Of_Death_v2.csv"
},
"mark": "bar",
"encoding": {
"x": {"field": "Toll", "type": "quantitative", "title": ""},
"y": {
"field": "Cause Of Death",
"type": "nominal",
"title": "",
"sort": {"order": "descending"}
},
"color": {
"field": "Toll",
"type": "quantitative",
"scale": {"scheme": "reds"}
}
}
}

vega-lite: how to aggregate by week

I have seen that it's possible aggregate using several time units, in example by month, but not by week.
And I have seen that in vega it's possible to customize the time unit https://vega.github.io/vega/docs/transforms/timeunit/#chronological-time-units
Is it possible to use it in vega-lite and aggregate by week, and transform in example this aggregation from month to week?
Thank you
You can group by week using a monthdate timeUnit with a step size of 7:
"timeUnit": {"unit": "monthdate", "step": 7}
For example:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {"url": "data/seattle-temps.csv"},
"mark": "line",
"encoding": {
"x": {"timeUnit": {"unit": "yearmonthdate", "step": 7}, "field": "date", "type": "temporal"},
"y": {"aggregate": "mean", "field": "temp", "type": "quantitative"}
}
}
Note, however, that this starts a new week at the beginning of each month, which means if you do a heatmap by day of week and week there are gaps:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {"url": "data/seattle-temps.csv"},
"mark": "rect",
"encoding": {
"y": {"timeUnit": "day", "field": "date", "type": "ordinal"},
"x": {"timeUnit": {"unit": "yearmonthdate", "step": 7}, "field": "date", "type": "ordinal"},
"color": {"aggregate": "mean", "field": "temp", "type": "quantitative"}
}
}
If you want more fine-grained control over where weeks start, that's unfortunately not expressible as a timeUnit, but you can take advantage of Vega-Lite's full transform syntax to make more customized aggregates. For example, here we compute the week-of-year by counting Sundays in the data:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {"url": "data/seattle-temps.csv"},
"transform": [
{"timeUnit": "yearmonthdate", "field": "date", "as": "date"},
{
"aggregate": [{"op": "mean", "field": "temp", "as": "temp"}],
"groupby": ["date"]
},
{"calculate": "day(datum.date) == 0", "as": "sundays"},
{
"window": [{"op": "sum", "field": "sundays", "as": "week"}],
"sort": "date"
}
],
"mark": "rect",
"encoding": {
"y": {"timeUnit": "day", "field": "date", "type": "ordinal", "title": "Day of Week"},
"x": {"field": "week", "type": "ordinal", "title": "Week of year"},
"color": {"aggregate": "mean", "field": "temp", "type": "quantitative"}
}
}