vega: filter nth of each group - unique

If I were to group by date, how would I filter the nth entry of each group?
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {"url": "https://cdn.jsdelivr.net/npm/vega-datasets#v1.29.0/data/seattle-temps.csv"},
"mark": "point",
"encoding": {
"x": {"field": "date", "type": "temporal"},
"y": {"field": "temp", "type": "quantitative"}
}
}
edit
Let's keep this data-agnostic as my data has many columns and I would like rows in their entirety.

Transforms overview:
Convert times to dates for grouping.
Group by date and number each row within the groups.
Filter on nth row.
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {"url": "https://cdn.jsdelivr.net/npm/vega-datasets#v1.29.0/data/seattle-temps.csv"},
"transform": [
{"timeUnit": "yearmonthdate", "field": "date", "as": "date"},
{
"window": [{"op": "row_number", "as": "row"}],
"groupby": ["date"]
},
{"calculate": "datum.index", "as":"newnew"},
{"filter": "datum['row'] == 1"}
],
"mark": "point",
"encoding": {
"x": {"field": "date", "type": "temporal"},
"y": {"field": "temp", "type": "quantitative"}
}
}
The downside is that the vega editor becomes very slow after adding the window transform.

Related

How to Annotate a Line in line Chart in Vega-Lite?

How to annotate a line in line chart in vega-lite
For the below code https://vega.github.io/editor/#/
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"layer": [
{
"data": {"url": "data/stocks.csv"},
"mark": "line",
"encoding": {
"x": {"field": "date", "type": "temporal"},
"y": {"field": "price", "type": "quantitative"},
"color": {"field": "symbol", "type": "nominal"}
}
},
{
"data": {"values": [{}]},
"mark": {"type": "rule", "strokeDash": [2, 2], "size": 2},
"encoding": {"x": {"datum": {"year": 2006}}}
}
]
}
We get plot
If I want to annotate the line at a specific position like (2004,400)
I tried this, it is working, but I don't want to pass hardcoded values like "a": 2004, "b": 400,
{
"data": {
"values": [
{"a": 2004, "b": 400}
]
},
"mark": {"type": "text", "fontSize" : 16, "fontWeight":"bold", "align" : "left"},
"encoding": {
"text": {"value": "Optimum"},
"x": {"field": "a", "type": "quantitative", "title":""},
"y": {"field": "b", "type": "quantitative", "title":""}
}
},
How to pass specific values from the data like average value of date (say:2004) and average value of price (say:400)?
or
just next to the line in the middle of y-axis
Transform and Aggregate Worked, I needed only for Y axis average position, so below code worked good. For another axis we can use same transform.
{
"mark": {"type": "text", "fontSize" : 16, "fontWeight":"bold", "align" : "left"},
"transform": [
{
"aggregate": [{
"op": "mean",
"field": "price",
"as": "mean_y_axis"
}],
"groupby": ["date"]
}
],
"encoding": {
"text": {"value": "Optimum"},
"x": {"field": "date",
"type": "quantitative"},
"y": {"field": "mean_y_axis",
"type": "quantitative"}
}
}

Sort bars of Stacked Bar Chart

For a Stacked Bar Chart, can you sort the bars by the size of one of the segments?
E.g., take this Stacked Bar Chart from the examples (Open Editor):
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {"url": "data/barley.json"},
"mark": "bar",
"encoding": {
"x": {"aggregate": "sum", "field": "yield"},
"y": {"field": "variety"},
"color": {"field": "site"}
}
}
Now I would like to sort the y-axis based on the yield in Crookston. Is that possible?
Sorting by the total of another field is relatively easy; you can do so with the "sort" entry of the desired encoding (sort docs):
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {"url": "data/barley.json"},
"mark": "bar",
"encoding": {
"x": {"aggregate": "sum", "field": "yield"},
"y": {"field": "variety", "sort": {"op": "sum", "field": "yield"}},
"color": {"field": "site"}
}
}
If you want to sort just by the value when site == "Crookston", you can do so by first applying a calculate transform to select just that value:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {"url": "data/barley.json"},
"transform": [
{
"calculate": "datum.site == 'Crookston' ? datum.yield : 0",
"as": "crookston"
}
],
"mark": "bar",
"encoding": {
"x": {"aggregate": "sum", "field": "yield"},
"y": {"field": "variety", "sort": {"op": "sum", "field": "crookston"}},
"color": {"field": "site"}
}
}

Highest Value Wrong Colour

Just made a simple bar chart, but for some reason, the final value is the wrong colour?
Code:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"width": 800,
"height": 600,
"title": "Death Rates Amongst Ages",
"data": {"url": "https://raw.githubusercontent.com/githubuser0099/Repo55/main/AgeBracket_DeathRate.csv"},
"transform": [
{"calculate":"parseInt(datum.Death_Rate)", "as": "Death_Rate"}
],
"mark": "bar",
"encoding": {
"x": {"field": "Death_Rate", "type": "quantitative", "title": ""},
"y": {"field": "Age", "type": "nominal", "title": "", "sort": "-x"},
"color": {
"field": "Age",
"type": "nominal",
"scale": {"scheme": "reds"}
}
}
}
The problem with your colour scale is: "Age" is currently encoded as a string (nominal variable). You define the type of "Age" as "nominal", but use a sequential colour scale ("reds"). Your data also has some issues - there are some empty spaces before 5-9, and 10-14.
In string comparison, white space < "0" < "100" < "15".
To solve the issue, we can get the first number from the range, and then define another channel to encode this first number (but hide the legend), then in the colour channel, you can define the colour order based on this additional channel.
Check the result and the codes below.
I have printed out the data and let you know how the calculation works.
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"width": 800,
"height": 600,
"title": "Death Rates Amongst Ages",
"data": {"url": "https://raw.githubusercontent.com/githubuser0099/Repo55/main/AgeBracket_DeathRate.csv"},
"transform": [
{"calculate":"parseInt(datum.Death_Rate)", "as": "Death_Rate"},
{"calculate": "split(datum['Age'], '-')[0]", "as": "Age_new"},
{"calculate": "replace(datum['Age_new'], ' ', '')", "as": "Age_new_2"},
{"calculate": "replace(datum['Age_new_2'], ' ', '')", "as": "Age_new_3"},
{"calculate": "parseInt(datum['Age_new_3'])", "as": "Age_new_4"}
],
"mark": "bar",
"encoding": {
"x": {"field": "Death_Rate", "type": "quantitative", "title": ""},
"y": {"field": "Age", "type": "nominal", "title": "", "sort": "-x"},
"opacity":{"field": "Age_new_4", "legend": null},
"color": {
"field": "Age",
"type": "ordinal",
"sort": "opacity",
"scale": {"scheme": "reds"}
}
}
}
Cheers,
KL

Points only in the central line

I am using this example named "Line Chart with Point Markers" as reference, but not see other example or any clues about conditional or "selected by symbol" points.
The illustration shows a typical case (see also SPC) where I need only the blue central line with dots.
You can do this by layering filtered versions of the dataset. Modifying the example you linked to, it might look something like this (vega editor):
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"description": "Stock prices of 5 Tech Companies over Time.",
"data": {"url": "data/stocks.csv"},
"encoding": {
"x": {"timeUnit": "year", "field": "date", "type": "temporal"},
"y": {"aggregate": "mean", "field": "price", "type": "quantitative"},
"color": {"field": "symbol", "type": "nominal"}
},
"layer": [
{
"mark": {"type": "line", "point": true},
"transform": [{"filter": "datum.symbol == 'GOOG'"}]
},
{
"mark": {"type": "line"},
"transform": [{"filter": "datum.symbol != 'GOOG'"}]
}
]
}

vega-lite: how to aggregate by week

I have seen that it's possible aggregate using several time units, in example by month, but not by week.
And I have seen that in vega it's possible to customize the time unit https://vega.github.io/vega/docs/transforms/timeunit/#chronological-time-units
Is it possible to use it in vega-lite and aggregate by week, and transform in example this aggregation from month to week?
Thank you
You can group by week using a monthdate timeUnit with a step size of 7:
"timeUnit": {"unit": "monthdate", "step": 7}
For example:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {"url": "data/seattle-temps.csv"},
"mark": "line",
"encoding": {
"x": {"timeUnit": {"unit": "yearmonthdate", "step": 7}, "field": "date", "type": "temporal"},
"y": {"aggregate": "mean", "field": "temp", "type": "quantitative"}
}
}
Note, however, that this starts a new week at the beginning of each month, which means if you do a heatmap by day of week and week there are gaps:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {"url": "data/seattle-temps.csv"},
"mark": "rect",
"encoding": {
"y": {"timeUnit": "day", "field": "date", "type": "ordinal"},
"x": {"timeUnit": {"unit": "yearmonthdate", "step": 7}, "field": "date", "type": "ordinal"},
"color": {"aggregate": "mean", "field": "temp", "type": "quantitative"}
}
}
If you want more fine-grained control over where weeks start, that's unfortunately not expressible as a timeUnit, but you can take advantage of Vega-Lite's full transform syntax to make more customized aggregates. For example, here we compute the week-of-year by counting Sundays in the data:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {"url": "data/seattle-temps.csv"},
"transform": [
{"timeUnit": "yearmonthdate", "field": "date", "as": "date"},
{
"aggregate": [{"op": "mean", "field": "temp", "as": "temp"}],
"groupby": ["date"]
},
{"calculate": "day(datum.date) == 0", "as": "sundays"},
{
"window": [{"op": "sum", "field": "sundays", "as": "week"}],
"sort": "date"
}
],
"mark": "rect",
"encoding": {
"y": {"timeUnit": "day", "field": "date", "type": "ordinal", "title": "Day of Week"},
"x": {"field": "week", "type": "ordinal", "title": "Week of year"},
"color": {"aggregate": "mean", "field": "temp", "type": "quantitative"}
}
}