Just made a simple bar chart, but for some reason, the final value is the wrong colour?
Code:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"width": 800,
"height": 600,
"title": "Death Rates Amongst Ages",
"data": {"url": "https://raw.githubusercontent.com/githubuser0099/Repo55/main/AgeBracket_DeathRate.csv"},
"transform": [
{"calculate":"parseInt(datum.Death_Rate)", "as": "Death_Rate"}
],
"mark": "bar",
"encoding": {
"x": {"field": "Death_Rate", "type": "quantitative", "title": ""},
"y": {"field": "Age", "type": "nominal", "title": "", "sort": "-x"},
"color": {
"field": "Age",
"type": "nominal",
"scale": {"scheme": "reds"}
}
}
}
The problem with your colour scale is: "Age" is currently encoded as a string (nominal variable). You define the type of "Age" as "nominal", but use a sequential colour scale ("reds"). Your data also has some issues - there are some empty spaces before 5-9, and 10-14.
In string comparison, white space < "0" < "100" < "15".
To solve the issue, we can get the first number from the range, and then define another channel to encode this first number (but hide the legend), then in the colour channel, you can define the colour order based on this additional channel.
Check the result and the codes below.
I have printed out the data and let you know how the calculation works.
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"width": 800,
"height": 600,
"title": "Death Rates Amongst Ages",
"data": {"url": "https://raw.githubusercontent.com/githubuser0099/Repo55/main/AgeBracket_DeathRate.csv"},
"transform": [
{"calculate":"parseInt(datum.Death_Rate)", "as": "Death_Rate"},
{"calculate": "split(datum['Age'], '-')[0]", "as": "Age_new"},
{"calculate": "replace(datum['Age_new'], ' ', '')", "as": "Age_new_2"},
{"calculate": "replace(datum['Age_new_2'], ' ', '')", "as": "Age_new_3"},
{"calculate": "parseInt(datum['Age_new_3'])", "as": "Age_new_4"}
],
"mark": "bar",
"encoding": {
"x": {"field": "Death_Rate", "type": "quantitative", "title": ""},
"y": {"field": "Age", "type": "nominal", "title": "", "sort": "-x"},
"opacity":{"field": "Age_new_4", "legend": null},
"color": {
"field": "Age",
"type": "ordinal",
"sort": "opacity",
"scale": {"scheme": "reds"}
}
}
}
Cheers,
KL
Related
I'm trying to build a line chart with error area in vega lite.
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {"url": "https://raw.githubusercontent.com/holtzy/D3-graph-gallery/master/DATA/data_IC.csv"},
"transform": [
{"calculate": "toNumber(datum.x)", "as": "x2"},
{"calculate": "toNumber(datum.y)", "as": "y2"},
{"calculate": "toNumber(datum.CI_left)", "as": "l"},
{"calculate": "toNumber(datum.CI_right)", "as": "r"}
],
"params": [
{ "name": "scaleDomain", "expr": "[0, 10]"}
],
"encoding": {
"y": {
"field": "x2",
"type": "ordinal",
"sort": "descending"
}
},
"layer": [
{
"mark": {"type": "line", "interpolate": "cardinal"},
"encoding": {
"x": {
"field": "y",
"type": "quantitative",
"title": "Mean of Miles per Gallon (95% CIs)",
"scale": {"type": "linear", "domain": {"expr": "scaleDomain"}},
"axis": {
"orient": "top"
}
}
}
},
{
"mark": {"type": "area", "interpolate": "cardinal"},
"encoding": {
"x": {
"field": "l",
"scale": {"type": "linear", "domain": {"expr": "scaleDomain"}},
"axis": {
"orient": "top"
}
},
"x2": {
"field": "r"
},
"opacity": { "value": 0.3 }
}
}
]
}
So far, it's nice looking. But there's a problem: to get this to work I have had to manually constrain the scale domain for the two marks by setting a param called scaleDomain. This is a problem, because if ever the data changes I need to manually update the domain :/
However, look what would happen if I didn't manually set the scale to the same domain for the area plot and a line plot:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {"url": "https://raw.githubusercontent.com/holtzy/D3-graph-gallery/master/DATA/data_IC.csv"},
"transform": [
{"calculate": "toNumber(datum.x)", "as": "x2"},
{"calculate": "toNumber(datum.y)", "as": "y2"},
{"calculate": "toNumber(datum.CI_left)", "as": "l"},
{"calculate": "toNumber(datum.CI_right)", "as": "r"}
],
"params": [
{ "name": "scaleDomain", "expr": "[0, 10]"}
],
"encoding": {
"y": {
"field": "x2",
"type": "ordinal",
"sort": "descending"
}
},
"layer": [
{
"mark": {"type": "line", "interpolate": "cardinal"},
"encoding": {
"x": {
"field": "y",
"type": "quantitative",
"title": "Mean of Miles per Gallon (95% CIs)",
// "scale": {"type": "linear", "domain": {"expr": "scaleDomain"}},
"axis": {
"orient": "top"
}
}
}
},
{
"mark": {"type": "area", "interpolate": "cardinal"},
"encoding": {
"x": {
"field": "l",
// "scale": {"type": "linear", "domain": {"expr": "scaleDomain"}},
"axis": {
"orient": "top"
}
},
"x2": {
"field": "r"
},
"opacity": { "value": 0.3 }
}
}
]
}
Yikes! The area plot gets a bit lost and doesn't track the line.
I can see one of two solutions to this problem:
Shared Scale: Coax the two mark layers to share the same scale
Manually Calculate Scale Domain: Use a parameter or a signal to store the desired domain.
I don't know how to do #1, but it seems like the correct approach. One imagined solution is something like:
"scale": {"align": "shared"},
I tried adding an aggregation to transform, but that of course results in summarizing the whole data set.
"transform": [
{"calculate": "toNumber(datum.x)", "as": "x2"},
{"calculate": "toNumber(datum.y)", "as": "y2"},
{"calculate": "toNumber(datum.CI_left)", "as": "l"},
{"calculate": "toNumber(datum.CI_right)", "as": "r"},
{ "aggregate": [
{
"field": "l",
"op": "min",
"as": "min"
},
{
"field": "r",
"op": "max",
"as": "max"
}
]}
],
It seems like I'd want to somehow put the transform directly into the layer or the params, but it's not clear how to do that.
I have seen these answers (finding max and min from dataset in vega and Post aggregation calculation & filter ##) but I don't know how to use them to achieve this.
You don't need any transforms and scales are automatically shared. Try this:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"width":500,
"height":500,
"data": {
"url": "https://raw.githubusercontent.com/holtzy/D3-graph-gallery/master/DATA/data_IC.csv"
},
"encoding": {"y": {"field": "x", "type": "quantitative", "sort": "ascending"}},
"layer": [
{
"mark": {"type": "line", "interpolate": "cardinal"},
"encoding": {
"x": {
"field": "y",
"sort": null,
"type": "quantitative",
"title": "Mean of Miles per Gallon (95% CIs)",
"axis": {"orient": "top"}
}
}
},
{
"mark": {"type": "area", "interpolate": "cardinal"},
"encoding": {
"x": {"field": "CI_left", "type": "quantitative"},
"x2": {"field": "CI_right"},
"opacity": {"value": 0.3}
}
}
]
}
I have the following Vega-Lite chart:
Open the Chart in the Vega Editor
Currently, I have the scale set as follows:
"scale": {"domainMin": "2021-06-01"}
However, what I really want is for the domainMin to be automatically calculated to be 6 months before the latest date in the notification_date field in the data.
I've looked at aggregate and expressions, but it's not exactly clear.
How can I get the maximum value of notification_date and subtract 6 months from it, and use that in "domainMin"?
Edit: To clarify, I don't want to filter the data. I want the user to be able to zoom out or pan to see the data outside the initial 6-month window. I get exactly what I want with "scale": {"domainMin": "2021-06-01"}, but this becomes out-of-date very quickly.
I have tried giving params and expr to domainMin, but I was unable to use the data fields in expr through datum.
The 2nd approach I tried will work for you, in this you will need to make use of joinaggregate/calculate/filter transforms. You will manually gather the max year and max months and then use it to filter your data.
Below is the modified config or refer the editor url:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"width": "container",
"height": "container",
"config": {
"group": {"fill": "#e5e5e5"},
"arc": {"fill": "#2b2c39"},
"area": {"fill": "#2b2c39"},
"line": {"stroke": "#2b2c39"},
"path": {"stroke": "#2b2c39"},
"rect": {"fill": "#2b2c39"},
"shape": {"stroke": "#2b2c39"},
"symbol": {"fill": "#2b2c39"},
"range": {
"category": [
"#2283a2",
"#003e6a",
"#a1ce5e",
"#FDBE13",
"#F2727E",
"#EA3F3F",
"#25A9E0",
"#F97A08",
"#41BFB8",
"#518DCA",
"#9460A8",
"#6F7D84",
"#D1DCA5"
]
}
},
"title": "South Western Sydney Cumulative and Daily COVID-19 Cases by LGA",
"data": {
"url": "https://davidwales.github.io/nsw-covid-19-data/confirmed_cases_table1_location.csv"
},
"transform": [
{
"filter": {
"and": [
{"field": "lhd_2010_name", "equal": "South Western Sydney"},
{"not": {"field": "lga_name19", "equal": "Penrith"}}
]
}
},
{"calculate": "utcyear(datum.notification_date)", "as": "yearNumber"},
{"calculate": "utcmonth(datum.notification_date)", "as": "monthNumber"},
{
"window": [
{"op": "count", "field": "notification_date", "as": "cumulative_count"}
],
"frame": [null, 0]
},
{
"joinaggregate": [
{"field": "monthNumber", "op": "max", "as": "max_month_count"},
{"field": "yearNumber", "op": "max", "as": "max_year"}
]
},
{"calculate": "abs(datum.max_month_count-6)", "as": "min_month_count"},
{
"filter": "datum.min_month_count < datum.monthNumber && datum.max_year === datum.yearNumber"
}
],
"layer": [
{
"selection": {
"date": {"type": "interval", "bind": "scales", "encodings": ["x"]}
},
"mark": {"type": "bar", "tooltip": true},
"encoding": {
"x": {
"timeUnit": "yearmonthdate",
"field": "notification_date",
"type": "temporal",
"title": "Date"
},
"color": {
"field": "lga_name19",
"type": "nominal",
"title": "LGA",
"legend": {"orient": "top", "columns": 4}
},
"y": {
"aggregate": "count",
"field": "lga_name19",
"type": "quantitative",
"title": "Cases",
"axis": {"title": "Daily Cases by SWS LGA"}
}
}
},
{
"mark": "line",
"encoding": {
"x": {
"timeUnit": "yearmonthdate",
"field": "notification_date",
"title": "Date",
"type": "temporal"
},
"y": {
"aggregate": "max",
"field": "cumulative_count",
"type": "quantitative",
"axis": {"title": "Cumulative Cases"}
}
}
}
],
"resolve": {"scale": {"y": "independent"}}
}
A bit simpler approach to filter approximately the last six months of data might look like this:
"transform": [
...,
{"joinaggregate": [{"op": "max", "field": "notification_date", "as": "last_date"}]},
{"filter": "datum.notification_date > datum.last_date - 6 * 30 * 24 * 60 * 60 * 1000"}
]
It makes use of the fact that dates are stored as millisecond time-stamps, and has the benefit that it will work across year boundaries.
This is not quite an answer to the question you asked, but if your data is reasonably up-to-date (that is, the most recent data point is close to the current date), you can do something like this:
"scale": { "domainMin": { "expr": "timeOffset('month', now(), -6)" } }
I have a stacked normalized bar chart similar to this:
https://vega.github.io/editor/#/examples/vega-lite/stacked_bar_normalize
I'm trying to show the related percentages (per bar segment) as text on the bars similar to: https://gist.github.com/pratapvardhan/00800a4981d43a84efdba0c4cf8ee2e1
I tried adding a transform field to calculate the percentages, but still couldn't get it to work after hours of trying.
I'm lost help 🥺
My best try:
{
"description":
"A bar chart showing the US population distribution of age groups and gender in 2000.",
"data": {
"url": "data/population.json"
},
"transform": [
{"filter": "datum.year == 2000"},
{"calculate": "datum.sex == 2 ? 'Female' : 'Male'", "as": "gender"},
{
"stack": "people",
"offset": "normalize",
"as": ["v1", "v2"],
"groupby": ["age"],
"sort": [{"field": "gender", "order": "descending"}]
}
],
"encoding": {
"y": {
"field": "v1",
"type": "quantitative",
"title": "population"
},
"y2": {"field": "v2"},
"x": {
"field": "age",
"type": "ordinal"
},
"color": {
"field": "gender",
"type": "nominal",
"scale": {
"range": ["#675193", "#ca8861"]
}
}
},
"layer":[
{ "mark": "bar"},
{"mark": {"type": "text", "dx": 0, "dy": 0},
"encoding": {
"color":{"value":"black"},
"text": { "field": "v1", "type": "quantitative", "format": ".1f"}}
}
]
}
You can use a joinaggregate transform to normalize each group, and then use "format": ".1%" to display fractions as percents. Using this, there is no need to manually compute the stack transform; it is simpler to specify the stack via the encoding, as in the example you linked to.
Here is the result (open in editor):
{
"description": "A bar chart showing the US population distribution of age groups and gender in 2000.",
"data": {"url": "data/population.json"},
"transform": [
{"filter": "datum.year == 2000"},
{"calculate": "datum.sex == 2 ? 'Female' : 'Male'", "as": "gender"},
{
"joinaggregate": [{"op": "sum", "field": "people", "as": "total"}],
"groupby": ["age"]
},
{"calculate": "datum.people / datum.total", "as": "fraction"}
],
"encoding": {
"y": {
"aggregate": "sum",
"field": "people",
"title": "population",
"stack": "normalize"
},
"order": {"field": "gender", "sort": "descending"},
"x": {"field": "age", "type": "ordinal"},
"color": {
"field": "gender",
"type": "nominal",
"scale": {"range": ["#675193", "#ca8861"]}
}
},
"layer": [
{"mark": "bar"},
{
"mark": {"type": "text", "dx": 20, "dy": 0, "angle": 90},
"encoding": {
"color": {"value": "white"},
"text": {"field": "fraction", "type": "quantitative", "format": ".1%"}
}
}
]
}
I have seen that it's possible aggregate using several time units, in example by month, but not by week.
And I have seen that in vega it's possible to customize the time unit https://vega.github.io/vega/docs/transforms/timeunit/#chronological-time-units
Is it possible to use it in vega-lite and aggregate by week, and transform in example this aggregation from month to week?
Thank you
You can group by week using a monthdate timeUnit with a step size of 7:
"timeUnit": {"unit": "monthdate", "step": 7}
For example:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {"url": "data/seattle-temps.csv"},
"mark": "line",
"encoding": {
"x": {"timeUnit": {"unit": "yearmonthdate", "step": 7}, "field": "date", "type": "temporal"},
"y": {"aggregate": "mean", "field": "temp", "type": "quantitative"}
}
}
Note, however, that this starts a new week at the beginning of each month, which means if you do a heatmap by day of week and week there are gaps:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {"url": "data/seattle-temps.csv"},
"mark": "rect",
"encoding": {
"y": {"timeUnit": "day", "field": "date", "type": "ordinal"},
"x": {"timeUnit": {"unit": "yearmonthdate", "step": 7}, "field": "date", "type": "ordinal"},
"color": {"aggregate": "mean", "field": "temp", "type": "quantitative"}
}
}
If you want more fine-grained control over where weeks start, that's unfortunately not expressible as a timeUnit, but you can take advantage of Vega-Lite's full transform syntax to make more customized aggregates. For example, here we compute the week-of-year by counting Sundays in the data:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {"url": "data/seattle-temps.csv"},
"transform": [
{"timeUnit": "yearmonthdate", "field": "date", "as": "date"},
{
"aggregate": [{"op": "mean", "field": "temp", "as": "temp"}],
"groupby": ["date"]
},
{"calculate": "day(datum.date) == 0", "as": "sundays"},
{
"window": [{"op": "sum", "field": "sundays", "as": "week"}],
"sort": "date"
}
],
"mark": "rect",
"encoding": {
"y": {"timeUnit": "day", "field": "date", "type": "ordinal", "title": "Day of Week"},
"x": {"field": "week", "type": "ordinal", "title": "Week of year"},
"color": {"aggregate": "mean", "field": "temp", "type": "quantitative"}
}
}
VegaLite assigns colors automatically. The Gold prices are Blue, and Silver prices are Orange, which feels wrong.
How can I assign explicit colours - #F1C40F for Gold and #95A5A6 for Silver?
I also would like to keep the data.values as in the example code below - as a set of separate arrays.
Code and Playground
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"description": "Stock prices of 5 Tech Companies over Time.",
"data": {
"values": [
{
"dates": ["2000-01", "2000-02", "2000-03"],
"gold": [1, 2, 1],
"silver": [1.5, 1, 2]
}
]
},
"transform": [
{"flatten": ["dates", "gold", "silver"]},
{"fold": ["gold", "silver"], "as": ["symbol", "price"]},
{"calculate": "datetime(datum.dates)", "as": "dates"}
],
"mark": {"type": "line", "point": {"filled": false, "fill": "white"}},
"encoding": {
"x": {"field": "dates", "type": "temporal", "timeUnit": "yearmonth"},
"y": {"field": "price", "type": "quantitative"},
"color": {"field": "symbol", "type": "nominal"}
}
}
You can set a custom Color Scheme using the scale.domain and scale.range arguments:
"color": {
"field": "symbol",
"type": "nominal",
"scale": {"domain": ["gold", "silver"], "range": ["#F1C40F", "#95A5A6"]}
}
This works regardless of how the data source is specified.
Here is the result when applied to your chart (Vega Editor):