I am trying to display some data in a line graph. However, my "Harvest_Year" data, which is a date in years, like 2017 or 2018, is being displayed as what I believe is a string
I imported by data from a .csv file, and the following are the steps I took to change the string to a date formate. I tired to do:
"Harvest_Year": "year"
But that did not work as it made all my values null. So I thought first I will make it into a int and then transform it into year. However in Vega-Lite all my years re displayed correctly in the table but when I display it on the line graph I only see 1970 which I am sure I don't have in the dataset, and it only displays that single year.
Where as in the image below, you can see I have all the years in my data:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {
"url": "https://raw.githubusercontent.com/DanStein91/Info-vis/master/CoffeeRobN.csv",
"format": {
"type": "csv",
"parse": {
"Number_of_Bags": "number",
"Bag_weight": "number",
"Harvest_Year": "number"
}
}
},
"transform": [
{
"timeUnit": "year",
"field": "Harvest_Year",
"as": "Year"
},
{
"calculate": "datum.Number_of_Bags * datum.Bag_Weight ",
"as": "Total_Export"
}
],
"width": 300,
"height": 200,
"mark": "line",
"encoding": {
"y": {
"field": "Total_Export",
"type": "quantitative"
},
"x": {
"field": "Harvest_Year",
"type": "temporal"
}
},
"config": {}
}
When you tell vega-lite to interpret numbers as dates, it treats them as unix timestamps, i.e. milliseconds after January 1 1970. Each of your resulting dates is in the year 1970, which leads to the chart you are seeing.
Your dates appear to be in a non-standard format (e.g. "2017.0" means the year 2017) so you'll have to use vega expressions to manually parse them into date objects. Here is an example of this (view in editor):
{
"data": {
"url": "https://raw.githubusercontent.com/DanStein91/Info-vis/master/CoffeeRobN.csv",
"format": {
"type": "csv",
"parse": {
"Number_of_Bags": "number",
"Bag_weight": "number",
"Harvest_Year": "number"
}
}
},
"transform": [
{"filter": "isValid(datum.Harvest_Year)"},
{"calculate": "datetime(datum.Harvest_Year, 1)", "as": "Harvest_Year"},
{
"calculate": "datum.Number_of_Bags * datum.Bag_Weight ",
"as": "Total_Export"
}
],
"mark": "point",
"encoding": {
"y": {"field": "Total_Export", "type": "quantitative"},
"x": {"field": "Harvest_Year", "type": "ordinal", "timeUnit": "year"}
},
"width": 300,
"height": 200
}
Another option is to avoid datetime and timeUnit logic altogether (since your data does not actually contain any dates), and just use the year numbers directly in your encoding; e.g.
{
"data": {
"url": "https://raw.githubusercontent.com/DanStein91/Info-vis/master/CoffeeRobN.csv",
"format": {
"type": "csv",
"parse": {
"Number_of_Bags": "number",
"Bag_weight": "number",
"Harvest_Year": "number"
}
}
},
"transform": [
{"filter": "isValid(datum.Harvest_Year)"},
{
"calculate": "datum.Number_of_Bags * datum.Bag_Weight ",
"as": "Total_Export"
}
],
"mark": "point",
"encoding": {
"y": {"field": "Total_Export", "type": "quantitative"},
"x": {"field": "Harvest_Year", "type": "ordinal"}
},
"width": 300,
"height": 200
}
Related
I want to visualize durations of events as a bar, my input value is a decimal value where the integer part represents days and the decimal part a fraction of a day. I can convert the input value to any value needed.
An event can span multiple days.
The code below contains data for two events, the duration of event a is 36 hours and the duration of event b is 12 hours. Of course, it's possible that an event can be over after just some minutes or take 3hours 14minutes 24seconds.
I want the x-axis have ticks every 30minutes, from the sample data I need 36 hours, an axis label can look like 0d 0:00.
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"height": "container",
"width": "container",
"data": {
"values": [
{
"event": "a",
"durationdecimal": 1.5
},
{
"event": "b",
"durationdecimal": 0.5
}
]
},
"mark": {"type": "bar"},
"encoding": {
"x": {
"field": "durationdecimal",
"type": "temporal",
"axis": {"grid": false},
"timeUnit": "utchoursminutes"
},
"y": {"field": "event", "type": "nominal", "title": null}
,
"tooltip": [{"field": "durationdecimal"}]
}
}
I appreciate any help.
I don't think your durationdecimal should be temporal as there is no date/month/year provided. I tried recreating your sample using quantitative type and have done conversions on labels using labelExpr and some expressions. It mostly covers all your mentioned requirements. The only remaining part seems to be of ticks for 30 mins.
Below is the config or refer editor:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"height": "container",
"width": "container",
"data": {
"values": [
{"event": "a", "durationdecimal": 1.5},
{"event": "c", "durationdecimal": 2.1},
{"event": "b", "durationdecimal": 0.5}
]
},
"mark": {"type": "bar"},
"transform": [
{
"calculate": "split(toString(datum.durationdecimal),'.')[0] + 'd ' + (split(toString(datum.durationdecimal),'.')[1] ? floor(('0.'+split(toString(datum.durationdecimal),'.')[1])*24) + ':00': '0:00')",
"as": "x_dateLabelTooltip"
}
],
"encoding": {
"x": {
"field": "durationdecimal",
"type": "quantitative",
"axis": {
"grid": false,
"labelExpr": "split(toString(datum.label),'.')[0] + 'd ' + (split(toString(datum.label),'.')[1] ? floor(('0.'+split(toString(datum.label),'.')[1])*24) + ':00': '0:00')"
}
},
"y": {"field": "event", "type": "nominal", "title": null},
"tooltip": [{"field": "x_dateLabelTooltip"}]
}
}
Let me know if this works for you.
I have the following Vega-Lite chart:
Open the Chart in the Vega Editor
Currently, I have the scale set as follows:
"scale": {"domainMin": "2021-06-01"}
However, what I really want is for the domainMin to be automatically calculated to be 6 months before the latest date in the notification_date field in the data.
I've looked at aggregate and expressions, but it's not exactly clear.
How can I get the maximum value of notification_date and subtract 6 months from it, and use that in "domainMin"?
Edit: To clarify, I don't want to filter the data. I want the user to be able to zoom out or pan to see the data outside the initial 6-month window. I get exactly what I want with "scale": {"domainMin": "2021-06-01"}, but this becomes out-of-date very quickly.
I have tried giving params and expr to domainMin, but I was unable to use the data fields in expr through datum.
The 2nd approach I tried will work for you, in this you will need to make use of joinaggregate/calculate/filter transforms. You will manually gather the max year and max months and then use it to filter your data.
Below is the modified config or refer the editor url:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"width": "container",
"height": "container",
"config": {
"group": {"fill": "#e5e5e5"},
"arc": {"fill": "#2b2c39"},
"area": {"fill": "#2b2c39"},
"line": {"stroke": "#2b2c39"},
"path": {"stroke": "#2b2c39"},
"rect": {"fill": "#2b2c39"},
"shape": {"stroke": "#2b2c39"},
"symbol": {"fill": "#2b2c39"},
"range": {
"category": [
"#2283a2",
"#003e6a",
"#a1ce5e",
"#FDBE13",
"#F2727E",
"#EA3F3F",
"#25A9E0",
"#F97A08",
"#41BFB8",
"#518DCA",
"#9460A8",
"#6F7D84",
"#D1DCA5"
]
}
},
"title": "South Western Sydney Cumulative and Daily COVID-19 Cases by LGA",
"data": {
"url": "https://davidwales.github.io/nsw-covid-19-data/confirmed_cases_table1_location.csv"
},
"transform": [
{
"filter": {
"and": [
{"field": "lhd_2010_name", "equal": "South Western Sydney"},
{"not": {"field": "lga_name19", "equal": "Penrith"}}
]
}
},
{"calculate": "utcyear(datum.notification_date)", "as": "yearNumber"},
{"calculate": "utcmonth(datum.notification_date)", "as": "monthNumber"},
{
"window": [
{"op": "count", "field": "notification_date", "as": "cumulative_count"}
],
"frame": [null, 0]
},
{
"joinaggregate": [
{"field": "monthNumber", "op": "max", "as": "max_month_count"},
{"field": "yearNumber", "op": "max", "as": "max_year"}
]
},
{"calculate": "abs(datum.max_month_count-6)", "as": "min_month_count"},
{
"filter": "datum.min_month_count < datum.monthNumber && datum.max_year === datum.yearNumber"
}
],
"layer": [
{
"selection": {
"date": {"type": "interval", "bind": "scales", "encodings": ["x"]}
},
"mark": {"type": "bar", "tooltip": true},
"encoding": {
"x": {
"timeUnit": "yearmonthdate",
"field": "notification_date",
"type": "temporal",
"title": "Date"
},
"color": {
"field": "lga_name19",
"type": "nominal",
"title": "LGA",
"legend": {"orient": "top", "columns": 4}
},
"y": {
"aggregate": "count",
"field": "lga_name19",
"type": "quantitative",
"title": "Cases",
"axis": {"title": "Daily Cases by SWS LGA"}
}
}
},
{
"mark": "line",
"encoding": {
"x": {
"timeUnit": "yearmonthdate",
"field": "notification_date",
"title": "Date",
"type": "temporal"
},
"y": {
"aggregate": "max",
"field": "cumulative_count",
"type": "quantitative",
"axis": {"title": "Cumulative Cases"}
}
}
}
],
"resolve": {"scale": {"y": "independent"}}
}
A bit simpler approach to filter approximately the last six months of data might look like this:
"transform": [
...,
{"joinaggregate": [{"op": "max", "field": "notification_date", "as": "last_date"}]},
{"filter": "datum.notification_date > datum.last_date - 6 * 30 * 24 * 60 * 60 * 1000"}
]
It makes use of the fact that dates are stored as millisecond time-stamps, and has the benefit that it will work across year boundaries.
This is not quite an answer to the question you asked, but if your data is reasonably up-to-date (that is, the most recent data point is close to the current date), you can do something like this:
"scale": { "domainMin": { "expr": "timeOffset('month', now(), -6)" } }
I'm looking for a vega-lite configuration to show small multiples (using the facet operator row or column) with all other data points greyed out in the background.
Here is an example plot using the facet-operator:
facet plot
in vega-editor
"facet": {
"row": {
"field": "group",
"type": "nominal"
}
},
And here is an example using multiple charts with the concat operator and color channel to grey out other groups:
concat-plot
in vega-editor
"color": {"condition": {"test": "datum['group'] != 1", "value": "grey"}, "value": "red"}
I was wondering if there is a combination of transforms and repeat commands to achieve this for an unknown number of groups.
Here is one possible solution:
create an additional layer with new data
"facet": {"row": {"field": "group"}},
"spec": {
"layer": [
{
"data": {"name": "main"},
"mark": "circle",
"encoding": {
"y": {
"field": "y",
"type": "ordinal"
},
"x": {
"field": "x",
"type": "ordinal"
},
"color": {"value": "grey"}
},
"params": []
},
{
"mark": {"type": "circle", "opacity": 1, "color": "red"},
"encoding": {
"y": {
"field": "y",
"type": "ordinal"
},
"x": {
"field": "x",
"type": "ordinal"
}
}
}
]
}
full example in vega editor
I want to add vertical Rule lines to my chart as date milestone indicators (like the red line in image).
X axis is dates (temportal), and y axis values are numbers.
In image is the closest I could get using explicit values for data property in Rule layer:
{
"mark": "rule",
"data": {
"values": [
"{\"x\":\"2020/04/10\"}"
]
},
"encoding": {
"x": {
"field": "x",
"type": "ordinal",
},
"color": {
"value": "red"
},
"size": {
"value": 1
}
}
}
I have also tried types: "type": "temportal", and "type": "quantitative", "aggregate": "distinct" with no luck.
My goal is to be able to add multiple red vertical Rule lines with explicit/constant x values to the chart.
Datum is meant for specifying literal fixed values. You can add several rules by layering them together with your main data. This approach works with quantitative data encoded in the x channel:
"layer": [
{
"mark": { "type": "line" },
"encoding": { "y": {...}, },
},
{
"mark": { "type": "rule", "color": "red", "size": 1, },
"encoding": {"x": {"datum": 42}},
},
{
"mark": { "type": "rule", "color": "blue", "size": 1, },
"encoding": {"x": {"datum": 100}},
},
]
For dealing with temporal data, you additionally have to specify how it should be parsed. This approach works for me:
"layer": [
{
// First layer: spec of your main linear plot.
},
{
// Second layer: spec of the vertical rulers.
"mark": { "type": "rule", "color": "red", "size": 2, },
"encoding": {
"x": { "field": "date", "type": "temporal", },
},
"data": {
"values": [
{"date": "25 May 2020 14:15:00"},
{"date": "25 May 2020 14:20:59"},
],
"format": {
"parse": {"date": "utc:'%d %b %Y %H:%M:%S'"}
}
},
},
]
I just started using Vega lite and was wondering how to cut out everything after my 10th object (I have thousands of rows and am just interested in the top 10).
This is what I have so far:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {
"url": "https://raw.githubusercontent.com/DanStein91/Info-vis/master/anage.csv",
"format": {
"type": "csv"
}
},
"transform": [
{
"filter": {
"field": "Female_maturity_(days)",
"gt": 0
}
}
],
"title": {
"text": "",
"anchor": "middle"
},
"mark": "bar",
"encoding": {
"y": {
"field": "Common_name",
"type": "nominal",
"sort": {
"op": "mean",
"field": "Female_maturity_(days)",
"order": "descending"
}
},
"x": {
"field": "Female_maturity_(days)",
"type": "quantitative"
}
},
"config": {}
}
You can follow the Filtering Top K Items example from the documentation. The result looks something like this (view in vega editor):
{
"data": {
"url": "https://raw.githubusercontent.com/DanStein91/Info-vis/master/anage.csv",
"format": {"type": "csv", "parse": {"Female_maturity_(days)": "number"}}
},
"transform": [
{
"window": [{"op": "rank", "as": "rank"}],
"sort": [{"field": "Female_maturity_(days)", "order": "descending"}]
},
{"filter": "datum.rank <= 10"}
],
"mark": "bar",
"encoding": {
"y": {
"field": "Common_name",
"type": "nominal",
"sort": {
"op": "mean",
"field": "Female_maturity_(days)",
"order": "descending"
}
},
"x": {"field": "Female_maturity_(days)", "type": "quantitative"}
},
"title": {"text": "", "anchor": "middle"}
}
One note: when doing transforms on CSV data (as opposed to JSON data), it's important to use format.parse to specify the desired data type for the columns: by default, CSV columns are interpreted as strings, which can cause sorting-based operations to behave in unexpected ways.