How to filter data in vega-lite? - vega-lite

I have a following code for line plot, I am not sure how to use the filter transform, I have the mark and encoding inside a layer to use the tooltip for the plot
{
"$schema": "https://vega.github.io/schema/vega-lite/v2.4.json",
"title": "Dashboard",
"data": {
"url" : {
"%context%": true,
"index": "paytrans",
"body": {
"size":10000,
"_source": ["Metrics","Value","ModelName"],
}
}
"format": {"property": "hits.hits"},
},
"layer": [
{
"mark": {
"type": "line",
"point": true
},
"encoding": {
"x": {"field": "_source.ModelName",
"type": "ordinal",
"title":"Models"
"axis": {
"labelAngle": 0
}
},
"y": {"field": "_source.Value", "type": "quantitative", "title":"Metric Score"
"scale": { "domain": [0.0, 1.0] }},
"color": {"field": "_source.Metrics", "type": "nominal", "title":"Metrics"},
"tooltip": [
{"field": "_source.Metrics", "type": "nominal", "title":"Metric"},
{"field": "_source.Value", "type": "quantitative", "title":"Value"}
]
}
}
]
}
If I add
"transform": [
{
"filter": "datum.Value <= 0.5"
}
],
Its not working, may I how to filter the Value Field

It appears that you don't have a field named Value; you have a field named _source.Value. So the correct way to filter would be:
"transform": [
{
"filter": "datum._source.Value <= 0.5"
}
],

Related

Cannot impute missing values

Have this image
Given this vega-lite
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {
"values": [
{
"timestamp": "2011-04-01T17:06:21.000Z",
"value": 0.44777528325189986
},
{
"timestamp": "2011-04-02T17:06:21.000Z",
"value": 0.44390285331388984
},
{
"timestamp": "2011-04-03T17:06:21.000Z",
"value": 0.44813958999449255
},
{
"timestamp": "2011-04-04T17:06:21.000Z",
"value": 0.4440416510172272
},
{
"timestamp": "2011-04-05T17:06:21.000Z",
"missing": "NO value KEY HERE!"
},
{
"timestamp": "2011-04-06T17:06:21.000Z",
"value": 0.3797480270068858
},
{
"timestamp": "2011-04-07T17:06:21.000Z",
"value": 0.31955288375970203
},
{
"timestamp": "2011-04-08T17:06:21.000Z",
"value": 0.3171368880067786
},
{
"timestamp": "2011-04-10T17:06:21.000Z",
"value": 0.30021395605134893
},
{
"timestamp": "2011-04-11T17:06:21.000Z",
"value": 0.3130485242947531
}
]
},
"encoding": {"y": {"field": "timestamp", "type": "temporal", "sort": "ascending"}},
"layer": [
{
"mark": {"type": "line", "interpolate": "cardinal"},
"encoding": {
"x": {
"field": "value",
"sort": null,
"type": "quantitative",
"axis": {"orient": "top"},
"impute": {"keyvals": ["value"], "method": "mean", "frame": [-5, 5]}
}
}
}
]
}
But I thought the impute line would cause it to fill that gap in the data:
"impute": {"keyvals": ["value"], "method": "mean", "frame": [-5, 5]}
Have tried many permutations of this, including:
changing keyvals to ["timestamp"]
Moving the impute line to inside the "encoding": {"y": ... definition
#2 but also switch keyvals to ["value"]
None of those seem to be working
Update
Also tried an impute in transform, and that doesn't work either:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {
"values": [
...
]
},
"transform": [
{
"impute": "value",
"key": "timestamp",
"frame": [-1, 1],
"method": "mean"
}
],
"encoding": {"y": {"field": "timestamp", "type": "temporal", "sort": "ascending"}},
"layer": [
{
"mark": {"type": "line", "interpolate": "cardinal"},
"encoding": {
"x": {
"field": "value",
"sort": null,
"type": "quantitative",
"axis": {"orient": "top"}
}
}
}
]
}
Update 2
Here's something that almost feels like progress, but doesn't behave how I would expect. This is the exact same data with the "transform" : [ "impute" : { ... approach, but now it's displaying imputed_value_value (which by the way is never mentioned in the docs) instead of value:
It does successfully impute, but it imputes (averages) everything, when I only want it to impute places with missing data. Is this how impute is supposed to work?
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {
"values": [
...
]
},
"transform": [
{
"impute": "value",
"key": "timestamp",
"frame": [-5, 5],
"method": "mean"
}
],
"encoding": {"y": {"field": "timestamp", "type": "temporal", "sort": "ascending"}},
"layer": [
{
"mark": {"type": "line", "interpolate": "cardinal"},
"encoding": {
"x": {
"field": "imputed_value_value",
"sort": null,
"type": "quantitative",
"axis": {"orient": "top"},
}
}
}
]
}

Is it possible to apply the same condition as color encoding for legend

My source code is following
"transform": [
{
"window": [
{
"op": "rank", "field": "Value", "as": "_rank"
}
],
"sort": [
{
"field": "Value",
"order": "descending"
}
]
}
],
"encoding": {
"color": {
"field": "_rank",
"condition": {
"test": "datum._rank>5",
"value": "grey"
}
},
"x": {
"field": "Week",
"type": "nominal",
"axis": {
"labelAngle": 0
}
},
"y": {
"field": "Value",
"type": "quantitative",
"axis": {
"grid": false
}
}
},
"layer": [
{
"mark": {
"type": "bar",
"tooltip": true
}
},
{
"mark": {
"type": "text",
"align": "center",
"baseline": "middle",
"dx": 0,
"dy": -5,
"tooltip": true
},
"encoding": {
"text": {
"field": "Value"
}
}
}
]
I put a condition for the color encoding to show anything but top5 to show in different colors and any values that are not top5 should be grey.
"color": {
"field": "_rank",
"condition": {
"test": "datum._rank>5",
"value": "grey"
}
}
It is all good for the bars but the legends don't generate with the same conditions.
Is it possible to extend the same top5 logics for the legend's color as well? i.e. anything <5 are grey in color (each) in legend and everything else is the same color as the condition (currently this part is getting generated)
Editor
The legend colors will reflect the color scale that you specify, and not reflect conditions.
The easiest way to do what you want is likely by setting the range for your color scheme; for example:
{
"data": {"url": "data/cars.json"},
"mark": "point",
"encoding": {
"x": {"field": "Horsepower", "type": "quantitative"},
"y": {"field": "Miles_per_Gallon", "type": "quantitative"},
"color": {
"field": "Origin", "type": "nominal",
"scale": {"range": ["purple", "#ff0000", "teal"]}
}
}
}
You'll have to modify the specified colors based on how many color categories you have in your data.

How do you fix the rendered text in a hconcat pyramid chart?

I am trying to create a concat pyramid chart, but the text in the middle seems to have a problem rendering properly. Changing the field for mark text to something that is a number does not have this render problem. This is the example I followed to and modify from. Population Pyramid
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"spacing": 0,
"hconcat": [
{
"transform": [
{ "filter": { "field": "sentiment", "equal": "negative" } }
],
"encoding": {
"y": { "field": "type", "title": null, "axis": null },
"x": {
"field": "sentiment",
"aggregate": "count",
"axis": null,
"sort": "descending"
}
},
"layer": [
{ "mark": "bar", "encoding": { "color": { "field": "channel" } } }
]
},
{
"width": 100,
"view": { "stroke": null },
"mark": { "type": "text", "align": "center" },
"encoding": {
"y": { "field": "type", "axis": null },
"text": { "field": "type" }
}
},
{
"mark": "bar",
"transform": [
{ "filter": { "field": "sentiment", "equal": "positive" } }
],
"encoding": {
"color": { "field": "channel" },
"y": { "field": "type", "axis": null },
"x": { "field": "sentiment", "aggregate": "count", "axis": null }
}
}
],
"config": { "view": { "stroke": null }, "axis": { "grid": false } },
"data": {
"values": [
{
"id": 1,
"type": "shops",
"channel": "line man",
"sentiment": "negative"
}
]
}
}
Since you have not done any aggregation in your text chart, each text mark is drawn multiple times – once per corresponding row in the data. This stacking of multiple text marks is what makes it appear as if it's rendered poorly.
To ensure that each text mark is only drawn once, you'll need to aggregate the data. There are a few ways to do this, but the easiest here is to use the argmin or argmax of an associated numerical column:
"encoding": {
"y": {"field": "type", "axis": null},
"text": {"field": "type", "aggregate": {"argmin": "id"}}
}

Vega lite highlighting data

I have an hconcat function in Vega lite displaying country and a score. I wanted to highlight some countries (changing the color of some and leaving the rest as it is) but when ever I use the color function either it gives me an error if it is outsite go hconcad or just highlights the countries I want in one graph but not display the rest:
With nothing:
with color in hconcat:
How can I do a highlight outside of the hconcat (or inside I can just repeated across all graphs), while leaving all the other data in the same color.
My code:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {
"url": "https://raw.githubusercontent.com/DanStein91/Info-vis/master/coffee.csv",
"format": {
"type": "csv",
"parse": {
"Aroma": "number",
"Flavor": "number",
"Aftertaste": "number",
"Acidity": "number",
"Clean_Cup": "number",
"Body": "number",
"Balance": "number",
"Uniformity": "number",
"Cupper_Points": "number",
"Sweetness": "number"
}
}
},
"transform": [
{
"filter": "datum.Country_of_Origin"
},
{
"calculate": "datum.Aroma + datum.Flavor + datum.Aftertaste + datum.Acidity + datum.Sweetness + datum.Balance ",
"as": "Taste_Points"
},
{
"calculate": "datum.Cupper_Points + datum.Clean_Cup + datum.Uniformity",
"as": "Cup_Points"
}
],
"hconcat": [
{
"mark": "bar",
"encoding": {
"y": {
"field": "Country_of_Origin",
"type": "nominal",
"sort": "-x"
},
"x": {
"field": "Taste_Points",
"type": "quantitative",
"aggregate": "mean"
}
}
},
{
"mark": "bar",
"encoding": {
"y": {
"field": "Country_of_Origin",
"type": "nominal",
"sort": "-x"
},
"x": {
"field": "Cup_Points",
"type": "quantitative",
"aggregate": "mean"
}
}
},
{
"mark": "bar",
"encoding": {
"y": {
"field": "Country_of_Origin",
"type": "nominal",
"sort": "-x"
},
"x": {
"field": "Total_Cup_Points",
"type": "quantitative",
"aggregate": "mean"
},
"color": {
"field": "Country_of_Origin",
"type": "nominal",
"scale": {
"domain": [
"Papua New Guinea",
"Mauritius"
],
"range": [
"#8101FA",
"#00C7A9"
]
}
}
}
}
],
"config": {}
}
Thanks.
You can do this using a Repeat Chart along with a Condition in the color encoding. The result might look something like this (view in editor):
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {
"url": "https://raw.githubusercontent.com/DanStein91/Info-vis/master/coffee.csv",
"format": {
"type": "csv",
"parse": {
"Aroma": "number",
"Flavor": "number",
"Aftertaste": "number",
"Acidity": "number",
"Clean_Cup": "number",
"Body": "number",
"Balance": "number",
"Uniformity": "number",
"Cupper_Points": "number",
"Sweetness": "number"
}
}
},
"transform": [
{"filter": "datum.Country_of_Origin"},
{
"calculate": "datum.Aroma + datum.Flavor + datum.Aftertaste + datum.Acidity + datum.Sweetness + datum.Balance ",
"as": "Taste_Points"
},
{
"calculate": "datum.Cupper_Points + datum.Clean_Cup + datum.Uniformity",
"as": "Cup_Points"
}
],
"repeat": ["Taste_Points", "Cup_Points", "Total_Cup_Points"],
"spec": {
"mark": "bar",
"encoding": {
"y": {"field": "Country_of_Origin", "type": "nominal", "sort": "-x"},
"x": {
"field": {"repeat": "repeat"},
"type": "quantitative",
"aggregate": "mean"
},
"color": {
"value": "steelblue",
"condition": {
"test": {
"field": "Country_of_Origin",
"oneOf": ["Papua New Guinea", "Mauritius"]
},
"field": "Country_of_Origin",
"type": "nominal",
"scale": {
"domain": ["Papua New Guinea", "Mauritius"],
"range": ["#8101FA", "#00C7A9"]
}
}
}
}
}
}

Vega lite select N number of objects (count)

I just started using Vega lite and was wondering how to cut out everything after my 10th object (I have thousands of rows and am just interested in the top 10).
This is what I have so far:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {
"url": "https://raw.githubusercontent.com/DanStein91/Info-vis/master/anage.csv",
"format": {
"type": "csv"
}
},
"transform": [
{
"filter": {
"field": "Female_maturity_(days)",
"gt": 0
}
}
],
"title": {
"text": "",
"anchor": "middle"
},
"mark": "bar",
"encoding": {
"y": {
"field": "Common_name",
"type": "nominal",
"sort": {
"op": "mean",
"field": "Female_maturity_(days)",
"order": "descending"
}
},
"x": {
"field": "Female_maturity_(days)",
"type": "quantitative"
}
},
"config": {}
}
You can follow the Filtering Top K Items example from the documentation. The result looks something like this (view in vega editor):
{
"data": {
"url": "https://raw.githubusercontent.com/DanStein91/Info-vis/master/anage.csv",
"format": {"type": "csv", "parse": {"Female_maturity_(days)": "number"}}
},
"transform": [
{
"window": [{"op": "rank", "as": "rank"}],
"sort": [{"field": "Female_maturity_(days)", "order": "descending"}]
},
{"filter": "datum.rank <= 10"}
],
"mark": "bar",
"encoding": {
"y": {
"field": "Common_name",
"type": "nominal",
"sort": {
"op": "mean",
"field": "Female_maturity_(days)",
"order": "descending"
}
},
"x": {"field": "Female_maturity_(days)", "type": "quantitative"}
},
"title": {"text": "", "anchor": "middle"}
}
One note: when doing transforms on CSV data (as opposed to JSON data), it's important to use format.parse to specify the desired data type for the columns: by default, CSV columns are interpreted as strings, which can cause sorting-based operations to behave in unexpected ways.