Vega lite select N number of objects (count) - vega-lite

I just started using Vega lite and was wondering how to cut out everything after my 10th object (I have thousands of rows and am just interested in the top 10).
This is what I have so far:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {
"url": "https://raw.githubusercontent.com/DanStein91/Info-vis/master/anage.csv",
"format": {
"type": "csv"
}
},
"transform": [
{
"filter": {
"field": "Female_maturity_(days)",
"gt": 0
}
}
],
"title": {
"text": "",
"anchor": "middle"
},
"mark": "bar",
"encoding": {
"y": {
"field": "Common_name",
"type": "nominal",
"sort": {
"op": "mean",
"field": "Female_maturity_(days)",
"order": "descending"
}
},
"x": {
"field": "Female_maturity_(days)",
"type": "quantitative"
}
},
"config": {}
}

You can follow the Filtering Top K Items example from the documentation. The result looks something like this (view in vega editor):
{
"data": {
"url": "https://raw.githubusercontent.com/DanStein91/Info-vis/master/anage.csv",
"format": {"type": "csv", "parse": {"Female_maturity_(days)": "number"}}
},
"transform": [
{
"window": [{"op": "rank", "as": "rank"}],
"sort": [{"field": "Female_maturity_(days)", "order": "descending"}]
},
{"filter": "datum.rank <= 10"}
],
"mark": "bar",
"encoding": {
"y": {
"field": "Common_name",
"type": "nominal",
"sort": {
"op": "mean",
"field": "Female_maturity_(days)",
"order": "descending"
}
},
"x": {"field": "Female_maturity_(days)", "type": "quantitative"}
},
"title": {"text": "", "anchor": "middle"}
}
One note: when doing transforms on CSV data (as opposed to JSON data), it's important to use format.parse to specify the desired data type for the columns: by default, CSV columns are interpreted as strings, which can cause sorting-based operations to behave in unexpected ways.

Related

Cannot impute missing values

Have this image
Given this vega-lite
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {
"values": [
{
"timestamp": "2011-04-01T17:06:21.000Z",
"value": 0.44777528325189986
},
{
"timestamp": "2011-04-02T17:06:21.000Z",
"value": 0.44390285331388984
},
{
"timestamp": "2011-04-03T17:06:21.000Z",
"value": 0.44813958999449255
},
{
"timestamp": "2011-04-04T17:06:21.000Z",
"value": 0.4440416510172272
},
{
"timestamp": "2011-04-05T17:06:21.000Z",
"missing": "NO value KEY HERE!"
},
{
"timestamp": "2011-04-06T17:06:21.000Z",
"value": 0.3797480270068858
},
{
"timestamp": "2011-04-07T17:06:21.000Z",
"value": 0.31955288375970203
},
{
"timestamp": "2011-04-08T17:06:21.000Z",
"value": 0.3171368880067786
},
{
"timestamp": "2011-04-10T17:06:21.000Z",
"value": 0.30021395605134893
},
{
"timestamp": "2011-04-11T17:06:21.000Z",
"value": 0.3130485242947531
}
]
},
"encoding": {"y": {"field": "timestamp", "type": "temporal", "sort": "ascending"}},
"layer": [
{
"mark": {"type": "line", "interpolate": "cardinal"},
"encoding": {
"x": {
"field": "value",
"sort": null,
"type": "quantitative",
"axis": {"orient": "top"},
"impute": {"keyvals": ["value"], "method": "mean", "frame": [-5, 5]}
}
}
}
]
}
But I thought the impute line would cause it to fill that gap in the data:
"impute": {"keyvals": ["value"], "method": "mean", "frame": [-5, 5]}
Have tried many permutations of this, including:
changing keyvals to ["timestamp"]
Moving the impute line to inside the "encoding": {"y": ... definition
#2 but also switch keyvals to ["value"]
None of those seem to be working
Update
Also tried an impute in transform, and that doesn't work either:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {
"values": [
...
]
},
"transform": [
{
"impute": "value",
"key": "timestamp",
"frame": [-1, 1],
"method": "mean"
}
],
"encoding": {"y": {"field": "timestamp", "type": "temporal", "sort": "ascending"}},
"layer": [
{
"mark": {"type": "line", "interpolate": "cardinal"},
"encoding": {
"x": {
"field": "value",
"sort": null,
"type": "quantitative",
"axis": {"orient": "top"}
}
}
}
]
}
Update 2
Here's something that almost feels like progress, but doesn't behave how I would expect. This is the exact same data with the "transform" : [ "impute" : { ... approach, but now it's displaying imputed_value_value (which by the way is never mentioned in the docs) instead of value:
It does successfully impute, but it imputes (averages) everything, when I only want it to impute places with missing data. Is this how impute is supposed to work?
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {
"values": [
...
]
},
"transform": [
{
"impute": "value",
"key": "timestamp",
"frame": [-5, 5],
"method": "mean"
}
],
"encoding": {"y": {"field": "timestamp", "type": "temporal", "sort": "ascending"}},
"layer": [
{
"mark": {"type": "line", "interpolate": "cardinal"},
"encoding": {
"x": {
"field": "imputed_value_value",
"sort": null,
"type": "quantitative",
"axis": {"orient": "top"},
}
}
}
]
}

How do you fix the rendered text in a hconcat pyramid chart?

I am trying to create a concat pyramid chart, but the text in the middle seems to have a problem rendering properly. Changing the field for mark text to something that is a number does not have this render problem. This is the example I followed to and modify from. Population Pyramid
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"spacing": 0,
"hconcat": [
{
"transform": [
{ "filter": { "field": "sentiment", "equal": "negative" } }
],
"encoding": {
"y": { "field": "type", "title": null, "axis": null },
"x": {
"field": "sentiment",
"aggregate": "count",
"axis": null,
"sort": "descending"
}
},
"layer": [
{ "mark": "bar", "encoding": { "color": { "field": "channel" } } }
]
},
{
"width": 100,
"view": { "stroke": null },
"mark": { "type": "text", "align": "center" },
"encoding": {
"y": { "field": "type", "axis": null },
"text": { "field": "type" }
}
},
{
"mark": "bar",
"transform": [
{ "filter": { "field": "sentiment", "equal": "positive" } }
],
"encoding": {
"color": { "field": "channel" },
"y": { "field": "type", "axis": null },
"x": { "field": "sentiment", "aggregate": "count", "axis": null }
}
}
],
"config": { "view": { "stroke": null }, "axis": { "grid": false } },
"data": {
"values": [
{
"id": 1,
"type": "shops",
"channel": "line man",
"sentiment": "negative"
}
]
}
}
Since you have not done any aggregation in your text chart, each text mark is drawn multiple times – once per corresponding row in the data. This stacking of multiple text marks is what makes it appear as if it's rendered poorly.
To ensure that each text mark is only drawn once, you'll need to aggregate the data. There are a few ways to do this, but the easiest here is to use the argmin or argmax of an associated numerical column:
"encoding": {
"y": {"field": "type", "axis": null},
"text": {"field": "type", "aggregate": {"argmin": "id"}}
}

Vega Lite: Normalized Stacked Bar Chart + Overlay percentages as text

I have a stacked normalized bar chart similar to this:
https://vega.github.io/editor/#/examples/vega-lite/stacked_bar_normalize
I'm trying to show the related percentages (per bar segment) as text on the bars similar to: https://gist.github.com/pratapvardhan/00800a4981d43a84efdba0c4cf8ee2e1
I tried adding a transform field to calculate the percentages, but still couldn't get it to work after hours of trying.
I'm lost help 🥺
My best try:
{
"description":
"A bar chart showing the US population distribution of age groups and gender in 2000.",
"data": {
"url": "data/population.json"
},
"transform": [
{"filter": "datum.year == 2000"},
{"calculate": "datum.sex == 2 ? 'Female' : 'Male'", "as": "gender"},
{
"stack": "people",
"offset": "normalize",
"as": ["v1", "v2"],
"groupby": ["age"],
"sort": [{"field": "gender", "order": "descending"}]
}
],
"encoding": {
"y": {
"field": "v1",
"type": "quantitative",
"title": "population"
},
"y2": {"field": "v2"},
"x": {
"field": "age",
"type": "ordinal"
},
"color": {
"field": "gender",
"type": "nominal",
"scale": {
"range": ["#675193", "#ca8861"]
}
}
},
"layer":[
{ "mark": "bar"},
{"mark": {"type": "text", "dx": 0, "dy": 0},
"encoding": {
"color":{"value":"black"},
"text": { "field": "v1", "type": "quantitative", "format": ".1f"}}
}
]
}
You can use a joinaggregate transform to normalize each group, and then use "format": ".1%" to display fractions as percents. Using this, there is no need to manually compute the stack transform; it is simpler to specify the stack via the encoding, as in the example you linked to.
Here is the result (open in editor):
{
"description": "A bar chart showing the US population distribution of age groups and gender in 2000.",
"data": {"url": "data/population.json"},
"transform": [
{"filter": "datum.year == 2000"},
{"calculate": "datum.sex == 2 ? 'Female' : 'Male'", "as": "gender"},
{
"joinaggregate": [{"op": "sum", "field": "people", "as": "total"}],
"groupby": ["age"]
},
{"calculate": "datum.people / datum.total", "as": "fraction"}
],
"encoding": {
"y": {
"aggregate": "sum",
"field": "people",
"title": "population",
"stack": "normalize"
},
"order": {"field": "gender", "sort": "descending"},
"x": {"field": "age", "type": "ordinal"},
"color": {
"field": "gender",
"type": "nominal",
"scale": {"range": ["#675193", "#ca8861"]}
}
},
"layer": [
{"mark": "bar"},
{
"mark": {"type": "text", "dx": 20, "dy": 0, "angle": 90},
"encoding": {
"color": {"value": "white"},
"text": {"field": "fraction", "type": "quantitative", "format": ".1%"}
}
}
]
}

How to filter data in vega-lite?

I have a following code for line plot, I am not sure how to use the filter transform, I have the mark and encoding inside a layer to use the tooltip for the plot
{
"$schema": "https://vega.github.io/schema/vega-lite/v2.4.json",
"title": "Dashboard",
"data": {
"url" : {
"%context%": true,
"index": "paytrans",
"body": {
"size":10000,
"_source": ["Metrics","Value","ModelName"],
}
}
"format": {"property": "hits.hits"},
},
"layer": [
{
"mark": {
"type": "line",
"point": true
},
"encoding": {
"x": {"field": "_source.ModelName",
"type": "ordinal",
"title":"Models"
"axis": {
"labelAngle": 0
}
},
"y": {"field": "_source.Value", "type": "quantitative", "title":"Metric Score"
"scale": { "domain": [0.0, 1.0] }},
"color": {"field": "_source.Metrics", "type": "nominal", "title":"Metrics"},
"tooltip": [
{"field": "_source.Metrics", "type": "nominal", "title":"Metric"},
{"field": "_source.Value", "type": "quantitative", "title":"Value"}
]
}
}
]
}
If I add
"transform": [
{
"filter": "datum.Value <= 0.5"
}
],
Its not working, may I how to filter the Value Field
It appears that you don't have a field named Value; you have a field named _source.Value. So the correct way to filter would be:
"transform": [
{
"filter": "datum._source.Value <= 0.5"
}
],

Groupby aggregations and missing combinations of values

I recently started tinkering with Vega-Lite templates to make a confusion matrix for an open-source data science software called DVC. You can see the template in my PR here, but I'll also repeat a simplified version below:
{
...
"data": {
"values": [
{"actual": "Wake", "predicted": "Wake", "rev": "HEAD"},
{"actual": "Wake", "predicted": "Deep", "rev": "HEAD"},
{"actual": "Light", "predicted": "Wake", "rev": "HEAD"},
{"actual": "REM", "predicted": "Light", "rev": "HEAD"},
....
],
},
"spec": {
"transform": [
{
"aggregate": [{"op": "count", "as": "xy_count"}],
"groupby": ["actual", "predicted"],
},
{
"joinaggregate": [
{"op": "max", "field": "xy_count", "as": "max_count"}
],
"groupby": [],
},
{
"calculate": "datum.xy_count / datum.max_count",
"as": "percent_of_max",
},
],
"encoding": {
"x": {"field": "predicted", "type": "nominal", "sort": "ascending"},
"y": {"field": "actual", "type": "nominal", "sort": "ascending"},
},
"layer": [
{
"mark": "rect",
"width": 300,
"height": 300,
"encoding": {
"color": {
"field": "xy_count",
"type": "quantitative",
"title": "",
"scale": {"domainMin": 0, "nice": True},
}
},
},
{
"mark": "text",
"encoding": {
"text": {
"field": "xy_count",
"type": "quantitative"
},
"color": {
"condition": {
"test": "datum.xy_count / datum.max_count > 0.5",
"value": "white"
},
"value": "black"
}
}
}
]
}
}
So, since I'm doing a groupby aggregation, it's possible for there to be cells in the confusion matrix with no entries. Here's an example output: link
How can I fill in these cells with "fallback" or something. I also looked at using pivot and impute, but couldn't quite figure it out. Help much appreciated :)
You can do this by adding two Impute transforms to the end of your sequence of transforms:
{"impute": "xy_count", "groupby": ["actual"], "key": "predicted", "keyvals": ["Deep", "Light", "Wake", "REM"], "value": 0},
{"impute": "xy_count", "groupby": ["predicted"], "key": "actual", "keyvals": ["Deep", "Light", "Wake", "REM"], "value": 0}
The keyvals specify which missing values you would like to be imputed on each axis; you can leave it out if at least one of the groups is present for each keyval.