How can I rename legend labels in Vega Lite? - vega-lite

I've been trying for the last few days to rename the legend labels on my vega-lite chart.
Normally these labels match their respective data field names. I have a case where I'd like to give them a more descriptive name, but without renaming the original data names.
A simplified example:
vl.markLine()
.data([
{ t:1, v:5, c:'a' }, { t:2, v:3, c:'a' }, { t:3, v:7, c:'a' },
{ t:1, v:6, c:'b' }, { t:2, v:8, c:'b' }, { t:3, v:2, c:'b' }
])
.encode(
vl.x().fieldQ('t'),
vl.y().fieldQ('v'),
vl.color().fieldN('c')
)
.render()
How can I rename 'a' and 'b' in the legend, without changing the original data?
(I'm using the javascript API but will be happy with a JSON solution too).
I'd like to find a way that doesn't involve just copying and mapping all the data to another variable name just for the sake of the legend labels.
I've yet to find a way of manually entering the legend labels as something like "labels": ['long name for a', 'long name for b'].

There are two possible approaches to this. You can either use a calculate transform to modify the values within the data stream (open in editor):
{
"data": {
"values": [
{"t": 1, "v": 5, "c": "a"},
{"t": 2, "v": 3, "c": "a"},
{"t": 3, "v": 7, "c": "a"},
{"t": 1, "v": 6, "c": "b"},
{"t": 2, "v": 8, "c": "b"},
{"t": 3, "v": 2, "c": "b"}
]
},
"transform": [
{"calculate": "{'a': 'Label A', 'b': 'Label B'}[datum.c]", "as": "c"}
],
"mark": "line",
"encoding": {
"x": {"field": "t", "type": "quantitative"},
"y": {"field": "v", "type": "quantitative"},
"color": {"field": "c", "type": "nominal"}
}
}
or you can use a labelExpr to define new labels, referencing the original label as datum.label (open in editor):
{
"data": {
"values": [
{"t": 1, "v": 5, "c": "a"},
{"t": 2, "v": 3, "c": "a"},
{"t": 3, "v": 7, "c": "a"},
{"t": 1, "v": 6, "c": "b"},
{"t": 2, "v": 8, "c": "b"},
{"t": 3, "v": 2, "c": "b"}
]
},
"mark": "line",
"encoding": {
"x": {"field": "t", "type": "quantitative"},
"y": {"field": "v", "type": "quantitative"},
"color": {
"field": "c",
"type": "nominal",
"legend": {"labelExpr": "{'a': 'Label A', 'b': 'Label B'}[datum.label]"}
}
}
}
The benefit of the former approach is that it changes the values everywhere (including in, e.g. tooltips). The benefit of the latter approach is that it's more efficient, because it only does the transform once per unique label.

Related

Column facet: how to align header title to the bottom?

I would like to align to the bottom all the header titles of my facet columns.
This is my code
And I get the below result, the titles are not aligned downwards, each one has a different distance from the graphic bottom.
How to align these titles to the bottom?
Thank you
Looks like a bug. Can you raise it on GitHub? Here is a workaround.
{
"data": {"name": "data-eb2eb4918524c908955f7797d7245a00"},
"facet": {
"column": {
"field": "f",
"header": {"labelOrient": "bottom", "labelFontSize": 20, "title": ""}
}
},
"spec": {
"layer": [
{
"mark": {"type": "arc", "outerRadius": 100},
"encoding": {
"color": {
"field": "k",
"legend": {"title": ""},
"sort": {"field": "n"},
"type": "nominal"
},
"theta": {"field": "v", "stack": true, "type": "quantitative"}
}
},
{
"mark": {"type": "text", "fill": "black", "radius": 115},
"encoding": {
"color": {
"field": "k",
"legend": {"title": ""},
"sort": {"field": "n"},
"type": "nominal"
},
"text": {"field": "v", "format": ",.1f", "type": "quantitative"},
"theta": {"field": "v", "stack": true, "type": "quantitative"}
}
}
]
},
"resolve": {"scale": {"theta": "independent"}},
"$schema": "https://vega.github.io/schema/vega-lite/v5.2.0.json",
"datasets": {
"data-eb2eb4918524c908955f7797d7245a00": [
{"n": 1, "f": "Non Forest", "k": "N", "v": 32.20114689016321},
{"n": 2, "f": "Non Forest", "k": "E", "v": 22.30554330245552},
{"n": 3, "f": "Non Forest", "k": "S", "v": 14.350830760182326},
{"n": 4, "f": "Non Forest", "k": "W", "v": 31.14247904719894},
{"n": 5, "f": "Forest", "k": "N", "v": 24.525745257452574},
{"n": 6, "f": "Forest", "k": "E", "v": 20.460704607046072},
{"n": 7, "f": "Forest", "k": "S", "v": 21.00271002710027},
{"n": 8, "f": "Forest", "k": "W", "v": 34.010840108401084},
{"n": 9, "f": "Unclassified", "k": "N", "v": 29.437706725468576},
{"n": 10, "f": "Unclassified", "k": "E", "v": 32.08379272326351},
{"n": 11, "f": "Unclassified", "k": "S", "v": 16.427783902976845},
{"n": 12, "f": "Unclassified", "k": "W", "v": 22.05071664829107}
]
}
}

How do I select only unique values from a list for coloring?

The vega-lite code at the end considers every instance of the column "c" as a unique value and adds a corresponding separate entry to the legend like so:
I need to have only 3 colors in this case: red, blue and yellow - no combinations such as "blue, red". The decision logic would be 50-50, for example: if "blue" has a value of 3 and "blue, red" has a value of 4, the latter would be split into 2 for blue and 2 for red, totalling 5 (3+2) "blue" and 2 "red". If "blue, red" were 5 it would have 2.5 and 2.5 etc.
Here is the code:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {
"values": [
{"a": "A", "b": 2, "c": "red, blue"},
{"a": "A", "b": 7, "c": "yellow, blue"},
{"a": "A", "b": 4, "c": "blue, red"},
{"a": "B", "b": 1, "c": "blue"},
{"a": "B", "b": 2, "c": "red"}
]
},
"mark": "bar",
"encoding": {
"x": {"field": "a", "type": "nominal"},
"y": {"aggregate": "average", "field": "b", "type": "quantitative"},
"color": {"field": "c", "type": "nominal"}
}
}
After performing some transformations like fold, calculate and filter, you will be able to achieve the desired result, below in the snippet or refer editor:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"width": 200,
"data": {
"values": [
{"a": "A", "b": 2, "c": "red, blue"},
{"a": "A", "b": 7, "c": "yellow, blue"},
{"a": "A", "b": 4, "c": "blue, red"},
{"a": "B", "b": 1, "c": "blue"},
{"a": "B", "b": 2, "c": "red"}
]
},
"transform": [
{"fold": ["red", "blue", "yellow"]},
{
"calculate": "indexof(datum.c,datum.key) ? datum.b/2 : datum.b",
"as": "value"
},
{"filter": "indexof(datum.c,datum.key) > -1"},
{
"joinaggregate": [{"field": "value", "op": "sum", "as": "value_sum"}],
"groupby": ["key", "a"]
}
],
"mark": "bar",
"encoding": {
"tooltip": [{"field": "value_sum"}],
"x": {"field": "a", "type": "nominal"},
"y": {"field": "value", "type": "quantitative"},
"color": {
"field": "key",
"type": "nominal",
"scale": {"range": ["blue", "red", "yellow"]}
}
}
}
Let me know if this works

How can I filter values with multiple conditions?

How can I filter values with multiple conditions based on other columns? In this post, the answer shows how to filter with a single condition using filter transform:
{
"data": {
"values": [
{"a": "A", "b": 2, "c": "red"},
{"a": "A", "b": 7, "c": "yellow"},
{"a": "A", "b": 4, "c": "blue"},
{"a": "B", "b": 1, "c": "blue"},
{"a": "B", "b": 2, "c": "red"}
]
},
"transform": [{"filter": "datum.c == 'red'"}],
"mark": "bar",
"encoding": {
"x": {"field": "a", "type": "nominal"},
"y": {"aggregate": "average", "field": "b", "type": "quantitative"}
}
}
I want to filter all values with "red" and "blue" in the column c, so no "yellow". I believe the OneOf logical operator needs to be used as shown in the vega documentation, but I can't figure out how. I changed the transform part to:
"transform": [{"field": "c", "oneOf": ["red", "blue"]}],
but that doesn't work in the online vega editor.
The filter transform can be used as string as well as object. While using string type you can give your condition as:
"transform": [{"filter": "datum.c == 'red' || datum.c == 'blue'"}],
And while using object it can be:
"transform": [{"filter": {"field": "c", "oneOf": ["red", "blue"]}}],
Depending on the complexity you can use any of the two types. To answer your question refer the below code or refer editor:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {
"values": [
{"a": "A", "b": 2, "c": "red"},
{"a": "A", "b": 7, "c": "yellow"},
{"a": "A", "b": 4, "c": "blue"},
{"a": "B", "b": 1, "c": "blue"},
{"a": "B", "b": 2, "c": "red"}
]
},
"transform": [{"filter": {"field": "c", "oneOf": ["red", "blue"]}}],
"mark": "bar",
"encoding": {
"x": {"field": "a", "type": "nominal"},
"y": {"aggregate": "average", "field": "b", "type": "quantitative"}
}
}

How do I use factorial (!) in a vega formula transform

I am trying to create a histogram of a binomial distribution PMF using a vega js specification.
How is this usually done? The vega expressions does not include functions for choose, or factorial, nor does it include a binomial distribution under the statistical functions.
I also cannot seem to reference other functions within the vega specification (i.e. for yval below).
"data":[
{"name": "dataset",
"transform": [
{"type":"sequence", "start": 1, "stop": 50, "step": 1, "as": "seq" },
{"type": "formula", "as": "xval", "expr": "if(datum.seq<nval,datum.seq,NaN)"},
{"type": "formula", "as": "yval", "expr": "math.factorial(datum.xval)
" }
]}],
Thanks.
There is no factorial operation available, but one suitable option might be to approximate it with Stirling's approximation, or perhaps a Stirling series if more accuracy is required.
For example, in Vega-Lite (view in editor):
{
"data": {
"values": [
{"n": 0, "factorial": 1},
{"n": 1, "factorial": 1},
{"n": 2, "factorial": 2},
{"n": 3, "factorial": 6},
{"n": 4, "factorial": 24},
{"n": 5, "factorial": 120},
{"n": 6, "factorial": 720},
{"n": 7, "factorial": 5040},
{"n": 8, "factorial": 40320},
{"n": 9, "factorial": 362880},
{"n": 10, "factorial": 3628800}
]
},
"transform": [
{
"calculate": "datum.n == 0 ? 1 : sqrt(2 * PI * datum.n) * pow(datum.n / E, datum.n)",
"as": "stirling"
},
{"fold": ["factorial", "stirling"]}
],
"mark": "point",
"encoding": {
"x": {"field": "n", "type": "quantitative"},
"y": {"field": "value", "type": "quantitative", "scale": {"type": "log"}},
"color": {"field": "key", "type": "nominal"}
}
}

When to nest mark property in Layer versus Top-Level Vega-lite spec?

I am wondering how Vega-lite works with respect to tying Marks to associated Encodings.
In the below example, both the encoding and the mark are at the "top-level" of the spec:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"description": "A simple bar chart with embedded data.",
"data": {
"values": [
{"a": "A", "b": 28}, {"a": "B", "b": 55}, {"a": "C", "b": 43},
{"a": "D", "b": 91}, {"a": "E", "b": 81}, {"a": "F", "b": 53},
{"a": "G", "b": 19}, {"a": "H", "b": 87}, {"a": "I", "b": 52}
]
},
"mark": "bar",
"encoding": {
"x": {"field": "a", "type": "nominal", "axis": {"labelAngle": 0}},
"y": {"field": "b", "type": "quantitative"}
}
}
And with the simplest layer example, both the bar mark and the text mark are nested in the Layer property
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"description": "Bar chart with text labels. Apply scale padding to make the frame cover the labels.",
"data": {
"values": [
{"a": "A", "b": 28},
{"a": "B", "b": 55},
{"a": "C", "b": 43}
]
},
"encoding": {
"y": {"field": "a", "type": "nominal"},
"x": {"field": "b", "type": "quantitative", "scale": {"padding": 10}}
},
"layer": [{
"mark": "bar"
}, {
"mark": {
"type": "text",
"align": "left",
"baseline": "middle",
"dx": 3
},
"encoding": {
"text": {"field": "b", "type": "quantitative"}
}
}]
}
In this case, am I correct to assume that any Mark in the Layer property automatically inherits the encodings at the top-level?
Further, I notice that I cannot move the bar Mark outside of the Layer property (Vega Editor prompts that this is not an allowed property and bars fail to render if placed in top-level).
Finally, in more complicated example still (see: https://vega.github.io/vega-lite/examples/layer_line_mean_point_raw.html), the encodings are repeated in the layer (despite having a redundant x encoding) -> so in this case, when is it appropriate to place encoding at the top level versus in the layer?
The Vega-lite docs go into a fair bit of detail about the configuration of these properties but I have not been able to find a conceptual answer to these 3 questions.
Thank you
Vega-Lite provides a hierarchical chart model, where each level in the hierarchy can override various properties declared in the parent level. In terms of layer specifications, the relevant concepts are this:
a UnitSpec is what you think of as a single chart: it it, you can specify data, mark, encodings, transforms, and other properties.
a LayerSpec, is a container that can hold a number of UnitSpec or LayerSpec specifications in the layers property. Additionally, you can specify data, encodings transforms, and other properties (but not mark).
A UnitSpec that is within a LayerSpec or other top-level object will inherit any properties specified there (such as data, encodings, transforms, etc.), and is also able to override them by specifying its own data, encodings, or transforms.
Similar hierarchical concepts apply to other compound chart types, such as ConcatSpec, VConcatSpec, HConcatSpec, FacetSpec, etc.
More concretely, in your example, the data and some encodings are defined in the top-level layer:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"description": "Bar chart with text labels. Apply scale padding to make the frame cover the labels.",
"data": {
"values": [
{"a": "A", "b": 28},
{"a": "B", "b": 55},
{"a": "C", "b": 43}
]
},
"encoding": {
"y": {"field": "a", "type": "nominal"},
"x": {"field": "b", "type": "quantitative", "scale": {"padding": 10}}
},
"layer": [{
"mark": "bar"
}, {
"mark": {
"type": "text",
"align": "left",
"baseline": "middle",
"dx": 3
},
"encoding": {
"text": {"field": "b", "type": "quantitative"}
}
}]
}
In terms of the inheritance from parent, this is functionally equivalent to the following, where I have moved data and encodings from the top-level into each contained UnitSpec:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"description": "Bar chart with text labels. Apply scale padding to make the frame cover the labels.",
"layer": [{
"data": {
"values": [
{"a": "A", "b": 28},
{"a": "B", "b": 55},
{"a": "C", "b": 43}
]
},
"mark": "bar"
"encoding": {
"y": {"field": "a", "type": "nominal"},
"x": {"field": "b", "type": "quantitative", "scale": {"padding": 10}}
},
}, {
"data": {
"values": [
{"a": "A", "b": 28},
{"a": "B", "b": 55},
{"a": "C", "b": 43}
]
},
"mark": {
"type": "text",
"align": "left",
"baseline": "middle",
"dx": 3
},
"encoding": {
"y": {"field": "a", "type": "nominal"},
"x": {"field": "b", "type": "quantitative", "scale": {"padding": 10}}
"text": {"field": "b", "type": "quantitative"}
}
}]
}
Specifying shared properties at the top level is a way to make chart specifications more concise and understandable.