How do I use factorial (!) in a vega formula transform - vega-lite

I am trying to create a histogram of a binomial distribution PMF using a vega js specification.
How is this usually done? The vega expressions does not include functions for choose, or factorial, nor does it include a binomial distribution under the statistical functions.
I also cannot seem to reference other functions within the vega specification (i.e. for yval below).
"data":[
{"name": "dataset",
"transform": [
{"type":"sequence", "start": 1, "stop": 50, "step": 1, "as": "seq" },
{"type": "formula", "as": "xval", "expr": "if(datum.seq<nval,datum.seq,NaN)"},
{"type": "formula", "as": "yval", "expr": "math.factorial(datum.xval)
" }
]}],
Thanks.

There is no factorial operation available, but one suitable option might be to approximate it with Stirling's approximation, or perhaps a Stirling series if more accuracy is required.
For example, in Vega-Lite (view in editor):
{
"data": {
"values": [
{"n": 0, "factorial": 1},
{"n": 1, "factorial": 1},
{"n": 2, "factorial": 2},
{"n": 3, "factorial": 6},
{"n": 4, "factorial": 24},
{"n": 5, "factorial": 120},
{"n": 6, "factorial": 720},
{"n": 7, "factorial": 5040},
{"n": 8, "factorial": 40320},
{"n": 9, "factorial": 362880},
{"n": 10, "factorial": 3628800}
]
},
"transform": [
{
"calculate": "datum.n == 0 ? 1 : sqrt(2 * PI * datum.n) * pow(datum.n / E, datum.n)",
"as": "stirling"
},
{"fold": ["factorial", "stirling"]}
],
"mark": "point",
"encoding": {
"x": {"field": "n", "type": "quantitative"},
"y": {"field": "value", "type": "quantitative", "scale": {"type": "log"}},
"color": {"field": "key", "type": "nominal"}
}
}

Related

Create array for x axis with known min, max and step

I am trying to create values for x axis for each y data point and I can't create a formula.
This is what I am trying to achieve:
Here is my code:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data":{
"values":[
{
"days": {
"min" : 0,
"max" : 10,
"step" : 2,
"count" : [0.2,0.6,0.4,0.3,0.1]
}}
]},
"transform": [
{"calculate": "datum.days.count", "as": "y"},
{"flatten": ["y"]},
{"calculate": "datum.max/datum.step", "as": "x"}
],
"mark": "line",
"encoding": {
"x": { "scale": {"type": "linear", "domain":[0,10], "exponent": 2},"field": "x",
"type": "quantitative"
},
"y": {
"field": "y",
"type": "quantitative"
}
}}
Where count is on y axis and I need generate x axis values for each y point using "min","max" and "step" data (something like "x":[1,2,3,4,5]).
I tried scale function but it didn't work.
I need the x axis to have labels 0,2,4,6,8,10 - because my data max is 10 and step is 2 (10/2).
I'm not sure I fully understand what you're trying to achieve but does the following help?
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {
"values": [
{
"days": {
"min": 0,
"max": 10,
"step": 2,
"count": [0.2, 0.6, 0.4, 0.3, 0.1]
}
}
]
},
"transform": [
{"calculate": "datum.days.count", "as": "y"},
{"flatten": ["y"]},
{
"window": [{"op": "count", "field": "count", "as": "i"}],
"frame": [null, 0]
},
{"calculate": "(datum.i-1)*2", "as": "x"}
],
"mark": "line",
"encoding": {
"x": {
"scale": {"type": "linear", "domain": [0, 10], "exponent": 2},
"field": "x",
"type": "quantitative"
},
"y": {"field": "y", "type": "quantitative"}
}
}

Vega-Lite: How do I include image marks in a doughnut chart?

I would like to have image marks surrounding my doughnut chart instead of texts. The example for image marks use x and y for its coordinate. How should I adjust that for a doughnut chart where we work with radius and theta?
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"description": "A simple pie chart with labels.",
"data": {
"values": [
{"category": "a", "value": 4, "image": url},
{"category": "b", "value": 6, "image": url},
{"category": "c", "value": 10, "image": url},
{"category": "d", "value": 3, "image": url},
{"category": "e", "value": 7, "image": url},
{"category": "f", "value": 8, "image": url}
]
},
"encoding": {
"theta": {"field": "value", "type": "quantitative", "stack": true},
"color": {"field": "category", "type": "nominal", "legend": null}
},
"layer": [{
"mark": {"type": "arc", "outerRadius": 80}
}, {
"mark": {"type": "text", "radius": 90},
"encoding": {
"text": {"field": "category", "type": "nominal"}
}
}],
"view": {"stroke": null}
}
New vega version:
Open the Chart in the Vega Editor
After some trials and reading through the doc, it seems Image Mark cannot be positioned by theta encoding, but the example shows that x and y encodings are supported.
Therefore, I worked out this positioning via simple trigonometry and an extra layer to place the images in the doughnut:
{
"transform": [
{"joinaggregate": [{"op":"sum", "field": "value", "as": "total"}]},
{
"window": [{"op": "sum", "field": "value", "as": "cum"}],
"frame": [null, 0]
},
{"calculate": "cos(2*PI*(datum.cum-datum.value/2)/datum.total)", "as": "y"},
{"calculate": "sin(2*PI*(datum.cum-datum.value/2)/datum.total)", "as": "x"}
],
"mark": {"type": "image", "width": 20, "height": 20},
"encoding": {
"url": {"field": "image"},
"x": {"field": "x", "type": "quantitative", "scale": {"domain": [-2, 2]}, "axis": null},
"y": {"field": "y", "type": "quantitative", "scale": {"domain": [-2, 2]}, "axis": null}
}
}
Yet another Vega Editor
As the order is messed up by the color encoding mentioned in comments below, a new window transform is added to generate an extra ordering field which is provided to color field
Renewed Vega Editor
3 changes were made: (2021-07-16)
Using cos in the calculate of y
Using sin in the calculate of x
Messing up the data value to check if working
Old & Wrong Vega Editor

How can I rename legend labels in Vega Lite?

I've been trying for the last few days to rename the legend labels on my vega-lite chart.
Normally these labels match their respective data field names. I have a case where I'd like to give them a more descriptive name, but without renaming the original data names.
A simplified example:
vl.markLine()
.data([
{ t:1, v:5, c:'a' }, { t:2, v:3, c:'a' }, { t:3, v:7, c:'a' },
{ t:1, v:6, c:'b' }, { t:2, v:8, c:'b' }, { t:3, v:2, c:'b' }
])
.encode(
vl.x().fieldQ('t'),
vl.y().fieldQ('v'),
vl.color().fieldN('c')
)
.render()
How can I rename 'a' and 'b' in the legend, without changing the original data?
(I'm using the javascript API but will be happy with a JSON solution too).
I'd like to find a way that doesn't involve just copying and mapping all the data to another variable name just for the sake of the legend labels.
I've yet to find a way of manually entering the legend labels as something like "labels": ['long name for a', 'long name for b'].
There are two possible approaches to this. You can either use a calculate transform to modify the values within the data stream (open in editor):
{
"data": {
"values": [
{"t": 1, "v": 5, "c": "a"},
{"t": 2, "v": 3, "c": "a"},
{"t": 3, "v": 7, "c": "a"},
{"t": 1, "v": 6, "c": "b"},
{"t": 2, "v": 8, "c": "b"},
{"t": 3, "v": 2, "c": "b"}
]
},
"transform": [
{"calculate": "{'a': 'Label A', 'b': 'Label B'}[datum.c]", "as": "c"}
],
"mark": "line",
"encoding": {
"x": {"field": "t", "type": "quantitative"},
"y": {"field": "v", "type": "quantitative"},
"color": {"field": "c", "type": "nominal"}
}
}
or you can use a labelExpr to define new labels, referencing the original label as datum.label (open in editor):
{
"data": {
"values": [
{"t": 1, "v": 5, "c": "a"},
{"t": 2, "v": 3, "c": "a"},
{"t": 3, "v": 7, "c": "a"},
{"t": 1, "v": 6, "c": "b"},
{"t": 2, "v": 8, "c": "b"},
{"t": 3, "v": 2, "c": "b"}
]
},
"mark": "line",
"encoding": {
"x": {"field": "t", "type": "quantitative"},
"y": {"field": "v", "type": "quantitative"},
"color": {
"field": "c",
"type": "nominal",
"legend": {"labelExpr": "{'a': 'Label A', 'b': 'Label B'}[datum.label]"}
}
}
}
The benefit of the former approach is that it changes the values everywhere (including in, e.g. tooltips). The benefit of the latter approach is that it's more efficient, because it only does the transform once per unique label.

How do I create a legend for a layered line plot

Basically what I have is a line graph that is layered from several line graphs. Since each graph has only one line, there is no legend automatically generated, so what is the best way to get a legend for the chart? I have been considering trying to transform my dataset. This is weekly deaths total from the cdc from 2019-June 2020. The way the csv is arranged is each date for each state has a record with each disease type as it's own column and integers as the column values. So there isn't one field to chart, there are many, hence the layering. Any insights into how to solve this problem would be much appreciated! Here is my work so far:
https://observablehq.com/#justin-krohn/covid-excess-deaths
You can create a legend for a layered chart by setting the color encoding for each layer to a datum specifying what label you would like it to have. For example (vega editor):
{
"data": {
"values": [
{"x": 1, "y1": 1, "y2": 2},
{"x": 2, "y1": 3, "y2": 1},
{"x": 3, "y1": 2, "y2": 4},
{"x": 4, "y1": 4, "y2": 3},
{"x": 5, "y1": 3, "y2": 5}
]
},
"encoding": {"x": {"field": "x", "type": "quantitative"}},
"layer": [
{
"mark": "line",
"encoding": {
"y": {"field": "y1", "type": "quantitative"},
"color": {"datum": "y1"}
}
},
{
"mark": "line",
"encoding": {
"y": {"field": "y2", "type": "quantitative"},
"color": {"datum": "y2"}
}
}
]
}
Alternatively, you can use a Fold Transform to pivot your data so that instead of manual layers, you can plot the multiple lines with a simple color encoding. For example (vega editor):
{
"data": {
"values": [
{"x": 1, "y1": 1, "y2": 2},
{"x": 2, "y1": 3, "y2": 1},
{"x": 3, "y1": 2, "y2": 4},
{"x": 4, "y1": 4, "y2": 3},
{"x": 5, "y1": 3, "y2": 5}
]
},
"transform": [{"fold": ["y1", "y2"], "as": ["name", "y"]}],
"mark": "line",
"encoding": {
"x": {"field": "x", "type": "quantitative"},
"y": {"field": "y", "type": "quantitative"},
"color": {"field": "name", "type": "nominal"}
}
}

How to create scatter matrix in Vega-Lite where row/columns are identified by value (not column name)

Can I create a scatter matrix like https://vega.github.io/editor/#/examples/vega-lite/interactive_splom but where the column/rows are created from categorical values and not column names?
The following example tries to make a scatter matrix based on 3 values from a Gaussian bivariate distribution. But it only displays one row
{
"mark": "point",
"encoding": {
"x": {
"field": "value"
},
"y": {
"field": "value"
},
"row": {
"field": "coordinate"
},
"column": {
"field": "coordinate"
}
},
"data": {
"values": [
{
"value": -0.5600273,
"coordinate": 1
},
{
"value": -0.31220084,
"coordinate": 2
},
{
"value": -0.37932342,
"coordinate": 1
},
{
"value": -0.799277,
"coordinate": 2
},
{
"value": -1.8596855,
"coordinate": 1
},
{
"value": -3.100046,
"coordinate": 2
}
]
}
}
.
It's not clear from your data what you expect to appear in the other two panels. Which data rows would be associated with each other when plotting coordinate 1 vs coordinate 2?
I'm going to assume you can modify your data such that it has a third field that specifies which points go together; if that's the case, you can use a pivot transform to turn your values into columns, and then use the repeat operator as in the example you linked to (vega editor):
{
"data": {
"values": [
{"value": -0.5600273, "coordinate": 1, "point": 1},
{"value": -0.31220084, "coordinate": 2, "point": 1},
{"value": -0.37932342, "coordinate": 1, "point": 2},
{"value": -0.799277, "coordinate": 2, "point": 2},
{"value": -1.8596855, "coordinate": 1, "point": 3},
{"value": -3.100046, "coordinate": 2, "point":3}
]
},
"transform": [
{"pivot": "coordinate", "value": "value", "groupby": ["point"]}
],
"repeat": {"row": ["1", "2"], "column": ["1", "2"]},
"spec": {
"mark": "point",
"encoding": {
"x": {"field": {"repeat": "column"}, "type": "quantitative"},
"y": {"field": {"repeat": "row"}, "type": "quantitative"}
}
}
}
If you can't add that to your dataset and you just want adjacent values to be treated as part of the same point, you can specify this using a series of transforms to construct the "point" field and recover the same chart (vega editor):
{
"data": {
"values": [
{"value": -0.5600273, "coordinate": 1},
{"value": -0.31220084, "coordinate": 2},
{"value": -0.37932342, "coordinate": 1},
{"value": -0.799277, "coordinate": 2},
{"value": -1.8596855, "coordinate": 1},
{"value": -3.100046, "coordinate": 2}
]
},
"transform": [
{"window": [{"op": "row_number", "as": "point"}]},
{"calculate": "ceil(datum.point / 2)", "as": "point"},
{"pivot": "coordinate", "value": "value", "groupby": ["point"]}
],
"repeat": {"row": ["1", "2"], "column": ["1", "2"]},
"spec": {
"mark": "point",
"encoding": {
"x": {"field": {"repeat": "column"}, "type": "quantitative"},
"y": {"field": {"repeat": "row"}, "type": "quantitative"}
}
}
}