Data-layout, layers and legends in vega-lite

Data-layout, layers and legends in vega-lite - vega-lite

I have a very simple situation, and I believe my solution is too complicated and there's a good chance I'm missing something. Say I have measures of time, positions (x,y,z), angles (roll, pitch, yaw) and speed. I want a simple visualization like I currently have where the speed plot can be used as "brush" to zoom dynamically into the first two graphs.
A small example of my plot in the vega-editor can be found here.
1. Can I use a different data-layout?
Right now, each point is an object
{
"pitch": -0.006149084584096612,
"roll": 0.0007914191778949736,
"speed": 4.747345444390669,
"time": 0.519741,
"x": -0.01731604791076788,
"y": 0.020068310429957575,
"yaw": 0.0038123065311157552,
"z": -0.016005977140476142
}
With many data-points, this is a lot of memory just for repeating column names. Much better would be to have the data in the form
{
"time": [t1, t2, t3, ...],
"x": [...],
...
}
but vega's "row first" representation doesn't allow for that. I already asked on Slack where someone suggested to use Fold and Pivot, but I'm not sure how to implement this. Is it possible to use data that are stored as arrays? I'm creating the data myself from a C++ program and I'm free to export a different representation easily. The only question is how do I make vega-lite understand?
2. Layers and legends.
If I had time-series data with an "indicator column", I could create plots that combine several graphs easily. Unfortunately, I don't have that and the only solution I found is to use layers. With this, I have to set the colours for different graphs explicitly (instead of using schemes) and I don't get a legend.
If layers are really to only option here to combine, e.g. x,y,z into one "Movement" plot, how can I get a legend for this plot that tells me red -> x, green -> y, and blue -> z?

The answer is "yes" to both of your questions.
The key to the first question is to pass the data in a dense format and use the Flatten Transform to expand it.
The key to the second question is to use a Fold Transform to turn multiple columns into an indicator plus a value.
Here is a demonstration of this for a single chart (open in editor):
{
"data": {
"values": [
{
"time": [1, 2, 3, 4],
"x": [5, 4, 5, 2],
"y": [2, 3, 2, 4],
"z": [1, 2, 1, 0]
}
]
},
"transform": [
{"flatten": ["time", "x", "y", "z"]},
{"fold": ["x", "y", "z"], "as": ["column", "value"]}
],
"mark": "line",
"encoding": {
"x": {"field": "time", "type": "quantitative"},
"y": {"field": "value", "type": "quantitative"},
"color": {"field": "column", "type": "nominal"}
}
}

Related

Vega-Lite Calculated Scale domainMax

I'm trying to calculate a value for domainMax on the Y-axis scale. I tried the following example where I want the Y-axis domainMax to be one greater than the maximum value in the dataset field named "value". The example produces the error 'Unrecognized signal name: "domMax"'. How can I get it to work?
{
"data": {
"values": [
{"date": "2021-03-01T00:00:00", "value": 1},
{"date": "2021-04-01T00:00:00", "value": 3},
{"date": "2021-05-01T00:00:00", "value": 2}
]
},
"transform": [
{ "calculate": "max(datum.value)+1","as": "domMax"}
],
"mark": "line",
"encoding": {
"x": {
"field": "date",
"type": "temporal"
},
"y": {"field": "value", "type": "quantitative",
"scale": {"domainMax": {"expr": "domMax"}}
}
}
}

This transform
"transform": [
{ "calculate": "max(datum.value)+1","as": "domMax"}
]
adds a new column to your data set - it does not create a new signal. You can check that in the editor. Go to the DataViewer tab and select data_0 from the drop down. Can you see the new domMax column?
Signals are a different thing entirely - have a look here in the documentation. Note that the link points to Vega, not Vega-Lite. (Vega-Lite specifications are compiled to Vega.)
Vega-Lite does not let you declare signals; you declare parameters instead. Here is another example using the domMax parameter. Vega-Lite parameters are translated to Vega signals.
It looks like you are trying to derive the value of your parameter/signal from the data. I am not sure you can do that in Vega-Lite.
On the other hand it's very easy in Vega. For example you could use the extent transform:
https://vega.github.io/vega/docs/transforms/extent/
Side comment - while Vega specifications are more verbose you can sometimes find their primitives simpler and a good way to understand how the visualisation works. (You can see compiled Vega in the editor.)

I tried to get a custom domain based on the data but hit the same limitations as you did.
In my case, I update the data from the outside a bit like the streaming example. I compute the domain from the outside and modify them in the visualization with params. This is quite easy as vega-lite params are exposed as vega signals.
This is the gist of the layout:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"params": [
{
"name": "lowBound",
"value": -10
},
{
"name": "highBound",
"value": 100
}
],
../..
"vconcat": [
{
"name": "detailed",
../..
"layer": [
{
../..
"mark": "line",
"encoding": {
"y": {
"field": "value",
"title": "Temperature",
"type": "quantitative",
"scale": {
"domainMin": {
"expr": "lowBound"
},
"domainMax": {
"expr": "highBound"
}
}
},
...
The lowBound and highBound are dynamically changed through vega signals. I change them with the regular JS API.

You can add a param to pan and zoom in case your hard coded values are less than ideal.
"params": [{"name": "grid", "select": "interval", "bind": "scales"}],
Open the Chart in the Vega Editor

How to draw vertical lines based on a different quantity than that for the x channel in charts by row channel?

I'd like to have multiple time series drawn by row channel ("field": "PLATFORM"), x channel: ("field": "estimating-date-time"), and y channel ("field": "eta-variance").
Besides the lines of time series, I'd like to draw a vertical line at x = arrvial-time which is another field, conditioned by the value of "PLATFORM".
The following is a working example of the charts except the desirable vertical line in each chart:
vega-lite for multiple time series
Below is the desired effect with manual illustration:
My question is how to add the vertical line for each chart to the specifications?
The challenge to me is that the field "arrival-time" from which the value used to draw the vertical line is not the same as the chart's x channel "estimating-data-time". I've found examples of drawing such a line using a value related to the same x channel.

You can do this by nesting a layer specification within a facet operator; something like this (open in editor):
{
"facet": {"row": {"field": "PLATFORM"}},
"spec": {
"height": 80,
"width": 300,
"layer": [
{
"mark": "line",
"encoding": {
"x": {"field": "estimating-date-time", "type": "temporal"},
"y": {"field": "ETA-variance", "type": "quantitative"}
}
},
{
"mark": "rule",
"encoding": {"x": {"field": "arrival-time", "type": "temporal"}}
}
]
},
"data": {...}
}

Vega-lite default bar width strange

I'm seeing the following oddly styled chart. I understand I can explicitly change the padding etc., but the default vega-lite layout is usually pretty good. I'm confused what I'm doing that's leading to this sub-normal behavior. Thanks! Here is the code in the vega-lite editor
I understand that I can also change x's type to ordinal to make the styling better, though I'm not sure I understand still why it is the difference I see. I need the type to be quantitative so I get the min/max brush bound, as opposed to the set.
Also I actually do not even know how to manually set the bar width after reading the documentation here https://vega.github.io/vega-lite/docs/scale.html. If anyone might have a working example that would be great.
Thanks.

As #marcprux mentioned, there is pre-binned support so you don't have to repeat the bin transform here. However, currently the prebinned support requires both bin_start and bin_end.
For now you could modify the spec to derive a new bin_end field and use it with x2.
{
"data": ...
"transform": [{
"calculate": "datum.ShareWomen_bin+0.1",
"as": "ShareWomen_bin_end"
}],
"mark": "bar",
"encoding": {
"x": {"bin": {"binned": true, "step": 0.1}, "field": "ShareWomen_bin", "type": "quantitative", "title": "ShareWomen_bin"},
"x2": {"field": "ShareWomen_bin_end"},
"y": {"field": "count", "type": "quantitative"}
}
}
like this spec.
I can see that we shouldn't require deriving bin_end and thus have created an issue to track this feature request: https://github.com/vega/vega-lite/issues/6086.
Btw, the quantitative scale only affects the bar position.
To set the bar size directly, you can use size property in a mark definition:
mark: {type: "bar", size: 5}

Since you declare "x" as a quantitative field, there's no assumption that the points along the axis are evenly distributed. E.g., you could add in some data points in between the others:
{"ShareWomen_bin": 0.83, "count": 40, "is_overview": true},
{"ShareWomen_bin": 0.87, "count": 70, "is_overview": true},
and you would see them rendered in between the other bars:
As you mention, you can specify that the bars should be encoded as ordinal values. Another solution is to leave it as quantitative, but specify that it is binned, in which case the bars will also be rendered as if they were ordinal:
"x": {"field": "ShareWomen_bin", "type": "quantitative", "bin": true},
Since it appears that your data is already binned, you should read about how vega-lite supports pre-binned data: https://vega.github.io/vega-lite/docs/bin.html#binned

How to build pre-calculated histogram in Vega-Lite?

VegaLite can bin and aggregate himself. But I have complex calculation and build histogram separately.
The resulting data is following
bins = [1, 2, 3, 4] // 4 edges
// |1-2|2-3|3-4| // 3 bars
counts = [1, 2, 1]
The problem is - how to properly display bar edges - there are 3 bars, but 4 edges.

You can specify bin start and endpoints using the x and x2 encodings. It's also helpful to specify bin='binned' which tells Vega-Lite that the data is pre-binned & triggers the same display defaults used when a bin operation appears in the specification. For example (editor link):
{
"data": {
"values": [
{"bin1": 1, "bin2": 2, "counts": 1},
{"bin1": 2, "bin2": 3, "counts": 2},
{"bin1": 3, "bin2": 4, "counts": 1}
]
},
"mark": "bar",
"encoding": {
"x": {"field": "bin1", "type": "quantitative", "bin": "binned"},
"x2": {"field": "bin2"},
"y": {"field": "counts", "type": "quantitative"}
}
}
For more information, see Using Vega-Lite with Binned data.

Is it possible to use facets and repeat operator for histograms?

I want to combine facet operators (row, column) along with the repeat operator to create 'small multiple' charts that display different data variables. This works for some types of charts (e.g. simple bar charts) but not others (i.e. histograms). For example, below I have modified the 'Horizontally repeated charts' example (https://vega.github.io/vega-lite/examples/repeat_histogram.html).
{
"$schema": "https://vega.github.io/schema/vega-lite/v3.json",
"repeat": {"column": ["Horsepower","Miles_per_Gallon", "Acceleration"]},
"spec": {
"data": {"url": "data/cars.json"},
"mark": "bar",
"encoding": {
"row":{"field":"Origin", "type":"nominal"},
"x": {
"field": {"repeat": "column"},
"bin": true,
"type": "quantitative"
},
"y": {"aggregate": "count","type": "quantitative"}
}
}
}
I expect three rows, with each row showing histograms of cars from different countries. However, this code results in the error :
'Error: Undefined data set name: "scale_child_Miles_per_Gallon_child_main"'
I'm reasonably sure that this worked with Vega-Lite v2. Is there some reason that the aggregate / bin operator can't work with a combination of facets and repeats?

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008