Labeled bar chart with fill encoding does not respect sorting - vega-lite

I am trying to make a sorted bar chart with labels and fill encoding. But when I add the the fill encoding it breaks the sort. Via the github issues it seems like there are ways to get around this, but I can seem find a solution.
Given the spec without using the fill encoding the sorting works as expected.
{
"$schema": "https://vega.github.io/schema/vega-lite/v3.json",
"data": {
"values": [
{
"a": "A",
"b": 28,
"color": "black"
},
{
"a": "B",
"b": 55,
"color": "grey"
},
{
"a": "C",
"b": 43,
"color": "red"
}
]
},
"encoding": {
"y": {
"field": "a",
"type": "ordinal",
"sort": {
"encoding": "x",
"order": "descending"
}
},
"x": {
"field": "b",
"type": "quantitative"
}
},
"layer": [
{
"mark": "bar"
},
{
"mark": {
"type": "text",
"align": "left",
"baseline": "middle",
"dx": 3
},
"encoding": {
"text": {
"field": "b",
"type": "quantitative"
}
}
}
]
}
When you add the fill encoding to the top level encoding object it breaks the sort with the following warning
"fill": {
"field": "color",
"type": "ordinal",
"scale": null
}
[Warning] Domains that should be unioned has conflicting sort properties. Sort will be set to true.
Full vega-editor here
Is there a work around for this.
It appear to relate to the these issues (maybe) #2536, #5408

Yep, the underlying issue is https://github.com/vega/vega-lite/issues/5048. In this particular case, adding color to once layer adds a stack transform to one part of the dataflow but not the other so we cannot merge it. This is a great test case. Can you add this example to a new github issue so we can try to resolve it?
You can manually fix this example by disabling stacking the x encoding.
"stack": null
See this spec.

Related

Force vega-lite to show label when number is 0

I'm still very much a beginner in vega-lite but I'm trying to create a stacked bar chart with different sales channels. Sometimes a sales channel has a 0 and doesn't show up, how can I still show the label?
{
"layer": [
{
"mark": {
"type": "bar",
"cornerRadius": 50,
"color": "#90C290",
"tooltip": true
},
"encoding": {
"x": {
"field": "Number of customers"
}
}
},
{
"mark": {
"type": "text",
"tooltip": true,
"align": "left",
"baseline": "middle",
"x": 10,
"color": "white"
},
"encoding": {
"text": {
"field": "Number of customers",
"type": "text"
}
}
}
],
"encoding": {
"y": {
"field": "Sales channel",
"type": "nominal",
"sort": "descending",
"title": null
},
"x": {
"type": "quantitative",
"title": null,
"axis": null
}
}
}
I tried the code above and looked through documentation but couldn't exactly find what I was looking for
I added sample data to your spec, and there are a few changes I would make.
Around line 29 you have "type": "text" which should be "type": "quantitative".
I think your problem is that the text color is white and the background color is white, so the text is there but you can't see it. A simple fix would be to set the text color to black, or change the background color to something other than white (add "background": "lightgray", before "layer").
It's also possible you don't see the channel at all depending on how you are passing data from Power BI. Check the data tab in the Deneb window to make sure the Channel is there.
If the channel is not there, you'll have to adjust something on the Power BI side. A good practice is to put the data in a table in Power BI first so you know what you are sending into Deneb. If you use an aggregation like SUM on the data field in Power BI, nulls will drop out, but zeros should stay. If you use "Don't summarize" then nulls or errors (text in a number field) will pass in as nulls to Deneb, but you may need to add an "aggregate": "sum" to your encoding.
In any case, here's the spec the way I would write it.
{
"data": {"name": "dataset"},
"layer": [
{
"mark": {
"type": "bar",
"cornerRadius": 50,
"color": "#90C290",
"tooltip": true
}
},
{
"mark": {
"type": "text",
"tooltip": true,
"align": "left",
"baseline": "middle",
"dx": 5,
"color": "black"
},
"encoding": {
"text": {
"field": "Number of customers",
"type": "quantitative"
}
}
}
],
"encoding": {
"y": {
"field": "Sales channel",
"type": "nominal",
"sort": "descending",
"title": null
},
"x": {
"field": "Number of customers",
"type": "quantitative",
"title": null,
"axis": null
}
}
}
Link to sample data in Vega Editor

how to layer multiple regression lines without repeating code?

I'm using Vega Lite to chart the times at which I eat my meals, and want a regression (loess) line showing the general time for each. By default, a regression uses the entire dataset and only shows one line; I want three lines, one for each meal (stored in the field extra_data.content).
I've achieved what I want to do my repeating the loess layer three times (screenshot) but am trying to find a solution in which the same layer is written once and repeats itself three times.
Edit after solving! Thanks very much to #jakevdp for the answer! Here is my working code; note that there is both a groupby on the loess and a color channel.
{
"mark": "line",
"transform": [
{
"loess": "hm",
"on": "ymd",
"groupby": ["extra_data.content"]
}
],
"encoding": {
"x": {
"field": "ymd",
"type": "temporal"
},
"y": {
"field": "hm",
"type": "temporal"
},
"color": {
"field": "extra_data.content",
"type": "nominal"
}
}
}
It sounds like you want the groupby argument of the loess transform, along with a color encoding. It might look something like this:
{
"mark": "line",
"transform": [
{
"loess": "hm",
"on": "ymd",
"groupby": ["extradata.content"]
}
],
"encoding": {
"x": {
"field": "ymd",
"type": "temporal"
},
"y": {
"field": "hm",
"type": "temporal"
},
"color": {
"field": "extradata.content",
"type": "nominal"
}
}
}

plot small multiples in vega-lite with gray background

I'm looking for a vega-lite configuration to show small multiples (using the facet operator row or column) with all other data points greyed out in the background.
Here is an example plot using the facet-operator:
facet plot
in vega-editor
"facet": {
"row": {
"field": "group",
"type": "nominal"
}
},
And here is an example using multiple charts with the concat operator and color channel to grey out other groups:
concat-plot
in vega-editor
"color": {"condition": {"test": "datum['group'] != 1", "value": "grey"}, "value": "red"}
I was wondering if there is a combination of transforms and repeat commands to achieve this for an unknown number of groups.
Here is one possible solution:
create an additional layer with new data
"facet": {"row": {"field": "group"}},
"spec": {
"layer": [
{
"data": {"name": "main"},
"mark": "circle",
"encoding": {
"y": {
"field": "y",
"type": "ordinal"
},
"x": {
"field": "x",
"type": "ordinal"
},
"color": {"value": "grey"}
},
"params": []
},
{
"mark": {"type": "circle", "opacity": 1, "color": "red"},
"encoding": {
"y": {
"field": "y",
"type": "ordinal"
},
"x": {
"field": "x",
"type": "ordinal"
}
}
}
]
}
full example in vega editor

Can you have facets & layers in single Vegalite plot?

I am struggling to understand why a layer spec like the below:
"layer": [
{"encoding": {
"facet": {"field": "FEATURE_VALUE"},
"x": {
"field": "DATE",
"type": "temporal"
},
"y": {
"field": "VALUE",
"type": "quantitative"
}
},
"mark": {
"type": "line"
}}
]
Throws an error to the effect of: Cannot read property 'push' of undefined
Meanwhile, the unit spec:
"encoding": {
"facet": {"field": "FEATURE_VALUE"},
"x": {
"field": "DATE",
"type": "temporal"
},
"y": {
"field": "VALUE",
"type": "quantitative"
}
},
"mark": {
"type": "line"
}
}
works just fine.
I can tell this has something to do with: Altair: Can't facet layered plots
However, can't quite seem to answer the principle question: can I have a trellis plot using facet as well as have layers on top of that (for say tooltips, rulers, etc.)
Thank you!
Vega-Lite provides two ways to specify facets: as an encoding (See Facet, Row, and Column Encoding Channels) and as an operator (See Facet Operator).
A layer chart is not allowed to contain a facet encoding, however a facet operator can contain a layer chart (the reason for this is that the semantics of layers containing incompatible facets is unclear).
So, instead of something like this:
"layer": [
{"encoding": {
"facet": {"field": "FEATURE_VALUE"},
"x": {
"field": "DATE",
"type": "temporal"
},
"y": {
"field": "VALUE",
"type": "quantitative"
}
},
"mark": {
"type": "line"
}}
]
you can do something like this:
"facet": {"field": "FEATURE_VALUE"},
"spec": {
"layer": [
{"encoding": {
"x": {
"field": "DATE",
"type": "temporal"
},
"y": {
"field": "VALUE",
"type": "quantitative"
}
},
"mark": {
"type": "line"
}}
]
}

Build bar chart in Vega with different colors for negative and positive values

So I need to build something like this using Vega library:
Now, I'm a super n00b, so please have mercy.
First solution: use some sort of conditional formatting (like in Excel): if bar value < 0, make it red. If bar value > 0, make it green. I could find some conditional syntax for Vega-Lite, which gave me hope, but how do I translate the syntax to normal Vega, I have no clue.
Second, I thought about using some color scheme for ranges, like the ones that have a threshold. But I got completely confused about what scale range type to use, and noticed there is a relationship between scale range type and color schemes, so yeah. Confused.
Then, my colleague suggested this: https://vega.github.io/editor/#/examples/vega-lite/layer_bar_annotations
So in the example, we can see that the value of the bar above the threshold has conditional formatting. So I tried filtering the data to obtain 2 subsets: values_lower_than_0 and values_higher_than_0 use them as sources for marks. But it seems like I don't know how to filter. My data looks like this:
[
{ "date": "2018-12-10", "difference": 20 },
{ "date": "2018-10-21", "difference": -10 }
...
]
So then I apply a transform on it:
...
{
'name': 'values_lower_than_0',
'source': 'temps',
'transform': [{ 'type': 'filter', 'expr': 'datum.difference.Value < 0' }]
}
But when I use the values_lower_than_0 in the marks nothing seems to be happening.
So, I have 2 questions:
Is this the best approach to build such a chart? (Tbh, it seems pretty convoluted to me).
If yes, then how am I supposed to get the two data sets and use them to obtain the correct colors?
A better approach would be the one where transforms on the dataset are not applied.
Taking the example from here
The idea is to set the y2 value as mid of Height. y will then adjust based on whether the value is negative or positive to below midHeight or above midHeight, respectively. Please refer to rect type marks config below.
{
"$schema": "https://vega.github.io/schema/vega/v4.json",
"width": 600,
"height": 360,
"autosize": "fit",
"data": [
{
"name": "table",
"url": "https://uat.gramener.com/vega/chart/data/pos-neg-items.json"
}
],
"scales": [
{
"name": "xscale",
"type": "band",
"domain": {
"data": "table",
"field": "category"
},
"range": "width",
"padding": 0.2,
"round": true
},
{
"name": "yscale",
"domain": {
"data": "table",
"field": "amount"
},
"nice": true,
"range": "height"
}
],
"marks": [
{
"name": "bars",
"type": "rect",
"from": {
"data": "table"
},
"encode": {
"enter": {
"x": {
"scale": "xscale",
"field": "category"
},
"width": {
"scale": "xscale",
"band": 1
},
"y": {
"scale": "yscale",
"field": "amount"
},
"y2": {
"signal": "scale('yscale', 0)"
},
"fill": {
"signal": "datum['amount'] > 0 ? '#5CB38B' : '#E6685C'"
},
"tooltip": {
"signal": "datum"
}
}
}
},
{
"name": "item_score",
"type": "text",
"from": {
"data": "table"
},
"encode": {
"enter": {
"x": {
"scale": "xscale",
"field": "category"
},
"y": {
"scale": "yscale",
"field": "amount"
},
"dy": {
"signal": "datum['amount'] > 0 ? -4 : 14"
},
"dx": {
"signal": "bandwidth('xscale')/2"
},
"align": {
"value": "center"
},
"fill": {
"value": "black"
},
"text": {
"field": "amount"
},
"fontSize": {
"value": 12
}
}
}
},
{
"name": "item_name",
"type": "text",
"from": {
"data": "table"
},
"encode": {
"enter": {
"x": {
"scale": "xscale",
"field": "category"
},
"dx": {
"value": 20
},
"dy": {
"signal": "datum['amount'] > 0 ? height/2 + 14 : height/2 - 6"
},
"align": {
"value": "center"
},
"fill": {
"value": "#000000"
},
"text": {
"field": "category"
},
"fontSize": {
"value": 12
}
}
}
}
]
}
I'm so tired, I make stupid mistakes today! If somebody wants to use the third approach described above, then I was correct, I was just passing the wrong source name to the mark.
Another remark would be: you only need to calculate one subset, for the negative values (for example values_lower_than_0).
Once you do that, you'll have a mark called bars for all bars (like a default, with a green fill). The data source for this mark will be the default data. On top of that mark, you will apply a second mark, called negative_bars (for example), whose source will be values_lower_than_0 and you'll give it a red fill.
My question regarding the best approach still stands.