Vega: transform array of values into accepted format - vega-lite

I'm using Vega for data visualization. Vega requires data which is shaped as an array of objects:
const accepted = [
{ num: 1, char: 'A' },
{ num: 2, char: 'B' },
{ num: 3, char: 'C' },
]
However, for sending data (e.g. via JSON), it is not efficient as the keys num and char are repeated multiple times. A more efficient way is sending the data as follows:
const efficient = {
num: [1, 2, 3 ],
char: ['A', 'B','C']
}
Right now I'm manually transforming the efficient structure above into the accepted structure before passing it into Vega. Is there any Vega transforms that I can use so taht I can pass efficient without having to transform it into accepted?

Use the flatten transform which should do what you need.
https://vega.github.io/vega/docs/transforms/flatten/
https://vega.github.io/vega-lite/docs/flatten.html

Related

Karate : Matching data between nested array

Is there a way to match the response data from API which contain a nested array for a key where key-value pair are in different order inside the nested array in karate?
Scenario: Verify original data contains expected data
def original = [{ a:1, b: [{c:2},{d:3}]}]
def expected = [{ b: [{d:3},{c:2}], a:1 }]
Using contains deep method will solve the issue but I am expecting original data from a API response so in some point of time if one more field gets added to the API response, then my scenario will still get passed
Don't try to do everything in one-line. Split your matches, and there is more explanation in the docs:
* def inner = [{ c: 2 }, { d: 3 }]
* def response = [{ a: 1, b: [{ d: 3 }, { c: 2 }]}]
* match each response contains { b: '#(^^inner)' }
* match each response == { a: 1, b: '#(^^inner)' }
* match response[0] == { a: 1, b: '#(^^inner)' }
* match response == [{ a: 1, b: '#(^^inner)' }]
You don't need to use all of these, I'm showing the possible options.

How to access data object from a nested array

I am using ObservableHQ and vega lite API to do data visualizations and have faced a problem I can't figure out.
The problem is that, I would like to access data object from the following data structure,
Array
Array
Array
Item
Item
Array
As you can see in my bad drawing, I have a multidimensional array and would like to access a specific array from the main array. How can I do that using Vegalite API?
vl.markCircle({
thickness: 4,
bandSize: 2
})
.data(diff[0])
.encode(
vl.x().fieldQ("mins").scale({ domain: [-60, 60] }),
vl.color().fieldN('type').scale({ range: ['#636363', '#f03b20'] }),
)
.config({bandSize: 10})
.width(600)
.height(40)
.render()
Thank you,
Based on your comments, I’m assuming that you’re trying to automatically chart all of the nested arrays (separately), not just one of them. And based on your chart code, I’m assuming that your data looks sorta like this:
const diff = [
[
{ mins: 38, type: "Type B" },
{ mins: 30, type: "Type B" },
{ mins: 28, type: "Type A" },
…
],
[
{ mins: 20, type: "Type B" },
{ mins: 17, type: "Type A" },
{ mins: 19, type: "Type A" },
…
],
…
];
First, flatten all the arrays into one big array, and record which array each came from with a new array property on the item object, with flatMap. If each child array represents, say, a different city, or a different year, or a different person collecting the data, you could replace array: i with something more meaningful about the data.
const flat = diff.flatMap((arr, i) => arr.map((d) => ({ ...d, array: i })));
Then use Vega-Lite’s “faceting” (documentation, Observable tutorial and examples) to make split the chart into sections, one for each value of array: i, with shared scales. This just adds one line to your example:
vl
.markCircle({
thickness: 4,
bandSize: 2
})
.data(flat)
.encode(
vl.row().fieldN("array"), // this line is new
vl
.x()
.fieldQ("mins")
.scale({ domain: [-60, 60] }),
vl
.color()
.fieldN("type")
.scale({ range: ["#636363", "#f03b20"] })
)
.config({ bandSize: 10 })
.width(600)
.height(40)
.render()
Here’s an Observable notebook with examples of this working. As I show there at the bottom, you can also map over your array to make a totally separate chart for each nested array.

What JSON format does STRIP_OUTER_ARRAY support?

I have a file composed of a single array containing multiple records.
{
"Client": [
{
"ClientNo": 1,
"ClientName": "Alpha",
"ClientBusiness": [
{
"BusinessNo": 1,
"IndustryCode": "12345"
},
{
"BusinessNo": 2,
"IndustryCode": "23456"
}
]
},
{
"ClientNo": 2,
"ClientName": "Bravo",
"ClientBusiness": [
{
"BusinessNo": 1,
"IndustryCode": "34567"
},
{
"BusinessNo": 2,
"IndustryCode": "45678"
}
]
}
]
}
I load it with the following code:
create or replace stage stage.test
url='azure://xxx/xxx'
credentials=(azure_sas_token='xxx');
create table if not exists stage.client (json_data variant not null);
copy into stage.client_test
from #stage.test/client_test.json
file_format = (type = 'JSON' strip_outer_array = true);
Snowflake imports the entire file as one row.
I would like the the COPY INTO command to remove the outer array structure and load the records into separate table rows.
When I load larger files, I hit the size limit for variant and get the error Error parsing JSON: document is too large, max size 16777216 bytes.
If you can import the file into Snowflake, into a single row, then you can use LATERAL FLATTEN on the Clients field to generate one row per element in the array.
Here's a blog post on LATERAL and FLATTEN (or you could look them up in the snowflake docs):
https://support.snowflake.net/s/article/How-To-Lateral-Join-Tutorial
If the format of the file is, as specified, a single object with a single property that contains an array with 500 MB worth of elements in it, then perhaps importing it will still work -- if that works, then LATERAL FLATTEN is exactly what you want. But that form is not particularly great for data processing. You might want to use some text processing script to massage the data if that's needed.
RECOMMENDATION #1:
The problem with your JSON is that it doesn't have an outer array. It has a single outer object containing a property with an inner array.
If you can fix the JSON, that would be the best solution, and then STRIP_OUTER_ARRAY will work as you expected.
You could also try to recompose the JSON (an ugly business) after reading line for line with:
CREATE OR REPLACE TABLE X (CLIENT VARCHAR);
COPY INTO X FROM (SELECT $1 CLIENT FROM #My_Stage/Client.json);
User Response to Recommendation #1:
Thank you. So from what I gather, COPY with STRIP_OUTER_ARRAY can handle a file starting and ending with square brackets, and parse the file as if they were not there.
The real files don't have line breaks, so I can't read the file line by line. I will see if the source system can change the export.
RECOMMENDATION #2:
Also if you would like to see what the JSON parser does, you can experiment using this code, I have parsed JSON on the copy command using similar code. Working with your JSON data in small project can help you shape the Copy command to work as intended.
CREATE OR REPLACE TABLE SAMPLE_JSON
(ID INTEGER,
DATA VARIANT
);
INSERT INTO SAMPLE_JSON(ID,DATA)
SELECT
1,parse_json('{
"Client": [
{
"ClientNo": 1,
"ClientName": "Alpha",
"ClientBusiness": [
{
"BusinessNo": 1,
"IndustryCode": "12345"
},
{
"BusinessNo": 2,
"IndustryCode": "23456"
}
]
},
{
"ClientNo": 2,
"ClientName": "Bravo",
"ClientBusiness": [
{
"BusinessNo": 1,
"IndustryCode": "34567"
},
{
"BusinessNo": 2,
"IndustryCode": "45678"
}
]
}
]
}');
SELECT
C.value:ClientNo AS ClientNo
,C.value:ClientName::STRING AS ClientName
,ClientBusiness.value:BusinessNo::Integer AS BusinessNo
,ClientBusiness.value:IndustryCode::Integer AS IndustryCode
from SAMPLE_JSON f
,table(flatten( f.DATA,'Client' )) C
,table(flatten(c.value:ClientBusiness,'')) ClientBusiness;
User Response to Recommendation #2:
Thank you for the parse_json example!
Trouble is, the real files are sometimes 500 MB, so the parse_json function chokes.
Follow-up on Recommendation #2:
The JSON needs to be in the NDJSON http://ndjson.org/ format. Otherwise the JSON will be impossible to parse because of the potential for large files.
Hope the above helps other running into similar questions!

How do I parse a JSON file to bind data to a d3 choropleth map

I'm trying to take data in from a JSON file and link it to my geoJSON file to create a choropleth map with the county colours bound to the "amount" value but also I would like a corresponding "comment" value to be bound to a div for when I mouseover that county.
My code at http://bl.ocks.org/eoiny/6244102 will work to generate a choropleth map when my counties.json data is in the form:
"Carlow":3,"Cavan":4,"Clare":5,"Cork":3,
But things get tricky when I try to use the following form:
{
"id":"Carlow",
"amount":11,
"comment":"The figures for Carlow show a something." },
I can't get my head around how join the "id": "Carlow" from counties.json and "id": "Carlow" path created from ireland.json, while at the same time to have access to the other values in counties.json i.e. "amount" and "comment".
Apologies for my inarticulate question but if anyone could point me to an example or reference I could look up that would be great.
I would preprocess the data when it's loaded to make lookup easier in your quantize function. Basically, replace this: data = json; with this:
data = json.reduce(function(result, county) {
result[county.id] = county;
return result;
}, {});
and then in your quantize function, you get at the amounts like this:
function quantize(d) {
return "q" + Math.min(8, ~~(data[d.id].amount * 9 / 12)) + "-9";
}
What the preprocessing does is turn this array (easily accessed by index):
[{id: 'xyz', ...}, {id: 'pdq', ...}, ...]
into this object with county keys (easily accessed by county id):
{'xyz': {id: 'xyz', ...}, 'pdq': {id: 'pdq', ...}, ...}
Here's the working gist: http://bl.ocks.org/rwaldin/6244803

What is "compressed JSON"?

I see a lot of references to "compressed JSON" when it comes to different serialization formats. What exactly is it? Is it just gzipped JSON or something else?
Compressed JSON removes the key:value pair of json's encoding to store keys and values in seperate parallel arrays:
// uncompressed
JSON = {
data : [
{ field1 : 'data1', field2 : 'data2', field3 : 'data3' },
{ field1 : 'data4', field2 : 'data5', field3 : 'data6' },
.....
]
};
//compressed
JSON = {
data : [ 'data1','data2','data3','data4','data5','data6' ],
keys : [ 'field1', 'field2', 'field3' ]
};
This method of usage i found here
Content from link (http://www.nwhite.net/?p=242)
rarely find myself in a place where I am writing javascript applications that use AJAX in its pure form. I have long abandoned the ‘X’ and replaced it with ‘J’ (JSON). When working with Javascript, it just makes sense to return JSON. Smaller footprint, easier parsing and an easier structure are all advantages I have gained since using JSON.
In a recent project I found myself unhappy with the large size of my result sets. The data I was returning was tabular data, in the form of objects for each row. I was returning a result set of 50, with 19 fields each. What I realized is if I augment my result set I could get a form of compression.
// uncompressed
JSON = {
data : [
{ field1 : 'data1', field2 : 'data2', field3 : 'data3' },
{ field1 : 'data4', field2 : 'data5', field3 : 'data6' },
.....
]
};
//compressed
JSON = {
data : [ 'data1','data2','data3','data4','data5','data6' ],
keys : [ 'field1', 'field2', 'field3' ]
};
I merged all my values into a single array and store all my fields in a separate array. Returning a key value pair for each result cost me 8800 byte (8.6kb). Ripping the fields out and putting them in a separate array cost me 186 bytes. Total savings 8.4kb.
Now I have a much more compressed JSON file, but the structure is different and now harder to work with. So I implement a solution in Mootools to make the decompression transparent.
Request.JSON.extend({
options : {
inflate : []
}
});
Request.JSON.implement({
success : function(text){
this.response.json = JSON.decode(text, this.options.secure);
if(this.options.inflate.length){
this.options.inflate.each(function(rule){
var ret = ($defined(rule.store)) ? this.response.json[rule.store] : this.response.json[rule.data];
ret = this.expandData(this.response.json[rule.data], this.response.json[rule.keys]);
},this);
}
this.onSuccess(this.response.json, text);
},
expandData : function(data,keys){
var arr = [];
var len = data.length; var klen = keys.length;
var start = 0; var stop = klen;
while(stop < len){
arr.push( data.slice(start,stop).associate(keys) );
start = stop; stop += klen;
}
return arr;
}
});
Request.JSON now has an inflate option. You can inflate multiple segments of your JSON object if you so desire.
Usage:
new Request.JSON({
url : 'url',
inflate : [{ 'keys' : 'fields', 'data' : 'data' }]
onComplete : function(json){}
});
Pass as many inflate objects as you like to the option inflate array. It has an optional property called ’store’ If set the inflated data set will be stored in that key instead.
The ‘keys’ and ‘fields’ expect strings to match a location in the root of your JSON object.
Based in Paniyar's answer, we can convert a List of Objects in "compressed" Json format using C# like this:
var JsonString = serializer.Serialize(
new
{
cols = new[] { "field1", "field2", "field3"},
items = data.Select(x => new object[] {x.field1, x.field2, x.field3})
});
I used an array of object because the fields can be int, bool, string...
More Reduction:
If the field is repeated very often and it is a string type, you can get compressed a little be more if you add a distinct list of that field... for instance, a field name job position, city, etc are excellent candidate for this. You can add a distinct list of this items and in each item change the value for a reference number. That will make your Json more lite.
Compressed:
[["KeyA", "KeyB", "KeyC", "KeyD", "KeyE", "KeyF"],
["ValA1", "ValB1", "ValC1", "ValD1", "ValE1", "ValF1"],
["ValA2", "ValB2", "ValC2", "ValD2", "ValE2", "ValF2"],
["ValA3", "ValB3", "ValC3", "ValD3", "ValE3", "ValF3"],
["ValA4", "ValB4", "ValC4", "ValD4", "ValE4", "ValF4"]]
Uncompressed:
[{KeyA: "ValA1", KeyB: "ValB1", KeyC: "ValC1", KeyD: "ValD1", KeyE: "ValE1", KeyF: "ValF1"},
{KeyA: "ValA2", KeyB: "ValB2", KeyC: "ValC2", KeyD: "ValD2", KeyE: "ValE2", KeyF: "ValF2"},
{KeyA: "ValA3", KeyB: "ValB3", KeyC: "ValC3", KeyD: "ValD3", KeyE: "ValE3", KeyF: "ValF3"},
{KeyA: "ValA4", KeyB: "ValB4", KeyC: "ValC4", KeyD: "ValD4", KeyE: "ValE4", KeyF: "ValF4"}]
The most likely answer is that it really is just gzipped JSON. There is no other standard meaning to this phrase.
Re-organizing a homogenous array of JSON objects into a pair of arrays is a very useful technique to make the payload smaller and to speed up encoding and decoding, it is not commonly called "compressed JSON". I haven't run across it ever in open source or any open API, but we use this technique internally and call it "jsontable".