JSON format with gzip compression - json

My current project sends a lot of data to the browser in JSON via ajax requests.
I've been trying to decide which format I should use. The two I have in mind are
[
"colname1" : "content",
"colname2" : "content",
],
[
"colname1" : "content",
"colname2" : "content",
],
...
and
{
"columns": [
"column name 1",
"column name 2",
],
"rows": [
[
"content",
"content"
],
[
"content",
"content"
]
...
]
}
The first method is better because it is easier to work with. I just have to convert to an object once received. The second will need some post processing to convert it into a format more like the first so it is easier to work with in JavaScript.
The second is better because it is less verbose and therefore takes up less bandwidth and downloads more quickly. Before compression it is usually between 0.75% and 0.85% of the size of the first format.
GZip compression complicates things further. Making the difference in file size nearer 0.85% to 0.95%
Which format should I go with and why?

I'd suggest using RJSON:
RJSON (Recursive JSON) converts any JSON data collection into more compact recursive form. Compressed data is still JSON and can be parsed with JSON.parse. RJSON can compress not only homogeneous collections, but any data sets with free structure.
Example:
JSON:
{
"id": 7,
"tags": ["programming", "javascript"],
"users": [
{"first": "Homer", "last": "Simpson"},
{"first": "Hank", "last": "Hill"},
{"first": "Peter", "last": "Griffin"}
],
"books": [
{"title": "JavaScript", "author": "Flanagan", "year": 2006},
{"title": "Cascading Style Sheets", "author": "Meyer", "year": 2004}
]
}
RJSON:
{
"id": 7,
"tags": ["programming", "javascript"],
"users": [
{"first": "Homer", "last": "Simpson"},
[2, "Hank", "Hill", "Peter", "Griffin"]
],
"books": [
{"title": "JavaScript", "author": "Flanagan", "year": 2006},
[3, "Cascading Style Sheets", "Meyer", 2004]
]
}

Shouldn't the second bit of example 1 be "rowname1"..etc.? I don't really get example 2 so I guess I would aim you towards 1. There is much to be said for having data immediately workable without pre-processing it first. Justification: I once spend too long optimizing array system that turned out to work perfectly but its hell to update it now.

Related

Using Power Query to extract data from nested arrays in JSON

I'm relatively new to Power Query, but I'm pulling in this basic structure of JSON from a web api
{
"report": "Cost History",
"dimensions": [
{
"time": [
{
"name": "2019-11",
"label": "2019-11",
…
},
{
"name": "2019-12",
"label": "2019-12",
…
},
{
"name": "2020-01",
"label": "2020-01",
…
},
…
]
},
{
"Category": [
{
"name": "category1",
"label": "Category 1",
…
},
{
"name": "category2",
"label": "Category 2",
…
},
…
]
}
],
"data": [
[
[
40419.6393798211
],
[
191.44
],
…
],
[
[
2299.652439184997
],
[
0.0
],
…
]
]
}
I actually have 112 categories and 13 "times". I figured out how to do multiple queries to turn the times into column headers and the categories into row labels (I think). But the data section is alluding me. Because each item is a list within a list I'm not sure how to expand it all out. Each object in the date array will have 112 numbers and there will be 13 objects. If that all makes sense.
So ultimately I want to make it look like
2019-11 2019-20 2020-01 ...
Category 1 40419 2299
Category 2 191 0
...
First time asking a question on here, so hopefully this all makes sense and is clear. Thanks in advance for any help!
i am also researching this exact thing and looking for a solution. In PQ, it displays nested arrays as a list and there is a function to extract values choosing a separating characterenter image description here
So this becomes, this
enter image description here
= Table.TransformColumns(#"Filtered Rows", {"aligned_to_ids", each Text.Combine(List.Transform(_, Text.From), ","), type text})
However the problem i'm trying to solve is when the nested json has multiple values like this: enter image description here
And when these LIST are extracted then an error message is caused, = Table.TransformColumns(#"Extracted Values1", {"collaborators", each Text.Combine(List.Transform(_, Text.From), ","), type text})
Expression.Error: We cannot convert a value of type Record to type Text.
Details:
Value=
id=15890
goal_id=323
role_id=15
Type=[Type]
It seems the multiple values are not handled and PQ does not recognise the underlying structure to enable the columns to be expanded.

string to json format powershell

Hey guys I have a singular string output which I need to convert in JSON:
Policy Name: Default_US1 Id: abc123 Buckets: bucket1,bothplaces
Policy Name: Default_CH1 Id: def456 Buckets: support,ch1,ch2
Policy Name: Default_NY2 Id: ghi789 Buckets: demo,bucket1,test1,test
How SHOULD it look like in JSON format?
[
{"Policy Name": "Default_US1"}, {"Id": "abc123"}, {"Buckets":[ "bucket1","bothplaces"]}
{"Policy Name": "Default_CH1"}, {"Id": "def456"}, {"Buckets":[ "support","ch1","ch2"]}
{"Policy Name": "Default_NY2"}, {"Id": "ghi789"}, {"Buckets":[ "demo","bucket1","test1","test"]}
]
above is my current attempt... but other than not working.. I know instinctively it's missing something/s... but I cant figure out what and how to remedy it
Directions on how to do it in Powershell would be a plus, but not necessary
I keep trying but messing up, since I know the best test is making Convertfrom-json show me normal output.
I do not care much how it ends up looking at the end, I just wish to extract all that data, with JSON being the format of choice, any VALID JSON result I can work with and manipulate....but first i need a valid JSON conversion
Ok, so you were correct - your current JSON format is ghastly! The mistake you are making is treating each little bit of data as a separate object when there appears to be a natural hierarchy in your data model.
The following structure more naturally fits your data model. However, this is purely based on a cursory examination of the input data you have posted - I know nothing about the data model itself.
[
{
"Name": "Default_US1",
"Id": "abc123",
"Buckets": [
"bucket1",
"bothplaces"
]
},
{
"Name": "Default_CH1",
"Id": "def456",
"Buckets": [
"support",
"ch1",
"ch2"
]
},
{
"Name": "Default_NY2",
"Id": "ghi789",
"Buckets": [
"demo",
"bucket1",
"test1",
"test2"
]
}
]

Solr create Core which has Array of Json object and Json object similar to elastic search

I want to create a core which has doc as under
{
"dataSet_s": "MYSQL_NEW_W",
"ruleType_s": "IF_EQUALS_THEN_EQUALS",
"enable_s": "true",
"testCaseId_s": "CASE_2",
"condition": {
"conClause_s": "IF",
"conField_s": "ENGINE_RPM",
"conOperator1_s": "GREATER",
"conVal1_s": "5000",
"conVal2_s": "1000"
},
"result": [
{
"resClause1": "THEN",
"resField1": "SPEED",
"resOperator1": "EQUALS",
"resVal1": "100",
"resVal2": "200"
},
{
"resClause1": "THEN",
"resField1": "SPEED",
"resOperator1": "GREATER",
"resVal1": "1000",
"resVal2": "2000"
}
]
}'
In elastic search the document has the same structure after inserting, but in Solr when I tried insert after creating a simple core it looks..
{
"dataSet_s": "MYSQL_NEW_W",
"ruleType_s": "IF_EQUALS_THEN_EQUALS",
"enable_s": "true",
"testCaseId_s": "CASE_2",
"condition.conClause_s": "IF",
"condition.conField_s": "ENGINE_RPM",
"condition.conOperator1_s": "GREATER",
"condition.conVal1_s": "5000",
"condition.conVal2_s": "1000",
"result.resClause1": [
"THEN",
"THEN"
],
"result.resField1": [
"SPEED",
"SPEED"
],
"result.resOperator1": [
"EQUALS",
"GREATER"
],
"result.resVal1": [
100,
1000
],
"result.resVal2": [
200,
2000
],
"id": "ed0af96b-5127-4686-ae4c-26621b941919",
"_version_": 1541533921419722800
}
Can we store json Object and Array of Json object in solr similar to Elastic search.
There are two aspects to this:
Storing original structure
Indexing it for search
Elasticsearch has that built in, but Solr does it through the configuration with some flexibility. It is described in the Reference Guide under indexing custom JSON. Basically, the original structure goes into _src_ field. The indexing can happen in several different ways, but default would just dump all content into the text field, which is effectively what Elasticsearch does.
It is important to remember that both use Lucene underneath which does not support nested structures. So, both engine do mapping between representation and what Lucene can actually do.

Deeply nested JSON documents in Apache Solr

I have a deeply nested document(pseudo structure as shown below):
[{
"id": "1",
"company_id": "1",
"company_name": "company_1",
"departments":[{
"dep1" : [{
"id" : 40,
"name" : xyz
},
{
"id" : 41,
"name" : xyr
}],
"dep2": [{
}]
}]
"employeePrograms" :[{
}]
}]
How can I index these type of documents in Apache Solr?
Documentation gives the idea of immediate child documents alone.
Unfortunatelly i'm don't have huge experience with this technology, but want to help. Here is some official documentation, that might be useful: oficial doc
more specific
If you have some uncommon issue, tell about it, maybe any error, or whatever.. I would try my best to help)
Upd1 :
Solr can only maintain a 'flat' representation of the data. What you weretrying to do is not really possible. There are a number of workarounds, such as using dynamic fields and using a solr join to link multiple data sets.
Speking about a deep nesting ? I've found such an example of work around.
If you had something like that:
"docs": [
{
"name": "Product Name",
"categories": [
{
"name": "Category 1",
"priority": 8
},
{
"name": "Category 2",
"priority": 6
}
...
]
},
You have to modify it like that to make it not deeply nested :
"docs": [
{
name: "Sample Product"
categories: [
{
priority_category: "9_Category 1",
},
{
priority_category: "5_Category 2",
}
...
]
},
So, you've done something similar, check if there are any errors anywhere

How to open a .mongo file and export the content into csv?

EDIT 2014-05-01: I tried fromJSON first (as suggested below), but that only parsed the first line. I found out that there were commas missing between the brackets of each JSON line so I changed that in TextEdit and saved the file. I also added [ at the beginning of the file and ] at the end and then it worked with JSON. Now the next step: from a list (with embedded lists) to a dataframe (or csv).
I get a data package from edX every now and then on the courses we are evaluating. Some of these are just plain .csv files which are quite easy to handle, others are more difficult for me (not having a CS or programming background).
I have 2 files I want to open and parse into csv files for analysis in R. I have tried many many json2csv tools out there, but to no avail. I also tried the simple methods described here to turn json into csv.
The data is confidential, so I cannot share the entire data set, but will share the first two lines of the file, maybe that helps. The problem is that nowhere I find anything about .mongo files, which to me seems quite strange, do they even exist? Or is this just a JSON file that may be corrupted (which could explain the errors)?
Any suggestions are welcome.
The first 2 lines in one of the .mongo files:
{
"_id": {
"$oid": "52d1e62c350e7a3156000009"
},
"votes": {
"up": [
],
"down": [
],
"up_count": 0,
"down_count": 0,
"count": 0,
"point": 0
},
"visible": true,
"abuse_flaggers": [
],
"historical_abuse_flaggers": [
],
"parent_ids": [
],
"at_position_list": [
],
"body": "the delft university accredited course with the scholarship (fundamentals of water treatment) is supposed to start in about a month's time. But have the scholarship list been published? Any tentative date??",
"course_id": "DelftX/CTB3365x/2013_Fall",
"_type": "Comment",
"endorsed": false,
"anonymous": false,
"anonymous_to_peers": false,
"author_id": "269835",
"comment_thread_id": {
"$oid": "52cd40c5ab40cf347e00008d"
},
"author_username": "tachak59",
"sk": "52d1e62c350e7a3156000009",
"updated_at": {
"$date": 1389487660636
},
"created_at": {
"$date": 1389487660636
}
}{
"_id": {
"$oid": "52d0a66bcb3eee318d000012"
},
"votes": {
"up": [
],
"down": [
],
"up_count": 0,
"down_count": 0,
"count": 0,
"point": 0
},
"visible": true,
"abuse_flaggers": [
],
"historical_abuse_flaggers": [
],
"parent_ids": [
{
"$oid": "52c63278100c07c0d1000028"
}
],
"at_position_list": [
],
"body": "I got it. Thank you!",
"course_id": "DelftX/CTB3365x/2013_Fall",
"_type": "Comment",
"endorsed": false,
"anonymous": false,
"anonymous_to_peers": false,
"parent_id": {
"$oid": "52c63278100c07c0d1000028"
},
"author_id": "2655027",
"comment_thread_id": {
"$oid": "52c4f303b03c4aba51000013"
},
"author_username": "dmoronta",
"sk": "52c63278100c07c0d1000028-52d0a66bcb3eee318d000012",
"updated_at": {
"$date": 1389405803386
},
"created_at": {
"$date": 1389405803386
}
}{
"_id": {
"$oid": "52ceea0cada002b72c000059"
},
"votes": {
"up": [
],
"down": [
],
"up_count": 0,
"down_count": 0,
"count": 0,
"point": 0
},
"visible": true,
"abuse_flaggers": [
],
"historical_abuse_flaggers": [
],
"parent_ids": [
{
"$oid": "5287e8d5906c42f5aa000013"
}
],
"at_position_list": [
],
"body": "if u please send by mail \n",
"course_id": "DelftX/CTB3365x/2013_Fall",
"_type": "Comment",
"endorsed": false,
"anonymous": false,
"anonymous_to_peers": false,
"parent_id": {
"$oid": "5287e8d5906c42f5aa000013"
},
"author_id": "2276302",
"comment_thread_id": {
"$oid": "528674d784179607d0000011"
},
"author_username": "totah1993",
"sk": "5287e8d5906c42f5aa000013-52ceea0cada002b72c000059",
"updated_at": {
"$date": 1389292044203
},
"created_at": {
"$date": 1389292044203
}
}
R doesn't have "native" support for these files but there is a JSON parser with the rjson package. So I might load my .mongo file with:
myfile <- "path/to/myfile.mongo"
myJSON <- readLines(myfile)
myNiceData <- fromJSON(myJSON)
Since RJson converts into a data structure that fits the object being read, you'll have to do some additional snooping but once you have an R data type you shouldn't have any trouble working with it from there.
Another package to consider when parsing JSON data is jsonlite. It will make data frames for you so you can write them to a csv format with write.table or some other applicable method for writing objects.
NOTE: if it is easier to connect to the MongoDB and get the data from a request, then RMongo may be a good bet. The R-Bloggers also made a post about using RMongo that has a nice little walkthrough.
I used RJSON as suggested by #theWanderer and with the help of a colleague wrote the following code to parse the data into columns, choosing the specific columns that are needed, and checking each of the instances if they return the right variables.
Entire workflow:
Checked some of the data in jsonlint - corrected the errors → },{ instead of }{ between each line and [ and ] at the beginning and end of the file
Made a smaller file to play with, containing about 11 JSON lines
Used the code below to parse the datafile - however, checking the different listItems first if they are not lists themselves (that gives problems) // as you will see, I also removed things like \n because that gave errors and added an empty value for parent_id if there is none in the data (otherwise it would mix up the data)
The code to import the .mongo file into R and then parse it into CSV:
library(rjson)
###### set working directory to write out the data file
setwd("/your/favourite/dir/json to csv/")
#never ever convert strings to factors
options(stringsAsFactors = FALSE)
#import the .mongo file to R
temp.data = fromJSON(file="temp.mongo", method="C", unexpected.escape="error")
file.remove("temp.csv") ## removes the old datafile if there is one
## (so the data is not appended to the file,
## but a new file is created)
listItem = temp.data[[1]] ## prepare the listItem the first time
for (listItem in temp.data){
parent_id = ""
if (length(listItem$parent_id)>0){
parent_id = listItem$parent_id
}
write.table(t(c(
listItem$votes$up_count, listItem$visible, parent_id,
gsub("\n", "", listItem$body), listItem$course_id, unlist(listItem["_type"]),
listItem$endorsed, listItem$anonymous, listItem$author_id,
unlist(listItem$comment_thread_id), listItem$author_username,
as.POSIXct(unlist(listItem$created_at)/1000, origin="1970-01-01"))), # end t(), c()
file="temp.csv", sep="\t", append=TRUE, row.names=FALSE, col.names=FALSE)
}