JSON Structure Types Naming Conventions - json

I'm seeing JSON presented in a couple of different formats/styles, and I'm wondering if there are any standard names for these different formats/styles.
My searches haven't turned up any info - I'd appreciate anything anyone could share.
Format 1:
{
"KEYS": ["first", "last", "middle", "age"],
"VALUES": [
["joe", "smith", "a", 34],
["mary", "morris", "p", 65],
["phillip", "jones", "a", 33]
]
}
Format 2:
[{
"first": "joe",
"last": "smith",
"middle": "a",
"age": 34
}, {
"first": "mary",
"last": "morris",
"middle": "p",
"age": 33
}, {
"first": "phillip",
"last": "jones",
"middle": "a",
"age": 33
}]

The first JSON structure is more suitable for table representation while the second is a classic JSON representation of a list of objects.

The second format seems much more standard, as it's using key-value pairs the way JSON intends. It's the format produced by d3.dsv for instance.
It's a bit hard to be definitive though.

A colleague suggested "tabular" for format 1 (I also like "mirrored arrays") and "standard" for format 2. Unless someone knows of some more formal/common names, I'll stick with these for now.

Related

Splunk not recognizing regex

I'm struggling to make a regex work with splunk. It works with regex 101, but splunk doesn't seem to recognize it!
Regex: \"([\w]+)\":([^,}]+)
Log entry:
May 20 12:22:21 127.0.0.1 {"rootId": "AXIxikL8ao-yaSvA", "requestId": "f6a873jkjjkjk:-8000:5738",
"details": {"flag": false, "title": "task 1", "status": "Waiting", "group": "", "order": 0},
"operation": "Creation", "objectId": "AXIyCN5Oao-H5aYyaSvd", "startDate": 1589977341890,
"objectType": "case_task", "base": true, "object": {"_routing": "AXIxikL8ao-H5aYyaSvA", "flag":
false, "_type": "case_task", "title": "task 1", "createdAt": 1589977341516, "_parent": "AXIxikL8ao-
H5aYyaSvA", "createdBy": "user", "_id": "AXIyCN5Oao-H5aYyaSvd", "id": "AXIyCN5Oao-H5aYyaSvd",
"_version": 1, "order": 0, "status": "Waiting", "group": ""}}
Regex 101 link:
https://regex101.com/r/XBuz9Y/2/
I suspect splunk may have a different regex syntax, but i don't really know how to adapt it.
Any help?
Thanks!
You may use
... | rex max_match=0 "\"(?<key>\w+)\":(?<value>[^,}]+)"
Here, max_match=0 will enable multiple matching (by defauly, if you do not use max_match parameter, only the first match is returned) and the named capturing groups (here, see (?<key>...) and (?<value>...)) will ensure field creation.
See more about the Splunk rex command.
Grab the JSON fragment of your event using rex, and then use spath to do the extraction.
rex field=_raw "^[^{]+(?<json>.*)" | spath input=json
This should extract the JSON fields with the appropriate structure.

json formats - which one to use?

What is the difference between these two JSON formats? Which format should I use?
[{
"employeeid": "12345",
"firstname": "joe",
"lastname": "smith",
"favoritefruit": "apple"
}, {
"employeeid": "45678",
"firstname": "paul",
"lastname": "johnson",
"favoritefruit": "orange"
}]
OR
[
["employeeid", "firstname", "lastname", "favoritefruit"],
["12345", "joe", "smith", "apple"],
["45678", "paul", "johnson", "orange"]
]
Definately first one. It will create array of employee object while second one will create array of array of objects which will be more difficult to parse in most of language.
It depends on the context.
The first is much easer to parse if you want to create employee objects to work with.
The second may be better if you need to work on the "raw" data only. Furthermore the second is much shorter. That's not important for small or medium datasets, but could be important for example if you need to transfer large sets of employee data.

Deeply nested JSON documents in Apache Solr

I have a deeply nested document(pseudo structure as shown below):
[{
"id": "1",
"company_id": "1",
"company_name": "company_1",
"departments":[{
"dep1" : [{
"id" : 40,
"name" : xyz
},
{
"id" : 41,
"name" : xyr
}],
"dep2": [{
}]
}]
"employeePrograms" :[{
}]
}]
How can I index these type of documents in Apache Solr?
Documentation gives the idea of immediate child documents alone.
Unfortunatelly i'm don't have huge experience with this technology, but want to help. Here is some official documentation, that might be useful: oficial doc
more specific
If you have some uncommon issue, tell about it, maybe any error, or whatever.. I would try my best to help)
Upd1 :
Solr can only maintain a 'flat' representation of the data. What you weretrying to do is not really possible. There are a number of workarounds, such as using dynamic fields and using a solr join to link multiple data sets.
Speking about a deep nesting ? I've found such an example of work around.
If you had something like that:
"docs": [
{
"name": "Product Name",
"categories": [
{
"name": "Category 1",
"priority": 8
},
{
"name": "Category 2",
"priority": 6
}
...
]
},
You have to modify it like that to make it not deeply nested :
"docs": [
{
name: "Sample Product"
categories: [
{
priority_category: "9_Category 1",
},
{
priority_category: "5_Category 2",
}
...
]
},
So, you've done something similar, check if there are any errors anywhere

Elasticseach no results for json query

I am learning elasticsearch and following along with the tutorial. I uploaded three documents into an index. When I supply the following query:
curl 'localhost:9200/vehicles/_search?query=driver.name:Jon'
I as expected get back object two and object three. However when I try querying using json:
curl localhost:9200/vehicles/_search -d'
{
"query":{
"prefix":{
"driver.name":"Jon"
}}}'
I get no results back. I am following the tutorial very closely, so I don't understand what the issue is. Any help would be really appreciated. The uploaded objects are below.
Thank you!
id:one
'{
"color": "green",
"driver": {
"born":"1989-09-12",
"name": "Ben"
},
"make": "BMW",
"model": "Aztek",
"value": 3000.0,
"year": 2003
}'
id:two
'{
"color": "black",
"driver": {
"born":"1934-09-08",
"name": "Jon"
},
"make": "Mercedes",
"model": "Benz",
"value": 10000.0,
"year": 2012
}'
id:three
'{
"color": "green",
"driver": {
"born":"1934-09-08",
"name": "Jon"
},
"make": "BMW",
"model": "Benz",
"value": 10000.0,
"year": 2012
}'
The prefix-query "matches documents that have fields containing terms with a specified prefix (not analyzed)".
Note the "not analyzed"-part. Lucene is looking for anything starting with "Jon" in the index, but the standard analyzer lowercases terms. That is, "jon" is in the index, but "Jon" is not.
Thus, if you lowercase the text in your prefix-query, it should work. Here is a runnable example: https://www.found.no/play/gist/7629456
Try:
curl -XGET "http://localhost:9200/vehicles/_search" -d '
{
"query": {"query_string" : { "query" : "driver.name:Jon" }}
}'
In any case, If you are new to elasticsearch I really recommend you read the documentation because there are lots of types of queries. Besides, the results of queries also depends on how you index the documents, how you define the mapping, etc.
In order to use the prefix query, you need to hit a non-analyzed field. In your mappings for driver.name, if you set "index" to "not_analyzed", you can use the prefix query. Otherwise, you should use a match query or something similar.

JSON format with gzip compression

My current project sends a lot of data to the browser in JSON via ajax requests.
I've been trying to decide which format I should use. The two I have in mind are
[
"colname1" : "content",
"colname2" : "content",
],
[
"colname1" : "content",
"colname2" : "content",
],
...
and
{
"columns": [
"column name 1",
"column name 2",
],
"rows": [
[
"content",
"content"
],
[
"content",
"content"
]
...
]
}
The first method is better because it is easier to work with. I just have to convert to an object once received. The second will need some post processing to convert it into a format more like the first so it is easier to work with in JavaScript.
The second is better because it is less verbose and therefore takes up less bandwidth and downloads more quickly. Before compression it is usually between 0.75% and 0.85% of the size of the first format.
GZip compression complicates things further. Making the difference in file size nearer 0.85% to 0.95%
Which format should I go with and why?
I'd suggest using RJSON:
RJSON (Recursive JSON) converts any JSON data collection into more compact recursive form. Compressed data is still JSON and can be parsed with JSON.parse. RJSON can compress not only homogeneous collections, but any data sets with free structure.
Example:
JSON:
{
"id": 7,
"tags": ["programming", "javascript"],
"users": [
{"first": "Homer", "last": "Simpson"},
{"first": "Hank", "last": "Hill"},
{"first": "Peter", "last": "Griffin"}
],
"books": [
{"title": "JavaScript", "author": "Flanagan", "year": 2006},
{"title": "Cascading Style Sheets", "author": "Meyer", "year": 2004}
]
}
RJSON:
{
"id": 7,
"tags": ["programming", "javascript"],
"users": [
{"first": "Homer", "last": "Simpson"},
[2, "Hank", "Hill", "Peter", "Griffin"]
],
"books": [
{"title": "JavaScript", "author": "Flanagan", "year": 2006},
[3, "Cascading Style Sheets", "Meyer", 2004]
]
}
Shouldn't the second bit of example 1 be "rowname1"..etc.? I don't really get example 2 so I guess I would aim you towards 1. There is much to be said for having data immediately workable without pre-processing it first. Justification: I once spend too long optimizing array system that turned out to work perfectly but its hell to update it now.