What is the difference between these two JSON formats? Which format should I use?
[{
"employeeid": "12345",
"firstname": "joe",
"lastname": "smith",
"favoritefruit": "apple"
}, {
"employeeid": "45678",
"firstname": "paul",
"lastname": "johnson",
"favoritefruit": "orange"
}]
OR
[
["employeeid", "firstname", "lastname", "favoritefruit"],
["12345", "joe", "smith", "apple"],
["45678", "paul", "johnson", "orange"]
]
Definately first one. It will create array of employee object while second one will create array of array of objects which will be more difficult to parse in most of language.
It depends on the context.
The first is much easer to parse if you want to create employee objects to work with.
The second may be better if you need to work on the "raw" data only. Furthermore the second is much shorter. That's not important for small or medium datasets, but could be important for example if you need to transfer large sets of employee data.
Related
Working in Nifi, I have the following json structure in the content of a flow file:
{
"firstname": "fred",
"lastname": "jackson",
"dob": "19550607",
"children": [{
"firstname": "janet",
"lastname": "jackson",
"dob": "20020607"
},
{
"firstname": "michael",
"lastname": "jackson",
"dob": "20010201"
},
{
"firstname": "tito",
"lastname": "jackson",
"dob": "20030707"
}
]
}
I want to split this such that I would have three (3) flowfiles, each containing the top level info, but with just one child. For example one of them would look like this:
{
"firstname": "fred",
"lastname": "jackson",
"dob": "19550607",
"children": {
"firstname": "janet",
"lastname": "jackson",
"dob": "20020607"
}
}
Again, I would have three different flow files, one for each child. The output does not have to look exactly like this. The important thing is that I am able to split the struture, yet maintain the common data in each of the result flow files.
I tried using SplitJson with a JSONExpression of "$.children", which does give me the three flow files, but I loose the parent info. I could save the key/values for the common elements in attributes, split, and then add them, but the parent information can be more complex than my example (dynamic fields, etc), so I am unsure how I would do this.
Appreciate any ideas or thoughts.
The simplest way would be to use ForkRecord with a JSON Reader/Writer.
Set Include Parent Fields to true to retain the parent fields.
However, this may flatten the JSON in a way that you don't want - give it a try.
Alternatively, look at JoltTransformJSON which gives a lot more flexibility, but is quite complex to work out the appropriate spec. You can use https://jolt-demo.appspot.com/#inception to test your JOLT Specs.
I'm seeing JSON presented in a couple of different formats/styles, and I'm wondering if there are any standard names for these different formats/styles.
My searches haven't turned up any info - I'd appreciate anything anyone could share.
Format 1:
{
"KEYS": ["first", "last", "middle", "age"],
"VALUES": [
["joe", "smith", "a", 34],
["mary", "morris", "p", 65],
["phillip", "jones", "a", 33]
]
}
Format 2:
[{
"first": "joe",
"last": "smith",
"middle": "a",
"age": 34
}, {
"first": "mary",
"last": "morris",
"middle": "p",
"age": 33
}, {
"first": "phillip",
"last": "jones",
"middle": "a",
"age": 33
}]
The first JSON structure is more suitable for table representation while the second is a classic JSON representation of a list of objects.
The second format seems much more standard, as it's using key-value pairs the way JSON intends. It's the format produced by d3.dsv for instance.
It's a bit hard to be definitive though.
A colleague suggested "tabular" for format 1 (I also like "mirrored arrays") and "standard" for format 2. Unless someone knows of some more formal/common names, I'll stick with these for now.
Imagine I am storing a person's phone numbers in JSON format. One such JSON record might look as follows:
{
"firstName": "John",
"lastName": "Smith",
"phoneNumber": [
{
"type": "home",
"number": "212 555-1234"
},
{
"type": "mobile",
"number": "646 555-4567"
}
]
}
One alternative structure to the above is:
{
"firstName": "John",
"lastName": "Smith",
"homePhone": {
"number": "212 555-1234"
},
"mobilePhone": {
"number": "646 555-4567"
}
}
What are the pros and cons of the two modelling approaches? The obvious one I see is that the first approach allows one to retrieve all phones in one go.
In order to decide what to do in this cases you should think in your implementation too.
Let's say for example that you will be parsing and using this with Python. If you put it as a list, you will have to loop through the list in order to find a given number which in the worst case scenario might end up as an O(n) task.
If you re-factor it to be a dictionary (hash table), looking up a phone number by accessing the right key would be closer to O(1).
In summary, what you're doing with your data and how are you going to use it should dictate its structure.
I think your first exemple is better.
With your first solution phone number is just a collection, and it's easy to add/delete/filters phone number.
// ES6
const allMobilePhones = user.phones.filter(phone => phone.type === 'mobile');
// With Lodash/Underscore
var allMobilePhones = _(user.phones).filter(function(phone){
return phone === 'mobile';
});
It's also more readable for documentation, you don't have to say look at attributes mobilePhone, homePhone, unusedPhone, workPhone. Another thing, if you add new type of phone, you don't care you just have to add new type value.
If you are working to expose your JSON over an API, take a look at:
micro-api or json-api.
I want to be able to access deeper elements stored in a json in the field json, stored in a postgresql database. For example, I would like to be able to access the elements that traverse the path states->events->time from the json provided below. Here is the postgreSQL query I'm using:
SELECT
data#>> '{userId}' as user,
data#>> '{region}' as region,
data#>>'{priorTimeSpentInApp}' as priotTimeSpentInApp,
data#>>'{userAttributes, "Total Friends"}' as totalFriends
from game_json
WHERE game_name LIKE 'myNewGame'
LIMIT 1000
and here is an example record from the json field
{
"region": "oh",
"deviceModel": "inHouseDevice",
"states": [
{
"events": [
{
"time": 1430247045.176,
"name": "Session Start",
"value": 0,
"parameters": {
"Balance": "40"
},
"info": ""
},
{
"time": 1430247293.501,
"name": "Mission1",
"value": 1,
"parameters": {
"Result": "Win ",
"Replay": "no",
"Attempt Number": "1"
},
"info": ""
}
]
}
],
"priorTimeSpentInApp": 28989.41467999999,
"country": "CA",
"city": "vancouver",
"isDeveloper": true,
"time": 1430247044.414,
"duration": 411.53,
"timezone": "America/Cleveland",
"priorSessions": 47,
"experiments": [],
"systemVersion": "3.8.1",
"appVersion": "14312",
"userId": "ef617d7ad4c6982e2cb7f6902801eb8a",
"isSession": true,
"firstRun": 1429572011.15,
"priorEvents": 69,
"userAttributes": {
"Total Friends": "0",
"Device Type": "Tablet",
"Social Connection": "None",
"Item Slots Owned": "12",
"Total Levels Played": "0",
"Retention Cohort": "Day 0",
"Player Progression": "0",
"Characters Owned": "1"
},
"deviceId": "ef617d7ad4c6982e2cb7f6902801eb8a"
}
That SQL query works, except that it doesn't give me any return values for totalFriends (e.g. data#>>'{userAttributes, "Total Friends"}' as totalFriends). I assume that part of the problem is that events falls within a square bracket (I don't know what that indicates in the json format) as opposed to a curly brace, but I'm also unable to extract values from the userAttributes key.
I would appreciate it if anyone could help me.
I'm sorry if this question has been asked elsewhere. I'm so new to postgresql and even json that I'm having trouble coming up with the proper terminology to find the answers to this (and related) questions.
You should definitely familiarize yourself with the basics of json
and json functions and operators in Postgres.
In the second source pay attention to the operators -> and ->>.
General rule: use -> to get a json object, ->> to get a json value as text.
Using these operators you can rewrite your query in the way which returns correct value of 'Total Friends':
select
data->>'userId' as user,
data->>'region' as region,
data->>'priorTimeSpentInApp' as priotTimeSpentInApp,
data->'userAttributes'->>'Total Friends' as totalFriends
from game_json
where game_name like 'myNewGame';
Json objects in square brackets are elements of a json array.
Json arrays may have many elements.
The elements are accessed by an index.
Json arrays are indexed from 0 (the first element of an array has an index 0).
Example:
select
data->'states'->0->'events'->1->>'name'
from game_json
where game_name like 'myNewGame';
-- returns "Mission1"
select
data->'states'->0->'events'->1->>'name'
from game_json
where game_name like 'myNewGame';
This did help me
Let's say I have a JSON structure that contains the following:
{
"ROWS": [{
"name": "Greg",
"age": "24",
},
{
"name": "Tom",
"age": "53",
}]
}
The value for the key "ROWS" is a list of dictionaries, right?
Okay, well what if I only have one entry? Is it still appropriate to use list notation, even if that list has a single element?
{
"ROWS": [{
"name": "Greg",
"age": "24",
}]
}
Would there be any reason I could NOT do this?
There is no technical reason why you could not use a list. Your array could be empty and that's perfectly acceptable and valid technically.
For your ROWS property I think the most important thing to consider is how many rows you could possibly have. You want to incorporate the computer engineering principle of generality to make sure you don't paint yourself into a corner by making ROWS an object. If you can expect to ever have more then one object as a row, even if currently there is only one, then it's absolutely appropriate to use an array.
For example let's assume you expect to get a unique record such as a login system. Then it wouldn't make sense to use an array , in this case you should use an object instead
{
"LOGIN_ROW": {
"name": "Greg",
"age": "24",
}
}
Again I said should because it's up to you to format your json object graph. But of course if you have a scenario where you have a list of employees then it would make sense to use an array:
{
"LIST_OF_ROWS": [{
"name": "Greg",
"age": "24",
}]
}
This is perfectly fine because you have one employee at this time but you wish to expand your company so you would expect to get more employees.