JSON Modelling Approaches - json

Imagine I am storing a person's phone numbers in JSON format. One such JSON record might look as follows:
{
"firstName": "John",
"lastName": "Smith",
"phoneNumber": [
{
"type": "home",
"number": "212 555-1234"
},
{
"type": "mobile",
"number": "646 555-4567"
}
]
}
One alternative structure to the above is:
{
"firstName": "John",
"lastName": "Smith",
"homePhone": {
"number": "212 555-1234"
},
"mobilePhone": {
"number": "646 555-4567"
}
}
What are the pros and cons of the two modelling approaches? The obvious one I see is that the first approach allows one to retrieve all phones in one go.

In order to decide what to do in this cases you should think in your implementation too.
Let's say for example that you will be parsing and using this with Python. If you put it as a list, you will have to loop through the list in order to find a given number which in the worst case scenario might end up as an O(n) task.
If you re-factor it to be a dictionary (hash table), looking up a phone number by accessing the right key would be closer to O(1).
In summary, what you're doing with your data and how are you going to use it should dictate its structure.

I think your first exemple is better.
With your first solution phone number is just a collection, and it's easy to add/delete/filters phone number.
// ES6
const allMobilePhones = user.phones.filter(phone => phone.type === 'mobile');
// With Lodash/Underscore
var allMobilePhones = _(user.phones).filter(function(phone){
return phone === 'mobile';
});
It's also more readable for documentation, you don't have to say look at attributes mobilePhone, homePhone, unusedPhone, workPhone. Another thing, if you add new type of phone, you don't care you just have to add new type value.
If you are working to expose your JSON over an API, take a look at:
micro-api or json-api.

Related

json formats - which one to use?

What is the difference between these two JSON formats? Which format should I use?
[{
"employeeid": "12345",
"firstname": "joe",
"lastname": "smith",
"favoritefruit": "apple"
}, {
"employeeid": "45678",
"firstname": "paul",
"lastname": "johnson",
"favoritefruit": "orange"
}]
OR
[
["employeeid", "firstname", "lastname", "favoritefruit"],
["12345", "joe", "smith", "apple"],
["45678", "paul", "johnson", "orange"]
]
Definately first one. It will create array of employee object while second one will create array of array of objects which will be more difficult to parse in most of language.
It depends on the context.
The first is much easer to parse if you want to create employee objects to work with.
The second may be better if you need to work on the "raw" data only. Furthermore the second is much shorter. That's not important for small or medium datasets, but could be important for example if you need to transfer large sets of employee data.

Update MongoDB document using JSON

Is there a way to update a complex MongoDB document from C# using JSON? For example, suppose I have the following document:
{
"name": "John Smith",
"age": 35,
"readingList":
[{
"title": "Title1",
"ISBN": 6246246426724,
"author":
{
"name": "James Johnson",
"age": 40
}
},
{
"title": "Title2",
"ISBN": 3513531513551,
"author":
{
"name": "Sam Hill",
"age": 20
}
}]
}
Now I want to update the age of the second book's author (Sam Hill) from 20 to 21. Suppose I have the following JSON representation:
{
"readingList":
[
{
"title": "Title2",
"author":
{
"age": 21
}
}]
}
Basically the second JSON string is like the first one, minus all the fields and array elements that don't change, except for one field in any array being looked at that uniquely identifies that index. In this case, the "age" field is included since it is being updated with the given value. The "title" field is given to locate the right array element while searching for the field to update. There may also be even more subdocuments and arrays to go through, and the format is not static (it may change at a later time). This is just a simplified example.
Is it possible to pass in something like this to some function and update the correct field that way? Is there something at least similar to this, so I can just pass in some JSON to do the update?
The reason I am looking to do it this way, rather than through simpler means, is because I want to keep track of a history of changes to documents, and if I want to backtrack to an earlier version, I want an easy way to do so that can handle this level of complexity.
UPDATE:
I have some clarifications to make. In this particular scenario I have no way to predict what kinds of changes would need to be made. A change could be made to any field at any time, and that field could be anywhere in the document, possibly at the top level, or within multiple nested subdocuments/arrays. The data we're dealing with is for a separate party that may use it and modify it at will, so we have no control over what they choose to do with it. In addition, there is no fixed schema. The other party could add new fields, including new subdocuments or arrays, or delete them.
The reason I'm asking this question is because I would like to store a history of changes to documents in such a way that I could revert to an older snapshot of the document by applying the changes in reverse. In this case, changing the age from to 20 to 21 would revert the document to an older state (assuming that someone messed with the age beforehand and made it 20, and I wanted to fix it back to 21). Since somebody could make any change they wanted to the system, including to the underlying structure of the data itself, I can't just come up with my own schema, or hardcode a solution that changes specific fields using this specific schema.
In this example, the change in age from 20 to 21 would be from a record in the history whose structure I couldn't predict beforehand. So I am looking for an efficient solution to apply an unpredictable update to a document given a simplified JSON representation of the change to be made.
I am also open to alternatives that don't involve JSON if they are fairly efficient. I brought up JSON because I figured that, given MongoDB's usage of JSON to structure documents, it would make the most sense, and perhaps be superior to something like string manipulation. Another alternative I considered would involve storing the change using some kind of custom dot notation, like this: readingList[ISBN:3513531513551].author.age=21"
This would require me to create a custom function to interpret the string and turn it into something useful though, so it doesn't sound like the best solution.
Hi friend I used below JSON document
{
"_id" : ObjectId("56a99c121f25cc3a3c709151"),
"name" : "John Smith",
"age" : 35,
"readingList" : [
{
"title" : "Title1",
"ISBN" : NumberLong(6246246426724),
"author" : {
"name" : "James Johnson",
"age" : 40
}
},
{
"title" : "Title2",
"ISBN" : NumberLong(3513531513551),
"author" : {
"name" : "Sam Hill",
"age" : "25"
}
}
]
}
I just used condition as author name is Sam Hill and execute below query in C# and its work.
IMongoQuery query = Query.And(Query.EQ("name", "John Smith"), Query.EQ("readingList.author.name", "Sam Hill"));
var result =collection.Update(query,
MongoDB.Driver.Builders.Update.Set("readingList.$.author.age", "21"));
you can query your main document let's assume your main collection is named "books" this is the structure:
{
"id":"123",
"name": "John Smith",
"age": 35,
"readingList":
[{
"title": "Title1",
"ISBN": 6246246426724,
"author":
{
"name": "James Johnson",
"age": 40
}
},
{
"title": "Title2",
"ISBN": 3513531513551,
"author":
{
"name": "Sam Hill",
"age": 20
}
}]
}
// you need a query that returns the main document by id for example, when you have the main document you can query at the one you want to modify in the list and assing it to a varibale let's say readItem, then do the modifications you need and after that you can update only the fields you need using set and onle the element in the array using "$" something like:
readItem.title = "some new title";
readItem.age++;
var update = MongoDB.Driver.Builders.Update.Set("readingList.$", BsonDocumentWrapper.Create(readItem));
Update<Book>(query, update);
Actually I would not advise you to choose this kind of data model because in my experience it will get pretty messy. Still, you might have some very specific requirements which might force you to have this and only this data model.
I would create two collections: persons and readinglists.
persons would look like:
{
"id":"123",
"name": "John Smith",
"age": 35
}
and readinglists would look like (note that it has a compound natural id):
{
"_id": { "personid":"123", "title": "Title1"},
"ISBN": 6246246426724,
"author":
{
"name": "James Johnson",
"age": 40
}
}
Then you can easily update the readinglist:
var query = Query.EQ("_id", new BsonDocument(new BsonElement[]{ new BsonElement("personid":"123"), BsonElement("title":"Title1")}));
readingListCollection.Update(query, Update.Set("author.age": 22));
In your data mode you need to know the array index of the second document. It is better to model readingList attribute as a map. In following example I used isbn as a map key:
{
"id":"123",
"name":"John Smith",
"age":35,
"readingList":{
"6246246426724":{
"title":"Title1",
"ISBN":6246246426724,
"author":{
"name":"James Johnson",
"age":40
}
},
"3513531513551":{
"title":"Title2",
"ISBN":3513531513551,
"author":{
"name":"Sam Hill",
"age":20
}
}
}
}
In this data model you can access second book directly. For instance by dot notation:
db.authors.update(
{ item: "123" },
{ $set: { "readingList.3513531513551.author.age": 22 } }
)
Unfortunately I do know C# notation for that but should be straight forward.

JSON Notation - Lists with Single Member

Let's say I have a JSON structure that contains the following:
{
"ROWS": [{
"name": "Greg",
"age": "24",
},
{
"name": "Tom",
"age": "53",
}]
}
The value for the key "ROWS" is a list of dictionaries, right?
Okay, well what if I only have one entry? Is it still appropriate to use list notation, even if that list has a single element?
{
"ROWS": [{
"name": "Greg",
"age": "24",
}]
}
Would there be any reason I could NOT do this?
There is no technical reason why you could not use a list. Your array could be empty and that's perfectly acceptable and valid technically.
For your ROWS property I think the most important thing to consider is how many rows you could possibly have. You want to incorporate the computer engineering principle of generality to make sure you don't paint yourself into a corner by making ROWS an object. If you can expect to ever have more then one object as a row, even if currently there is only one, then it's absolutely appropriate to use an array.
For example let's assume you expect to get a unique record such as a login system. Then it wouldn't make sense to use an array , in this case you should use an object instead
{
"LOGIN_ROW": {
"name": "Greg",
"age": "24",
}
}
Again I said should because it's up to you to format your json object graph. But of course if you have a scenario where you have a list of employees then it would make sense to use an array:
{
"LIST_OF_ROWS": [{
"name": "Greg",
"age": "24",
}]
}
This is perfectly fine because you have one employee at this time but you wish to expand your company so you would expect to get more employees.

Should the name of an array of objects in json be pluralized or not

What is the convention,should array of objects in json be pluralized or not
i.e
{
"releases": [
{
"id": "0b405ea7-8785-402f-bcf7-d55f5000dc3e",
"title": "Wintertunes"
},
{
"id": "7eb37a3a-646d-4501-a373-e9071186b88d",
"title": "Adventure Magic Supreme Journey Music"
}
],
}
versus
i.e
{
"release": [
{
"id": "0b405ea7-8785-402f-bcf7-d55f5000dc3e",
"title": "Wintertunes"
},
{
"id": "7eb37a3a-646d-4501-a373-e9071186b88d",
"title": "Adventure Magic Supreme Journey Music"
}
],
}
I believe Google JSON Style Guide is a good source for this kind of doubt.
Arrays usually contain multiple items, and a plural property name reflects this.
Also, the strongest argument for the plural form would be the direct object mapping you get from JSON.parse(). You probably want your Javascript code to deal with pluralized array objects.
Though, in case you have a million such entries - with no compression - that extra 's' will for sure produce some significant bloat :P
In the absence of any strong opinion decided to pluralize it

Identifying Duplicates in CouchDB

I'm new to CouchDB and document-oriented databases in general.
I've been playing around with CouchDB, and was able to get familiar with creating documents (with perl) and using the Map/Reduce functions in Futon to query the data and create views.
One of the things I'm still trying to figure out is how to identify duplicate values across documents using Futon's Map/Reduce.
For example, if I have the following documents:
{
"_id": "123",
"name": "carl",
"timestamp": "2012-01-27T17:06:03Z"
}
{
"_id": "124",
"name": "carl",
"timestamp": "2012-01-27T17:07:03Z"
}
And I wanted to get a list of document id's that had duplicate "name" values, is this something I could do with the Futon Map/Reduce?
The result was hoping to achieve is as follows:
{
"name": "carl",
"dupes": [ "123", "124" ]
}
..or..
{
"carl": [ "123", "124" ]
}
.. which would be the value, and associated document ids which contain those duplicate values.
I've tried a few different things with Map/Reduce, but so far as I understand, the Map function works with data on a per-document basis, and the Reduce functions only allow you to work with the keys/values from a given document.
I know i could just pull the data I need with perl, work magic there, and get the result I want, but I'm trying to work only with CouchDB for now in order to better understand it's benefits / limitations.
Another way I'm thinking about doing this is to use a single document like an RDBMS table:
{
"_id": "names",
"rec1": {
"_id": "123",
"name": "carl",
"timestamp": "2012-01-27T17:06:03Z"
},
"rec2": {
"_id": "124",
"name": "carl",
"timestamp": "2012-01-27T17:07:03Z"
}
}
.. which should allow me to use the Map/Reduce functions in the way I originally thought. However I'm not sure if this is ideal.
I understand that my mind is still stuck in RDBMS land, so much of what I'm trying to do above may not be necessary. Any insight on this would be much appreciated.
Thanks!
Edit: Fixed JSON syntax in some of the examples.
If you merely want a list of unique values, that's pretty easy. If you wish to identify the duplicates, then it gets less easy.
In both cases, a map function like this should suffice:
function (doc) {
emit(doc.name);
}
For your reduce function, just enter _count.
Your view output will look like: (based on your 2 documents)
{
"rows": [
{ "key": "carl", "value": 2 }
]
}
From there, you will have a list of names as well as their frequency. You can take that list and filter it yourself, or you can take the "all couch" route and use a _list function to perform that final filtering.
function (head, req) {
var row, duplicates = [];
while (row = getRow()) {
if (row.value > 1) {
duplicates.push(row);
}
}
send(JSON.stringify(duplicates));
}
Read up about _list functions, they're pretty handy and versatile.