I am trying to learn mongodb. Suppose there are two tables and they are related. For example like this -
1st table has
First name- Fred, last name- Zhang, age- 20, id- s1234
2nd table has
id- s1234, course- COSC2406, semester- 1
id- s1234, course- COSC1127, semester- 1
id- s1234, course- COSC2110, semester- 1
how to insert data in the mongo db? I wrote it like this, not sure is it correct or not -
db.users.insert({
given_name: 'Fred',
family_name: 'Zhang',
Age: 20,
student_number: 's1234',
Course: ['COSC2406', 'COSC1127', 'COSC2110'],
Semester: 1
});
Thank you in advance
This would be a assuming that what you want to model has the "student_number" and the "Semester" as what is basically a unique identifier for the entries. But there would be a way to do this without accumulating the array contents in code.
You can make use of the upsert functionality in the .update() method, with the help of of few other operators in the statement.
I am going to assume you are going this inside a loop of sorts, so everything on the right side values is actually a variable:
db.users.update(
{
"student_number": student_number,
"Semester": semester
},
{
"$setOnInsert": {
"given_name": given_name,
"family_name": family_name,
"Age": age
},
"$addToSet": { "courses": course }
},
{ "upsert": true }
)
What this does in an "upsert" operation is first looks for a document that may exist in your collection that matches the query criteria given. In this case a "student_number" with the current "Semester" value.
When that match is found, the document is merely "updated". So what is being done here is using the $addToSet operator in order to "update" only unique values into the "courses" array element. This would seem to make sense to have unique courses but if that is not your case then of course you can simply use the $push operator instead. So that is the operation you want to happen every time, whether the document was "matched" or not.
In the case where no "matching" document is found, a new document will then be inserted into the collection. This is where the $setOnInsert operator comes in.
So the point of that section is that it will only be called when a new document is created as there is no need to update those fields with the same information every time. In addition to this, the fields you specified in the query criteria have explicit values, so the behavior of the "upsert" is to automatically create those fields with those values in the newly created document.
After a new document is created, then the next "upsert" statement that uses the same criteria will of course only "update" the now existing document, and as such only your new course information would be added.
Overall working like this allows you to "pre-join" the two tables from your source with an appropriate query. Then you are just looping the results without needing to write code for trying to group the correct entries together and simply letting MongoDB do the accumulation work for you.
Of course you can always just write the code to do this yourself and it would result in fewer "trips" to the database in order to insert your already accumulated records if that would suit your needs.
As a final note, though it does require some additional complexity, you can get better performance out of the operation as shown by using the newly introduced "batch updates" functionality.For this your MongoDB server version will need to be 2.6 or higher. But that is one way of still reducing the logic while maintaining fewer actual "over the wire" writes to the database.
You can either have two separate collections - one with student details and other with courses and link them with "id".
Else you can have a single document with courses as inner document in form of array as below:
{
"FirstName": "Fred",
"LastName": "Zhang",
"age": 20,
"id": "s1234",
"Courses": [
{
"courseId": "COSC2406",
"semester": 1
},
{
"courseId": "COSC1127",
"semester": 1
},
{
"courseId": "COSC2110",
"semester": 1
},
{
"courseId": "COSC2110",
"semester": 2
}
]
}
Related
I'm starting to explore the JSON1 library for sqlite and have been so far successful in the basic queries I've created. I'm now looking to create a more complicated query that pulls data from multiple levels.
Here's the example JSON object I'm starting with (and most of the data is very similar).
{
"height": 140.0,
"id": "cp",
"label": {
"bind": "cp_label"
},
"type": "color_picker",
"user_data": {
"my_property": 2
},
"uuid": "948cb959-74df-4af8-9e9c-c3cb53ac9915",
"value": {
"bind": "cp_color"
},
"width": 200.0
}
This json object is buried about seven levels deep in a json structure and I pulled it from the larger json construct using an sql statement like this:
SELECT value FROM forms, json_tree(forms.formJSON, '$.root')
WHERE type = 'object'
AND json_extract(value, '$.id') = #sControlID
// In this example, #sControlID is a variable that represents the `id` value we're looking for, which is 'cp'
But what I really need to pull from this object are the following:
the value from key type ("color_picker" in this example)
the values from keys bind ("cp_color" and "cp_label" in this example)
the keys value and label (which have values of {"bind":"<string>"} in this example)
For that last item, the key name (value and label in this case) can be any number of keywords, but no matter the keyword, the value will be an object of the form {"bind":"<some_string>"}. Also, there could be multiple keys that have a bind object associated with them, and I'd need to return all of them.
For the first two items, the keywords will always be type and bind.
With the json example above, I'd ideally like to retrieve two rows:
type key value
color_picker value cp_color
color_picker label cp_label
When I use json_extract methods, I end up retrieving the object {"bind":"cp_color"} from the json_tree table, but I also need to retrieve the data from the parent object. I feel like I need to do some kind of union, but my attempts have so far been unsuccessful. Any ideas here?
Note: if the {"bind":"<string>"} object doesn't exist as a child of the parent object, I don't want any rows returned.
Well, I was on the right track and eventually figured out it. I created a separate query for each of the items I was looking for, then INNER JOINed all the json_tree tables from each of the queries to have all the required fields available. Then I json_extracted the required data from each of the json fields I needed data from. In the end, it gave me exactly what I was looking for, though I'm sure it could be written more efficiently.
For anyone interested, this is what hte final query ended up looking like:
SELECT IFNULL(json_extract(parent.value, '$.type'), '_window_'), child.key, json_extract(child.value, '$.bind') FROM (SELECT json_tree.* FROM nui_forms, json_tree(nui_forms.formJSON, '$') WHERE type = 'object' AND json_extract(nui_forms.formJSON, '$.id') = #sWindowID) parent INNER JOIN (SELECT json_tree.* FROM nui_forms, json_tree(nui_forms.formJSON, '$') WHERE type = 'object' AND json_extract(value, '$.bind') != 'NULL' AND json_extract(nui_forms.formJSON, '$.id') = #sWindowID) child ON child.parent = parent.id;
If you have any tips on reducing its complexity, feel free to comment!
I've got a couple hundred JSONs in a structure like the following example:
{
"JsonExport": [
{
"entities": [
{
"identity": "ENTITY_001",
"surname": "SMIT",
"entityLocationRelation": [
{
"parentIdentification": "PARENT_ENTITY_001",
"typeRelation": "SEEN_AT",
"locationIdentity": "LOCATION_001"
},
{
"parentIdentification": "PARENT_ENTITY_001",
"typeRelation": "SEEN_AT",
"locationIdentity": "LOCATION_002"
}
],
"entityEntityRelation": [
{
"parentIdentification": "PARENT_ENTITY_001",
"typeRelation": "FRIENDS_WITH",
"childIdentification": "ENTITY_002"
}
]
},
{
"identity": "ENTITY_002",
"surname": "JACKSON",
"entityLocationRelation": [
{
"parentIdentification": "PARENT_ENTITY_002",
"typeRelation": "SEEN_AT",
"locationIdentity": "LOCATION_001"
}
]
},
{
"identity": "ENTITY_003",
"surname": "JOHNSON"
}
],
"identification": "REGISTRATION_001",
"locations": [
{
"city": "LONDON",
"identity": "LOCATION_001"
},
{
"city": "PARIS",
"identity": "LOCATION_002"
}
]
}
]
}
With these JSON's, I want to make a graph consisting of the following nodes: Registration, Entity and Location. This part I've figured out and made the following:
WITH "file:///example.json" AS json_file
CALL apoc.load.json(json_file,"$.JsonExport.*" ) YIELD value AS data
MERGE(r:Registration {id:data.identification})
WITH json_file
CALL apoc.load.json(json_file,"$.JsonExport..locations.*" ) YIELD value AS locations
MERGE(l:Locations{identity:locations.identity, name:locations.city})
WITH json_file
CALL apoc.load.json(json_file,"$.JsonExport..entities.*" ) YIELD value AS entities
MERGE(e:Entities {name:entities.surname, identity:entities.identity})
All the entities and locations should have a relation with the registration. I thought I could do this by using the following code:
MERGE (e)-[:REGISTERED_ON]->(r)
MERGE (l)-[:REGISTERED_ON]->(r)
However this code doesn’t give the desired output. It creates extra "empty" nodes and doesn't connect to the registration node. So the first question is: How do I connect the location and entities nodes to the registration node. And in light of the other JSON's, the entities and locations should only be linked to the specific registration.
Furthermore, I would like to make the entity -> location relation and the entity - entity relation and use the given type of relation (SEEN_AT or FRIENDS_WITH) as label for the given relation. How can this be done? I'm kind of lost at this point and don’t see how to solve this. If someone could guide me into the right direction I would be much obliged.
Variable names (like e and r) are not stored in the DB, and are bound to values only within individual queries. MERGE on a pattern with an unbound variable will just create the entire pattern (including creating an empty node for unbound node variables).
When you MERGE a node, you should only specify the unique identifying property for that node, to avoid duplicates. Any other properties you want to set at the time of creation should be set using ON CREATE SET.
It is inefficient to parse through the JSON data 3 times to get different areas of the data. And it is especially inefficient the way your query was doing it, since each subsequent CALL/MERGE group of clauses would be done multiple times (since every previous CALL produces multiple rows, and the number of rows increases multiplicative). You can use aggregation to get around that, but it is unnecessary in your case, since you can just do the entire query in a single pass through the JSON data.
This may work for you:
CALL apoc.load.json(json_file,"$.JsonExport.*" ) YIELD value AS data
MERGE(r:Registration {id:data.identification})
FOREACH(ent IN data.entities |
MERGE (e:Entities {identity: ent.identity})
ON CREATE SET e.name = ent.surname
MERGE (e)-[:REGISTERED_ON]->(r)
FOREACH(loc1 IN ent.entityLocationRelation |
MERGE (l1:Locations {identity: loc1.locationIdentity})
MERGE (e)-[:SEEN_AT]->(l1))
FOREACH(ent2 IN ent.entityEntityRelation |
MERGE (e2:Entities {identity: ent2.childIdentification})
MERGE (e)-[:FRIENDS_WITH]->(e2))
)
FOREACH(loc IN data.locations |
MERGE (l:Locations{identity:loc.identity})
ON CREATE SET l.name = loc.city
MERGE (l)-[:REGISTERED_ON]->(r)
)
For simplicity, it hard-codes the FRIENDS_WITH and REGISTERED_ON relationship types, as MERGE only supports hard-coded relationship types.
So playing with neo4j/cyper I've learned some new stuff and came to another solution for the problem. Based on the given example data, the following can create the nodes and edges dynamically.
WITH "file:///example.json" AS json_file
CALL apoc.load.json(json_file,"$.JsonExport.*" ) YIELD value AS data
CALL apoc.merge.node(['Registration'], {id:data.identification}, {},{}) YIELD node AS vReg
UNWIND data.entities AS ent
CALL apoc.merge.node(['Person'], {id:ent.identity}, {}, {id:ent.identity, surname:ent.surname}) YIELD node AS vPer1
UNWIND ent.entityEntityRelation AS entRel
CALL apoc.merge.node(['Person'],{id:entRel.childIdentification},{id:entRel.childIdentification},{}) YIELD node AS vPer2
CALL apoc.merge.relationship(vPer1, entRel.typeRelation, {},{},vPer2) YIELD rel AS ePer
UNWIND data.locations AS loc
CALL apoc.merge.node(['Location'], {id:loc.identity}, {name:loc.city}) YIELD node AS vLoc
UNWIND ent.entityLocationRelation AS locRel
CALL apoc.merge.relationship(vPer1, locRel.typeRelation, {},{},vLoc) YIELD rel AS eLoc
CALL apoc.merge.relationship(vLoc, "REGISTERED_ON", {},{},vReg) YIELD rel AS eReg1
CALL apoc.merge.relationship(vPer1, "REGISTERED_ON", {},{},vReg) YIELD rel AS eReg2
CALL apoc.merge.relationship(vPer2, "REGISTERED_ON", {},{},vReg) YIELD rel AS eReg3
RETURN vPer1,vPer2, vReg, vLoc, eLoc, eReg1, eReg2, eReg3
I have two database tables with a one-to-many relationship between them and the parent-child relationship within each of the tables. Here reference table (one side) works as a master table and the reference_copies table (many side) works as replicas of the master.
I want to create UI form in AngularJS to provide insert/update functionalities to the user. As shown in UI_image, user can go up to the n number of level as he/she wants. Also attached the image with database tables structure.
In reference_copies table, data can already exist as we are uploading through excels too. Here, name & type_id columns combining together create unique constraints. So while the user tries to add a level, I need to check if the name exists for that type or not. If exists then fetch the object else save and fetch the saved object (with created id). Here value and selected type will be the same for all levels so the user needs to select only once.
On the final submit, the master form's each level will be mapped with corresponding reference copies' each level. i.e. parent name of a master will be mapped with parent names of reference_copies 1, 2... Likewise, level1 of a master will be mapped with level1 of reference_copies 1& 2. and so on. If there is no corresponding level in either of the form, nothing happens, that level will not be mapped with any. Here, there are no restrictions to create similarity in levels. As shown in the example, the master form is having two levels, reference copy 1 form is having only one level and reference copy 2 form is having 3 levels.
On final Submit button, I want to build the json payload as below: Also when I get the response json in below format with IDs, the form should be filled as shown in above for the update.
{
"name": "Reference Name",
"childs": [
{
"name": "child level1 name",
"childs": [
{
"name": "childlevel2 name",
"childs":[],
"referenceCopies": [
{
"id" : 2004
}
]
}
],
"referenceCopies": [
{
"id": 2001
},
{
"id": 2003
}
]
}
],
"referenceCopies": [
{
"id": 2000
},
{
"id": 2002
}
]
}
I tried with recursive template in AngularJS to achieve this but it's not working. Can anyone provide some demo or suggestion to achieve above kind of requirement.
Please let me know if the above description is incomplete or unclear.
Is it possible to tell CB.Lite to reject documents that contain values from a certain key repeated?
For instance, if i have the next document already in CB.Lite:
{
"Dog": {
"Name": "Dug",
"Color": "Blue",
"Age": 2
}
}
Is it possible to tell CB.Lite to reject any document with repeated Key "Name", so that if i try to add the next one:
{
"Dog": {
"Name": "Dug",
"Color": "Green",
"Age": 5
}
}
it would reject it?
I know It would be not much hassle to implement this functionality myself, but i was wondering if CB.Lite has already something Out of the Box.
Currently not at commit time (this is as of 1.4.x). The closest you could where Couchbase would do most of the work would be to create a View emitting the value you don't want repeated, then query and do the enforcement yourself.
This is assuming the docs themselves have different IDs. If you had what you showed using the same document ID, there are other possibilities. For example, you could trap this and reject it in Sync Gateway.
I am working on a simple app for Android. I am having some trouble using the Firebase database since it uses JSON objects and I am used to relational databases.
My data will consists of two users that share a value. In relational databases this would be represented in a table like this:
**uname1** **uname2** shared_value
In which the usernames are the keys. If I wanted the all the values user Bob shares with other users, I could do a simple union statement that would return the rows where:
uname1 == Bob or unname == Bob
However, in JSON databases, there seems to be a tree-like hierarchy in the data, which is complicated since I would not be able to search for users at the top level. I am looking for help in how to do this or how to structure my database for best efficiency if my most common search will be one similar to the one above.
In case this is not enough information, I will elaborate: My database would be structured like this:
{
'username': 'Bob'
{
'username2': 'Alice'
{
'shared_value' = 2
}
}
'username': 'Cece'
{
'username2': 'Bob'
{
'shared_value' = 4
}
}
As you can see from the example, Bob is included in two relationships, but looking into Bobs node doesn't show that information. (The relationship is commutative, so who is "first" cannot be predicted).
The most intuitive way to fix this would be duplicate all data. For example, when we add Bob->Alice->2, also add Alice->Bob->2. In my experience with relational databases, duplication could be a big problem, which is why I haven't done this already. Also, duplication seems like an inefficient fix.
Is there a reason why you don't invert this? How about a collection like:
{ "_id": 2, "usernames":[ "Bob", "Alice"]}
{ "_id": 4, "usernames":[ "Bob", "Cece"]}
If you need all the values for "Bob", then index on "usernames".
EDIT:
If you need the two usernames to be a unique key, then do something like this:
{ "_id": {"uname1":"Bob", "uname2":"Alice"}, "value": 2 }
But this would still permit the creation of:
{ "_id": {"uname1":"Alice", "uname2":"Bob"}, "value": 78 }
(This issue is also present in your as-is relational model, btw. How do you handle it there?)
In general, I think implementing an array by creating multiple columns with names like "attr1", "attr2", "attr3", etc. and then having to search them all for a possible value is an artifact of relational table modeling, which does not support array values. If you are converting to a document-oriented storage, these really should be an embedded list of values, and you should use the document paradigm and model them as such, instead of just reimplementing your table rows as documents.
You can still have old structure:
[
{ username: 'Bob', username2: 'Alice', value: 2 },
{ username: 'Cece', username2: 'Bob', value: 4 },
]
You may want to create indexes on 'username' and 'username2' for performance. And then just do the same union.
To create a tree-like structure, the best way is to create an "ancestors" array that stores all the ancestors of a particular entry. That way you can query for either ancestors or descendants and all documents that are related to a particular value in the tree. Using your example, you would be able to search for all descendants of Bob's, or any of his ancestors (and related documents).
The answer above suggest:
{ "_id": {"uname1":"Bob", "uname2":"Alice"}, "value": 2 }
That is correct. But you don't get to see the relationship between Bob and Cece with this design. My suggestion, which is from Mongo, is to store ancestor keys in an ancestor array.
{ "_id": {"uname1":"Bob", "uname2":"Alice"}, "value": 2 , "ancestors": [{uname: "Cece"}]}
With this design you still get duplicates, which is something that you do not want. I would design it like this:
{"username": "Bob", "ancestors": [{"username": "Cece", "shared_value": 4}]}
{"username": "Alice", "ancestors": [{"username": "Bob", "shared_value": 2}, {"username": "Cece"}]}