denormalizing JSON for mongoDB - json

I think that's the word I'm looking for. I'm trying to get parent info into each of the cards. I think that's what I need to do, but chime in if you have any other ideas.
{
"LEA": {
"name": "Limited Edition Alpha",
"code": "LEA",
"releaseDate": "1993-08-05",
"border": "black",
"type": "core",
"cards": [
{"name": "Air Elemental"},
{"name": "Earth Elemental"},
{"name": "Fire Elemental"},
{"name": "Water Elemental"}
]
},
"LEB": {
"name": "Limited Edition Beta",
"code": "LEB",
"releaseDate": "1993-10-01",
"border": "black",
"type": "core",
"cards": [
{"name": "Armageddon"},
{"name": "Fireball"},
{"name": "Swords to Plowshares"},
{"name": "Wrath of God"}
]
}
}
This is a tiny subset of the data, obviously. LEA and LEB are sets of cards, and inside each set there are a bunch of cards. I'm thinking of denormalizing this into just the cards, with the set info added to each card. Something like this...
{
{
"name": "Air Elemental",
"set": {
"name": "Limited Edition Alpha",
"code": "LEA",
"releaseDate": "1993-08-05",
"border": "black",
"type": "core"
}
},
{
"name": "Earth Elemental",
"set": {
"name": "Limited Edition Alpha",
"code": "LEA",
"releaseDate": "1993-08-05",
"border": "black",
"type": "core"
}
},
{
"name": "Armageddon",
"set": {
"name": "Limited Edition Beta",
"code": "LEB",
"releaseDate": "1993-10-01",
"border": "black",
"type": "core"
}
},
{
"name": "Fireball",
"set": {
"name": "Limited Edition Beta",
"code": "LEB",
"releaseDate": "1993-10-01",
"border": "black",
"type": "core"
}
}
}
Is my thinking right, first and foremost? Would I want a giant collection of cards and have the set information flattened into each card? In SQL, I'd do a table for the sets, and and the cards would belong_to a set. I'm trying to wrap my head around 'document thinking'.
Second, if my thinking is correct, any ideas on how I could achieve this denormalizing?

Here you go =).
OK here is where I would start. Since we've said that cards will never change (since they're based on physical MTG cards), create one collection with all of your cards in it, this will be used for easily populating a user's deck later on. You can search on it by card name or some sort of card ID (like a physical one, stored on the card).
For the user's array of card objects, you shouldn't just store the _id field for a card, because that forces you to join. Since cards will never change, completely denormalize them and just shove them in that card array, so a user object, so far, resembles:
{
name: "Tom Hanks",
skill_level: 0,
decks: [
[
{
card_name: "Balance",
card_description: "LONG_BLOCK_OF_DESCRIP_TEXT",
card_creator: "Sugargirl14",
type: "Normal",
_id: $SOME_MONGO_ID_HERE,
... rest of card data...
}, {
...card 2 complete data...
}
],
[
{ ...another deck here... }
]
]
}
OK, back to set info, I will also assume set info is a constant (based on your SO post, I can't see how it would physically change). So, if that set info is always relevant to the card, I would denormalize and include it, changing our card object to:
{
card_name: "Balance",
card_description: "LONG_BLOCK_OF_DESCRIP_TEXT",
card_creator: "Sugargirl14",
type: "Normal",
_id: $SOME_MONGO_ID_HERE
set: {
"name": "Limited Edition Alpha",
"code": "LEA",
"releaseDate": "1993-08-05",
"border": "black",
"type": "core",
"_id": $SOME_MONGO_ID_HERE
},
... rest of card data...
}
I imagine that storing the other cards in the denormalized object for a given card isn't relevant, if it is, add them. If you'll note, the key that is given in your SO example is dropped, since it seems to always == the "code" field.
OK, now to properly answer your SO question about whether you should embed sets in cards, or vice versa. First off, both collections are relevant. So, even if we embed sets into cards, you'll want those sets in a collection so they can be fetched later and inserted into new cards.
Which gets embedded in which is really determined by business logic, how the data is used and which gets pulled more often. Are you frequently displaying sets and pulling cards from them (like for users to search)? You could embed all of the card data, or any relevant data, in each set's cards array. But with the above data model, each card stores its set ID in its set object. I assume cards belong to only one set, so to get all cards for a set you can query over your card collection where set.id == the Mongo ID of the set you want. Now sets need minimal updates, due to business logic, (hopefully none at all) and your queries are still fast (and you get complete card objects). I'd, honestly, do that latter one and keep my sets clean of cards. As such, a card owns the set it belongs to as opposed to a set owning a card. That's a more SQLy way to think that actually can work fine in Mongo (you'll never join).
So our final data model resembles:
Collection 1, Set:
//data model
{
"name": "Limited Edition Alpha",
"code": "LEA",
"releaseDate": "1993-08-05",
"border": "black",
"type": "core",
"_id": $SOME_MONGO_ID_HERE
}
Collection 2, cards:
//data model
{
_id: $SOME_MONGO_ID_HERE
card_name: "Balance",
card_description: "LONG_BLOCK_OF_DESCRIP_TEXT",
card_creator: "Sugargirl14",
type: "Normal",
set: {
"name": "Limited Edition Alpha",
"code": "LEA",
"releaseDate": "1993-08-05",
"border": "black",
"type": "core",
"_id": $SOME_MONGO_ID_HERE
... rest of card data...
},
}
Collection 3, users:
{
_id: $SOME_MONGO_ID_HERE,
name: "Tom Hanks",
skill_level: 0,
decks: [
[
{
card_name: "Balance",
card_description: "LONG_BLOCK_OF_DESCRIP_TEXT",
card_creator: "Sugargirl14",
type: "Normal",
_id: $SOME_MONGO_ID_HERE,
set: {
"name": "Limited Edition Alpha",
"code": "LEA",
"releaseDate": "1993-08-05",
"border": "black",
"type": "core",
"_id": $SOME_MONGO_ID_HERE
},
}, {
...card 2 complete data...
}
],
[
{ ...another deck here... }
]
]
}
This, obviously, assumes set data for each card is relevant to the user. Now your data is denormalized, sets and cards rarely need updates (according to business logic), so you'll never need cascading updates or deletes. Manipulating users is easy. When you remove a card from a user's deck you can do a $pull from Mongo (I think that's what it's called) on the relevant decks array where a contained item's _id field == the Mongo ID of the card you want to remove. All other updates are easier.
In retrospect, you might want to make the user's decks like so:
decks: {
"SOME_ID_HERE": [
{ ...card 1... },
{ ...card 2... }
]
}
This makes identifying the decks MUCH easier and will make your pulls easier (you'll have more data on the frontend and the pull query will be more precise). It can be a number, random string, anything really, since it gets passed back to the frontend. Or just use their Mongo ID, when looking at a deck, a user will have it's Mongo ID. Then when they pull a card out of it, or add one in, you have a direct identifier to easily grab the deck needed.
Obviously all values with text like: $MONGO_ID_HERE should really be MongoId() objects.
Whew, that was intense, 6800 characters. Hope it makes sense to you and I apologize if any verbiage is confusing or if any of my JSON objects' formatting is fucked up (just let me know if any prose is confusing, I'll reword). Does this make sense/solve your problem?

Related

Is there a way to delete user.organizations[1] in Google Admin Console?

Making a script to create uniform gmail signatures and I noticed only a few users, out of the hundreds, have both user.organizations[0] and user.organizations[1]. This of course is bothering me terribly. I see on these users that the 1 is the primary and visible in GAC while the rest of the users that is the 0 that is visible. Is there a way to delete the extra organizations inside a user?
Yes, this is possible and it is very simple.
We will need to use 2 methods:
Users.get (https://developers.google.com/admin-sdk/directory/reference/rest/v1/users/get) to check every user to confirm they have the correct value in the organizations property.
Users.update (https://developers.google.com/admin-sdk/directory/reference/rest/v1/users/update) to replace the existing value on that property.
Steps:
Feel free to use the API Explorer available in the documentation above to test these API calls.
Use the Users.get method to obtain the data of the user you are interested in updating. You can make the result shorter by specifying a value in the fields parameter like so:
"fields": "primaryEmail,organizations"
This will return the data of the user including their primary email and a list of the organization(s) the user has.
{
"primaryEmail": "user#domain.com",
"organizations": [
{
"title": "Accountant",
"primary": true,
"customType": "",
"department": "Accounting",
"description": "Full Time accountant",
"costCenter": "CompanyTotalPro"
},
{
"title": "Accountant",
"primary": false,
"customType": "",
"department": "Accounting",
"description": "Part Time accountant",
"costCenter": "SecondaryCompany"
}
]
}
This response will contain the data of one or more organizations, you will only need to copy the value you want to maintain and make an API call using the Users.update method. Like so:
Don't worry about the code. What you need to see is that we are essentially obtaining the list of organizations and removing the one we don't want, and using the new set of information to overwrite the old one. Use this for reference.
gapi.client.directory.users.update({
"userKey": "user#domain.com",
"resource": {
"organizations": [
{
"title": "Accountant",
"primary": true,
"customType": "",
"department": "Accounting",
"description": "Full Time accountant",
"costCenter": "CompanyTotalPro"
}
]
}
})
Use null to clear the value.
"organizations": null

Fiware Upload Image

I want to know how to use NSGI-LD to upload an image even though these static files are not stored in Orion Context Broker or Mongo. I want to know if there is a way to configure the NSGI-LD to forward the images to AWS S3 Buck or another location?
As you correctly identified, binary files are not a good candidate for context data, and should not be held directly within a context broker. The usual paradigm would be as follows:
Imagine you have a number plate reader library linked to Kurento and wish to store the images of vehicles as they pass. In this case the event from the media stream should cause two separate actions:
Upload the raw image to a storage server
Upsert the context data to the context broker including an attribute holding the URI of the stored image.
Doing things this way means you can confirm that the image is safely stored, and then send the following:
{
"vehicle_registration_number": {
"type": "Property",
"value": "X123RPD"
},
"image_download": {
"type": "Property",
"value": "http://example.com/url/to/image"
}
}
The alternative would be to simply include some link back to the source file somehow as metadata:
{
"vehicle_registration_number": {
"type": "Property",
"value": "X123RPD",
"origin": {
"type": "Property",
"value": "file://localimage"
}
}
}
Then if you have a registration on vehicle_registration_number which somehow links back to the server with the original file, it could upload the image after the context broker has been updated (and then do another upsert)
Option one is simpler. Option two would make more sense if the registration is narrower. For example, only upload images of VRNs for cars whose speed attribute is greater than 70 km/h.
Ontologically you could say that Device has a relationship to a Photograph which would mean that Device could have an additional latestRecord attribute:
{
"latestRecord": {
"type": "Relationship",
"object": "urn:ngsi-ld:CatalogueRecordDCAT-AP:0001"
},
}
And and create a separate entity holding the details of the Photograph itself using a standard data model such as CatalogueRecordDCAT-AP which is defined here. Attributes such as source and sourceMetadata help define the location of the raw file.
{
"id": "urn:ngsi-ld:CatalogueRecordDCAT-AP:0001",
"type": "CatalogueRecordDCAT-AP",
"dateCreated": "2020-11-02T21:25:54Z",
"dateModified": "2021-07-02T18:37:55Z",
"description": "Speeding Ticket",
"dataProvider": "European open data portal",
"location": {
"type": "Point",
"coordinates": [
36.633152,
-85.183315
]
},
"address": {
"streetAddress": "2, rue Mercier",
"addressLocality": "Luxembourg",
"addressRegion": "Luxembourg",
"addressCountry": "Luxembourg",
"postalCode": "2985",
"postOfficeBoxNumber": ""
},
"areaServed": "European Union and beyond",
"primaryTopic": "Public administration",
"modificationDate": "2021-07-02T18:37:55Z",
"applicationProfile": "DCAT Application profile for data portals in Europe",
"changeType": "First version",
"source": "http://example.com/url/to/image"
"sourceMetadata": {"type" :"jpeg", "height" : 100, "width": 100},
"#context": [
"https://smartdatamodels.org/context.jsonld"
]
}

JSON Database doesn't work correctly as per REST technique

I created a JSON server and this is the data that I'm using. However, when I'm trying to query the examlist and relate it to the students (i'd like to receive the students based on their ID (the picture below shows the REST query - I'm using ?_expand=student )) it won't display them. The code shows correct as per JSON validators, but my goal is to have them working.
The way my data is organized (the examlist table) won't display the students, because apparently, it cannot read their IDs. This database will be used for HTTP requests, hence I need it fully working.
I'll upload another image so that you can visualize my code.
Momentarily instead of my studentIDs, it's showing some random 0,1 numbers and the student IDs are being pushed down along the arbitrary tree.
(Just the examlist "table")
It's M:M relationship (relational database) and how I want it structured is:
Table "students" that contains information about the students;
I have "table" exams that contains information about the exams;
And then I have another "table" examlist which contains information about the EXAM (ExamID) and the students enrolled in it (basically relates the two abovementioned tables)
When I try querying the students through the "examlist" table, it won't work. However, the other "table" -- exam, does work.
My assumption is the way I have organized the students in the examlist "table" is not good, however, given my very little experience I cannot seem to see where the issue is.
I hope I cleared it out for the most of you! Sorry for any inconvenience caused.
{
"students": [
{
"id": 3021,
"nume": "Ionut",
"prenume": "Grigorescu",
"an": 3,
"departament": "IE"
},
{
"id": 3061,
"nume": "Nadina",
"prenume": "Pop",
"an":3,
"departament": "MG"
},
{
"id": 3051,
"nume": "Ionut",
"prenume": "Casca",
"an": 3,
"departament": "IE"
}
],
"exams": [
{
"id": 1,
"subiect": "Web Semantic",
"profesor": {
"Nume": "Robert Buchman"
}
},
{
"id": 2,
"subiect": "Programare Web",
"profesor": {
"Nume": "Mario Cretu"
}
},
{
"id": 3,
"subiect": "Medii de Programare",
"profesor": {
"Nume": "Valentin Stinga"
}
}
],
"listaexamene": [
{
"examId":1,
"Data Examen":"02/06/2022 12:00",
"studentId":
[
{
"id":3021
},
{
"id":3051
}
]
},
{
"examId":2,
"Data Examen":"27/05/2022 10:00",
"studentId":
[
{
"id":3021
},
{
"id":3051
}
]
},
{
"examId":1,
"Data Examen":"04/06/2022 10:00",
"studentId":
[
{
"id":3021
},
{
"id":3051
},
{
"id":3061
}
]
}
]
}
I had to repost with more information after my first one got closed down
I think I finally got the answer. The problem lays in the JSON server. Apparently, it cannot obtain information from further down the arbitrary tree, only the first layer.
Thank you all for your input on the previous post!

replace "key" name in whole JSON python for bulk data in efficient way

Actually i am pushing data to other system but before pushing i have to change the "key" in the whole JSON. JSON may contain 200 or 10000 or 250000 data.
sample JSON:
{
"insert": "table",
"contacts": [
{
"testName": "testname",
"ContactID": 212121
},
{
"testName": "testname",
"ContactID": 2146354564
},
{
"testName": "testname",
"ContactID": 12312
},
{
"testName": "testname",
"ContactID": 211221
},
{
"testName": "testname",
"ContactID": 10218550
}
]
}
I need to change contacts array Keys. These contacts may be in bulk. So i need to work with this efficiently with minimal complexity.
The above JSON to be converted as below
{
"insert": "table",
"contacts": [
{
"name": "testname",
"phone": 212121
},
{
"name": "testname",
"phone": 2146354564
},
{
"name": "testname",
"phone": 12312
},
{
"name": "testname",
"phone": 211221
},
{
"name": "testname",
"phone": 10218550
}
]
}
here is my code trying by loop
ini_dict = request.data
contact_data = ini_dict['contacts']
for i in contact_data:
i['name'] = i.pop('testName')
print(contact_data)
Please suggest me how can i change the key names efficiently for bulk data. i mean for 50000 lists in contacts. "for loop" will be leading a performance issue. So please let me know the efficient way to achieve this
I dont know how fast you need it to be nor how you are choosing to store your json. One simple solution is just store it as a string and then replace all the instances of your attributes.
# Something like this using a jsonstring
jsonstring.replace("'testName':", "'name':")
jsonstring.replace("'ContactId':", "'phone':")
If you want to do this in bulk you, may need to create some batch process to be able to fetch multiple existing records and make changes at once. I have done this before with the java equivalent of https://pypi.org/project/JayDeBeApi/ but, that was more for modifying existing records in a database.

Handling Incredibly large JSON Document in CouchDB

I'm new to NoSql databases and I'm having a hard time figuring how to handle a very large JSON Document that could amount to over 20MB on my local drive. This structure will definitely increase over time and I worry about the speed of queries and having to search deep though the returned JSON object nest just to get a string out. My JSON is deeply nested like so for example.
{
"exams": {
"exam1": {
"year": {
"math": {
"questions": [
{
"question_text": "first question",
"options": [
"option1",
"option2",
"option3",
"option4",
"option5"
],
"answer": 1,
"explaination": "explain the answer"
},
{
"question_text": "second question",
"options": [
"option1",
"option2",
"option3",
"option4",
"option5"
],
"answer": 1,
"explaination": "explain the answer"
},
{
"question_text": "third question",
"options": [
"option1",
"option2",
"option3",
"option4",
"option5"
],
"answer": 1,
"explaination": "explain the answer"
}
]
},
"english": {same structure as above}
},
"1961": {}
},
"exam2": {},
"exam3": {},
"exam4": {}
}
}
In the main application, question objects are created and appended based on type of exam, year, and subject making the JSON document huge over time. How can I re-model this so as to avoid slow queries in the future?
Dominic is right. You need to start dividing the documents and storing them as separate documents.
The next question is how to recompose the document after it's been split.
Considering you're using Couch, I would recommend doing this at the application layer. A good starting point would be to create exam documents and store them in their own database. Then have a document (exams) in another database that has pointers to the exam documents.
You can retrieve the exams document and get exams one by one as needed. This could be especially useful with paging since most people will only want to see the most recent exams.