Looking for Help Understanding Mongo DB Data Organization - mysql

I am trying to understand the concept of document storage and fail to see how it would apply to some situations. For example, in the case of a CMS/blog engine there may be data in the form of:
Posts
Categories
Users
Comments
In something such as MySQL one might have a table for each, then a join table for each set of related data. i.e. posts_table, categories_table, categories_posts_table
In this case, posts_table would contain the post data, categories_table would contain the categories data and categories_posts_table would contain 2 foreign keys used to associate a specific category to a specific post.
How does this translate into something like mongodb?
The only way i can see this setup being structured in mongo is something like:
posts_collection
The output of a single bson document might look similar to:
{
"title" : "title",
"body" : "blah body",
"categories" : [
"category1",
"category2"
]
}
That makes sense, but it seems like categories are going to be duplicated all over the place. With out some sort of relation, you could never be able to simply change the category name and have it reflected across all of your blog posts (?).
Additionally what if these were like binary documents that took up a lot of space? Instead of duplicating the same image over and over it seems like a relationship would work better?
I guess this is a pretty open question, but i was looking for anyone's input on how i should mentally take apart a problem to tell if it should fit in a db like mongo or not. And equally important is how does one structure data correctly?
I have not touched on Users but it seems like EVERYTHING in this would ultimately end up as an embedded document inside of a User's collection since the User kind of starts everything.
thanks a lot.

What's interesting about document databases is that you really need to think about how your data is going to be used. Storing the same information in multiple places (denormalization) is fine in a document database. So you're correct when you say you could have a root User document with everything else embedded in it.
From my limited experience, there's not a "right" way to model a particular set of data, it's more about how that data is going to be used in the future.
It IS possible to reference another documents. For example if you want a Posts collection and have each Post reference a User document in the Users collection. Take a look at this article about Embed vs. Reference.

Related

Firebase Database: how to compare two values

In my Firebase database, I have a data structure similar to this:
The post ID (1a3b3c4d5e) is generated by the ChildByAutoId() function.
The user ID (fn394nf9u3) is the UID of the user.
In my app, I have a UILabel (author) and I would like to update it with the 'full name' of the user who created the post.
Since I have a reference to the post ID in the users part of the database, I assume there must be some code (if statement?) to check if the value exists and if so, update the label.
Can you help with that?
While it is possible to do the query (ref.child("Users").queryOrdered(byChild: "Posts/1a3b3c4d5e").queryEqual(toValue:true)), you will need to have an index on each specific user's posts to allow this query to run efficiently. This is not a feasible strategy.
As usual when working with NoSQL databases: if you need to do something that your current data model doesn't allow, change your data model to allow the use-case.
In this case that can either be adding the UID of the user to each post, or alternative add the user name to each post (as Andre suggests) and determining if/how you deal with user name changes.
Having such relational data in both directions to allow efficient lookups in both directions is very common in NoSQL database such as Firebase and Firestore. In fact I wrote a separate answer about dealing with many-to-many relations.
If you can change the structure then that is very good because I don't think you are maintaining proper structure for database.
You should take one more key name createdBy inside the Post node so actully structure would be
{description:"Thus the post is here", title:"Hello User", createdBy:"Javed Multani"}
Once you do this, It will dam easy to get detail of user.
OR
Unethical solution,
You can achieve this thing like while you are going to show Post from post node of firabase. Definitely you'll get the auto generated postid like:
1a3b3c4d5e
now first you should first get only posts then inside the successfully getting data and parsing you have to get users and find inside the user by putting the codition like postId == UserPostId if match found take fullname value from there.

Separate get request and database hit for each post to get like status

So I am trying to make a social network on Django. Like any other social network users get the option to like a post, and each of these likes are stored in a model that is different from the model used for posts that show up in the news feed. Now I have tried two choices to get the like status on the go.
1.Least database hits:
Make one sql query and get the like entry for every post id if they exist.Now I use a custom django template tag to see if the like entry for the current post exist in the Queryset by searching an array that contains like statuses of all posts.
This way I use the database to get all values and search for a particular value from the list using python.
2.Separate Database Query for each query:
Here i use the same custom template tag but rather that searching through a Queryset I use the mysql database for most of the heavy lifting.
I use model.objects.get() for each entry.
Which is a more efficient algorithm. Also I was planning on getting another database server, can this change the choice if network latency is only around 0.1 ms.
Is there anyway that I can get these like statuses on the go as boolean values along with all the posts in a single db query.
An example query for the first method can be like
Let post_list be the post QuerySet
models.likes.objects.filter(user=current_user,post__in = post_list)
This is not a direct answer to your question, but I hope it is useful nonetheless.
and each of these likes are stored in a model that is different from the model used for news feed
I think you have a design issue here. It is better if you create a model that describes a post, and then add a field users_that_liked_it as a many-to-many relationship to your user model. Then, you can do something like post.users_that_liked_it and get a query set of all users that liked your page.
In my eyes you should also avoid putting logic in templates as much as possible. They are simply not made for it. Logic belongs into the model class, or, if it is dependent on the page visited, in the view. (As a rule of thumb).
Lastly, if performance is your main worry, you probably shouldn't be using Django anyway. It is just not that fast. What Django gives you is the ability to write clean, concise code. This is much more important for a new project than performance. Ask yourself: How many (personal) projects fail because their performance is bad? And how many fail because the creator gets caught in messy code?
Here is my advice: Favor clarity over performance. Especially in a young project.

Is this an reasonable use case for storing JSON in MySQL?

I understand that it is generally considered a "bad idea" to store JSON in a MySQL column due to the fact that it becomes difficult to maintain and is not easily searched, or otherwise queried. However, I feel that the scenario I have encountered in my application is a reasonable use case for storing JSON data in my MySQL table. I am indeed looking for an answer, particularly one that may point out any difficulties which I may have overlooked, or if there is any good reason to avoid what I have planned and if so, an alternate approach.
The application at hand provides resource and inventory management, and supports building Assemblies, which may contain an infinite number of sub assemblies.
I have a table which holds all of the metadata for items, such as their name, sku, retail price, dimensions, and most importantly to this question: the item type. An item can either be a part or an assembly. For items defined as assemblies, their contents are stored in another table, item_assembly_contents whose structure is rather expected, using a parent_id column to link the children to the parent. As you may expect, at any time, a user may decide to add or remove an item from an assembly, or otherwise modify the assembly contents or delete it entirely.
Here is a visual representation of the above table description, populated with data that when composed, creates an assembly containing another assembly.
With the above structure, any item that is deleted from the items table will also be automatically deleted in the item_assembly_contents table via InnoDB ON DELETE CASCADE.
Here is a very simple example Assembly in JSON format, demonstrating a single Sub Assembly structure.
{
"id":1,
"name":"Fruit Basket",
"type":"assembly",
"contents":[
{
"id":10,
"parent_id":1,
"name":"Apple",
"type":"part",
"quantity":1
},
{
"id":11,
"parent_id":1,
"name":"Orange",
"type":"part",
"quantity":1
},
{
"id":12,
"parent_id":1,
"name":"Bag-o-Grapes",
"type":"assembly",
"quantity":1,
"contents":[
{
"id":100,
"parent_id":12,
"name":"Green Grape",
"quanity":10,
"type":"part"
},
{
"id":101,
"parent_id":12,
"name":"Purple Grape",
"quanity":10,
"type":"part"
}
]
}
]
}
The Fruit Basket is an Assembly, which contains a Sub-Assembly named "Bag o Grapes". This all works wonderfully, until orders and shipments come into consideration.
Take for example, an outbound shipment containing an assembly. At any time, the user must be able to see the contents of the assembly, as they were defined at the time of shipment, which rules out simply retrieving the data from the items and item_assembly_contents table, as these tables may have been modified since the shipment was created. Therefore, assembly contents must be explicitly saved with the shipment so that they may be viewed at a later date, independent of the state or mere existence of the assembly in the user's defined inventory (that being, the items table).
It is storing the assembly contents along side the shipment contents that has me a bit confused, and where it seems to me that storing the data in JSON is a visable solution. It is critical to understand the following points about this data:
It will NOT be used for Searches
Any UPDATES will simply overwrite the contents of the row
It will MOST OFTEN be used to populate a Tree View on the Client, which will accept the JSON as it exists in the table, without any need for modification.
See this image for a (hopefully) more clear visualization of the data:
Questions
Is this a reasonable use case? Are my concerns meaningless?
Is there anything that I have looked over that may come back to bite me?
Can you provide an explanation why I should NOT proceed with my proposed schema, and if so....
Can you provide an alternative approach?
As always, thank you so much for your time and please do not hesitate to request clarification or additional information.
UPDATE
As per #Rowland Shaw's suggestion (below), I've come up with another proposed table structure using a reflexive or "bunny ear" relationship to the order_assembly_contents table. Please see the following image:
I feel this is a lot more elegant than storing the JSON directly, as the data is more relational and database friendly. Retrieving the data and forming it for the client should be easy-peasy as well! Please provide any input on the above structure!
Typically, for an ordering system I'd expect something like
Product -< OrderLine >- Order
In your case, you could add a "bunny ear" relation on your Product to refer to itself. So your outbound_shipment_contents loses name, type to the new product. You can then recursively build up the tree of items to pick as required.
This appears to be a standard bill of materials problem and there are lots of good articles on SQL and bill of materials patterns. I would avoid the JSON storage as it really complicates any reporting and detailed joining functionality that relies on the native SQL. For your application you can construct the JSON for the UI within a data access layer.
IMHO: Keep the data clean, highly accessible and relational in the relational db, repurpose for the application at the data access/low level business layer.

Migrating from MySQL to MongoDB - best practices

So, it may be best to just try it out and see through some trial and error, but I'm trying to figure out the best way to migrate a pretty simple structure from mysql to mongodb. Let's say that I have a main table in mysql called 'articles' and I have two other tables, one called 'categories' and the other 'category_linkage'. The categories all have an ID and a name. The articles all have an ID and other data. The linkage table relates articles to categories, so that you can have unlimited categories related to each article.
From a MongoDB approach, would it make sense to just store the article data and category ID's that belong to that article in the same collection, thus having just 2 data collections (one for the articles and one for the categories)? My thinking is that to add/remove categories from an article, you would just update($pull/$push) on that particular article document, no?
In my opinion, a good model would look like this:
{"article_name": "name",
"category": ["category1_name", "category2_name", ...],
"other_data": "other data value"
}
So, to embed the category names directly to the article document. Updating article categories is easy, but removing a category altogether requires modifying all articles belonging to the category. If removing categories is frequent, then keeping them separate might be a good idea performance-wise.
This approach makes it also easy to make queries on the category name (no need to map name to id with a separate query).
Thus, the "correct" way to model the data depends on the assumed use case, as is typically the case with mongodb and other nosql databases.
If you have access to a Mac computer, you could give the MongoHub GUI a try. It has an "Import from MySQL" feature.

Proper way to store requests in Mysql (or any) database

What is the "proper" (most normalized?) way to store requests in the database? For example, a user submits an article. This article must be reviewed and approved before it is posted to the site.
Which is the more proper way:
A) store it in in the Articles table with an "Approved" field which is either a 0, 1, 2 (denied, approved, pending)
OR
B) Have an ArticleRequests table which has the same fields as Articles, and upon approval, move the row data from ArticleRequests to Articles.
Thanks!
Since every article is going to have an approval status, and each time an article is requested you're very likely going to need to know that status - keep it inline with the table.
Do consider calling the field ApprovalStatus, though. You may want to add a related table to contain each of the statuses unless they aren't going to change very often (or ever).
EDIT: Reasons to keep fields in related tables are:
If the related field is not always applicable, or may frequently be null.
If the related field is only needed in rare scenarios and is better described by using a foreign key into a related table of associated attributes.
In your case those above reasons don't apply.
Definitely do 'A'.
If you do B, you'll be creating a new table with the same fields as the other one and that means you're doing something wrong. You're repeating yourself.
I think it's better to store data in main table with specific status. Because it's not necessary to move data between tables if this one is approved and the article will appear on site at the same time. If you don't want to store disapproved articles you should create cron script with will remove unnecessary data or move them to archive table. In this case you will have less loading of your db because you can adjust proper time for removing old articles for example at night.
Regarding problem using approval status in each query: If you are planning to have very popular site with high-load for searching or making list of article you will use standalone server like sphinx or solr(mysql is not good solution for this purposes) and you will put data to these ones with status='Approved'. Using delta indexing helps you to keep your data up-to-date.