algorithm for analyzing description before saving into db: mongodb - json

The idea of application is to show all "E-number" ingredients within a product (E100, E200 etc.)
Imagine we have a list of products coming into our database (JSONs, scrapped or received via APIs). Products contain description - it describes ingredients within a product.
Sometimes those ingredients already come with numbers (like E100), but sometimes there are names of ingredients (Octyl gallate), sometimes both.
We are going to store all these data in mongodb (collection prodcuts).
The question - now application queries given product and it has to show all E-numbers product contains. How would you solve the problem that descriptions has different forms (sometimes with direct E-numbers, sometimes with E-descriptions, sometimes with both etc.). Moreover sometimes in products' descriptions some E-descriptions are written incorreclty (with missing letters).
I do not thing that it would be good to do this on the fly, it would be better if all data would already be stored in DB (but not sure). So myr general solution could be like this:
do preprocessing of description field when receiving products data and before saving product into DB (this could be done in any programming language - node.js for instance)
during preprocessing we need to analyse descriptions field (thus searching within existing e-collection: e-id, e-name, e-category, array of e-different-names; for instance, if description contains E100, greens, Octyl gallate then during preprocessing we would get array "E100, E140, E311".
then we would create "e-list" for products collection in json
save product in db
Does this seems logical? Never worked with mongodb.

Yes, it would make sense to process it while inserting and prepare data for quick queries. These ingredients could be normalized to separate collection and then add ingredient id-s to product.

Related

Cache, Database, Over 400k Listing

In my MySQL database I have a table of products which contains almost 625k rows. The table has 162 columns.
Now there is a search box on my home page where you can search for anything and, if your search term is matched from any of my product titles, it give you a list of 15 products. This is similar to Amazon and other e-commerce websites.
What I did so far was to create a JSON file with all the product ID's and title names. When user inputs a minimum of 3 chars into the search field, an AJAX request is made and gets the list. But my issue is that the JSON file is almost 12MB in size, and the ajax calls it whenever user write's a char or removes a char. It was working fine until I was on local Machine and now as soon as I made it live it doesn't work for users, having lower then 5 MBPS internet connection. So I am looking for some advice, how do I create it fast as Amazon. I mean the search with auto suggestion from 625K products.
I am really sorry, but there is nothing more to give as an advice here then "go do some reading on database design and schema normalization".
If you have 162 columns in a table you will never be able to do an efficient search. The database (especially MySQL) will not hold the table in memory and indexes will not help either. Yes, you can throw it all into an ElasticSearch instance and it will fix some of your problems. But, honestly, this solution does not clean up the mess you have.
You should have a table with relevant information (titles, names, etc.) in one column (or also a numeric column for prices, etc). This metadata should reference the main table, the column should be fulltext-indexed. This way you ask for matches, filter results and JOIN relevant lines from the main table. This will work quickly with very little resources used.

Database design for ecommerce site with many product categories

I've a requirement to design a database for an ecommerce app that has vast scope of product categories ranging from pin to plane. All products have different kinds of features. For example, a mobile phone has specific features like memory, camera mega pixel, screen size etc whilst a house has land size, number of storeys and rooms, garage size etc. Such specific features go on and on as much as we've products. Whist all have some common features, there are mostly very different and specific features of all. So, it has gotten bit confusing while designing its database. I'm doing it for the first time.
My query is about database design. Here is what I'm planning to do:
Create a master table with all fields, that tells if a field is common or specific and map them with respective category of the product. All products will have "common" fields but "specific" will be shown only for one category.
table: ALL_COLUMNS
columns:
id,
name,
type(common or specific),
category(phone, car, laptop etc.)
Fetch respective fields from all_columns table while showing the fields on the front.
Store the user data in another table along with mapped fields
table: ALL_USER_DATA
columns:
id,
columnid,
value
I don't know what is the right way and how it is done with established apps and site. So, I'm looking forward if someone could tell if this is the right way of database architecture of an ecommerce app with highly comprehensive and sparse set of categories and features.
Thank you all.
There are many possible answers to this question - see the "related" questions alongside this one.
The design for your ALL_USER_DATA table is commonly known as "entity/attribute/value" (EAV). It's widely considered horrible (search SO for why) - it's theoretically flexible, but imagine finding "airplanes made by Boeing with a wingspan of at least 20 metres suitable for pilots with a new qualification" - your queries become almost unintelligible really fast.
The alternative is to create a schema that can store polymorphic data types - again, look on Stack Overflow for how that might work.
The simple answer is that the relational model is not a good fit for this - you don't want to make a schema change for each new product type your store uses, and you don't want to have hundreds of different tables/columns.
My recommendation is to store the core, common information, and all the relationships in SQL, and to store the extended information as XML or JSON. MySQL is pretty good at querying JSON, and it's a native data type.
Your data model would be something like:
Categories
---------
category_id
parent_category_id
name
Products
--------
product_id
price
valid_for_sale
added_date
extended_properties (JSON/XML)
Category_products
-----------------
category_id
product_id

Relationship database design - object specific many to many, do I solve with self join table or new table

Being new to relational database design, I am trying to clarify one piece of information to properly design this database. Although I am using Filemaker as the platform, I believe this is a universal question.
Using the logic of ideally having all one to many relationships, and using separate tables or join tables to solve these.
I have a database with multiple products, made by multiple brands, in multiple product categories. I also want this to be as scale-able as possible when it comes to reporting, being able to slice and dice the data in as many ways as possible since the needs of the users are constantly changing.
So when I ask the question "Does each Brand have multiple products" I get a yes, and "Does each product have multiple brands" the answer is no. So this is a one to many relationship, but it also seems that a self-join table might give me everything that I need.
This methodology also seems to go down a rabbit hole for other "product related" information such as product category, each product is tied to one product category, but only one product category is related to a product.
So I see 2 possibilities, make three tables and join them with primary and foreign keys, one for Brand, one for Product Category, and one for Products.
Or the second possibility is to create one table that has the brand and product category and product info all in one table (since they are all product related) and simply do self-joins and other query based tables to give me the future reporting requirements that will be changing over time.
I am looking for input from experiences that might point me in the right direction.
Thanks in advance!
Could you ever want to store additional information about a brand (company URL, phone number, etc.) or about a product category (description, etc.)?
If the answer is yes, you definitely want to use three tables. If you don't, you'll be repeating all that information for every single item that belongs to the same brand or same category.
If the answer is no, there is still an advantage to using three tables - it will prevent typos or other spelling inconsistencies from getting into your database. For example, it would prevent you from writing a brand as "Coca Cola" for some items and as "Coca-Cola" for other items. These inconsistencies get harder and harder to find and correct as your database grows. By having each brand only listed once in it's own table, it will always be written the same way.
The disadvantage of multiple tables is the SQL for your queries is more complicated. There's definitely a tradeoff, but when in doubt, normalize into multiple tables. You'll learn when it's better to de-normalize with more experience.
I am not sure where do you see a room for a self-join here. It seems to me you are saying: I have a table of products; each product has one brand and one (?) category. If that's the case then you need either three tables:
Brands -< Products >- Categories
or - in Filemaker only - you can replace either or both the Brands and the Categories tables with a value list (assuming you won't be renaming brands/categories and at the expense of some reporting capabilities). So really it depends on what type of information you want to get out in the end.
If you truly want your solution to be scalable you need to parse and partition your data now. Otherwise you will be faced with the re-structuring of the solution down the road when the solution grows in size. You will also be faced with parsing and relocating the data to new tables. Since you've also included the SQL and MySQL tags if you plan on connecting Filemaker to an external data source then you will definitely need to up your game structurally.
Building everything in one table is essentially using Filemaker to do Excel work and it won't cut it if you are connecting to SQL, MySQL, etc.
Self join tables are a great tool. However, they should really only be used for calculating small data points and should not be used as pivot points or foundations for your reporting features. It can grow out of control as time goes on and you need to keep your backend clean.
Use summary and sub-summary reporting features to slice product based data.
For retail and general product management solutions, whether it's Filemaker/SQL/or whatever the "Brand" or "Vendor" is it's own table. Then you would have a "Products" table (the match key being the "Brand ID").
The "Product Category" field should be a field in the "Products" table. You can manage the category values by building a standard value list or building a value list based on a "Product Category" table. The second scenario is better for long term administration.

Most Efficient Method of Storing a List in MySQL

I'm relatively new to databases and MySQL, but I'm using it to connect a database to a program I've made in VB.NET. Along with many programming languages, I understand SQL, but I have very little experience with databases. Also, I'm using MySQL Workbench (if it helps to know).
I am creating a program which retrieves information from the database. This program in particular is a guide for cooking.
The Database
My database consists of one table named "recipes". Within the table are four columns, each named (in order): ID, Recipe Name, Origin, Ingredients.
My only problem is I plan on storing around 80 or so recipes within the database; however, this will not be a difficult task because I'm simply copy-and-pasting from a Wikia page.
The Problem
The Wikia page in which I'm copying my ingredients from are in a numerical list, therefor I cannot simply copy nearly ten steps, and past them into my ingredients column because it will not let you (typing it would take ages as well). This an issue because I need to retrieve all the ingredients in a list, and I thought it would be inefficient to create over ten different columns.
Conclusion
Is there a more efficient way to store a list of items rather than creating multiple columns? How can I combat this issue?
Have multiple tables. Have a table of recipies with acolumns ID, Recipe Name and Origin, and aother table of ingredients which contains ID, Recipe ID and ingredient (ie, one row per recipe per ingrediant).
You initial ideas (ie, either all ingredients in one column, or many columns, one for each ingredient) would be inefficient and also difficult to interrogate. For example finding which recipes contained a particular ingredient would be difficult.

Migrating from MySQL to MongoDB - best practices

So, it may be best to just try it out and see through some trial and error, but I'm trying to figure out the best way to migrate a pretty simple structure from mysql to mongodb. Let's say that I have a main table in mysql called 'articles' and I have two other tables, one called 'categories' and the other 'category_linkage'. The categories all have an ID and a name. The articles all have an ID and other data. The linkage table relates articles to categories, so that you can have unlimited categories related to each article.
From a MongoDB approach, would it make sense to just store the article data and category ID's that belong to that article in the same collection, thus having just 2 data collections (one for the articles and one for the categories)? My thinking is that to add/remove categories from an article, you would just update($pull/$push) on that particular article document, no?
In my opinion, a good model would look like this:
{"article_name": "name",
"category": ["category1_name", "category2_name", ...],
"other_data": "other data value"
}
So, to embed the category names directly to the article document. Updating article categories is easy, but removing a category altogether requires modifying all articles belonging to the category. If removing categories is frequent, then keeping them separate might be a good idea performance-wise.
This approach makes it also easy to make queries on the category name (no need to map name to id with a separate query).
Thus, the "correct" way to model the data depends on the assumed use case, as is typically the case with mongodb and other nosql databases.
If you have access to a Mac computer, you could give the MongoHub GUI a try. It has an "Import from MySQL" feature.