MySQL - Break into more tables? - mysql

I'm creating an order system to to keep track of orders. There's about 60 products or so, each with their own prices. The system isn't very complicated though, just need to be able to submit how many of each product the person orders.
My question is, is it more efficient to have an 'orders' table, with columns representing each product, and numeric values representing how many of each they ordered.., example:
orders
id
product_a
product_b
product_c
etc...
OR
should I break it into different tables, with a many-to-many table to join them. something like this maybe:
customers
id
name
email
orders
id
customer_id
products
id
product
orders_products
order_id
product_id

I would break it out apart like you show in your second sample. This will make your application much more scalable and will still be quite efficient.

Always build for future features and expansion in mind. A shortcut here or there always seems to bite you later when you have to re-architect and refactor the whole thing. Look up normalization and why you want to separate every independent element in a relational DB.
I am often asked “why make it a separate table, when this way is simpler?” Then remind them that their “oh, there are no other of this type of thing we will use” then later having them ask for a feature that necessitates many-to-many, not realizing they painted you into a corner by not considering future features. People who do not understand data structures tend not to be able to realize this and are pretty bad at specifying system requirements. This usually happens when the DB starts getting big and they realize they want to be able to look at only subset of data. A flat DB means adding columns to handle a ton of different desires, while a many-to-many join table can do it with a few lines of code.

I'd also use the second way. If the db is simple as you say, the difference might not be much in terms of speed and such. But the second way is more efficient and easy to reuse/enhance in case you get new ideas and add to your application.

Should you go for the 1st case, how will you keep track of the prices and discounts you gave to your customers for each product? And even if you have no plans to track it now, this is quite common thing, so might have request for such change.
With normalized schema all you have to do is add a couple of fields.

Related

MySQL - Partitioning vs multiple table suggestion for a use case

We are having around 30,000 customers and each customer is having multiple products. We are currently storing all the products in a single table partitioned by KEY(customerid). I would like to get your suggestions if separate tables for each customer would be more beneficial over the partitioning OR we continue to use partitioning with current (HASH) or different type.
Number of products per customers varies, a few customers having > 1M products while some customers having as small as a few hundred products. This may result in not so perfect partitions.
If a customer account is to be deleted, so will be all products of that customer. In case of separate tables, this would be quite useful.
All customers are disjointed. So there is no query to access cross-customer products.
Number of customers are quite large (around 30k), I am not sure if that's a good idea to have so many tables.
Is any other partitioning scheme is better than what we currently using.
Thank you for your inputs.
Generally I would go with the single table solution that you already have, it's the simple, straight-forward way to go.
You don't mention your motivation for wanting to change your setup.
How many entries do you have in your products table?
Are you experiencing performance issues with your current setup? If not I might be inclined to call this a case of "premature optimization".
If you ARE experiencing performance issues I would start by analyzing those first (profiling) to determine whether they are caused by your single products table design being a bottleneck.
Practical advice I can offer: Make sure you are using InnoDB storage engine and not MyISAM since that will allow for row level locks.
The downside to your proposal of having one table for each customer is maintenance and complexity. If you ever want to change your schema of the product tables it will be a lot more complicated and error prone task than before. You might have to make a script to batch the changes of all those tables, and what if the script crashes halfway? Then half of you customers have a changed table schema and the other half doesn't. As I mentioned if you do not currently have a performance problem you would be adding this complexity and maintenance without gaining anything.
You state that "All customers are disjointed. So there is no query to access cross-customer products." however it might not stay that way forever. Imagine in 2 months you need to extract a list of all customers who own specific product of type x, that would be a simple SQL query in your current setup, in the multi-table setup you would have to make a script or small program that could iterate over all customers and for each customer make a product query. So what was 1 query before is now 30.000 queries.
What you propose is a simple form of sharding. If you decide to go that way you may want to look into sharding since there are other ways to approach than the somewhat aggressive approach of giving every customer a dedicated table. E.g. use a hash of each customer id as sharding key, so every customer is either part of group A or group B. Products owned by A-customers are in ProductTableA, products owned by B-customers are in ProductTableB. (in a real implementation you may want to hash to a value between 0-255 and then keep a reference list saying that 0-127 are table-A, 128-255 are table-B, that way if you ever decide to scale up and add one more table, you don't have to recalculate all your hashes you just update your reference list).

Same table or different table?

My application tracks purchases and sales of inventory. I can't decide if I should use separate tables with auto-increment or the same table with a distinguishing type field with a manual auto-increment id. The tables would store close to identical data. I'm worried that combining the two tables would make it harder to visualize inventory movement in the future. I understand that this is purely for human comfort but I'm not sure if there are other performance related issues as well. I would like to hear opinions from both ends.
Suppose I decide to combine my sales invoice and purchase order tables into the same table, there is a single difference in the columns required - purchase orders store the tax paid while sales invoices use a bool on whether the order is taxed. I have two options:
Use two fields - one bool and one decimal
Use the same field and type cast the values on the application level
Does anyone know if the second would cause more problems?
Thanks
If you're using a single-table design you probably need two columns, one for each purpose, where some records may use column A, and some column B, but the rest are shared.
The real question is if a purchase order and an invoice are really the same thing or not when speaking in terms of data and relationships. Normally a purchase order is related to an invoice, but an invoice may not have an associated purchase order. In some systems you will have a complex arrangement between multiple purchase orders and multiple invoices, it depends on the nature of what's being sold and how it's packaged.
If you're dealing with fairly granular things, like large, expensive items that can be tracked indidually, then your system can get into a lot of detail. If it's tracking inexpensive items sold in large quantities it gets pretty hard to manage that. Things get even more complicated if you're dealing with things that are made-to-order.
I'd do a lot more research about the types of situations you're likely to encounter, order those by probability, and then test your schema against the ugliest cases you're likely to encounter.

Should I have client and contractor billable items in the same table or separate?

I would like to store client and contractor billable items into my DB. It does not appear that I would be doing any queries to search both client and contractor billable items in a single query, or used in an interchangeable way where inheritance would help. However, because they would share many columns, it makes me feel like I should use single table inheritance and have them both in the same table. I'm pretty sure I should just keep them separate, but the fact that the objects are so similar makes me unsure, especially if things change in the future and they are looked at in an interchangeable way.
Presumably Clients and Contractors are held in separate tables? Assuming that that is so, then I'd go with separate tables. If at some time you need to query both at once, then given their similarity it should be simple enough to create a view over both tables, but remember YAGNI!
Cheers -

Two different ways to approach storing multiple pieces of data in a database. Which should I choose?

I want to store product images.
Each product will have more than one image
Approach 1:
table products
table productImages
with contents:
productId productPhoto
1 - 28374628
1 - 12731283
2 - 23498723
3 - 23849723
Approach 2:
table products
in it, there is a table productPhotos
and in it there is one long string that contains all the product images ids, separated by a comma. if I want to add one more image, I have to concatenate it.
I have been debating this with some friends and opinions are devided.
I consider the first approach much better because it is more object-oriented and I think normalisation is at a better level this way.
Opponents of my approach say that it will not perform as well as the string/separators approach when there is more activity in my application.
Never, never, never store multiple values in one column!
Go with Approach 1. You can easily join tables with that design and don't have to break your head on seperating the content in that column or search for a specific id in the list.
Chose approach one. It's normalised, will enable you to work with your data in a standard, understandable way.
It will also be easier to extend it (what if you need to add more columns?)
It will be much easier and quicker to add and remove additional photos as well as query them. Database systems are designed to work with normalised data and are efficient at this - you don't need to try to improve it.

Should I combine two similar tables into one?

In my database I currently have two tables that are almost identical except for one field.
For a quick explanation, with my project, each year businesses submit to me a list of suppliers that they sale to, and also purchase things from. Since this is done on an annual basis, I have a table called sales and one called purchases.
So in the sales table, I would have the fields like: BusinessID, year, PurchaserID, etc. And the complete opposite would be in the purchases table, except that there would be a SellerID.
So basically both tables are exactly the same field wise except for the PurchaserID/SellerID. I inherited this system, so I did not design the DB this way. I'm debating combing the two tables into one table called suppliers and just adding a type field to distinguish between whether they are selling to, or purchasing from.
Does this sound like a good idea? Is there something I'm missing in regards to why this wouldn't be a good idea?
Do what works for you.
The textbook answer is normalize. If you normalized you would probably have 2 tables, one with both your buyers and sellers as companies. And a transactions table telling who bought what from who.
If it ain't broke, don't fix it. Leave them separate.
Since the system is already built, I would only consider this if you find yourself doing a lot of queries across the two tables, like big nasty UNION queries. Joining the two tables in one makes queries like "show me all sellers or purchasers who sold/bought between these dates..." much easier.
But it sounds like these two groups are treated very differently from the business rule perspective, so its probably not worth the trouble to make application changes at this point. (Every query would have to have a "WHERE Type = 1" or something like that).
If you'd have asked this during the db design phase, my answer might be different.
Normalization would say "yes".
How many applications are affected by this change? That would affect the decision.
Definitely one table. And I wouldn't call it supplier since this does not reflect the meaning of the table. Something like busibess_partner or something better than that might be more appropriate. Instead of purchase_id and seller_id, then be more generic like business_partner_id, and yes, add a field to distinguish.
Not one table. They are different entities that have a similar structure. There's nothing to be gained by consolidating them. (Nothing lost, either, except lucidity; but that's critical IMHO).
"Normalization" doesn't include looking for tables with similar schemas, and merging them.
A database is always a limited model of your business objective. If it doesn't make sense for you business, ignore those who say you should add complexity to your data model by creating a new companies table (though you probably already have something similar). If you really want to get into the "perfect model" game, just start abstracting everything away into an "entities" table and pretty soon you will have a completely unmanageable database.
Normalization would dictate that you NOT combine the two fields, unless the foreign keys actually point to the same table. A key rule to keep in mind is that each column in a table should only mean one thing. Adding a second field that explains what the first field means breaks this rule.
If your queries are getting to be a mess because you are always joining the two tables, you could create a view.
Also, the number of records in the table is almost completely irrelevant. Always optimize for performance after you have the system in place. If it killing your application to have all the records in one table, set a clustered index on a column that partitions your table in a meaningful way.
You must take into consideration the number of records on both tables. if they are to big it could have a big inpact on queries that have multiple joins to customers and suppliers.
Example: Who sold computers to us and to whom did we sell them to.
From a completely different point of view. I tend to consider logic over technology. To me the decision is not whether the data is similar in shape or fields, but whether it makes sense mixing them. That is as much to say that whether the technical answer might be normalize, my answer would be: does it make sense to you (business logic) to have both together?
Another answer talks about merging both and changing naming conventions. To me that is a logic decision: you are saying that you don't work with buyers and sellers, but with business partners. If that is your case, then do it.
You might also consider what your use of the tables would be. If they are of one unique logic type (business partner) you will surely have queries that need to access both buyers and sellers. Else, if all your queries are separate, that might be an indication that they are not the same, and should not be held together. Pushing them together will imply a lot of extra checks and cpu time spent differing from what were separate entities.
There is a long used metaphor about interfaces that might apply here. Just because a fire gun and a camera both shoot, that does not mean they share an interface, unless you like playing Russian roulette.
From a logical view, there seems to be no difference between the reported transactions, it is just a difference in who reports it to you. It should be a single table with SellerID, BuyerID, and (if you need it) ReporterID(s) (and perhaps additional transaction information).
This is how it should be. Now, how to make the transition? Making a script that uses the two old tables to fill a new table should be an easy exercise, but then you also need to change all the queries that use the information. This is likely a lot of work, and might not be worth the effort.
Since none of the experts reporting in are willing to answer your question, the simple answer is: query1 UNION query2
EX.
SELECT * FROM table1 UNION SELECT * FROM table2 assuming table1 and table2 have the same structure/heading titles