I am creating a database for a publishing company. The company has around 1300 books and around 6-7 offices. Now i have created a table that displays the stock items in all locations. The table should look like following to the user:
Book Name Location1 Location2 Location3 ......
History 20000 3000 4354
Computers 4000 688 344
Maths 3046 300 0
...
I already have a Books table which stores all the details of the books, i also have a office table which has the office information. Now if i create a stock management table which shows the information like above i will end up in a huge table with a lot of repetition if i store my data in the following way:
Column1- Book_ID Column2- Location_ID Column3- Quantity
1 1 20000
1 2 3000
1 3 4354
2 1 4000
2 2 688
...
So, i think this isn't the best way to store data as it would end up with 1300 (Books) X 7 (Locations) = 9100 rows. Is there a better way of storing data. Now i can have 7 additional columns in the Books stable but if i create a new location, i will have to add another column to the Books table.
I would appreciate any advice or if you think that the above method is suitable or not.
Nope, that's the best way to do it.
What you have is a Many-to-Many relationship between Books and Locations. This is, in almost all cases, stored in the database as an "associative" table between the two main entities. In your case, you also have additional information about that association, namely, it's "stock" or "quantity" (or, if you think about it like a Graph, the magnitude of the connection, or edge-weight).
So, it might seem like you have a lot of "duplication", but you don't really. If you were to try to do it any other way, it would be much less flexible. For example, with the design you have now, it doesn't require any database schema change to add another thousand different books or another 20 locations.
If you were to try to put the book quantities inside the Locations table, or the Locations inside the Books table, it would require you to change the layout of the database, and then re-test any code that might be use it.
Thats the most common (and effective) solution. Most frameworks like Django, Modx and several others implement Many2Many relations via an intermediate table only, using foreign key relations.
Make sure you index your table properly.
ALTER TABLE stock_management add index (Book_ID), add index (Location_ID)
That really the best way to do it; you have 9100 independent data to store, so you really do need 9100 rows (less, really; the rows where the quantity is 0 can be omitted.) Other way of arranging the data would require the structure of the table to change when a location was added.
Related
There are two tables - users and orders:
id
first_name
orders_amount_total
1
Jone
5634200
2
Mike
3982830
id
user_id
order_amount
1
1
200
2
1
150
3
2
70
4
1
320
5
2
20
6
2
10
7
2
85
8
1
25
The tables are linked by user id. The task is to show for each user the sum of all his orders, there can be thousands of them (orders), maybe tens of thousands, while there can be hundreds and thousands of users simultaneously making a request. There are two options:
With each new order, in addition to writing to the orders table, increase the orders_amount_total counter, and then simply show it to the user.
Remove the orders_amount_total field, and to show the sum of all orders using tables JOIN and use the SUM operator to calculate the sum of all orders of a particular user.
Which option is better to use? Why? Why is the other option bad?
P.S. I believe that the second option is concise and correct, given that the database is relational, but there are strong doubts about the load on the server, because the sample when calculating the amount is large even for one user, and there are many of them.
Option 2. is the correct one for the vast majority of cases.
Option 1. would cause data redundancy that may lead to inconsistencies. With option 2. you're on the safe side to always get the right values.
Yes, denormalizing tables can improve performance. But that's a last resort and great care needs to be taken. "tens of thousands" of rows isn't a particular large set for an RDMBS. They are built to handle even millions and more pretty well. So you seem to be far away from the last resort and should go with option 1. and proper indexes.
I agree with #sticky_bit that Option 2. is better than 1. There's another possibility:
Create a VIEW that's a pre-defined invocation of the JOIN/SUM query. A smart DBMS should be able to infer that each time the orders table is updated, it also needs to adjust orders_amount_total for the user_id.
BTW re your schema design: don't name columns id; don't use the same column name in two different tables except if they mean the same thing.
I have two sets of data that are near identical, one set for books, the other for movies.
So we have things such as:
Title
Price
Image
Release Date
Published
etc.
The only difference between the two sets of data is that Books have an ISBN field and Movies has a Budget field.
My question is, even though the data is similar should both be combined into one table or should they be two separate tables?
I've looked on SO at similar questions but am asking because most of the time my application will need to get a single list of both books and movies. It would be rare to get either books or movies. So I would need to lookup two tables for most queries if the data is split into two tables.
Doing this -- cataloging books and movies -- perfectly is the work of several lifetimes. Don't strive for perfection, because you'll likely never get there. Take a look at Worldcat.org for excellent cataloging examples. Just two:
https://www.worldcat.org/title/coco/oclc/1149151811
https://www.worldcat.org/title/designing-data-intensive-applications-the-big-ideas-behind-reliable-scalable-and-maintainable-systems/oclc/1042165662
My suggestion: Add a table called metadata. your titles table should have a one-to-many relationship with your metadata table.
Then, for example, titles might contain
title_id title price release
103 Designing Data-Intensive Applications 34.96 2017
104 Coco 34.12 2107
Then metadata might contain
metadata_id title_id key value
1 103 ISBN-13 978-1449373320
2 103 ISBN-10 1449373320
3 104 budget USD175000000
4 104 EIDR 10.5240/EB14-C407-C74B-C870-B5B6-C
5 104 Sound Designer Barney Jones
Then, if you want to get items with their ISBN-13 values (I'm not familiar with IBAN, but I guess that's the same sort of thing) you do this
SELECT titles.*, isbn13.value isbn13
FROM titles
LEFT JOIN metadata isbn13 ON titles.title_id = metadata.title_id
AND metadata.key='ISBN-13'
This is a good way to go because it's future-proof. If somebody turns up tomorrow and wants, let's say, the name of the most important character in the book or movie, you can add it easily.
The only difference between the two sets of data is that Books have an
IBAN field and Movies has a Budget field.
Are you sure that this difference that you have now will not be
extended to other differences that you may have to take into account
in the future?
Are you sure that you will not have to deal with any other type of
entities (other than books and movies) in the future which will
complicate things?
If the answer in both questions is "Yes" then you could use 1 table.
But if I had to design this, I would keep a separate table for each entity.
If needed, it's easy to combine their data in a View.
What is not easy, is to add or modify columns in a table, even naming them, just to match requirements of 2 or more entities.
You must be very sure about future requests/features for your application.
I can't image what type of books linked with movies you store thus a lot of movies have different titles than books which are based on. Example: 25 films that changed the name.
If you are sure that your data will be persistent and always the same for books and movies then you can create new table for example Productions and there store attributes Title, Price, Image, Release Date, Published. Then you can store foreign keys of Production entity in your tables Books and Movies.
But if any accident happen in the future you will need to rebuild structure or change your assumptions. But anyway it will be easier with entity Production. Then you just create new row with modified values and assign to selected Book or Movie.
Solution with one table for both books and movies is the worst, because if one of the parameters drive away you will add new row and you will have data for first set (real book and non-existing movie) and second set (non-existing book and real movie).
Of course everything is under condition they may be changes in the future. If you are 100% sure, then 1 table is enough solution, but not correct from the database normalization perspective.
I would personally create separate tables for books and movies.
This is a bit hard to explain.
But i have built an app where users create what i like to call 'raffles' and then users subscribe to it.
I have a table for the raffles, and i could have a column of type text in it and store all the users in it separated by commas(,)
or i could create a separate table where users are added and associated to the raffle via another field called 'raffle_id' or something like it.
I'm not sure how effective both of these methods will be efficient in the long run or for scaling.
Some advise would be appreciated.
I would recommend against storing your user information in CSV format. The main reason for this is that CSV will make querying the table by user difficult. It will also make doing updates difficult. SQL databases were designed to handle relational data using tables. So in your case I would design the raffles table to look like thia:
raffles (raffle_id, user_id)
And the data might look like this:
1 1
1 3
1 7
2 1
2 2
2 3
2 6
In other words, each record corresponds to a single raffle-user relation. Assuming that you only have a few dozen users and raffles happen every so often, thia should scale fine. And if this raffles table ever gets too large at a much later date you can archive a portion of it.
See [What is the best way to add users to multiple groups in a database?][1]
Raffles are the "Groups". "UserInGroup" becomes UserInRaffle, your join table.
I have an existing mysql DB that manages regulations for 50 states. The current setup is relational - three tables for EACH of the 50 states:
state_table contains the chapter/sub-chapter headings
item_table contains the end records
department_table contains the ID's to relate the two.
all combined it handles around 620,000 records
I'm not a DB design expert and have always utilized this as-is and gotten-by however, the nature of tables for all 50 states limits searching across all states etc. and I'm wondering if there is a better approach.
I'm wondering if I should consider combining this into either a single set of 3 relational tables for the entire nation or even a single table to handle everything.
I've asked this on other forums and have been told to read various volumes of DB schema and structures etc. so if there is someone who can just suggest the direction to go in and the pro's and con's of what I have vs the alternative that would be great!
thanks!
Here's the way it is, X 50
alabama
ID
Name
State
Parent
Description
alabama_department
Department - ID's from "alabama"
Item - ID's from "alabama_item"
alabama_item
ID
Name
Description
Keywords
Doc_ID
Effective_date
...
...
The Queries: I step through the heirarchy of chapter/sub-chapter/end-record via links this works fine but I'm starting to focus more on search capability and also thinking what I have is overkill and it sounds like a couple of you think so (overkill)
If I am correct in thinking you have 150 tables (3 * 50 states) Then:
You should have a 'states' table which includes a stateID and stateName. Then use ONE table for chapter/subchapters, ONE for departments, and ONE for end records and use the stateID to relate different records to a state.
You should not have 3 tables for each state, you can use one of each and just relate to a state table. This brings you to four tables instead of 150.
I have this example tables:
table ORDERS
client orderno cant1 code1 notes1 cant2 code2 notes2 cant[i] code[i] [...]
--------------------------------------------------------------------------------------
1 1 3 AA01 Test 4 BB01 Testing
2 2 10 XX05 Test
table PRODUCTS
code prod price
---------------------
AA01 Engine 100
BB01 Wheel 50
table CLIENTS
client name address telephone
-----------------------------------------
1 Maxwell 24 1st st 0987654321
2 Hammer 77 main st 1234567890
I need to relate them to get the quantity, name of the product and price for each of the product lines (they are 30 cant[i], code[i] and notes[i]) and the customer's information (name, address, etc)
I found this case, but I don't understand how to apply it to mine: SQL query two tables with relation one-to-many
I hope it's not too complex.
Thanks in advance!
EDIT
Thanks to ElectricLlama I realized the problem here is the table where the order is storaged. According to his answer, the normalization of the database would improve the way I'm able to get the info.
For anyone interested in this solution, I found this great website: http://www.devshed.com/c/a/MySQL/An-Introduction-to-Database-Normalization/
This SO answer clears it ALL! Super clear and understandable!
https://stackoverflow.com/a/1258776/888292
Looking at what's in your link - yes it seems like a lot of nonsense, but it is probably the only way to get what you want.
The problem is that your table is not normalised. Specifically you should not have fields called code1 code2 code3 code4... code30
There are many flaws with this design including what happens when a client has 31 products?
In a normalised database you would have a table with one set of cant, code and notes, and you would have one row per product.
But I guess you are not in a position to normalise it.
So well done for coming up with your own answer, and you now you also have first hand experience of the repercussions of not normalising a database.
What you might want to consider is creating a view that will normalise this for you. It will introduce performance issues but it will give you an introduction to views, and give you an opportunity to see how the solution would look like against a normalised table.
Ditto #ElectricLlama & here are a few links that should help you learn SQL:
W3Schools: SQL Tutorial
SQLServerCentral.com Stairway Series
I think that your table should be look like
Table orders
-------------
orderno
client
code
cant
note
Here make orderno, client and code make a composite primary key of the table