Split similar data into two tables? - mysql

I have two sets of data that are near identical, one set for books, the other for movies.
So we have things such as:
Title
Price
Image
Release Date
Published
etc.
The only difference between the two sets of data is that Books have an ISBN field and Movies has a Budget field.
My question is, even though the data is similar should both be combined into one table or should they be two separate tables?
I've looked on SO at similar questions but am asking because most of the time my application will need to get a single list of both books and movies. It would be rare to get either books or movies. So I would need to lookup two tables for most queries if the data is split into two tables.

Doing this -- cataloging books and movies -- perfectly is the work of several lifetimes. Don't strive for perfection, because you'll likely never get there. Take a look at Worldcat.org for excellent cataloging examples. Just two:
https://www.worldcat.org/title/coco/oclc/1149151811
https://www.worldcat.org/title/designing-data-intensive-applications-the-big-ideas-behind-reliable-scalable-and-maintainable-systems/oclc/1042165662
My suggestion: Add a table called metadata. your titles table should have a one-to-many relationship with your metadata table.
Then, for example, titles might contain
title_id title price release
103 Designing Data-Intensive Applications 34.96 2017
104 Coco 34.12 2107
Then metadata might contain
metadata_id title_id key value
1 103 ISBN-13 978-1449373320
2 103 ISBN-10 1449373320
3 104 budget USD175000000
4 104 EIDR 10.5240/EB14-C407-C74B-C870-B5B6-C
5 104 Sound Designer Barney Jones
Then, if you want to get items with their ISBN-13 values (I'm not familiar with IBAN, but I guess that's the same sort of thing) you do this
SELECT titles.*, isbn13.value isbn13
FROM titles
LEFT JOIN metadata isbn13 ON titles.title_id = metadata.title_id
AND metadata.key='ISBN-13'
This is a good way to go because it's future-proof. If somebody turns up tomorrow and wants, let's say, the name of the most important character in the book or movie, you can add it easily.

The only difference between the two sets of data is that Books have an
IBAN field and Movies has a Budget field.
Are you sure that this difference that you have now will not be
extended to other differences that you may have to take into account
in the future?
Are you sure that you will not have to deal with any other type of
entities (other than books and movies) in the future which will
complicate things?
If the answer in both questions is "Yes" then you could use 1 table.
But if I had to design this, I would keep a separate table for each entity.
If needed, it's easy to combine their data in a View.
What is not easy, is to add or modify columns in a table, even naming them, just to match requirements of 2 or more entities.

You must be very sure about future requests/features for your application.
I can't image what type of books linked with movies you store thus a lot of movies have different titles than books which are based on. Example: 25 films that changed the name.
If you are sure that your data will be persistent and always the same for books and movies then you can create new table for example Productions and there store attributes Title, Price, Image, Release Date, Published. Then you can store foreign keys of Production entity in your tables Books and Movies.
But if any accident happen in the future you will need to rebuild structure or change your assumptions. But anyway it will be easier with entity Production. Then you just create new row with modified values and assign to selected Book or Movie.
Solution with one table for both books and movies is the worst, because if one of the parameters drive away you will add new row and you will have data for first set (real book and non-existing movie) and second set (non-existing book and real movie).
Of course everything is under condition they may be changes in the future. If you are 100% sure, then 1 table is enough solution, but not correct from the database normalization perspective.
I would personally create separate tables for books and movies.

Related

Better way to organize lots of columns and data?

I'm creating a real-estate website and i was wondering if there was a better way of organizing my columns or tables, not sure what would be the best way to go about it, i currently have a lot of columns and im worried about performance issues.
The columns are as follows
5 for things like property id, add date, duration, owner/user id.
35 columns for things like title, description, price, energy rating, location, etc.
40 columns for features like swimming-pool, central heating, river front, garage, well, etc.
15 for image locations which are stored on server
15 for the image descriptions
Is 110+ columns bad practice in MySQL? Everything is lightning fast but i'm in localhost at the mo, wont the monstrous size of the tables slow queries? Especially if I have a couple hundred properties?
Am i ok with my current setup? What would best practice be? How do e-commerce websites that have many feature options go about this?
It is not a good practice since the data can be stored in separate tables. What would help you most would be to create an ERD to visualize how you can organize your tables. Even if you do not understand the ins and outs of ERDs, you can still use it to at least organize your thoughts.
It seems that you already have your tables separated based on the bullet points that you made within your question. One thing that I would add to your bullets is maybe breaking down your features into categories and creating a table for each.
For example, swimming pools and riverfront can be placed in a table called
LandscapeFeatures or OutdoorFeatures.
Most likely, the property features would be better stored in a separate table, with one row per property feature, rather than as columns in your main table. I understand this as a many-to-many relationship between a propery and its features, so this suggest two more tables:
properties (property_id (pk), date_added, title, description)
features (feature_id (pk), description)
property_features (property_id (fk), feature_id (fk))
Such structure is much more flexible and easier to query than having one column per feature. As examples:
easy to add features by creating new rows in the features table (while in the old structure you had to create a new column)
easy to aggregate the features, and answer a question like: count how many features each property has
As for images, they should have their ow table too. If an image maby belong to several user, then it's a many-to-many relationship, and you can follow the above pattern. If each image belongs to a single user, one more table is enough:
properties (property_id (pk), date_added, title, description)
features (feature_id (pk), description)
property_features (property_id (fk), feature_id (fk))
images (image_id, location, description, property_id (fk))
One table:
Columns for the dozen or so values that you are most likely to search on.
Devise several composite indexes that involve those columns, starting with the more commonly searched columns.
Devise a TEXT column and put "words" in it for a FULLTEXT index. If this is home sales, consider words like "swimming pool septic tank gazebo Eichler". This will help with certain "boolean" type queries. (If you like this idea, let's discuss how to make use of filtering with indexes and/or fulltext; it gets tricky.)
Put the rest into a JSON (or TEXT column). Do not plan on searching it; instead bring the row(s) into your app code for further filtering after searching by the actual INDEXes

cons of storing comma separated value of ids for custom sort order

We're working a web application (Ruby/Rails + Backbone,jQuery,Javascript) where a user can manage a booklist and drag and drop books to rearrange their order within the list, which has to be persisted.
We have books and a custom collection of books called booklist, for which we have two tables: book and booklist. Since a book could belong to multiple booklists, and a booklist consists of multiple books, they have an m x n relationship, and we have another additional table to store the mapping. Lets say we use this for all purposes. Now when the user wants to re-order the books in her bookshelf, we'd need to store that order.
I can totally see the sense about why storing ids in a column is evil , no doubts about it. What if we have the tables normalized, and for all other cases we'd go through the standard operations.
There are quite a few approaches on storing an additional order column. But still it seems like bad design to store the ids of the books in a booklist in a comma separated list in the booklist table, even assuming that integrity is maintained.
We'd never run into this...
SELECT * FROM users WHERE... OH F#$%CK -
Yes it's bad, you can't order, count, sum (etc) or even do a simple report without depending
on a top level language.
because we'd simply be selecting books based on the booklist id using the join table like the standard approach. (In any case, we're only getting the books as an array as part of the backbone booklist model)
So what if we retrieve the booklist and books for the booklist, and do the sorting programatically on the client side (in this case Javascript?) based on the CSV column.
It appears to be a simple solution because:
Every time the user reorders a book, we simply store all the ids in this one column freshly again. (A user will have at the most 20 to 30 books in a booklist).
We could of course simply ignore invalid ids, i.e. books that have been deleted after the booklist had been created.
What are the disadvantages of this approach, which seems to be simpler than maintaining the sort order and updating other columns every time an order is changed, or using a float or weightage, etc.
As per my knowldege its really violating the rule of RDBMS.Which causes facing many difficulties when applying JOIN.
Hope it will help you.

How to design a book library application

I am creating an app for a library. I need to create a database containing details about all the books in the library and the members in the library. I need to maintain a table that connects a person with the books he has read. Each book has a unique id as well the member. I need to track each book taken by the person and recommend books based on his interests. Also i have to track the time required by the person to complete a book. I have only basic knowledge about databases and that too about MySQL only. If I'm right you cannot have multiple values for a field in MySQL. I thought of creating a table for mapping a person with the books he/she has lended. But the problem with such an approach is that the size of the table increases uncontrollably. Is there any other approach that I'm missing that can make my database simpler? I need to frequently search for content from the database. So the table must be as small as possible.
I'm also ready to learn any other language other than MySQL if my requirement is not accomplished by it.
The standard approach would indeed be to create a "join" table that maps people to books. Such a table may have many rows, but each row should only consist of a few columns -- person id, book id, timestamp loaned, timestamp completed.
It is not uncommon to have MySQL tables with millions of rows. Don't worry about that. Make sure to index the person id and book id, and you'll be fine.
You can use this aproach: http://sqlfiddle.com/#!2/33b45/1
You need three tables: books, members, lendings. The code is in the fiddle.

creating user profiles, each with personal mysql data, using php

I'm trying to figure out the best practices for storing user data on a php/mysql site.
let's say the website will host a service of saving people's input for items they have in their house.
I have set up tables that includes: kitchen, bathroom, bedroom, etc.
Sally adds her 6 kitchen items.
John adds his 3 kitchen items.
etc.
I'm just wondering what may be the common practice on storing other user information in the mysql database. I've taken a class on databases, so i'm thinking relationally linking by foreign key, john with his items in the lists, and sally too..
does that sound about right? or is there a better way? I can see the list getting really large quite quickly.
would it be possible to set up a different table to each user? is that possible? or would it be silly?
I would not set up a table for each user.
Definitely go relational. I am not sure I follow you completely around "john with his items.." and so on. So I interpret this as
user table
room table
item table
relational user->item (id, user_id, item_id, room_id) OR:
relational item->room
So you can pull a user, list the rooms they have related to them, then list the items in that room. Additionally, like this you do not need a new item entry for common things like tables, stoves, spatulas, etc.
Your list could get large, but if you scale properly and plan a back end based update migration when you absolutely need to (like millions of users) then you should be fine. Consider how many relations sites like facebook and ebay have to maintain. Large relations are normal for databases so I wouldn't let a couple million rows scare you.
I would use three tables:
rooms (id, room), to store values kitchen, bathroom, bedroom, etc.
users
items: assuming you have a common structure your your current kitchen, bathroom, bedroom tables, one table could replace all of them. This table should also contain two foreign keys, user_id and room_id.
With that structure, you can easily retrieve and filter your data.

Method To Create Database for Tv Shows

This is my first question to stackoverflow so if i do something wrong please let me know i will fix it as soon as possible.
So i am trying to make a database for Tv Shows and i would like to know the best way and to make my current database more simple (normalization).
I would to be able to have the following structure or similar.
Fringe
Season 1
Episodes 1 - 10(whatever there are)
Season 2
Episodes 1 - 10(whatever there are)
... (so on)
Burn Notice
Season 1
Episodes 1 - 10(whatever there are)
Season 2
Episodes 1 - 10(whatever there are)
... (so on)
... (More Tv Shows)
Sorry if this seems unclear. (Please ask for clarification)
But the structure i have right now is 3 tables (tvshow_list, tvshow_episodes, tvshow_link)
//tvshow_list//
TvShow Name | Director | Company_Created | Language | TVDescription | tv_ID
//tvshow_episodes//
tv_ID | EpisodeNum | SeasonNum | EpTitle | EpDescription | Showdate | epid
//tvshow_link//
epid | ep_link
The Director and the company are linked by an id to another table with a list of companies and directors.
I am pretty sure that there is an more simplified way of doing this.
Thanks for the help in advance,
Krishanthan Lingeswaran
The basic concept of Normalization is the idea that you should only store one copy of any item of data that you have. It looks like you've got a good start already.
There are two basic ways to model what you're trying to do here, with episodes and shows. In the database world, we you might have heard the term "one to many" or "many to many". Both are useful, it just depends on your specific situation to know which is the correct one to use. In your case, the big question to ask yourself is whether a single episode can belong to only one show, or can an episode belong to multiple shows at once? I'll explain the two forms, and why you need to know the answer to that question.
The first form is simply a foreign key relationship. If you have two tables, 'episodes' and 'shows', in the episodes table, you would have a column named 'show_id' that contains the ID of one (and only one!) show. Can you see how you could never have an episode belong to more than one show this way? This is called a "one to many" relationship, i.e. a show can have many episodes.
The second form is to use an association table, and this is the form you used in your example. This form would allow you to associate an episode with multiple shows and is therefore called a "many to many" relationship.
There is some benefit to using the first form, but it's not really that big of a deal in most cases. Your queries will be a little bit shorter because you only have to join 2 tables to get episodes->shows but the other table is just one more join. It really comes down to figuring out if you need a "one to many" or "many to many" type relationship.
An example of a situation where you would need a many-to-many relationship would be if you were modeling a library and had to keep track of who checked out which book. You'd have a table of books, a table of users, and then a table of "books to users" that would have an id, a book_id, and a user_id and would be a many-to-many relationship.
Hope that helps!
I am pretty sure that there is an more simplified way of doing this.
Not as far as I know. Your schema is close to the simplest you can make for what I presume is the functionality you're asking for. "Improvements" on it really only make it more complicated, and should be added as you judge the need emerges on your side. The following examples come to mind (none of which really simplify your schema).
I would standardize your foreign key and primary key names. An example would be to have the columns shows.id, episodes.id, episodes.show_id, link.id, link.episode_id.
Putting SeasonNum as what I presume will be an int in the Episodes table, in my opinion, violates the normalization constraint. This is not a major violation, but if you really want to stick to it, I would create a separate Seasons table and associate it many-to-one to the Shows table, and then have the Episodes associate only with the Seasons. This gives you the opportunity to, for instance, attach information to each season. Also, it prevent repetition of information (while the type of the season ID foreign key column in the Episodes table would ostensibly still be an INT, a foreign key philosophically stores an association, what you want, versus dumb data, what you have).
You may consider putting language, director, and company in their own tables rather than your TV show list. This is the same concern as above and in your case a minor violation of normalization.
Language, director, and company all have interesting issues attached to them regarding the level of the association. Most TV shows have different directors for different episodes. Many are produced in multiple languages and by several different companies and sometimes networks. So at what level do you plan on storing this information? I'm not a software architect, so someone else can better answer this question than me, but I'd set up a polymorphic many-to-many association for languages, directors, and companies and an inheritance model that allows for these values to be specified on an episode-by-episode, season-by-season, or show-by-show basis, inheriting the value from its parent if none are provided.
Bottom line concerning all these suggestions: Pick what's appropriate for your project. If you don't need the functionality afforded by this level of associations, and you don't mind manually entering in repetitive data (you might end up implementing an auto-complete system to help you), you can gloss over some of the normalization constraints.
Normalization is merely a suggestion. Pick what's right for you and learn from your mistakes.