Database design - "Separate Tables Vs One table" for Select Queries - mysql

I have a MySQL table like following
Books Table
book-id category author author_place book_name book_price --------other 50 columns directly related to book-id
1 adventure tom USA skydiving 300
2 spiritual rom Germany what you are 500
3 adventure som India woo woo 700
4 education kom Italy boring 900
5 adventure lom Pak yo yo 90
.
.
4000 spiritual tom USA you are 10
As you can see there are around 4000 rows and around 55 columns, I am using this table mostly for select query, Maybe add or update new book after2-3 weeks
I have doubt about the category and author columns
now if I need to select the table by category and author, I can simply do
SELECT * from books Where author = 'tom'
Select * FROM books WHERE category='education'
It works fine, But according to standard database design I think I should separate the category and authors columns into separate tables (especially authors) and use their primary key as foreign key in the books table
Something like this
Books Table
book-id categ_id author_id book_name book_price --------other 50 columns directly related to book-id
1 1 1 skydiving 300
2 2 2 what you are 500
3 1 3 woo woo 700
4 3 4 boring 900
5 1 5 yo yo 90
.
.
4000 3 1 you are 10
Category Table
categ_id category_name
1 advernture
2 spiritual
3 education
. .
. .
30 something
Authors Table
author_id author country
1 tom USA
2 rom Germany
3 som India
4 kom Italy
5 lom Pak
But then I have to use join the tables each time I make a select query by authors or category, Which I think will be inefficient, Something like this
SELECT * FROM Books LEFT JOIN authors on authors.author_id = books.author_id WHERE books.author_id =1
SELECT * FROM Books LEFT JOIN categories on categories.categ_id = books.categ_id_id WHERE books.categ_id =1
So should I separate the first table into separate tables or first table design is better in this case?

This question has it's answer from Mr. Edgar F. Codd himself - the inventor of the relation model upon which all RDBMS are build.
Shortly after releasing the relational model papers he and his team followed with papers on the so called normal forms. There are few of them but the first 3 (at least) should be generally considered mandatory:
First normal form (1NF)
Second normal form (2NF)
Third normal form (3NF)
When you read them you'll see that your initial design is in violation of 2NF and you have come with a solution that more or less respects it. Go ahead with a the NF-compliant design without any doubts.
To elaborate a bit on your concerns with Join's performance. This is not an issue as long as the following criteria is met:
your database schema is well designed (2NF compliant at least)
you use Foreign keys to link the tables (MySQL's docs)
you join the tables by their FK
you have the hardware resources necessary to run your data
efficiently
e.g. on MySQL with InnoDB, on 2NF compliant schema using Foreign keys the join performance by the FK will be among the last things you'd ever be concerned.
Historically there was a DB engine in MySQL - the MyISAM - that did not support foreign key constraints. Perhaps it's the main source of feedback about poor join performance (along poor schema designs of course).

Related

MYSQL DB Best method to store keywords and URL index

Which of these methods would be the most efficient way of storing, retrieving, processing and searching a large (millions of records) index of stored URLs along with there keywords.
Example 1: (Using one table)
TABLE_URLs-----------------------------------------------
ID DOMAIN KEYWORDS
1 mysite.com videos,photos,images
2 yoursite.com videos,games
3 hissite.com games,images
4 hersite.com photos,pictures
---------------------------------------------------------
Example 2: (one-to-one Relationship from one table to another)
TABLE_URLs-----------------------------------------------
ID DOMAIN KEYWORDS
1 mysite.com
2 yoursite.com
3 hissite.com
4 hersite.com
---------------------------------------------------------
TABLE_URL_KEYWORDS---------------------------------------------
ID DOMAIN_ID KEYWORDS
1 1 videos,photos,images
2 2 videos,games
3 3 games,images
4 4 photos,pictures
---------------------------------------------------------
Example 3: (one-to-one Relationship from one table to another (Using a reference table))
TABLE_URLs-----------------------------------------------
ID DOMAIN
1 mysite.com
2 yoursite.com
3 hissite.com
4 hersite.com
---------------------------------------------------------
TABLE_URL_TO_KEYWORDS------------------------------------
ID DOMAIN_ID KEYWORDS_ID
1 1 1
2 2 2
3 3 3
4 4 4
---------------------------------------------------------
TABLE_KEYWORDS-------------------------------------------
ID KEYWORDS
1 videos,photos,images
2 videos,games
3 games,images
4 photos,pictures
---------------------------------------------------------
Example 4: (many-to-many Relationship from url to keyword ID (using reference table))
TABLE_URLs-----------------------------------------------
ID DOMAIN
1 mysite.com
2 yoursite.com
3 hissite.com
4 hersite.com
---------------------------------------------------------
TABLE_URL_TO_KEYWORDS------------------------------------
ID DOMAIN_ID KEYWORDS_ID
1 1 1
2 1 2
3 1 3
4 2 1
5 2 4
6 3 4
7 3 3
8 4 2
9 4 5
---------------------------------------------------------
TABLE_KEYWORDS-------------------------------------------
ID KEYWORDS
1 videos
2 photos
3 images
4 games
5 pictures
---------------------------------------------------------
My understanding is that Example 1 would take the largest amount of storage space however searching through this data would be quick (Repeat keywords saved multiple times, however keywords are sat next to the relevant domain)
wWhereas Example 4 would save a tons on storage space but searching through would take longer. (Not having to store duplicate keywords, however referencing multiple keywords for each domain would take longer)
Could anyone give me any insight or thoughts on which the best method would be to utilise when designing a database that can handle huge amounts of data? With the foresight that you may want to display a URL with its assosicated keywords OR search for one or more keywords and bring up the most relevant URLs
You do have a many-to-many relationship between url and keywords. The canonical way to represent this in a relational database is to use a bridge table, which corresponds to example 4 in your question.
Using the proper data structure, you will find out that the queries will be much easier to write, and as efficient as it gets.
I don't know what drives you to think that searchin in a structure like the first one will be faster. This requires you to do pattern matching when searching for each single keyword, which is notably slow. On the other hand, using a junction table lets you search for exact matches, which can take advantage of indexes.
Finally, maintaining such a structure is also much easier; adding or removing keywords can be done with insert and delete statements, while other structures require you do do string manipulation in delimited list, which again is tedious, error-prone and inefficient.
None of the above.
Simply have a table with 2 string columns:
CREATE TABLE domain_keywords (
domain VARCHAR(..) NOT NULL,
keyword VARCHAR(..) NOT NULL,
PRIMARY KEY(domain, keyword),
INDEX(keyword, domain)
) ENGINE=InnoDB
Notes:
It will be faster.
It will be easier to write code.
Having a plain id is very much a waste.
Normalizing the domain and keyword buys little space savings, but at a big loss in efficiency.
"Huse database"? I predict that this table will be smaller than your Domains table. That is, this table is not your main concern for "huge".

mySql one to many relation table structures

Folks can you please give your suggestions for my question regarding mysql joins.
My Table structures:
place table:
place_id place_name city
1 Hotel Golconda Hyderabad
2 Paradise Hotel Hyderabad
3 Hotel Mayuri Hyderabad
place_tags
tag_id tag_name
1 Valet Parking
2 Air Conditioned
3 Buffet
4 Bar
5 Family Dining
places_info Table:
place_id tag_id
1 1
1 2
1 3
2 1
2 5
3 1
3 4
The above is all my tables which are containing the place names and address in places table, all the facilities of the restaurants in tags table and mapping of the facilities of each place in places_info table.
Is this my table structures are correct to get the places which had "Valet parking and Buffet". How can write a join query for this type of results to get.
Most Importantly we had millions of places in places table and also in the places_info table. How to achieve maximum performance with this type of table structure? Or shall I need to change the table structures?
Please guide me.
This'd be the basic structure for "places with valet AND buffet":
SELECT place_id, COUNT(places_info) AS cnt
FROM place
LEFT JOIN places_info ON place.place_id = places_info.place_ID
AND tag_id IN (1, 3)
^^^^---- two tags: valet(1) + buffet(3)
GROUP BY place.place_id
HAVING cnt = 2
^^^---- must have both tags
For a places which have NEITHER of the tags, or only one, the count would come back 0, or 1, and get dumped by the HAVING clause.

When is it better to flatten out data using comma separated values to improve search query performance?

My question about SEARCH query performance.
I've flattened out data into a read-only Person table (MySQL) that exists purely for search. The table has about 20 columns of data (mostly limited text values, dates and booleans and a few columns containing unlimited text).
Person
=============================================================
id First Last DOB etc (20+ columns)...
1 John Doe 05/02/1969
2 Sara Jones 04/02/1982
3 Dave Moore 10/11/1984
Another two tables support the relationship between Person and Activity.
Activity
===================================
id activity
1 hiking
2 skiing
3 snowboarding
4 bird watching
5 etc...
PersonActivity
===================================
id PersonId ActivityId
1 2 1
2 2 3
3 2 10
4 2 16
5 2 34
6 2 37
7 2 38
8 etc…
Search considerations:
Person table has potentially 200-300k+ rows
Each person potentially has 50+ activities
Search may include Activity filter (e.g., select persons with one and/or more activities)
Returned results are displayed with person details and activities as bulleted list
If the Person table is used only for search, I'm wondering if I should add the activities as comma separated values to the Person table instead of joining to the Activity and PersonActivity tables:
Person
===========================================================================
id First Last DOB Activity
2 Sara Jones 04/02/1982 hiking, snowboarding, golf, etc.
Given the search considerations above, would this help or hurt search performance?
Thanks for the input.
Horrible idea. You will lose the ability to use indexes in querying. Do not under any circumstances store data in a comma delimited list if you ever want to search on that column. Realtional database are designed to have good performance with tables joined together. Your database is relatively small and should have no performance issues at all if you index properly.
You may still want to display the results in a comma delimted fashion. I think MYSQL has a function called GROUP_CONCAT for that.

how to design this database

I have a table for users. Each user has certain skills they teach. So for example:
Bob can teach karate
Louise can teach piano and knitting
Roger can teach judo, sailing and fencing
This is how I've done it in the database:
Table users
pk: uid, name
1 Bob,
2 Louise,
3 Roger
Table skills
pk: sk_id, skill
1 karate,
2 piano,
3 knitting,
4 judo,
5 sailing,
6 fencing
Table user_skill (relationship between user and skills)
pk:usk_id, fk:uid, sk_id
1 1 1,
2 2 2,
3 2 3,
4 3 4,
5 3 5,
6 3 6,
I want to then display "Roger has these skills: judo, basketweaving"
select name, skill
from users, skills, user_skill
where users.uid = user_skill.uid
and users.uid = 3
Is this the right way to go about it - both in terms of designing the tables and querying (mysql)?
Then say I want to update their profile with the areas they teach in:
Bob can teach karate in London
Louise can teach piano in Bolton and knitting in Manchester
Roger can teach judo in London and Manchester, sailing in Liverpool and fencing in Bradford
So I add the following tables:
Table cities
pk: city_id, city
1 London,
2 Manchester,
3 Liverpool,
4 Bolton,
5 Bradford,
But I'm confused as to how to do the relationships. I keep writing it out and realizing it doesnt work and starting again so I've obviously gone wrong somewhere.
I would say your general DB structure is fine as far as the relations go. To incorporate the cities aspect you could use your proposed cities table, but also add a column to your user_skill table to include a reference to the city table.
Also make sure you use proper join statements in the select queries as this is best practice and helps the queries run as efficiently as possible.
Can users teach skills in more than one location, e.g. "bob teaches judo in london and bolton"? Or is it strictly one skill-one city?
Depending on how you want your tables, you'd either just add a 'city' field to the user_skills table, and have multiple "bob/judo/cityX" "bob/judo/cityY" type records. Or you'll add yet another table "user_city_skills" where it'd be "user_skill_ID, cityID".
Your Structure looks fine except your usr_skill table. To incorporate the last part add a fk city_id in user_skill table. If the player can teach the same skill in multiple cities, you will need an additional table to avoid multi-valued columns.
Yes, carry on with it. You should also add one more column in table user_skill which will hold city_id.

SQL "shortcut" identifiers or a long string of joins?

QUESTION: Is it okay to have "shortcut" identifiers in a table so that I don't have to do a long string of joins to get the information I need?
To understand what I'm talking about, I'm going to have to lay ouf an example here that looks pretty complicated but I've simplified the problem quite a bit here, and it should be easily understood (I hope).
The basic setup: A "company" can be an "affiliate", a "client" or both. Each "company" can have multiple "contacts", some of which can be "users" with log in privileges.
`Company` table
----------------------------------------------
ID Company_Name Address
-- ----------------------- -----------------
1 Acme, Inc. 101 Sierra Vista
2 Spacely Space Sprockets East Mars Colony
3 Cogswell Cogs West Mars Colony
4 Stark Industries Los Angeles, CA
We have four companies in our database.
`Affiliates` table
---------------------
ID Company_ID Price Sales
-- ---------- ----- -----
1 1 50 456
2 4 50 222
3 1 75 14
Each company can have multiple affiliate id's so that they can represent the products at different pricing levels to different markets.
Two of our companies are affiliates (Acme, Inc. and Stark Industries), and Acme has two affiliate ID's
`Clients` table
--------------------------------------
ID Company_ID Referring_affiliate_id
-- ---------- ----------------------
1 2 1
2 3 1
3 4 3
Each company can only be a client once.
Three of our companies are clients (Spacely Space Sprockets, Cogswell Cogs, and Stark Industries, who is also an affiliate)
In all three cases, they were referred to us by Acme, Inc., using one of their two affiliate ID's
`Contacts` table
-----------------------------------------
ID Name Email
-- -------------- ---------------------
1 Wylie Coyote wcoyote#acme.com
2 Cosmo Spacely boss#spacely.com
3 H. G. Cogswell ceo#cogs.com
4 Tony Stark tony#stark.com
5 Homer Simpson simpson#burnscorp.com
Each company has at least one contact, but in this table, there is no indication of which company each contact works for, and there's also an extra contact (#5). We'll get to that in a moment.
Each of these contacts may or may not have a login account on the system.
`Contacts_type` table
--------------------------------------
contact_id company_id contact_type
---------- ---------- --------------
1 1 Administrative
2 2 Administrative
3 3 Administrative
4 4 Administrative
5 1 Technical
4 2 Technical
Associates a contact with one or more companies.
Each contact is associated with a company, and in addition, contact 5 (Homer Simpson) is a technical contact for Acme, Inc, and contact 4 (Tony Stark) is a both an administrative contact for company 4 (Stark Industries) and a technical contact for company 3 (Cogswell Cogs)
`Users` table
-------------------------------------------------------------------------------------
ID contact_id company_id client_id affiliate_id user_id password access_level
-- ---------- ---------- --------- ------------ -------- -------- ------------
1 1 1 1 1 wylie A03BA951 2
2 2 2 2 NULL cosmo BF16DA77 3
3 3 3 3 NULL cogswell 39F56ACD 3
4 4 4 4 2 ironman DFA9301A 2
The users table is essentially a list of contacts that are allowed to login to the system.
Zero or one user per contact; one contact per user.
Contact 1 (Wylie Coyote) works for company 1 (Acme) and is a customer (1) and also an affiliate (1)
Contact 2 (Cosmo Spacely) works for company 2 (Spacely Space Sprockets) and is a customer (2) but not an affiliate
etc...
NOW finally onto the problem, if there is one...
Do I have a circular reference via the client_id and affiliate_id columns in the Users table? Is this a bad thing? I'm having a hard time wrapping my head around this.
When someone logs in, it checks their credentials against the users table and uses users.contact_id, users.client_id, and users.affiliate_id to do a quick look up rather than having to join together a string of tables to find out the same information. But this causes duplication of data.
Without client_id in the users table, I would have to find the following information out like this:
affiliate_id: join `users`.`contact_id` to `contacts_types`.`company_id` to `affiliates`.`company_id`
client_id: join `users`.`contact_id` to `contacts_types`.`company_id` to `clients`.`company_id`
company_id: join `users`.`contact_id` to `contacts_types`.`company_id` to `company`.`company_id`
user's name: join `users`.`contact_id` to `contacts_types`.`contact_id` to `contacts`.`contact_id` > `name`
In each case, I wouldn't necessarily know if the user even has an entry in the affiliate table or the clients table, because they likely have an entry in only one of those tables and not both.
Is it better to do these kinds of joins and thread through multiple tables to get the information I want, or is it better to have a "shortcut" field to get me the information I want?
I have a feeling that over all, this is overly complicated in some way, but I don't see how.
I'm using MySQL.
it's better to do the joins. you should only be denormalizing your data when you have timed evidence of a slow response.
having said that, there are various ways to reduce the amount of typing:
use "as" to give shorter names to your fields
create views. these are "virtual tables" that already have your standard joins built-in, so that you don't have to repeat that stuff every time.
use "with" in sql. this lets you define something like a view within a single query.
it's possible mysql doesn't support all the above - you'll need to check the docs [update: ok, recent mysql seems to support views, but not "with". so you can add views to do the work of affiliate_id, client_id etc and treat them just like tables in your queries, but keeping the underlying data nicely organised.]