I am designing a database application for an award. It has a 75 year history and numerous categories that have changed over time. Right now, the design I am thinking of has two kinds of tables:
entities
people
publishers
categories
novel
movie
author
artist
and such like. Each category has data particular to that category, for example:
NOVEL
title varchar(1024)
author int #FK into people table ID
publisher int #FK into publisher table ID
year year(4)
winner bool
or
ARTIST
name int
year year(4)
winner bool
So far so good. However, there are 38 (!) of these categories that have existed over time (some do not exist anymore) and I really can't imagine doing a query for say, all of the winners from 1963 by doing:
SELECT * from table1,table2,...,table38 WHERE year=1963 and winner=TRUE;
These tables will never be that large (each category usually has at most five nominees, so even after a 100 years, there would be at most 500 rows per table and at a lot less for the early ones that aren't continued). So this isn't a performance question. It is just that that query feels very, very wrong to me, if only because every query will have to be changed every time a new category is created or an old one removed. That happens every few years or so.
The questions then are:
is this query evidence that I've designed this wrong?
if not, is there a better way to do that query?
I keep thinking there must be some way to create a lookup table which pulls from other tables, but I could be misremembering. Is there some way of doing such a thing?
Many thanks,
Glenn
You could do that with 3 tables.
First one is entities. It contains data about all publishers/artist/etc.
entities
name varchar(1024)
publisher bool
Second is data where all data from all categories is stored.
data
title varchar(1024)
author/name int #FK into people table ID
publisher int #FK into publisher table ID
year year(4)
winner bool
category int #FK into category table ID
Third is category in which you can find all categories names with their IDs.
category
ID int
name varchar(1024)
Now you have to join only three tables.
select * from entities e, data d, category c where d.name=e.name and d.category=c.id and winner=bool and year=1963
You would better to have a table for categories where you can save category key value, or just normal category table and you can save the row's id only in other table:
for example,
Table: Category
columns: id, name, slug, status, active_since, inactive_since etc...
In slug, you can keep slugified form of cat to make it easy for queries and url: for example, Industry Innovations category will be saved as industry-innovations.
In status, keep 0 or 1 to show if it is active now. You can also keep dates when it was active and when became inactive in active_since and inactive_since fields.
When you search, you can search those have status 1 for example etc. I dont think your problem is complex and it is very simple for mysql to search when you join tables.
There are projects where dozens of tables are joined and it is ok.
Related
I have a MySQL table with around 4 million+ rows. Let us say the table is as follows:
Columns in table Person:
Id
Name
Age
Marital Status
Education Level
'Location Country'
'Description'
When I run a query based on Age, I also want to have a summary count of people with the same age in different marital status and also with different 'Education Level' and 'Location Country'.
When I run a query based on Age and Education Level, I also want to have a summary count of people with the same age and Education Level in different marital status and also with different 'Location Country'.
For example, the query issued would be SELECT * FROM Person WHERE Age = 27;. I also want results that would be produced by SELECT Education Level, COUNT(*) FROM Person WHERE Age = 27 GROUP BY Education Level; and SELECT Location Country, COUNT(*) FROM Person WHERE Age = 27 GROUP BY Location Country;
Also, this becomes more challenging for me when I have to do a search based on keywords on description and want a summary count on each of the other columns. The application I am developing is a sort of search engine. This can be seen in sites like Ebay,
I can possibly run these queries separately. But, with 4 million rows, the GROUP BY query will take substantial amount of time. This is an internet application and the query should complete within few seconds.
Any help would be much appreciated.
You can do both in one query
SELECT p.*, count(p2.id)
FROM Person p, Person p2
WHERE p2.Age = p.age and p2.marital != p.marital and p1.education != p2.education
GROUP BY p1.id
In such situation, I would suggest to save data in a memcache cache. You can expire cache if new data inserted to table or after some expiration time, to avoid long query execution. Another improvement would be using a LIMIT to reduce number of row returned by DB like this:
SELECT p.*, count(p2.id)
FROM Person p, Person p2
WHERE p2.Age = p.age and p2.marital != p.marital and p1.education != p2.education
GROUP BY p1.id
LIMIT 10
From what you are describing, I would have a separate aggregate table to query directly from that has those "roll-up" stats you want. How frequent is the "Person" table getting added to / changed. If you are only storing a person's "Age", what is the basis of the age if no date, and you add the person again in future they would have multiple records... such that
At age X, so many people were married (or not) and had this level of education.
At age Y, so many people... etc..
I would create a summary table, something like
create table AgeStat (
age int,
married int,
single int,
divorced int,
HighSchool int,
Associates int,
Bachelors int,
Masters int,
Doctorate int )
Then, add a trigger to the person table such that during insert (or inclusive of update/delete as needed), the new record just adds 1 to each respective count applicable.
Then, for your web app, it would be instantaneous to grab one record from this summary table where age = 27 and you have ALL your classification stats.
However, if you distinctly wanted to know how many Married with Masters degree, you would have to roll back to master person list.
Alternatively, you could do a similar pre-aggregation but down a level of granularity something like
create table AgeStat (
age int,
maritalstat int, -- but I would actually use an enumerated value for marital status
educationlevel int, -- and education level vs a hard description of each.
peoplecount int )
and likewise have a trigger that updates the count based on the two combination elements per age. Then, if you wanted the total "Married", you can sum(peoplecount) for age = 27 and maritalstat=(enumerator for "married" value)
Good luck, and hope it helps alternative solution for you.
I am new to database structure and design. Currently, I am in the process of creating a course catalog that will match course description according to course name and date. I have sketched one table describing the courses, which include course_code, name and every other relevant information. Then I sketched another table linking those courses to when they will be taught.
I am missing classes that are classified as all_year. Also I am missing a way how to label the courses under a major. Since hypothetically a course can belong to several majors, putting the data from one into the other would force you to duplicate data. Any ideas how I would implement this two things to my tables design? Or suggestion in how to restructure my design. If possible please show me a query to execute in my phpmyadmin DB.
Example of table courses
id serial
course_code text
description text
Example of table course_dates
id serial
course_id serial
year date
semester
Example of table majors
major_id int
course_id int
So a populated database could contain the following:
Table courses
id course_code description
1 INF1000 "Basic programming"
2 INF1001 "More basic programming"
Table course_dates (0 for spring 1 for fall)
id course_id year semester
1 1 2012 0
2 1 2013 1
3 2 2013 1
To link courses to majors - this is a one to many relationship (one course to many majors) - you want to use a linking table that has this type of structure:
table courses_majors
major_id int
course_id int
Remember to index this table as well - its very important. Then you can populate it and have one course even go to many majors and many course to one major (many to many relationship).
Then you can run a join on the tables across this table:
select * from courses left join courses_majors on courses.id = courses_majors.course_id left join majors on courses_majors.majors_id = majors.id
Of course you can add a where clause, etc.
The other way is to create a table of majors:
majors
id int
name varchar
Then add a major_id to your courses table - this will just give you a one to one relationship from courses to majors, but many courses can join a major.
As for Yearly, I would just add a field in the database to account for this, probably a tiny int and just make it 0 or 1.
i'm building a study tool and i'm not sure of the best way to go about structuring my database.
Basically, i have a simple but big table with around 50000 bits of information in it.
info (50'000 rows)
id
info_text
user
id
name
email
password
etc
What i want is for the students to be able to marked each item as studied or to be studied(basically on and off), so that they can tick off each item when they have revised it.
I want to build tool to cope with thousands of users and was wondering what the most efficient/easiest option way of setting up the database and associated queries.
At the moment i would lean towards just having one huge table with two primary keys one with user id and then id of the info they had studied and then doing some sort of JOIN statement so i could only pull back the items that they had left to study.
user_info
user_id
info_id
Thanks in advance
Here is one way to model this situation:
The table in the middle has a composite primary key on USER_ID and ITEM_ID, so a combination of the two must be unique, even though individually they don't have to be.
A user (with given USER_ID) has studied a particular item (with given ITEM_ID) only if there is a corresponding row in the STUDIED table (with these same USER_ID and ITEM_ID values).
Conversely, the user has not studied the item, if and only if the corresponding row in STUDIED is missing. To pull all items a given user hasn't studied, you can do something like this:
SELECT * FROM ITEM
WHERE NOT EXISTS (
SELECT * FROM STUDIED
WHERE
USER_ID = <given_user_id>
AND ITEM.ITEM_ID = STUDIED.ITEM_ID
)
Or, alternatively:
SELECT ITEM.*
FROM ITEM LEFT JOIN STUDIED ON ITEM.ITEM_ID = STUDIED.ITEM_ID
WHERE USER_ID = <given_user_id> AND STUDIED.ITEM_ID IS NULL
The good thing about this design is that you don't need to care about STUDIED table in advance. When adding a new user or item, just leave the STUDIED alone - you'll gradually fill it later as users progress with their studies.
I would do something like this:
1) A users table with a uid primary key
2) A enrolled table (this table shows all courses that have enrolled students) with a primary key of (uid, cid)
3) A items (info) table holding all items to study, with a primary key of itemid
Then in the enrolled table just have one attribute (a binary flag) 1 means it has been studyed and 0 means they still need to study it.
I had a question about whether or not my implementation idea is easy to work with/write queries for.
I currently have a database with multiple columns. Most of the columns are the same thing (items, but split into item 1, item 2, item 3 etc).
So I have currently in my database ID, Name, Item 1, Item 2 ..... Item 10.
I want to condense this into ID, Name, Item.
But what I want item to have is to store multiple values as different rows. I.e.
ID = One Name = Hello Item = This
That
There
Kind of like the format it looks like. Is this a good idea and how exactly would I go about doing this? I will be using no numbers in the database and all of the information will be static and will never change.
Can I do this using 1 database table (and would it be easy to match items of one ID to another ID), or would I need to create 2 tables and link them?
If so how exactly would I create 2 tables and make them relational?
Any ideas on how to implement this? Thanks!
This is a classical type of denormalized data base. Denormalization sometimes makes certain operations more efficient, but more often leads to inefficiencies. (For example, if one of your write queries was to change the name associated with an id, you would have to change many rows instead of a single one.) Denormalization should only be done for specific reasons after a fully normalized data base has been designed. In your example, a normalized data base design would be:
table_1: ID (key), Name
table_2: ID (foreign key mapped to table_1.ID), Item
You're talking about a denormalized table, which SQL databases have a difficult time dealing with. Your Item field is said to have a many-to-one relationship to the other fields. The correct things to do is to make two tables. The typical example is an album and songs. Songs have a many-to-one relationship to albums, so you could structure your ables like this:
Table Album
album_id [Primary Key]
Title
Artist
Table Song
song_id [Primary Key]
album_id [Foreign Key album.album_id]
Title
Often this example is given with a third table Artist, and you could substitute the Artist field for an artist_id field which is a Foreign Key to an Artist table's artist_id.
Of course, in reality songs, albums, and artists are more complex. One song can be on multiple albums, multiple artists can be on one album, there are multiple versions of the same song, and there are even some songs which have no album release at all.
Example:
Album
album_id Title Artist
1 White Beatles
2 Black Metallica
Song
song_id album_id Title
1 2 Enter Sandman
2 1 Back in the USSR
3 2 Sad but True
4 2 Nothing Else Matters
5 1 Helter Skelter
To query this you just do a JOIN:
SELECT * FROM Album INNER JOIN Song ON Album.album_id = Song.album_id
I don't think one table really makes sense in this case. Instead you can do:
Main Table:
ID
Name
Item Table:
ID
Item #
Item Value
Main_ID = Main Table.ID
Then when you do queries you can do a simple join
we have three types of data (tables):
Book (id,name,author...) ( about 3 million of rows)
Category (id,name) ( about 2000 rows)
Location (id,name) ( about 10000 rows)
A Book must have at least 1 type of Category (up to 3) AND a Book must have only one Location.
I need to correlate this data to get this query faster:
Select Books where Category = 'cat_id' AND Location = 'loc_id'
Select Books where match(name) against ('name of book') AND Location = 'loc_id'
Please I need some help.
Thanks
Have another table, say bookscategories, which has 'id , bookid, categoryid' as fields.
Use this to map books to categories.
Both your original queries will not be affected since the first query wants books in ONE specific category and location and the second query wants books that match a title and ONE location.
With tables this size, which I would consider medium-sized (not small, not large), it is more likely the indices which will make the difference. Create the tables correctly and join them appropriately (on the indexed primary and foreign keys) and your performance should be fine.