Suppose I want to create a simple database which lets user create playlists and add multiple songs into it. I just want to be able to find which songs are added in a particular playlist.
song table :
`song_id` INT AUTO_INCREMENT PRIMARY KEY, `song_title` VARCHAR
playlist table :
`playlist_id` INT AUTO_INCREMENT PRIMARY KEY, `playlist_title` VARCHAR
What would be the best option to pull this off?
Add another column to the playlist table and insert comma separated ids of songs into that column. Which I don't think would be a proper relational way to do it but does the job.
or
Create separate table just to store song ids with the playlist id to which it belongs. Like playlist_id INT, song_id INT where both columns are foreign keys.
Now, if the second option is better, should I add another column as a primary key and auto_increment knowing that it won't be useful anywhere? Because I read some articles online and many of them suggests that not having a primary key of a table significantly affects its performance in a negative way.
You should strongly lean towards option two, namely creating a table which relates a playlist ID to a song ID. In this case, you can actually create a primary key which is a composite of the playlist and song ID.
CREATE TABLE playlist_songs (
song_id INT,
playlist_id INT,
PRIMARY KEY (song_id, playlist_id)
)
As to whether you also need an auto increment column on playlist_songs, it would depend on your situation. You might not need it from a business logic point of view since you would likely be manipulating the table using the two columns already there.
There are two aspects to your question - the abstract, philosophical view and practical implications.
Philosophically, the way we decide whether a database design is "good" is to see if it's normalized.
You have two entities in your design - song and playlist. You have two relationships - a song can belong to 0..n play lists, and a play lists contain 0..n songs.
You want to store those facts individually, rather than bundling them together. This means that the bridging table is "best" as it stores a single fact (song x belongs to playlist y), independently of the existence of song or playlist.
The alternative design stores several facts in a single row - "a playlist exists, and has the following songs".
The second philosophical issue is "how do I uniquely identify facts?". In your bridging table, the unique fact is "song x belongs to playlist y". It can only belong to that playlist once (actually, that's probably not true - you may need a column to indicate in which order the song appears).
This means you have a natural, compound key right there in your business domain. Philosophically, this is what you want to use to identify those records, so this should be your primary key.
From a practical point of view, the first question (option one or option two) depends on how your application will work and evolve.
If you ever have to answer the question "in which playlists does this song appear", option 2 is much better - option one would require a where clause like 'where playlist.songs like '% songid,&', which will be very slow.
If you ever have to delete a song, and make sure all the references are deleted too - option 2 is much better. Option one would be slow to find, and the code to update the comma-separated list would be horrible.
If you ever have to insert songs in the middle of the play list, option 2 is much better.
As for the question "how do I assign my primary key" - i think you may have misunderstood the articles. A primary key is a logical concept, and doesn't need to be an auto-incrementing integer. As long as you have good indexes (and indexes are different to primary keys) your performance will be fine.
The 2nd option is FAR preferable.
As to the extra primary key, while not necessary I tend to use one even if just to make it easier to process rows from that table.
For example, say you want to delete a dozen rows you can use IN (comma separated list of ids) rather than a lot of where clauses checking each pair of fields in the rows.
As an aside, there are many reasons the 2nd option is preferable:-
What happens when you want more items in the comma separated list than will fit in the field?
What happens when you want to search for a value in that list? You
cannot index for a value half way through that list.
What happens when you want to hang another value from the item on
the play list? For example the number of times that track has been
played on that playlist?
Etc
I would say option two will be the most beneficial to you. You would then have a table such as the following:
playlist_items table
pi_id INT AUTO_INCREMENT PRIMARY KEY
pi_song_id INT
pi_playlist_id INT
With this you you could then add functionality in the future if required such as:
pi_dateadded DATETIME
In InnoDB keep in mind that you access rows by traversing the primary key index in logical order, so you need to ask how you are looking up rows. Traversing an index is O(log(N)) complexity but if you are using a secondary index you are doing that twice.
Usually having a single column pkey in InnoDB is better, but there may be exceptions.
playlist_table
`playlist_id` INT AUTO_INCREMENT PRIMARY KEY, `playlist_title` VARCHAR
songs_table
`song_id` INT AUTO_INCREMENT PRIMARY KEY, `song_title` VARCHAR,playlist_id INT FOREIGN KEY (playlist_id) REFERENCES playlist_table(playlist_id)
When you want a search use join to find songs
select * from songs_table left join playlist_table on(songs_table.playlist_id=playlist_table.playlist_id)
Related
During the creation of tables using mysql on phpmyadmin, I always find an issue when it comes to primary keys and their auto-increments. When I insert lines into my table. The auto_increment works perfectly adding a value of 1 to each primary key on each new line. But when I delete a line for example a line where the primary key is 'id = 4' and I add a new line to the table. The primary key in the new line gets a value of 'id = 5' instead of 'id = 4'. It acts like the old line was never deleted.
Here is an example of the SQL statement:
CREATE TABLE employe(
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(30) NOT NULL
)
ENGINE = INNODB;
How do I find a solution to this problem ?
Thank you.
I'm pretty sure this is by design. If you had IDs up to 6 in your table and you deleted ID 2, would you want the next input to be an ID of 2? That doesn't seem to follow the ACID properties. Also, if there was a dependence on that data, for example, if it was user data, and the ID determined user IDs, it would invalidate pre-existing information, since if user X was deleted and the same ID was assigned to user Y, that could cause integrity issues in dependent systems.
Also, imagine a table with 50 billion rows. Should the table run an O(n) search for the smallest missing ID every time you're trying to insert a new record? I can see that getting out of hand really quickly.
Some links you might like to read:
Principles of Transaction-Oriented Database Recovery (1983)
How can we re-use the deleted id from any MySQL-DB table?
Why do you care?
Primary keys are internal row identifiers that are not supposed to be sexy or good looking. As long as they are able identify each row uniquely, they serve their purpose.
Now, if you care about its value, then you probably want to expose the primary key value somewhere, and that's a big red flag. If you need an external, visible identifier, you can create a secondary column with any formatting sequence and values you want.
As a side note, the term AUTO_INCREMENT is a bit misleading. It doesn't really mean they increase one by one all the time. It just mean it will try to produce sequential numbers, as long as it is possible. In multi-threaded apps that's usually not possible since batches or numbers are reserved per thread so the row insertion sequence may end actually not following the natural numbering. Row deletions have a similar effect, as well as INSERT with roll backs.
Primary keys are meant to be used for joining tables together and
indexing, they are not meant to be used for human usage. Reordering
primary key columns could orphan data and wreck havoc to your queries.
Tips: Add another column to your table and reorder that column to your will if needed (show that column to your user instead of the primary key).
As my title states, I'm curious about the best practices for modifying an existing table in a (mysql) database. In my scenario, I have a table that is already full of data and has a column named product_id that is currently the primary key for the table. I'm working on a feature where I'm finding product_id doesn't necessarily need to be unique or the primary key, since I want to allow multiple records for the same product. Database design isn't a strength of mine yet, but in my head I feel like what I would want to do is run the command DROP PRIMARY KEY for the product_id column, then add a column called id and making this the new primary key. Then I would need to update the id column for each record with a unique id for it to be a valid primary key. As far as database design is concerned, is this the best practice for doing this or is it better to create a new table with the updated structure and copying the current records into the new table?
EDIT:
More about the feature I'm working on. The products are books and I'm trying to allow multiple sections of these books to be previewed. In order to do this, I'm storing page ranges that can be previewed. Right now, only one page range is allowed, which is why the product id doesn't need to be unique anymore.
A primary key is ALWAYS unique.
Why do you don't want it to be unique? It sounds like you are exposing the key outside the database, that the PK is visible somehow and some user(s) think it should behave differently. If this is the case then this is a really bad practice.
This is the typical case of the notorious "natural keys". They are a disaster waiting to happen; I don't like big time bombs. I've been strongly opposed to them for some time now. It's good they teach them in schools so you know what not to use in the real world.
Now for the solution. If product_id is exposed, then it shouldn't be the PK at all. Solution?
Create a new column (id maybe?) that is internal, that is unique, and not exposed to the user, while keeping product_id. This new column could have the exact same value as product_id at first.
Change all FK references from other tables to the new id column.
Then, remove the PK constraint from product_id and do whatever you want to do with it.
Add the PK contraint to the new id column.
I have two tables: students and courses, assuming that each student can be in more than one course and that each course can have more than one student.
[Table Students] [Table Courses]
id(PK) id(PK)
name name
age duration
etc... etc...
and what I want to do it is to relate both tables into another table, for example, studying, in which I will store the course or courses that is doing each student. Like this:
[Table studying]
idStudent
idCourse
What I have deduced
I think that idStudent and idCourse should be foreign keys because the information it is stored in students and courses respectively with an unique primary key and to respect the consistency of the database. It cannot exist a relation without information neither of the student nor the course or just without the information of one of them.
I also know that some tables has two primary keys to allow that in the table could exist more than one repeated value of a primary key, but not of both primary keys at the same time.
My questions
These ids (idStudent, idCourse). Have to be primary keys or foreign keys?
Should the table studying has another column with an ID?
Is my deduction in the good way?
P.S: I do not need sql statements, I just need help to clarify my confusion.
Thanks in advance!
These ids (idStudent, idCourse). Have to be primary keys or foreign keys?
You want them to be foreign keys, because the existence of each record on your third table depends on the availability of the first, that is, there cannot be a "Student Course" or a "Course with Students" without either the course or the student. It could (if you don't make those keys) but you would break referential integrity
On the other hand, having FK's is usually a good thing because you make sure that you don't remove dependable records by mistake (which is what the constraint is for on the first place) unless you did something like cascade deleting
Should the table studying has another column with an ID?
No, it does not have to but again, sometimes it is a good practice because some software like Object Relational Mappers, Diagram Software, etc. may rely on the fact that they always needs a by-convention primary key. Some others don't even support composite keys so while it is not mandatory it can help in the future and it does not hurt. Of course this all depends on what you are using the database for and how (pure SQL, which engine you use, if you use it with a framework etc.)
Is my deduction in the good way?
All is relative. But I think your logic is good. My advice is that you always design your data schemas as flexible as you can because if a project grows its harder (and more costly) to do those changes down the road. Invest time on thinking how you may expand your application functionality and think if the schema will adapt to it.
Your deduction is correct.
In fact, you should have a composite primary key consisting of both (idStudent, idCourse) columns, because this tuple is the identifier of row in the table, you do not need additional ID column (of course, you can also take that approach to add additional ID column that would be your primary key, but you do not need it if one student can have one course assigned only once)
To respect the integrity, both columns (separately) should be foreign keys - idStudent should be referencing id column of Students table and idCourse should reference id column of Courses table.
If you like you can make them primary keys on studying table. But this is unnecesary, because relation (role of studying table) is many to many and this kind of table dont need primary keys. You need to know that also when you make them pk (pair of student id and course id) , thats mean that theee could be only one pair of each, thats equivalent to constrain unique - student can take a course only ones. In the future you maybe would like to add to this table start_date and this kind of pk could be a problem, you will need to modify them.
I have seen it said that I should have a primary key for all tables in an SQL database. However, I've created a database with a number of tables that are just used for storing the ID of another attribute that belongs to the first. In both cases, I will need multiple mentions of each ID, so what, if anything, should I use for the primary key?
Two ways to go if I get what you are saying
Primary keys can be compound
eg. CustomerContacts could have CustomnerID and ContactID, with a primary key of both columns
so you could have
1,1
1,2
2,2
2,3
but you couldn't have another Customer2 linked to Contact2
The other way would be a surrogate, usually an auto increement
CustomerContacts
CustomerContactID, CustomerID, ContactID, now you could have
1,2,2
2,2,2
which you may or may not want.
Choose the one you want, because what you've seen said is very true, without a unique key, aside from poetntially suffering a good stick of performance problems you get the "some other user has changed a record problem"
So say you had a tables of Names, and Tony is in it twice.
You want to delete one of them, how does the DBMS identify which Tony has to go?
It will either get rid of both, or it will get really upset and throw it's toys out of the pram.
If it hasn't got a unique key, it has failed 1st normal form, that means you can kiss goodbye to all the rest.
In general, you can just use a column 'id' as a primary key to keep track of the row number. For the stored attribute, you can have the column named 'stored_attribute_id'
Your question is quite unclear. A Primary Key defines a combination of columns which, for that table, you are defining must be unique. If the entry is contained in another table you may want to investigate Foreign Keys
I've checked everything for errors: primary key, uniqueness, and type. Access just doesnt seem to be able to link the 2 fields i have in my database. can someone please take a look?
http://www.jpegtown.com/pictures/jf5WKxKRqehz.jpg
Thanks.
Your relationship diagram shows that you've made the ID fields your primary key in all your tables, but you're not using them for your joins. Thus, they serve absolutely no purpose. If you're not going to use "surrogate keys" (i.e., a meaningless ID number that is generated by the database and is unique to each record, but has absolutely no meaning in regard to the data in your table), then eliminate them. But if you're going to use "natural keys" (i.e., a primary key constructed from a set of real data fields that together are going to be unique for each record), you must have a unique compound index on those fields.
However, there are issues with both approaches:
Surrogate Keys: a surrogate PK makes each record unique. That is you could have a record for David Fenton with ID 1 and a record for David Fenton with ID 2. If it's the same David Fenton, you've got duplicate data, but as far as your database knows, they are unique.
Natural Keys: some types of entities work very well with natural keys. The best such are where there's a single field that identifies the record uniquely. An example would be "employee type," where values might be "associate, manager, etc." In that case, it's a very good candidate for using the natural key instead of adding a surrogate key. The only argument against the natural key in that case is if the data in the candidate natural key is highly volatile (i.e., it changes frequently). While every modern database engine provides "CASCADE UPDATE" functionality (i.e., if the value in the PK field changes, all the tables where that field is a Foreign Key are automatically updated), this imposes a certain amount of overhead and can be problematic. For single-column keys, it's unlikely to be an issue. Now, except for lookup tables, there are very few entities for which a natural key will be a single column. Instead, you have to create a compound index, i.e., an index that spans multiple data fields. In the index dialog in Access table design, you create a compound key by giving it a name in the first column, and then adding multiple rows in the second column (from the dropdown list of fields in your table). The drawback of this is that if any of the fields in your compound unique index are unknown, you won't get uniqueness. That is, if a field has a Null in two records, and the rest of the fields are identical, this won't be counted as a conflict of uniqueness because Null never equals Null. This is because Null doesn't mean "empty" -- it means "Unknown."
Allen Browne has explained everything you need to know about Nulls:
Nulls: Do I Need Them?
Common Errors with Null
In your graphic, you show that you are trying to link the Company table with the PManager table. The latter table has a CompanyID field, and your Company table has a unique index on its ID field, so all you need is a link from the ID field of the Company table to the CompanyID field of the PManager table. For your example to work (which would be useless, since you already have a unique index on the ID field), you'd need to create a unique compound key spanning both ID and ShortName in the Company table.
Additionally, if ShortName is a field that you want to be unique (i.e., you don't want two company records to have the same ShortName), you should add a unique index to it, whether or not you still use the ID field as your primary key. This brings me back to item #1 above, where I described a situation where a surrogate key could lead you to enter duplicate records, because uniqueness is established by the surrogate key along. Any time you choose to use a surrogate key, you must also add a unique compound index on any combination of data fields that needs to be unique (with the caveat about Null fields as outlined in item #2).
If you're thinking "surrogate keys mean more indexes" you're correct, in that you have two unique indexes on the same table (assuming you don't have the Null problem). But you do get substantial ease of use in joining tables in SQL, as well as substantially less duplication of data. Likewise, you avoid the overhead of CASCADE UPDATE. On the other hand, if you're viewing a child table with a natural foreign key, you don't need to join to the parent table to be able to identify the parent record, because the data that identifies that record is right there in the foreign-key fields. That lack of a need for a join can be a major performance gain in certain scenarios (especially for the case where you'd need an outer join because the foreign key can be Null).
This is actually quite a huge topic, and it's something of a religious argument. I'm firmly in the surrogate key camp, but I use natural keys for lookup tables where the key is a single column. I don't use natural keys for any other purpose. That said, where possible (i.e., no Null problems) I also have a unique index on the natural key.
Hope this helps.
Actually you need an index on the name fields, on both sides
However, may I suggest that you have way too many joins? In general there should only be one join from one table to the next. It is rare to have more than one join between tables, and exceedingly rare to have more than two.
Have a look at this link:
http://weblogs.asp.net/scottgu/archive/2006/07/12/Tip_2F00_Trick_3A00_-Online-Database-Schema-Samples-Library.aspx
Notice how all of the tables are joined together by a single relationship?
Each of the fields labeled PK are primary keys. These are AUTONUMBER fields. Each of the fields labeled FK are foreign keys. These are indexed Number fields of type Integer. The Primary Keys are connected to the Foreign Keys in a 1 to many relationship (in most cases).
99% of the time, you won't need any other kind of joins. The trick is to create tables with unique information. There is a lot of repeated information in your database.
A database that is reorganized in this manner is called a "normalized" database. There are lots of good examples of these at http://www.databaseanswers.org/data_models/
Just join on the CompanyID. You could also get rid of the Company field in PManager.
I did the following and the problem was solved (I face the same problem of referential integrity in access).
I exported data from both tables in Access to Excel. Table1
was containing Cust Code and basic information about the company.
Cust Code as Primary key.
Table2 was containing all information about who the
customers associated with that company.
I removed all duplicates from Table2 exported to excel.
Using Vlookup I checked and found that there are 11
customers code not present in Table1.
I added those codes in Access Table. I linked by
referential integrity and Problem was solved.
Also look for foreign key if it does not work.
You need to create an INDEX. Perhaps look for some kind of create index button and create an index on CompanyID