Straight into this one. I have a table for a sort of "like" feature. This table naturally has the following:
Name | Type | Attributes | (Comment)
Post ID | int | index | ID of the post which was "Liked"
Topic ID | int | index | ID of the topic which contains the "Liked" post
Member ID | int | index | ID of the member who "Liked" the post
Date | bigint | index | Date/time of "Like"
As you can see, there's no primary key. This seems natural. The only functions which need performing are the INSERT (for "Like"), DELETE (for "Unlike") and searching for likes in order of most recent by the post or member who gave them.
Each entry will obviously be very 'UNIQUE' - as only one like is needed per person per post. There seems absolutely no need for a unique primary index, as if duplicates occur (somehow) I will want to DELETE them all, not just one with a particular ID. Same with insertion, no one can like the same thing twice. And these "likes" will only ever be selected using the indexes from other tables.
Yet, phpMyAdmin now forbids me from any manual editing, copying or deleting. This is also fine, but prompted me to further look up the logistics of not having a primary key. When I found a stackoverflow answer, the general opinion was that it's "very rare" to not need a primary key.
So, either I've found one of these very rare moments, or it's not that rare at all. My scenario seems quite simple and common, so there should be a more definite answer. Everything seems natural this way, I will never ever need to actually use a primary key. Therefore, I'd think it'd be simpler not to have one. Are there any really mysterious (and somewhat magical) ways of MySQL I'm overlooking? Or am I safe to leave out a useless auto-incrementing primary ID key (which could reach its limit way before any of the currently used ID's would, anyway) at least until I time I find a use for them (never)?
You've said that Post ID and Member ID define the uniqueness of a column (and that Topic ID is secondary, included only for convenience).
So, why not have a primary key on (Post ID, Member ID)? If you already have UNIQUEness constraints on them, then this is not a big leap.
CREATE TABLE `Likes` (
`PostID` INT UNSIGNED NOT NULL,
`TopicID` INT UNSIGNED NOT NULL,
`MemberID` INT UNSIGNED NOT NULL,
`Date` DATETIME NOT NULL,
PRIMARY KEY (`PostID`, `MemberID`),
FOREIGN KEY (`PostID`) REFERENCES `Posts` (`ID`) ON DELETE CASCADE,
FOREIGN KEY (`MemberID`) REFERENCES `Members` (`ID`) ON DELETE CASCADE
) Engine=InnoDB;
(I don't know enough about TopicID to suggest key constraints for it, but you may wish to add some.)
Certainly adding an arbitrary auto-incrementing field is pointless, but that doesn't mean that you can't have a meaningful primary key.
As an aside, I'd consider removing the TopicID field; if you have your foreign keys set up properly then it should be trivial to do post<->topic lookup without it, and in this instance you're duplicating data and violating the relational model!
Related
Is there any reason to use id int auto-increment for a primary key while a UNIQUE column (that is not a foreign key or anything) already exists in a table?
I'm reading someone else's thesis about a project that uses freebase data.
Every topic in freebase is uniquely identified by an mid. (example: m.gugkl395).
But instead of using mid as the primary key in the topics table he chose to use an id (int auto-increment). So the topics table looks like this
CREATE TABLE topics (
id INT NOT NULL AUTO-INCREMENT,
mid VARCHAR(254) NOT NULL UNIQUE,
name VARCHAR(254) NOT NULL,
description VARCHAR(2048),
type VARCHAR(254) NOT NULL,
PRIMARY KEY (id)
);
I should mention that there are 3 other tables that use this id as a foreign key and that because it has to do with freebase there will probably be a lot of data in the database. Also in case it matters MySQL version 5.7.15 is being used.
Theoretically,IMHO after 26+ years from dbaseIII+, if you have a Unique key, you can use it, auto increment field just simplify the things if you are not sure about uniqueness.
Practically, Regarding actual programming and performance of the database, you need to add multiple lines of code to create the primary key value and some times require connections to the server to get the last value of the key before the new one, these plus the space used by varchar in the design of the tables may affect the performance of the solution and take more time in programming.
I hope this may help.
The m.gugkl395 is the RDF encoding of MID which looks like /m/gugkl395 in its native form, but the prefix is constant and the gugkl395 is a radix 32(?) encoding of an integer. The most space efficient and performant schema would be to just store that integer.
The MID encoding is described in an earlier answer of mine here: https://stackoverflow.com/a/56012791/167425
Quite often I encounter situation like this:
table `user_adress`
+----------+-------------+--------------+---------+
|adress_id | user_id | adress_type |adress |
+----------+-------------+--------------+---------+
| 1 | 1 | home |adressXXX|
| 2 | 2 | home |adressXXX|
| 3 | 3 | home |adressXXX|
| 4 | 1 | work |adressXXX|
| 5 | 2 | work |adressXXX|
| 6 | 1 | second_home |adressXXX|
+----------+-------------+--------------+---------+
If I want to use it, I'm using queries like this:
SELECT `adress` FROM `user_adress` WHERE `user_id`=1;
Seems quite normal, but the thing is, that I use "useless" adress_id column, that has no other purpouse but to be an primary key with autoincrement just for the sake of having an primary key in MySQL table. I never use or need this number. So I figured out that I should not use primary key in my table at all, remove totally adress_id, and set INDEX (without unique) at user_id column. That seems to be good - or am I wrong?
I have some doubts, because as much as I'm reading, everywhere I see advices, that every table should, or even need to have primary key. But why? Perhaps my database is badly designed if I allowed this to happen, but looking on my extreamly simple example table - I can't imagine how this could be the case in every situation, especially in such simple cases. I deffinetly missunderstanded some simple, basic rules about creating tables and properly indexing them - where is the hole in my toughts?
Purely based on your table structure, I would say that your primary key is incorrect.
Instead, it looks like your primary should be:
PRIMARY KEY (user_id, address_type)
You are correct that every table should have a primary key ideally, but primary keys can be over multiple fields.
It is still sometimes easier to have a simple auto-incrementing id as your primary key. The Innodb storage engine will actually do this secretly in an invisible field.
Maybe in your limited example it's not needed, but in a lot of real-world cases it can just make it easier to work with the data. In that sense I would say that having an artificial auto-incrementing primary key is not a best practice from an academic standpoint, but it can be good idea from a 'real world, operational, and MySQL admin' perspective.
There's also ORM systems out there that simply require this (bad as that is).
As is evident in your data the primary key allow the access directly to a single row without any problem or ambiguity .. (expecially for delete or updated)
this is specifically the purpose of a primary key ..
di the fact you could need join this table to others table by user_id
and index (not unique ) on user_id
create index myidx on mytable(user_id)
is really useful for faster join allow a direct access only at the rows related to a single user_id
It's true that a relational database table needs a primary key.
But it all comes down to the definition of a primary key. A primary key is NOT necessarily a single integer column that auto-increments.
A primary key is any column or set of multiple columns that can uniquely identify every row. In your case, the combination of user_id and address_type can do this (as Evert posted already).
So if you make your table like this:
CREATE TABLE user_address (
user_id INT NOT NULL,
address_type varchar(10) NOT NULL,
address TEXT NOT NULL,
PRIMARY KEY (user_id, address_type)
);
Then you can update or delete one specific row at a time like this:
UPDATE user_address SET ...
WHERE user_id = ? AND address_type = ?;
Some people feel that it's more convenient to enforce a convention that every table should have a single integer column as its primary key. They even may insist that the column must be called id for the sake of consistency.
There's some advantage in consistency, but on the other hand, it's kind of brainless to insist on that convention even when it's not helpful.
In my web application, the user can define documents and give them a unique name that identifies that document and a friendly name that a human will use to refer to the document. Take the following table schema as an example:
| id | name | friendly_name |
-----------------------------------------------
| 2 | invoice-2 | Invoice 2 |
In this example I've used the id column as the primary key, which is an auto incrementing number. Since there's already a natural ID for documents (name) I could also do this:
| name | friendly_name |
--------------------------------------
| invoice-2 | Invoice 2 |
In this example, name is the primary key of the document. We've eliminated the id field as it's essentially just a duplicate of name, since every document in the table must have a unique name anyway.
This would also mean that when I refer to a document from a foreign key relationship I'd have to call it document_name rather than document_id.
What's the best practice regarding this? Theoretically it's entirely possible for me to use a VARCHAR for the primary key, but does it come with any downsides such as performance overhead?
There are two schools of thought on this topic.
There are some who hold strongly to the belief that using a "natural key" as the primary key for an entity table is desirable, because it has significant advantages over a surrogate key.
The are others that believe that a "surrogate" key can provide some desirable properties which a "natural" key may not.
Let's summarize some of the most important and desirable properties of a primary key:
minimal - fewest possible number of attributes
simple - native datatypes, ideally a single column
available - the value will always be available when the entity is created
unique - absolutely no duplicates, no two rows will ever have the same value
anonymous - carries no hidden "meaningful" information
immutable - once assigned, it will never be modified
(There are some other properties that can be listed, but some of those properties can be derived from the properties above (not null, can be indexed, etc.)
I break the two schools of thought regarding "natural" and "surrogate" keys as the "best" primary keys into two camps:
1) Those who have been badly burned by an earlier decision to elect a natural key as the primary key, and
2) Those who have not yet been burned by that decision.
Of course you can.
create table sometbl(
`name` varchar(250) NOT NULL PRIMARY KEY,
`friendly_name` varchar(400)
);
Time for accessing integer or varchar (unless its too long) key doesn't have any difference. Even if it has, it wont be your main bottleneck. As long as a column is declared as key mysql can access it very fast.
Auto incrementing integer can not be primary key. Its just a serial number for the row. When you look at the real object you'll see it doesn't have any serial number. So the primary key should be based on those real properties.
Here is a small design with the common NOT NULL UNIQUE constraints on the natural keys:
CREATE TABLE 'users' {
id int(10) NOT NULL AUTO_INCREMENT,
name NOT NULL UNIQUE,
email NOT NULL UNIQUE,
pass NOT NULL,
PRIMARY KEY ('id')
}
The NOT NULL UNIQUE constraint seems hackish to me. Having disjoint candidate keys seems denormalized to me, and the UNIQUE constraint seems like a bloated O(N) checking feature, so I'm inclined to use a design that has a relation for each natural key that maps the natural key to the surrogate key in the main relation.
CREATE TABLE users {
id int(10) NOT NULL AUTO_INCREMENT,
pass NOT NULL,
PRIMARY KEY ('id')
}
CREATE TABLE user_names {
name NOT NULL,
user_id NOT NULL,
PRIMARY KEY ('name')
}
CREATE TABLE user_emails {
email NOT NULL,
user_id NOT NULL,
PRIMARY KEY ('email')
}
This way, I implicitly enforce the unique constraint on user's emails and usernames while providing the luxury of being able to search for a user's info with their email or name in O(ln N + ln M) time (which I very much desire).
This only way I can ever see the first, more common design matching the performance of the second design is if the UNIQUE constraint implicitly indexed the table so that selects with, and therefore checks for uniqueness of, the natural keys can be done in O(ln N) time.
I suppose my question is, with regard to the performance insertions and selections with the natural keys, what is the best way to handle a table with 3 or more natural keys that is indexed by a surrogate key?
It seems that what you are describing is 6th Normal Form. Assuming your original table is in 5NF then your new schema consisting of 3 tables is in 6NF. Having three candidate keys does not violate 5NF but it would violate 6NF.
From the data integrity point of view however 6NF has significant disadvantages. It is normally the case that some dependencies are lost. For example your original table enforces the constraint that every user has a name and password. Your 6NF version can't do that - at least not in SQL if you want to permit inserts to all the tables. 6NF is useful for some specific situations (temporal data) but in general 5NF is more useful and desirable from a data integrity perspective.
This doesn't answer your performance question but I thought it was worth pointing out.
You are normalizing too much in my opinion. You will hurt performance not only on inserts/updates but also on selects since you are now joining 3 tables instead of doing a straight insert/select/update/delete in one table.
I disagree that the NOT NULL UNIQUE is hackish but I do find strange that there's such a constraint on a name column.
I am writing a data warehouse, using MySQL as the back-end. I need to partition a table based on two integer IDs and a name string. I have read (parts of) the mySQL documentation regarding partitioning, and it seems the most appropriate partitioning scheme in this scenario would be either a HASH or KEY partitioning.
I have elected for a KEY partitioning because I (chicked out and) dont want to be responsible for providing a 'collision free' hashing algorithm for my fields - instead, I am relying on MySQL hashing to generate the keys required for hashing.
I have included below, a snippet of the schema of the table that I would like to partition based on the COMPOSITE of the following fields:
school id, course_id, ssname (student surname).
BTW, before anyone points out that this is not the best way to store school related information, I'll have to point out that I am only using the case below as an analogy to what I am trying to model.
My Current CREATE TABLE statement looks like this:
CREATE TABLE foobar (
id int UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
school_id int UNSIGNED NOT NULL,
course_id int UNSIGNED NOT NULL,
ssname varchar(64) NOT NULL,
/* some other fields */
FOREIGN KEY (school_id) REFERENCES school(id) ON DELETE RESTRICT ON UPDATE CASCADE,
FOREIGN KEY (course_id) REFERENCES course(id) ON DELETE RESTRICT ON UPDATE CASCADE,
INDEX idx_fb_si (school_id),
INDEX idx_fb_ci (course_id),
CONSTRAINT UNIQUE INDEX idx_fb_scs (school_id,course_id,ssname(16))
) ENGINE=innodb;
I would like to know how to modify the statement above so that the table is partitioned using the three fields I mentioned at the begining of this question (namely - school_id, course_id and the starting letter of the students surname).
Another question I would like to ask is this:
What happens in 'edge' situations for example if I attempt to insert a record that contains a valid* school_id, course_id or surname - for which no underlying partitioned table file exists - will mySQL automatically create the underlying file.?
Case in point. I have the following schools: New York Kindergaten, Belfast Elementary and the following courses: Lie Algebra in Infitesmal Dimensions, Entangled Entities
Also assume I have the following students (surnames): Bush, Blair, Hussein
When I add a new school (or course, or student), can I insert them into the foobar table (actually, I cant think why not). The reason I ask is that I forsee adding more schools and courses etc, which means that mySQL will have to create additional tables behind the scenes (as the hash will generate new keys).
I will be grateful if someone with experience in this area can confirm (preferably with links backing their assertion), that my understanding (i.e. no manual administration is required if I add new schools, courses or students to the database), is correct.
I dont know if my second question was well formed (clear) or not. If not, I will be glad to clarify further.
*VALID - by valid, I mean that it is valid in terms of not breaking referential integrity.
I doubt partitioning is as useful as you think. That said, there are a couple of other problems with what you're asking for (note: the entirety of this answer applies to MySQL 5; version 6 might be different):
columns used in KEY partitioning must be a part of the primary key. school_id, course_id and ssname are not part of the primary key.
more generally, every UNIQUE key (including the primary key) must include all columns in the partition1. This means you can only partition on the intersection of the columns in the UNIQUE keys. In your example, the intersection is empty.
most partitioning schemes (other than KEY) require integer or null values. If not NULL, ssname will not be an integer value.
foreign keys and partitioning aren't supported simultaneously2. This is a strong argument not to use partitioning.
Fortunately, collision free hashing is one thing you don't need to worry about, because partitioning is going to result in collisions (otherwise, you'd only have a single row in each partition). If you could ignore the above problems as well as the limitations on functions used in partitioning expressions, you could create a HASH partition with:
CREATE TABLE foobar (
...
) ENGINE=innodb
PARTITION BY HASH (school_id + course_id + ORD(ssname))
PARTITIONS 2
;
What should work is:
CREATE TABLE foobar (
id int UNSIGNED NOT NULL AUTO_INCREMENT,
school_id int UNSIGNED NOT NULL,
course_id int UNSIGNED NOT NULL,
ssname varchar(64) NOT NULL,
/* some other fields */
PRIMARY KEY (id, school_id, course_id),
INDEX idx_fb_si (school_id),
INDEX idx_fb_ci (course_id),
CONSTRAINT UNIQUE INDEX idx_fb_scs (school_id,course_id,ssname)
) ENGINE=innodb
PARTITION BY HASH (school_id + course_id)
PARTITIONS 2
;
or:
CREATE TABLE foobar (
id int UNSIGNED NOT NULL AUTO_INCREMENT,
school_id int UNSIGNED NOT NULL,
course_id int UNSIGNED NOT NULL,
ssname varchar(64) NOT NULL,
/* some other fields */
PRIMARY KEY (id, school_id, course_id, ssname),
INDEX idx_fb_si (school_id),
INDEX idx_fb_ci (course_id),
CONSTRAINT UNIQUE INDEX idx_fb_scs (school_id,course_id,ssname)
) ENGINE=innodb
PARTITION BY KEY (school_id, course_id, ssname)
PARTITIONS 2
;
As for the files that store tables, MySOL will create them, though it may do it when you define the table rather than when rows are inserted into it. You don't need to worry about how MySQL manages files. Remember, there are a limited number of partitions, defined when you create the table by the PARTITIONS *n* clause.