I was wondering when having a parent table and a child table with foreign key like:
users
id | username | password |
users_blog
id | id_user | blog_title
is it ok to use id as auto increment also on join table (users_blog) or will i have problems of query speed?
also i would like to know which fields to add as PRIMARY and which as INDEX in users_blog table?
hope question is clear, sorry for my bad english :P
I don't think you actually need the id column in the users_blog table. I would make the id_user the primary index on that table unless you have another reason for doing so (perhaps the users_blog table actually has more columns and you are just not showing it to us?).
As far as performance, having the id column in the users_blog table shouldn't affect performance by itself but your queries will never use this index since it's very unlikely that you'll ever select data based on that column. Having the id_user column as the primary index will actually be of benefit for you and will speed up your joins and selects.
What's the cardinality between the user and user_blog? If it's 1:1, why do you need an id field in the user_blog table?
is it ok to use id as auto increment also on join table (users_blog)
or will i have problems of query speed?
Whether a field is auto-increment or not has no impact on how quickly you can retrieve data that is already in the database.
also i would like to know which fields to add as PRIMARY and which as
INDEX in users_blog table?
The purpose of PRIMARY KEY (and other constraints) is to enforce the correctness of data. Indexes are "just" for performance.
So what fields will be in PRIMARY KEY depends on what you wish to express with your data model:
If a users_blog row is identified with the id alone (i.e. there is a "non-identifying" relationship between these two tables), put id alone in the PRIMARY KEY.
If it is identified by a combination of id_user and id (aka. "identifying" relationship) then you'll have these two fields together in your PK.
As of indexes, that depends on how you are going to access your data. For example, if you do many JOINs you may consider an index on id_user.
A good tutorial on index performance can be found at:
http://use-the-index-luke.com
I don't see any problem with having an auto increment id column on users_blog.
The primary key can be id_user, id. As for indexing, this heavily depends on your usage.
I doubt you will be having any database related performance issue with a blog engine though, so indexing or not doesn't make much of a difference.
You dont have to use id column in users_blog table you can join the id_user with users table. also auto increment is not a problem to performance
It is a good idea to have an identifier column that is auto increment - this guarantees a way of uniquely identifying the row (in case all other columns are the same for two rows)
id is a good name for all table keys and it's the standard
<table>_id is the standard name for foreign keys - in your case use user_id (not id_user as you have)
mysql automatically creates indexes for columns defined as primary or foreign keys - there is no need to do anything here
IMHO, table names should be singular - ie user not users
You SQL should look something like:
create table user (
id int not null auto_increment primary key,
...
);
create table user_blog (
id int not null auto_increment primary key,
id_user int not null references user,
...
);
Related
Imagine we have three tables in a MySQL database:
posts
categories
category_post
There is a one-to-many relationship between posts and categories so that a single post may have many categories.
The category_post table is the pivot table between categories and posts and has the following columns:
id (primary key, auto-incrementing, big integer)
category_id
post_id
Let's also imagine that we have 1,000,000 rows in our category_post table.
My question is:
Is there any performance benefit to having the id column in the category_post table or does it just take up extra space?
Posts and categories is probably many-to-many, not one-to-many.
A many-to-many relationship table is best done something like
CREATE TABLE a_b (
a_id ... NOT NULL,
b_id ... NOT NULL,
PRIMARY KEY (a_id, b_id),
INDEX(b_id, a_id) -- include this if you need to go both directions
) ENGINE = InnoDB;
With that, you automatically get "clustered" lookups both directions, and you avoid the unnecessary artificial id for the table.
(By the way, N.B., an implicit PK is 6 bytes, not 8. There is a lengthy post by Jeremy Cole on the topic.)
A one-to-many relationship does not need this extra table. Instead, have one id inside the other table. For example, a City table will have the id for the Country in it.
Having category_id and post_id as a compound primary key will have better performance than having an extra id as a primary key. This is because making it a primary key will also create an index on it automatically. If you really want an extra Id column you can improve performance by manually defining an index on category_id and post_id. There is no benefit of having an extra key column though and this is generally a bad practice.
not having id is good, but when you care about ordering by the pivot table you will need to have id or timestamp in pivot table
I have a table with thousands of records. I do a lot of selects like this to find if a person exists.
SELECT * from person WHERE personid='U244A902'
Because the person ID is not pure numerical, I didn't use it as the primary key and went with auto-increment. But now I'm rethinking my strategy, because I think SELECTS are getting slower as the table fills up. I'm thinking the reason behind this slowness is because personid is not the primary key.
So my question, if I were to go through the trouble of restructuring the table and use the personid as the primary key instead, without an auto-increment, would that significantly speed up the selects? I'm talking about a table that has 200,000 records now and will fill up to about 5 million when done.
The slowness is due indirectly to the fact that the personid is not a primary key, in that it isn't indexed because it wasn't defined as a key. The quickest fix is to simply index it:
CREATE UNIQUE INDEX `idx_personid` ON `person` (`personid`);
However, if it is a unique value, it should be the table's primary key. There is no real need for a separate auto_increment key.
ALTER TABLE person DROP the_auto_increment_column;
ALTER TABLE person ADD PRIMARY KEY personid;
Note however, that if you were also using the_auto_increment_column as a FOREIGN KEY in other tables and dropped it in favor of personid, you would need to modify all your other tables to use personid instead. The difficulty of doing so may not be completely worth the gain for you.
You can to create an index to personid.
CREATE INDEX id_index ON person(personidid)
ALTER TABLE `person ` ADD INDEX `index1` (`personid`);
try to index your coloumns on which you are using where clause or selecting the coloumns
I have roles for users.
User can have multiple roles. I have a table called users_roles.
I have three columns - id,user,role.
id is an auto-increment column.
So,
Is it a good idea to drop the id column since I never use that in code?
If yes, then what column should be the index for this table? Or should it not have an index at all?
I agree that if user is userid then you dont need the id column, userid can be your indexed PK.
If user is the name of the user then you are going to want to keep the id, or create a user_id so that you can have a valid key to index.
users_roles is a many many link table.
There are at least 2 common approaches to primary keys on many:many tables:
users_roles has its own surrogate Primary key, as in the case here (users_roles.id)
OR, you create a composite key consisting of (user, role), since a user shouldn't be in the same role more than once.
There are many discussions on simple vs composite keys e.g. Why single primary key is better than composite keys?
Note that indexes and primary keys are different concepts. Primary Key is for uniqueness, Index is for performance. (You can have multiple indexes on a table, but only one PK)
If, as you seem to be saying, that no other tables reference user_roles, then you don't actually need a primary key.
If your users_roles table gets large, you will likely want to add an index on the users column, and possibly also the roles table, e.g. if you often search for users in a particular role.
If you are supposing to delete the "id" field then how you will make a relation between user and user_roles table.
It is always better to define a primary key. The default index is created when you define the primary key. And it somehow increases the performance.
Also when you define a foreign key the foreign Key index will also be generated. And hence your table query execution will become faster.
This is your first answer:
According to your requirements, for the current time being you can delete the "id" primary key from user_roles table as it is just use as a relationship table between users and roles.
But in most of PHP frameworks, this is not a good practice to drop a primary key even in relationship table.
This is your second answer: If you would drop a primary key, then you will have to maintain the indexes on "user" and "role" field as a foreignKey index. And if you are not going to drop a primary key from user_roles table. Then 3 indexes would be generated for "id", "user" and "role" fields. First index will be the primary index and rest two are foreignKey index.
Explicitly defining of more indexes on a table also causes some extra overhead on query execution.
I have 2 MySQL tables with the following schemas for a web site that's kinda like a magazine.
Article (articleId int auto increment ,
title varchar(100),
titleHash guid -- a hash of the title
articleText varchar(4000)
userId int)
User (userId int autoincrement
userName varchar(30)
email etc...)
The most important query is;
select title,articleText,userName,email
from Article inner join user
on article.userId = user.UserId
where titleHash = <some hash>
I am thinking of using the articleId and titleHash columns together as a clustered primary y for the Article table. And userId and userName as a primary key for the user table.
As the searches will be based on titlehash and userName columns.
Also titlehash and userName are unqiue by design and will not change normally.
The articleId and userid columns are not business keys and are not visible to the application, so they'll only be used for joins.
I'm going to use mysql table partitioning on the titlehash column so the selects will be faster as the db will be able to use partition elimination based on that column.
I'm using innoDB as the storage engine;
So here are my questions;
Do I need to create another index on
the titlehash column as the primary
key (articleId,titlehash) is not
good for the searches on the
titlehash column as it is the second
column on the primary key ?
What are the problems with this
design ?
I need the selects to be very fast and expects the tables to have millions of rows and please note that the int Id columns are not visible to the business layer and can never be used to find a record
I'm from a sql server background and going to use mysql as using the partitioning on sql server will cost me a fortune as it is only available in the Enterprise edition.
So DB gurus, please help me; Many thanks.
As written, your "most important query" doesn't actually appear to involve the User table at all. If there isn't just something missing, the best way to speed this up will be to get the User table out of the picture and create an index on titleHash. Boom, done.
If there's another condition on that query, we'll need to know what it is to give any more specific advice.
Given your changes, all that should be necessary as far as keys should be:
On Article:
PRIMARY KEY (articleId) (no additional columns, don't try to be fancy)
KEY (userId)
UNIQUE KEY (titleHash)
On User:
PRIMARY KEY (userId)
Don't try to get fancy with composite primary keys. Primary keys which just consist of an autoincrementing integer are handled more efficiently by InnoDB, as the key can be used internally as a row ID. In effect, you get one integer primary key "for free".
Above all else, test with real data and look at the results from EXPLAINing your query.
I have two tables A,B. Both tables will have more than 1 million records.
A has two columns - id, photo_id. id is the primary key and photo_id is unique.
A needs to be referenced in B.
3 questions:
Can I ignore A's id and use photo_id to link the two tables?
Is there any benefit of using the primary column as opposed to using a unique column in B?
What difference will it make to have a foreign key? I probably won't use foreign key since it's not supported on the server I'm using. Will this make a significant difference when there are 1+ mil records?
Skip having an id-column. If you have a photo_id that is already unique you should use that instead. A primary key (in MySQL InnoDB) is automatically clustered, which means that the data is stored in the index, making for VERY efficient retrieval of an entire row if you use the primary key as reference.
To answer your questions:
Yes. And remove the id-column. It is an artificial key and provide no benefits over using the photo_id
Yes. The primary key index is clustered and makes for very efficient querying on both exact and range-queries. (i.e. select * from photos where 2 < id AND id < 10)
A foreign key puts a constraint on your database tables, and ensure that the data in the tables are in a consistent state. Without foreign keys you have to have some application level logic to ensure consistency.
I would only remove your id column if you are positive that photo_id values will never change. If multiple rows of your B table reference a specific A row and the photo_id for that row needs to be updated, you will want to be referencing the id column from your B table.