Faster selects when using GUID vs. Select WHERE? - mysql

I have a table with thousands of records. I do a lot of selects like this to find if a person exists.
SELECT * from person WHERE personid='U244A902'
Because the person ID is not pure numerical, I didn't use it as the primary key and went with auto-increment. But now I'm rethinking my strategy, because I think SELECTS are getting slower as the table fills up. I'm thinking the reason behind this slowness is because personid is not the primary key.
So my question, if I were to go through the trouble of restructuring the table and use the personid as the primary key instead, without an auto-increment, would that significantly speed up the selects? I'm talking about a table that has 200,000 records now and will fill up to about 5 million when done.

The slowness is due indirectly to the fact that the personid is not a primary key, in that it isn't indexed because it wasn't defined as a key. The quickest fix is to simply index it:
CREATE UNIQUE INDEX `idx_personid` ON `person` (`personid`);
However, if it is a unique value, it should be the table's primary key. There is no real need for a separate auto_increment key.
ALTER TABLE person DROP the_auto_increment_column;
ALTER TABLE person ADD PRIMARY KEY personid;
Note however, that if you were also using the_auto_increment_column as a FOREIGN KEY in other tables and dropped it in favor of personid, you would need to modify all your other tables to use personid instead. The difficulty of doing so may not be completely worth the gain for you.

You can to create an index to personid.
CREATE INDEX id_index ON person(personidid)

ALTER TABLE `person ` ADD INDEX `index1` (`personid`);
try to index your coloumns on which you are using where clause or selecting the coloumns

Related

sql management studio [duplicate]

At work we have a big database with unique indexes instead of primary keys and all works fine.
I'm designing new database for a new project and I have a dilemma:
In DB theory, primary key is fundamental element, that's OK, but in REAL projects what are advantages and disadvantages of both?
What do you use in projects?
EDIT: ...and what about primary keys and replication on MS SQL server?
What is a unique index?
A unique index on a column is an index on that column that also enforces the constraint that you cannot have two equal values in that column in two different rows. Example:
CREATE TABLE table1 (foo int, bar int);
CREATE UNIQUE INDEX ux_table1_foo ON table1(foo); -- Create unique index on foo.
INSERT INTO table1 (foo, bar) VALUES (1, 2); -- OK
INSERT INTO table1 (foo, bar) VALUES (2, 2); -- OK
INSERT INTO table1 (foo, bar) VALUES (3, 1); -- OK
INSERT INTO table1 (foo, bar) VALUES (1, 4); -- Fails!
Duplicate entry '1' for key 'ux_table1_foo'
The last insert fails because it violates the unique index on column foo when it tries to insert the value 1 into this column for a second time.
In MySQL a unique constraint allows multiple NULLs.
It is possible to make a unique index on mutiple columns.
Primary key versus unique index
Things that are the same:
A primary key implies a unique index.
Things that are different:
A primary key also implies NOT NULL, but a unique index can be nullable.
There can be only one primary key, but there can be multiple unique indexes.
If there is no clustered index defined then the primary key will be the clustered index.
You can see it like this:
A Primary Key IS Unique
A Unique value doesn't have to be the Representaion of the Element
Meaning?; Well a primary key is used to identify the element, if you have a "Person" you would like to have a Personal Identification Number ( SSN or such ) which is Primary to your Person.
On the other hand, the person might have an e-mail which is unique, but doensn't identify the person.
I always have Primary Keys, even in relationship tables ( the mid-table / connection table ) I might have them. Why? Well I like to follow a standard when coding, if the "Person" has an identifier, the Car has an identifier, well, then the Person -> Car should have an identifier as well!
Foreign keys work with unique constraints as well as primary keys. From Books Online:
A FOREIGN KEY constraint does not have
to be linked only to a PRIMARY KEY
constraint in another table; it can
also be defined to reference the
columns of a UNIQUE constraint in
another table
For transactional replication, you need the primary key. From Books Online:
Tables published for transactional
replication must have a primary key.
If a table is in a transactional
replication publication, you cannot
disable any indexes that are
associated with primary key columns.
These indexes are required by
replication. To disable an index, you
must first drop the table from the
publication.
Both answers are for SQL Server 2005.
The choice of when to use a surrogate primary key as opposed to a natural key is tricky. Answers such as, always or never, are rarely useful. I find that it depends on the situation.
As an example, I have the following tables:
CREATE TABLE toll_booths (
id INTEGER NOT NULL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
...
UNIQUE(name)
)
CREATE TABLE cars (
vin VARCHAR(17) NOT NULL PRIMARY KEY,
license_plate VARCHAR(10) NOT NULL,
...
UNIQUE(license_plate)
)
CREATE TABLE drive_through (
id INTEGER NOT NULL PRIMARY KEY,
toll_booth_id INTEGER NOT NULL REFERENCES toll_booths(id),
vin VARCHAR(17) NOT NULL REFERENCES cars(vin),
at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL,
amount NUMERIC(10,4) NOT NULL,
...
UNIQUE(toll_booth_id, vin)
)
We have two entity tables (toll_booths and cars) and a transaction table (drive_through). The toll_booth table uses a surrogate key because it has no natural attribute that is not guaranteed to change (the name can easily be changed). The cars table uses a natural primary key because it has a non-changing unique identifier (vin). The drive_through transaction table uses a surrogate key for easy identification, but also has a unique constraint on the attributes that are guaranteed to be unique at the time the record is inserted.
http://database-programmer.blogspot.com has some great articles on this particular subject.
There are no disadvantages of primary keys.
To add just some information to #MrWiggles and #Peter Parker answers, when table doesn't have primary key for example you won't be able to edit data in some applications (they will end up saying sth like cannot edit / delete data without primary key). Postgresql allows multiple NULL values to be in UNIQUE column, PRIMARY KEY doesn't allow NULLs. Also some ORM that generate code may have some problems with tables without primary keys.
UPDATE:
As far as I know it is not possible to replicate tables without primary keys in MSSQL, at least without problems (details).
If something is a primary key, depending on your DB engine, the entire table gets sorted by the primary key. This means that lookups are much faster on the primary key because it doesn't have to do any dereferencing as it has to do with any other kind of index. Besides that, it's just theory.
In addition to what the other answers have said, some databases and systems may require a primary to be present. One situation comes to mind; when using enterprise replication with Informix a PK must be present for a table to participate in replication.
As long as you do not allow NULL for a value, they should be handled the same, but the value NULL is handled differently on databases(AFAIK MS-SQL do not allow more than one(1) NULL value, mySQL and Oracle allow this, if a column is UNIQUE)
So you must define this column NOT NULL UNIQUE INDEX
There is no such thing as a primary key in relational data theory, so your question has to be answered on the practical level.
Unique indexes are not part of the SQL standard. The particular implementation of a DBMS will determine what are the consequences of declaring a unique index.
In Oracle, declaring a primary key will result in a unique index being created on your behalf, so the question is almost moot. I can't tell you about other DBMS products.
I favor declaring a primary key. This has the effect of forbidding NULLs in the key column(s) as well as forbidding duplicates. I also favor declaring REFERENCES constraints to enforce entity integrity. In many cases, declaring an index on the coulmn(s) of a foreign key will speed up joins. This kind of index should in general not be unique.
There are some disadvantages of CLUSTERED INDEXES vs UNIQUE INDEXES.
As already stated, a CLUSTERED INDEX physically orders the data in the table.
This mean that when you have a lot if inserts or deletes on a table containing a clustered index, everytime (well, almost, depending on your fill factor) you change the data, the physical table needs to be updated to stay sorted.
In relative small tables, this is fine, but when getting to tables that have GB's worth of data, and insertrs/deletes affect the sorting, you will run into problems.
I almost never create a table without a numeric primary key. If there is also a natural key that should be unique, I also put a unique index on it. Joins are faster on integers than multicolumn natural keys, data only needs to change in one place (natural keys tend to need to be updated which is a bad thing when it is in primary key - foreign key relationships). If you are going to need replication use a GUID instead of an integer, but for the most part I prefer a key that is user readable especially if they need to see it to distinguish between John Smith and John Smith.
The few times I don't create a surrogate key are when I have a joining table that is involved in a many-to-many relationship. In this case I declare both fields as the primary key.
My understanding is that a primary key and a unique index with a not‑null constraint, are the same (*); and I suppose one choose one or the other depending on what the specification explicitly states or implies (a matter of what you want to express and explicitly enforce). If it requires uniqueness and not‑null, then make it a primary key. If it just happens all parts of a unique index are not‑null without any requirement for that, then just make it a unique index.
The sole remaining difference is, you may have multiple not‑null unique indexes, while you can't have multiple primary keys.
(*) Excepting a practical difference: a primary key can be the default unique key for some operations, like defining a foreign key. Ex. if one define a foreign key referencing a table and does not provide the column name, if the referenced table has a primary key, then the primary key will be the referenced column. Otherwise, the the referenced column will have to be named explicitly.
Others here have mentioned DB replication, but I don't know about it.
Unique Index can have one NULL value. It creates NON-CLUSTERED INDEX.
Primary Key cannot contain NULL value. It creates CLUSTERED INDEX.
In MSSQL, Primary keys should be monotonically increasing for best performance on the clustered index. Therefore an integer with identity insert is better than any natural key that might not be monotonically increasing.
If it were up to me...
You need to satisfy the requirements of the database and of your applications.
Adding an auto-incrementing integer or long id column to every table to serve as the primary key takes care of the database requirements.
You would then add at least one other unique index to the table for use by your application. This would be the index on employee_id, or account_id, or customer_id, etc. If possible, this index should not be a composite index.
I would favor indices on several fields individually over composite indices. The database will use the single field indices whenever the where clause includes those fields, but it will only use a composite when you provide the fields in exactly the correct order - meaning it can't use the second field in a composite index unless you provide both the first and second in your where clause.
I am all for using calculated or Function type indices - and would recommend using them over composite indices. It makes it very easy to use the function index by using the same function in your where clause.
This takes care of your application requirements.
It is highly likely that other non-primary indices are actually mappings of that indexes key value to a primary key value, not rowid()'s. This allows for physical sorting operations and deletes to occur without having to recreate these indices.

Why do we usually create an index before dropping the primary key?

I think the following is a standard coding "pattern" when we need to change the primary key of a table to serve better our queries via the index:
ALTER TABLE employees
ADD UNIQUE INDEX tmp (employee_id, subsidiary_id);
ALTER TABLE employees
DROP PRIMARY KEY;
ALTER TABLE employees
ADD PRIMARY KEY (subsidiary_id, employee_id);
My understanding is that the tmp index is created before dropping the primary key in order to facilitate queries using the current primary key and not lose performance.
But I don't understand this.
When we execute an ALTER TABLE (I am referring to the ALTER TABLE to drop the primary key) the table will be locked until the operation finishes right?
So the queries will not be able to run anyway. So why create tmp in the first place?
I haven't seen this pattern. But, I would expect it for a slightly different reason. The purpose is less "facilitating" queries than ensuring that the pair of keys remains unique throughout the entire set of transactions.
In other words, between dropping the primary key and creating the new key, there is a short window of opportunity for someone to insert a duplicate pair of keys. Then the second operation will fail. By creating the unique index first, you prevent this from happening.
If you know there are no other users/queries using the system when you are modifying the table (say you are in single user mode), then there is no need to create the additional index.

How add unique key to existing table (with non uniques rows)

I want to add complex unique key to existing table. Key contains from 4 fields (user_id, game_id, date, time).
But table have non unique rows.
I understand that I can remove all duplicate dates and after that add complex key.
Maybe exist another solution without searching all duplicate data. (like add unique ignore etc).
UPD
I searched, how can remove duplicate mysql rows - i think it's good solution.
Remove duplicates using only a MySQL query?
You can do as yAnTar advised
ALTER TABLE TABLE_NAME ADD Id INT AUTO_INCREMENT PRIMARY KEY
OR
You can add a constraint
ALTER TABLE TABLE_NAME ADD CONSTRAINT constr_ID UNIQUE (user_id, game_id, date, time)
But I think to not lose your existing data, you can add an indentity column and then make a composite key.
The proper syntax would be - ALTER TABLE Table_Name ADD UNIQUE (column_name)
Example
ALTER TABLE 0_value_addition_setup ADD UNIQUE (`value_code`)
I had to solve a similar problem. I inherited a large source table from MS Access with nearly 15000 records that did not have a primary key, which I had to normalize and make CakePHP compatible. One convention of CakePHP is that every table has a the primary key, that it is first column and that it is called 'id'. The following simple statement did the trick for me under MySQL 5.5:
ALTER TABLE `database_name`.`table_name`
ADD COLUMN `id` INT NOT NULL AUTO_INCREMENT FIRST,
ADD PRIMARY KEY (`id`);
This added a new column 'id' of type integer in front of the existing data ("FIRST" keyword). The AUTO_INCREMENT keyword increments the ids starting with 1. Now every dataset has a unique numerical id. (Without the AUTO_INCREMENT statement all rows are populated with id = 0).
Set Multiple Unique key into table
ALTER TABLE table_name
ADD CONSTRAINT UC_table_name UNIQUE (field1,field2);
I am providing my solution with the assumption on your business logic. Basically in my design I will allow the table to store only one record for a user-game combination. So I will add a composite key to the table.
PRIMARY KEY (`user_id`,`game_id`)
Either create an auto-increment id or a UNIQUE id and add it to the natural key you are talking about with the 4 fields. this will make every row in the table unique...
For MySQL:
ALTER TABLE MyTable ADD MyId INT AUTO_INCREMENT PRIMARY KEY;
If yourColumnName has some values doesn't unique, and now you wanna add an unique index for it. Try this:
CREATE UNIQUE INDEX [IDX_Name] ON yourTableName (yourColumnName) WHERE [id]>1963 --1963 is max(id)-1
Now, try to insert some values are exists for test.

Sql Query join suggestions

I was wondering when having a parent table and a child table with foreign key like:
users
id | username | password |
users_blog
id | id_user | blog_title
is it ok to use id as auto increment also on join table (users_blog) or will i have problems of query speed?
also i would like to know which fields to add as PRIMARY and which as INDEX in users_blog table?
hope question is clear, sorry for my bad english :P
I don't think you actually need the id column in the users_blog table. I would make the id_user the primary index on that table unless you have another reason for doing so (perhaps the users_blog table actually has more columns and you are just not showing it to us?).
As far as performance, having the id column in the users_blog table shouldn't affect performance by itself but your queries will never use this index since it's very unlikely that you'll ever select data based on that column. Having the id_user column as the primary index will actually be of benefit for you and will speed up your joins and selects.
What's the cardinality between the user and user_blog? If it's 1:1, why do you need an id field in the user_blog table?
is it ok to use id as auto increment also on join table (users_blog)
or will i have problems of query speed?
Whether a field is auto-increment or not has no impact on how quickly you can retrieve data that is already in the database.
also i would like to know which fields to add as PRIMARY and which as
INDEX in users_blog table?
The purpose of PRIMARY KEY (and other constraints) is to enforce the correctness of data. Indexes are "just" for performance.
So what fields will be in PRIMARY KEY depends on what you wish to express with your data model:
If a users_blog row is identified with the id alone (i.e. there is a "non-identifying" relationship between these two tables), put id alone in the PRIMARY KEY.
If it is identified by a combination of id_user and id (aka. "identifying" relationship) then you'll have these two fields together in your PK.
As of indexes, that depends on how you are going to access your data. For example, if you do many JOINs you may consider an index on id_user.
A good tutorial on index performance can be found at:
http://use-the-index-luke.com
I don't see any problem with having an auto increment id column on users_blog.
The primary key can be id_user, id. As for indexing, this heavily depends on your usage.
I doubt you will be having any database related performance issue with a blog engine though, so indexing or not doesn't make much of a difference.
You dont have to use id column in users_blog table you can join the id_user with users table. also auto increment is not a problem to performance
It is a good idea to have an identifier column that is auto increment - this guarantees a way of uniquely identifying the row (in case all other columns are the same for two rows)
id is a good name for all table keys and it's the standard
<table>_id is the standard name for foreign keys - in your case use user_id (not id_user as you have)
mysql automatically creates indexes for columns defined as primary or foreign keys - there is no need to do anything here
IMHO, table names should be singular - ie user not users
You SQL should look something like:
create table user (
id int not null auto_increment primary key,
...
);
create table user_blog (
id int not null auto_increment primary key,
id_user int not null references user,
...
);

Maintaining a large table of unique values in MySQL

This is probably a common situation, but I couldn't find a specific answer on SO or Google.
I have a large table (>10 million rows) of friend relationships on a MySQL database that is very important and needs to be maintained such that there are no duplicate rows. The table stores the user's uids. The SQL for the table is:
CREATE TABLE possiblefriends(
id INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY(id),
user INT,
possiblefriend INT)
The way the table works is that each user has around 1000 or so "possible friends" that are discovered and need to be stored, but duplicate "possible friends" need to be avoided.
The problem is, due to the design of the program, over the course of a day, I need to add 1 million rows or more to the table that may or not be duplicate row entries. The simple answer would seem to be to check each row to see if it is a duplicate, and if not, then insert it into the table. But this technique will probably get very slow as the table size increases to 100 million rows, 1 billion rows or higher (which I expect it to soon).
What is the best (i.e. fastest) way to maintain this unique table?
I don't need to have a table with only unique values always on hand. I just need it once-a-day for batch jobs. In this case, should I create a separate table that just inserts all the possible rows (containing duplicate rows and all), and then at the end of the day, create a second table that calculates all the unique rows in the first table?
If not, what is the best way for this table long-term?
(If indexes are the best long-term solution, please tell me which indexes to use)
Add a unique index on (user, possiblefriend) then use one of:
INSERT ... ON DUPLICATE KEY UPDATE ...
INSERT IGNORE
REPLACE
to ensure that you don't get errors when you try to insert a duplicate row.
You might also want to consider if you can drop your auto-incrementing primary key and use (user, possiblefriend) as the primary key. This will decrease the size of your table and also the primary key will function as the index, saving you from having to create an extra index.
See also:
“INSERT IGNORE” vs “INSERT … ON DUPLICATE KEY UPDATE”
A unique index will let you be sure that the field is indeed unique, you can add a unique index like so:
CREATE TABLE possiblefriends(
id INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY(id),
user INT,
possiblefriend INT,
PRIMARY KEY (id),
UNIQUE INDEX DefUserID_UNIQUE (user ASC, possiblefriend ASC))
This will also speec up your table access significantly.
Your other issue with the mass insert is a little more tricky, you could use the in-built ON DUPLICATE KEY UPDATE function below:
INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;
UPDATE table SET c=c+1 WHERE a=1;