If I have a primary key of say id and I do a simple query for the key such as,
SELECT id FROM myTable WHERE id = X
Will it find one row and then stop looking as it is a primary key, or would it be better to tell mysql to limit its select by using LIMIT 1? For instance:
SELECT id FROM myTable WHERE id = X LIMIT 1
Does using “LIMIT 1” speed up a query on a primary key?
No. It's already as fast as can be without LIMIT 1. LIMIT 1 is effectively implied anyway.
Will it find one row and then stop looking as it is a primary key
Yes.
No table scan should be necessary at all here: it's a key-based lookup. The matching row is found and that's the end of the procedure.
There is no need to worry about this sort of "optimization".
Only one row will be fetched -- per unique index contract -- and the database is able to very quickly find all the (1) rows. It is able to do this because of the underlying structures that back the index (or primary key) support fast-seeking by value. There is no table-scan involved. (Generally a variant of a B-tree, but it might be hash-based, etc, is used. I suppose a smart query optimizer might also be able to pass down additional hints based on the unique constraint in effect, but I don't know enough about this.)
Related
Here is the table_a schema I have:
Field
type
id(PRIMARY)
bigint
status
tinyint
err_code
bigint
...
...
The sql I want to execute will be:
select * from table_a where id > 123456 and status = -1 and err_code = 100001 order by id asc LIMIT 500
I'd like to query this sql above in real time.
My question is what kind of the index should I use here, I ready create a composite index -- idx_id_status_err_code, but it seems that mysql does not choose it.
There are two possible keys reported by explain statement -- PRIMARY and idx_id_status_err_code, but mysql use primary key instead of idx_id_status_err_code.
Another thing, there are some concurrent write operations, so I add row lock(for update not share mode) to target rows. I'm not sure if these write locks will affect the sql I mentioned above.
Any help is appreciated.
where id > 123456 and status = -1 and err_code = 100001 order by id
needs
INDEX(status, error_code, -- 1st because they are tested with "=", either order
id) -- for range test (>) and for ORDER BY
Since that handles all of the WHERE, GROUP BY, and ORDER BY, the Optimizer can even handle the LIMIT 500, thereby stopping after 500 rows.
When you start an INDEX with the column(s) of the PRIMARY KEY (id), there is little reason for the Optimizer to pick the INDEX instead of simply reaching into the data. This is especially true since you are fetching columns that are not in the index (SELECT *).
Avoid "index hints". What helps today may hurt tomorrow (when the data distribution changes).
You mentioned a "row lock"; let's hear more about why you think you need such. If you are afraid that some other thread will change one of the rows this SELECT picked, then that is better fixed by adding a suitable WHERE to the UPDATE -- to make sure the row still has that status and error_code.
Given the following SQL table :
Employee(ssn, name, dept, manager,
salary)
You discover that the following query is significantly slower than
expected. There is an index on salary, and you have verified that
the query plan is using it.
SELECT *
FROM Employee
WHERE salary = 48000
Please give a possible reason why this query is slower than expected, and provide a tuning solution that
addresses that reason.
I have two ideas for why this query is slower than expected. One is that we are trying to SELECT * instead of SELECT Employee.salary which would slow down the query as we must search across all columns instead of one. Another idea is that the index on salary is non-clustered, and we want to use a clustered index, as the company could be very large and it would make sense to organize the table by the salary field.
Would either of those two solutions speed up this query? I.e. either change SELECT * to SELECT Employee.salary or explicitly set the index on salary to be clustered?
What indexes do you have now?
Is it really "slow"? What evidence do you have?
Comments on "SELECT * instead of SELECT Employee.salary" --
* is bad form because tomorrow you might add a column, thereby breaking any code that is expecting a certain number of columns in a certain order.
Dealing with * versus salary does not happen until after the row(s) is located.
Locating the row(s) is the costly part.
On the other hand, if you have INDEX(salary) and only look at salary then the index is "covering". That means that the "data" (the other columns) does not need to be fetched. Hence, faster. But this is probably beyond what your teacher has told you about yet.
Comments on "the index on salary is non-clustered, and we want to use a clustered index" --
In MySQL (not necessarily in other RDBMSs), InnoDB has exactly one PRIMARY KEY and it is always UNIQUE and "clustered".
That is, "clustered" implies "unique", which seems inappropriate for "salary".
In InnoDB a "secondary key" implicitly includes the column(s) of the PK (ssn?), with which it can reach over into the data.
"verified that the query plan" -- Have you learned about EXPLAIN SELECT ...?
More Tips on creating the optimal index for a given SELECT.
I will try to be as simple as I can be ,
You can not simply make salary a clustered index unless you make it a unique or primary which is kind of both stupid and senseless because two person can have same salary.
There can be only one clustered index per table according to MYSQL documentation. Database by default elects primary key for being clustered index .
If you do not define a PRIMARY KEY for your table, MySQL locates the
first UNIQUE index where all the key columns are NOT NULL and InnoDB
uses it as the clustered index.
To speed up your query I have a few suggestions , go for secondary indexes,
If you want to search a salary by direct value then hash based indexes are a better option, if MYSQL supports that already.
If you want to search a value using greater than , less than or some range ,then B-tree indexes are better choice.
The first option is faster than the second one , but is limited to only equality operator.
Hope it helps.
Is there an implementation of SQL database allowing to select from a table the block of n following records -- in respect of the index -- starting from a specified index, with performance O(n + log tables_size)? But also allowing adding a record in O(log tables_size). If so, how to do it?
I'm probably a dreamer but, is it possible with MySQL?
If id is the primary key on a table, then the following will return in order of the time needed to fetch the records, plus the initial index seek:
select t.*
from t
where id >= SOMEID
order by id
limit <n>;
Adding a record consists of two parts. The first part is finding available space and the second part is inserting into the index. A b-tree index should require O(log table_size) for an insert. If the pages are full and you are only inserting at the end of the table, then finding the right page is constant time.
In other words, if I understand your question correctly, primary key clustered indexes do exactly what you are asking for.
I'm still trying to get my head around the best way to use INDEXES in MySQL. How do you know when to merge them together and when to have them separate?
Below are the indexes from the Wordpress posts table. See how post_name, post_parent and post_author are seperate entries? And then they have type_status_date which is a mixture of 4 fields?
http://img215.imageshack.us/img215/5976/screenshot20120426at431.png
I don't understand the logic behind this? Can anyone enlighten me?
Going to be a bit of a long answer but here we go. Please note I am not going to deal with the differences in database engines here(MyISAM and InnoDB have distinct way of implementing what I am trying to describe)
First thing you have to understand about a index is that it is a separate data structure stored on disk. Normally this is a b-tree data structure containing the column(s) that you have indexed and also contain a pointer to the row in the table(this pointer is normally the primary key).
The only index that is stored with the data is the primary key index. Thus a primary key index IS the table.
Lets assume you have following table definition.
CREATE TABLE `Student` (
`StudentNumber` INT NOT NULL ,
`Name` VARCHAR(32) NULL ,
`Surname` VARCHAR(32) NULL ,
`StudentEmail` VARCHAR(32) NULL ,
PRIMARY KEY (`StudentNumber`) );
Since we have a primary key on StudentID there will be a index containing the primary key and the other columns in the index. If you had to look at the data in the index you would probably see something like this.
1 , John ,Doe ,Jdoe#gmail.com
As you can see this is the table data once again showing you that the primary key index IS the table.
The StudentNumber column is indexed which allows your to effectively search on it the rest of the data is stored with the key. Thus if ran the following query:
SELECT * FROM Student WHERE StudentNumber=1
MySQL would use the primary index to quickly find the row and the read the data stored with the indexed column. Since there is a index MySQL can use the index to do a effective binary seek operation on the b-tree.
Also when it comes to retrieving the data after doing the search MySQL can read the data from the index thus we are using 1 operation in the index to retrieve the data. Now if I ran the following query:
SELECT * FROM Student WHERE Name ='Joe'
MySQL would check if there is a index that it could use to speed the query up. However in my case there is no index on name so MySQL would do a sequential read from the table one row at a time from the first row to the last.
At each row it would evaluate the row against the where clause and return matching row. So basically it reads the primary key index from top to bottom. Remember the primary key index is the table.
If I ran the following statement:
ALTER TABLE `TimLog`.`student`
ADD INDEX `ix_name` (`Name` ASC) ;
ALTER TABLE `TimLog`.`student`
ADD INDEX `ix_surname` (`Surname` ASC) ;
MySQL would create new indexes on the Student table. This will be stored away from the table on disk and the data inside would look something like this:
Data in ix_Name
John, 1 <--PRIMARY KEY VALUE
Data in ix_Surname
Doe, 1 <--PRIMARY KEY VALUE
Notice the data in the ix_Name index is the name and the primary key value. Great so if I ran the previous select statement MySQL would then read the ix_name index and get the primary key value for matching items and then use the primary key index to get the rest of the data.
So the number of operations to get the data from the index is 2. The matching rows are found in the index and then a lookup happens on the primary key to get the row data out.
You now have the following query:
SELECT * FROM Student WHERE Name='John' AND surname ='Doe'
Here MySQL cant use both indexes as it would be a waste of operations. If MySQL had to use both indexes in this query the following would happen(this should not happen).
1 Find in the ix_Name the rows with the value John
2 Read the primary key that matches to get the row data
3 Store the matching results
4 Find in the ix Surname the rows with the value Doe
5 Read the primary key that matches to get row data.
6 Store the matching results
7 Take the Name results and Surname results and merge them
8 Return query results.
This is really a waste of IO as MySQL would then read the table twice. Basically using one index would be better than trying to use two(I will explain in a momnet why). MySQL will choose 1 index to use in a this simple query.
So how does MySQL decide on which index to use?
MySQL keeps statistics around indexes internally. These statistics tell MySQL basically how unique a index is. So for the sake of argument lets say the surname index (ix_surname)was more unique than the name index(ix_name) MySQL would use the surname index (ix_surname).
Thus query retrieval would be like this:
1 Use the ix_surname and find rows that match the value Doe
2 Read the primary key and apply the filter for the value John on the actual column data in the row.
3 Return the matched row.
As you can see the number of operations in this search is much less. I have over simplified a lot of the technical detail. Indexing is a interesting thing to master but you have to look at it from the perspective of how do I get the data with the minimal amount of IO.
Hope it is as clear as mud now!
MySQL cannot normally use more than one index at a time. That means, for instance, that when you have a query that filters or sorts on two fields you put them both into the same index.
WordPress likely has a common query that filters and/or sorts on post_type, post_status and post_date. Making an educated guess as to what they stand for, this would likely be the core query for WordPress's Post listing pages. So the three fields are put into the same index.
Assume I have this table:
create table table_a (
id int,
name varchar(25),
address varchar(25),
primary key (id)
) engine = innodb;
When I run this query:
select * from table_a where id >= 'x' and name = 'test';
How will MySQL process it? Will it pull all the id's first (assume 1000 rows) then apply the where clause name = 'test'?
Or while it looks for the ids, it is already applying the where clause at the same time?
As id is the PK (and no index on name) it will load all rows that satisfy the id based criterion into memory after which it will filter the resultset by the name criterion. Adding a composite index containing both fields would mean that it would only load the records that satisfy both criteria. Adding a separate single column index on the name field may not result in an index merge operation, in which case the index would have no effect.
Do you have indexes on either column? That may affect the execution plan. The other thing is one might cast the 'x'::int to ensure a numeric comparison instead of a string comparison.
For the best result, you should have a single index which includes both of the columns id and name.
In your case, I can't answer the affect of the primary index to that query. That depends on DBMS's and versions. If you really don't want to put more index (because more index means slow write and updates) just populate your table with like 10.000.000 random results, try it and see the effect.
you can compare the execution times by executing the query first when the id comes first in the where clause and then interchange and bring the name first. to see an example of mysql performance with indexes check this out http://www.mysqlperformanceblog.com/2006/06/02/indexes-in-mysql/
You can get information on how the query is processed by running EXPLAIN on the query.
If the idea is to optimize that query then you might want to add an index like:
alter table table_a add unique index name_id_idx (name, id);