I have a table in MySQL with two columns
id int(11) unsigned NOT NULL AUTO_INCREMENT,
B varchar(191) CHARACTER SET utf8mb4 DEFAULT NULL,
The id being the PK.
I need to do a lookup in a query using either one of these. id in (:idList) or B in (:bList)
Would this query perform better if, there is a composite index with these two columns in them?
No, it will not.
Indexes can be used to look up values from the leftmost columns in an index:
MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on. If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table.
So, if you have a composite index on id, B fields (in this order), then the index can be used to look up values based on their id, or a combination of id and B values. But cannot be used to look up values based on B only. However, in case of an or condition that's what you need to do: look up values based on B only.
If both fields in the or condition are leftmost fields in an index, then MySQL attempts to do an index merge optimisation, so you may actually be better off having separate indexes for these two fields.
Note: if you use innodb table engine, then there is no point in adding the primary key to any multi column index because innodb silently adds the PK to every index.
For OR I dont think so.
Optimizer will try to find a match in the first side, if fail will try the second side. So Individual index for each search will be better.
For AND a composite index will help.
MySQL index TIPS
Of course you can always add the index and compare the explain plan.
MySQL Explain Plan
The trick for optimizing OR is to use UNION. (At least, it works well in some cases.)
( SELECT ... FROM ... WHERE id IN (...) )
UNION DISTINCT
( SELECT ... FROM ... WHERE B IN (...) )
Notes:
Need separate indexes on id and B.
No benefit from any composite index (unless it is also "covering").
Change DISTINCT to ALL if you know that there won't be any rows found by both the id and B tests. (This avoids a de-dup pass.)
If you need ORDER BY, add it after the SQL above.
If you need LIMIT, it gets messier. (This is probably not relevant for IN, but it often is with ORDER BY.)
If the rows are 'wide' and the resultset has very few rows, it may be further beneficial to do
Something like this:
SELECT t...
FROM t
JOIN (
( SELECT id FROM t WHERE id IN (...) )
UNION DISTINCT
( SELECT id FROM t WHERE B IN (...) )
) AS u USING(id);
Notes:
This needs PRIMARY KEY(id) and INDEX(B, id). (Actually there is no diff, as Michael pointed out.)
The UNION is cheaper here because of collecting only id, not the bulky columns.
The SELECTs in the UNION are faster because you should be able to provide "covering" indexes.
ORDER BY would go at the very end.
Related
The problem I have is the following:
I have a table that contains about 100000000 rows
it has 22 fields - some numeric, some text
it has a primary key id (auto-incremented integer)
it has a field another_id of type bigint, and a unique key on it
it has a field called state that can take only 4 integer values (0 to 3)
I need that the queries of the following form are executed as fast as possible:
SELECT COUNT(*)
FROM my_table
WHERE another_id IN ( <about 100 values> )
AND state = ...
for different values of state.
How should the index look like? I was thinking about two options:
KEY another_id:state (another_id, state)
KEY state:another_id (state, another_id)
Is there any difference in performance between those two variants? Is there anything else to consider?
Edit: engine is InnoDB
For the query you show, you should create the index with state, another_id in that order.
Define the index with any columns referenced in equality conditions first, after them add one column referenced in a range condition or ORDER BY or GROUP BY.
You may also like my answer to Does Order of Fields of Multi-Column Index in MySQL Matter or my presentation How to Design Indexes, Really, or the video.
I agree with the answer above. One clarification though is that you want to have ita hash index not btree index. It should work faster. The hash index wouldn't work well with any queries that involve inequality such as <=
Step 1:
I am creating a simple table.
CREATE TABLE `indexs`.`table_one` (
`id` INT NOT NULL AUTO_INCREMENT,
`name` VARCHAR(45) NULL,
PRIMARY KEY (`id`));
Step 2:
I make two inserts into this table.
insert into table_one (name) values ("B");
insert into table_one (name) values ("A");
Step 3:
I make a select, I get a table, the records in which are ordered by id.
SELECT * FROM table_one;
This is the expected result, because in mysql the primary key is a clustered index, therefore the data will be physically ordered by it.
Now the part I don't understand.
Step 4:
I am creating an index on the name column.
CREATE INDEX index_name ON table_one(name)
I repeat step 3 again, but I get a different result. The lines are now ordered according to the name column.
Why is this happening? why the order of the rows in the table changes in accordance with the new index on the name column, because as far as I understand, in mysql, the primary key is the only clustered index, and all indexes created additionally are secondary.
I make a select, I get a table, the records in which are ordered by id. [...] This is the expected result, because in mysql the primary key is a clustered index, therefore the data will be physically ordered by it.
There is some misunderstanding of a concept here.
Table rows have no inherent ordering: they represent unordered set of rows. While the clustered index enforces a physical ordering of data in storage, it does not guarantee the order in which rows are returned by a select query.
If you want the results of the query to be ordered, then use an order by clause. Without such clause, the ordering or the rows is undefined: the database is free to return results in whichever order it likes, and results are not guaranteed to be consistent over consecutive executions of the same query.
select * from table_one order by id;
select * from table_one order by name;
(GMB explains most)
Why is this happening? why the order of the rows in the table changes in accordance with the new index on the name column
Use EXPLAIN SELECT ... -- it might give a clue of what I am about to suggest.
You added INDEX(name). In InnoDB, the PRIMARY KEY column(s) are tacked onto the end of each secondary index. So it is effectively a BTree ordered by (name,id) and containing only those columns.
Now, the Optimizer is free to fetch the data from the index, since it has everything you asked for (id and name). (This index is called "covering".)
Since you did not specify an ORDER BY, the result set ordering is valid (see GMB's discussion).
Moral of the story: If you want an ordering, specify ORDER BY. (The Optimizer is smart enough to "do no extra work" if it can see how to provide the data without doing a sort.
Further experiment: Add another column to the table but don't change the indexes. Now you will find SELECT * FROM t is ordered differently than SELECT id, name FROM t. I think I have given you enough clues to predict this difference, if not, ask.
Suppose the table T has three columns,
id int not null auto_increment,
my_id int,
name varchar(200).
and the query is "select * from T where my_id in (var_1, var_2, ..., var_n) and name = 'name_var'".
Is there any performance difference between below two indices?
Index1: (my_id, name)
Index2: (name, my_id).
•Index1: (my_id, name)
•Index2: (name, my_id).
Yes, above two would slightly differ when it comes to query performance.
Always, the leftmost fields are the most important in determining the efficiency and selectivity of an index.
index should be built on the column(s) which are frequently used in the WHERE, ORDER BY, and GROUP BY clauses.
Hope this helps!
In a composite index, the column to be searched should appear first. So, if you are searching for a set of id values, you'll want id to show up first in the index.
But if id is the primary key and you're using a SELECT * clause to retrieve the whole row, it doesn't make sense to add another index. The way tables are organized, all the data of the row appears clustered with each id value. So just use the index on the primary key.
tl;dr: neither (id,name) nor (name,id) will help this query.
In general, it is best to start the INDEX with the WHERE clauses with col = const. One range can come last. IN is sort of like =, sort like a range. Hence, this is best:
INDEX(name, id)
Think of it this way. The index is an ordered list. With this index, it will start at the name=... and then have to scan or leapfrog through all the ids with that name.
I suspect the PRIMARY KEY is not (id). If it were, why would you be checking the name?
I am new to MySQL, and I need to add indexes on an existing table (which contains roughly 200K rows).
Table mytable: (id:integer, created_time:timestamp, deleted_time:timestamp)
I have 2 queries which need to benefit from the index:
select s.id from mytable s
where s.completed_time is not null
and s.completed_time < ?
and ( s.deleted_time is null
or s.deleted_time >= ? );
and :
select s.id from mytable s
where
s.completed_time is not null
and (
( s.deleted_time is not null
and s.deleted_time >= ?
and s.deleted_time < ? )
or ( s.completed_time >= ?
and s.completed_time < ? ) ) ;
I am considering introducing a multi column index (on completed_time and deleted_time)
However, I am not sure if the condition "s.completed_time is not null" matches the criteria to make these queries use the composite index.
Do you have any thoughts about what is best (composite index or 2 indexes)? I am trying to use "explain" to figure out what's best but I am unsure on how to interpret the results.
And more generally: with a table having a composite index on (column1, column2), I understand that filtering on column2 only will not use the index.
But what if I introduce a dummy condition like (column1 > MIN_VALUE), or (column1 is not null) when it is correct to do so?
Thanks!
Assuming the two queries you mentionned will be "frequently" used, I would advise a composite index versus two distinct indexes on two columns.
As you already know, a query searching on two columns might sometimes use two separate indexes by (roughly) merging these two indexes into one. But this is sub-optimal, and has a cost in terms of performance.
Conversely, a composite index can only be used if the left-most columns are involved in the seach condition, or as the manual puts it:
MySQL can use multiple-column indexes for queries that test all the
columns in the index, or queries that test just the first column, the
first two columns, the first three columns, and so on
With regards to your suggested hack (introducing dummy conditions so as to be able to use the index), this might work, but I would rather advise creating a second index on column2 only (besides the two-column index on (column1, column2)). This comes at a (minor) cost, but is so much more elegant and reusable.
As for the suggestion of getting rid of NULL values, I strongly disagree. It is sematnically incorrect to use 0. 0 means "zero", NULL means "no value". All your tests would need to account for this special value, whereas IS NULL is standard and well understood everywhere. It is also just impractical in some situations (try to insert 0 with SQL_MODE='TRADITIONAL').
On the other hand, the performance gain is dubious (I believe this is mostly based on the false assumption that NULL values are not indexed). It is easy to verify that a query like s.completed_time IS NOT NULL will hit an index if such an index exists.
Provided you get rid of the nulls as mentioned by a user. A composite index might work for the first query.
But the second query the index may not work. - As it has an OR between your indexed columns.
Usually it goes by the Left-Most columns in the index and in that order.
I suggest creating separate indexes. Though it has an overhead (May be) of http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html
I need to retrieve columns from two tables and I have used an INNER JOIN. But its consuming lot of time during loading the page. Is there any better and faster way to achieve the same?
Select P.Col1, P.Col2, P.Col3, P.Col4, P.Col5, C.Col1, C.Col2, C.Col3 from Pyalers P inner join Customers C on C.Col1 = P.Col1 where P.Col2 = 5
Thanks in Advance.
Without knowing your DDL, there's no way to say.
But conceptually this is ok, just be sure you have proper indexs sets.
For instance: (is your table name really 'Pyalers'? Assuming 'players')
CREATE INDEX idx_players ON `players` (col1);
CREATE INDEX idx_customers ON `customers` (col1);
use the columns you need for joinning the 2 tables.
http://dev.mysql.com/doc/refman/5.0/en/create-index.html
You're doing it the right way, but if you don't have indexes on your tables on the correct columns, it's not going to be very fast for tables of any size. Do Pyalers.col1 and Customers.col1 both have indexes on them?
Show us how the tables are defined.
Be sure your table has the needed indexes... as a "thumb rule", every field which is used for search (WHERE) or data joins (INNER JOIN, LEFT JOIN, RIGHT JOIN) should be indexed.
Example: If you are creating a table, you can add your indexes at that time (notice that your tables should always have a primary key):
CREATE TABLE myTable (
myId int unsigned not null,
someField varchar(50),
primary key (myId),
index someIdx(someField)
);
If your table already exists, and you want to add indexes, you need to use the ALTER statement:
ALTER TABLE myTable
ADD INDEX someIdx(someField),
ADD PRIMARY KEY (myId);
Rules:
To define an index you most provide a unique name for it, and specify the fields included in the index: INDEX myIndex(field1, field2, ...)
There are different types of indexes: PRIMARY KEY is used for primary keys (that's obvious, huh?); INDEX is an 'ordinary index', just used to speed up search and join operations; UNIQUE INDEX is an index that prevents duplicate values.
Recomendations:
Whenever you can, index all numeric and date fields that are relevant (ids, birth date, etc.). Avoid creating indexes on fields that contain 'double' values.
Don't abuse of indexes, because abuse can create very large index files.
Tips:
If you want to see how your query will be executed, you can use the EXPLAIN statement:
EXPLAIN SELECT a., b. FROM a INNER JOIN b on a.myId = b.otherId
This instruction will show you the execution plan of the query. If in the last column you see 'file sort' or 'using temporary', you may (just may) need aditional indexes (notice that if you use GROUP BY you will almost always get the 'using temporary' message)
Hope this help you