MySQL multicolumn index - mysql

Should I include col3 & col4 in my index on MyTable if this is the only query I intend to run on my database?
Select MyTable.col3, MyTable.col4
From MyTable
Inner Join MyOtherTable
On MyTable.col1 = MyOtherTable.col1
And MyTable.col2 = MyOtherTable.col2;
The tables I'm using have about half a million rows in them. For the purposes of my question, col1 & col2 are a unique set found in both tables.
Here's the example table definition if you really need to know:
CREATE TABLE MyTable
(col1 varchar(10), col2 varchar(10), col3 varchar(10), col4 varchar(10));
CREATE TABLE MyOtherTable
(col1 varchar(10), col2 varchar(10));
So, should it be this?
CREATE MyIdx ON MyTable (col1,col2);
Or this?
CREATE MyIdx ON MyTable (col1,col2,col3,col4);

adding columns col3 and col4 will not help because you're just pulling those values after finding them using columns col1 and col2. The speed would normally come from making sure columns col1 and col2 are indexed.
You should actually split those indexes since you're not using them together:
CREATE MyIdx ON MyTable (col1);
CREATE MyIdx ON MyTable (col2);
I don't think a combined index will help you in this case.
CORRECTION: I think I've misspoken, since you intend to use only that query on the two tables and never have the individual columns joined in isolation. In your case it appears you could get some speed up by putting them together. It would be interesting to benchmark this to see just how much of a speedup you'd see on 1/2 million rows using a combined index versus individual ones. (You should still not use columns col3 and col4 in the index, since you're not joining anything by them.)

A query returning half a million rows joined from two tables is never going to be very fast - because it's returning half a million rows.
An index on col1,col2 seems sufficient (as a secondary index), but depending on what other columns you have, adding (col3,col4) might make it a covering index.
In InnoDB it might be to make the primary key (col1,col2), then it will cluster it, which is something of a win.
But once again, if your query joins 500,000 rows with no other WHERE clause, and returns 500,000 rows, it's not going to be fast, becuase it needs to fetch all of the rows to return them.

I don't think anyone else mentioned it, so I'm adding that you should have a compound (col1,col2) index on both tables:
CREATE MyIdx ON MyTable (col1,col2);
CREATE MyOtherIdx ON MyOtherTable (col1,col2);
And another point. An index on (col1,col2,col3,col4) will be helpful if you ever need to use a DISTINCT variation of your query:
Select DISTINCT
MyTable.col3, MyTable.col4
From MyTable
Inner Join MyOtherTable
On MyTable.col1 = MyOtherTable.col1
And MyTable.col2 = MyOtherTable.col2;

Related

MySQL update query using indexed column

I have a table table1 and the columns are like this:
id - int
field1 - varchar(10)
field2 - varchar(10)
totalMarks - int
Also, I have created an index using the column field1.
create index myindex1 on table1 (field1);
I need to update the table entries, either using field1 or field2.
UPDATE table1 SET totalMarks = 1000 WHERE field1='somevalue';
or
UPDATE table1 SET totalMarks = 1000 WHERE field2='somevalue';
Which update query will have good performance? Since we have created an index using field1, will it have a good performance if we use field1 in the where clause?
A simple update statement with a where clause with an equality comparison on an indexed column should use the index for the where clause.
This will not always improve performance, but it usually will on larger tables. On very small tables where the data all fits on a single data page, the engine needs to load both the index and the data page into memory, which can actually be a wee bit slower than just looking for the row on a given page. This is an edge case.
I would recommend using the version with the indexed column.

solving latency in mysql update queries

so I have written a query as follows:
UPDATE
table1 latest, table2 previous
SET latest.col1 = previous.col1
WHERE latest.col2 = previous.col2 and previous.col1 is not null;
which copies the value of col2, from table2 to table 1 wherever the value of col1, matches. However due to the context, there can be no primary/foreign key constraints involve and col2 doesn't contain nulls but col1 does( in both tables)..
however this query takes several minutes to execute! is there a way to speeded it up?
Fixed by adding indexes to both table. Created indexes for the common column on which the tables were being joined. Wherever lookups are performed, either through joins, the columns being joined/lookedup should have indexes

Unique first column in multi-column index

I have multi-column index for 2 columns. Can I make first column unique without making separate index for that?
If I understand correctly mysql can use only first column in this index for lookups, so can it use it to detect uniqueness?
The short answer is "No". Because it doesn't make much sense.
Indeed, MySQL is able to use a multiple-column index for operations that use only the leftmost "n" columns from the index definition.
Let's say you have an index on columns (col1, col2). MySQL can use it to find records matching conditions on both col1 and col2, GROUP BY col1, col2 or ORDER BY col1, col2. It is important to notice that col1 and col2 needs to used in this order in the GROUP BY or ORDER BY clause. Their order doesn't matter on WHERE or ON clauses as long as both are used.
MySQL can also use the same index for WHERE or ON conditions and GROUP BY or ORDER BY clauses that contain only col1. It cannot, however, use the index if col2 appears without col1.
What happens when you have an index on columns (col1, col2) and all the rows have distinct values in column col1?
Let's assume we have a table that have distinct values in column col1 and it has an index on columns (col1, col2). When MySQL needs to find the rows that match WHERE col1 = val1 AND col2 = val2, by consulting the index it can find the row that have col1 = val1. It doesn't need to use the index to refine the list of candidate rows because there is no list: there is at most one row having col1 = val1.
Sure, most of the times MySQL will use the index to check if col2 = val2 but having col2 in this index doesn't bring more useful information to the index. The storage space it takes and the processing power it uses on table data updates are too big for the tiny contribution it adds to rows searching.
The whole purpose of having indexes on multiple columns is to help searching by shrinking the list of matching rows for a given set of values when the columns included in a multiple-column index cannot be used individually because they don't contain enough distinct values.
Technically speaking, there is no way to tell MySQL you want to have a multiple-column index on (col1, col2) that must have unique values on col1. Create an UNIQUE INDEX on col1 instead. Then think about the data you have in the table and the queries you run against it and decide if another index on col2 only isn't better than the multiple-column index on (col1, col2).
In order to decide you can create the new indexes (UNIQUE on col1, INDEX on col2), put EXPLAIN in front of the most frequent queries you run on the table and check what index will pick MySQL up for use.
You need to have enough data (thousands of rows, at least, more is better) in the table to get accurate results.
You asked.
I have multi-column index for 2 columns. Can I make first column unique without making separate index for that?
The answer is no. You need a separate unique index on the first column to enforce a uniqueness constraint.

Would WHERE col1 and ORDER BY col2 use a composite key on (col1,col2)?

I have a database table (potentially huge, with hundreds of millions of records in the future) on which I would execute the following query very often:
select *
from table1
where col1 = [some number]
order by col2
Obviously having an index on "col1" would make it run fast. col1 is not unique, so many rows (2000+ I expect) would be returned.
Does it make sense to create an index on (col1, col2)? Would MySQL use it for this query?
Also, if I just query without "order by" part, would this index be used as well for the "where" part?
Yes, it will help, mysql will use composite index with first part on WHERE and second part on ORDER BY. You can read about ORDER BY optimization here: http://dev.mysql.com/doc/refman/5.5/en/order-by-optimization.html

Delete many rows in MySQL

I am deleting rows in order of hundreds of thousands from a remote DB. Each delete has it's own target eg.
DELETE FROM tablename
WHERE (col1=c1val1 AND col2=c2val1) OR (col1=c1val2 AND col2=c2val2) OR ...
This has been almost twice as fast for me than individual queries, but I was wondering if there's a way to speed this up more, as I haven't been working with SQL very long.
Create a temporary table and fill it with all your value pairs, one per row. Name the columns the same as the matching columns in your table.
CREATE TEMPORARY TABLE donotwant (
col1 INT NOT NULL,
col2 INT NOT NULL,
PRIMARY KEY (c1val, c2val)
);
INSERT INTO donotwant VALUES (c1val1, c2val1), (c1val2, c2val2), ...
Then execute a multi-table delete based on the JOIN between these tables:
DELETE t1 FROM `tablename` AS t1 JOIN `donotwant` USING (col1, col2);
The USING clause is shorthand for ON t1.col1=donotwant.col1 AND t1.col2=donotwant.col2, assuming the columns are named the same in both tables, and you want the join condition where both columns are equal to their namesake in the joined table.
Generally speaking, the fastest way to do bulk DELETEs is to put the ids to be deleted into a temp table of some sort, then use that as part of the query:
DELETE FROM table
WHERE (col1, col2) IN (SELECT col1, col2
FROM tmp)
Inserting can be done via a standard:
INSERT INTO tmp VALUES (...), (...), ...;
statement, or by using the DB's bulk-load utility.
I doubt it makes much difference to performance but you can write that kind of thing this way...
DELETE
FROM table
WHERE (col1,col2) IN(('c1val1','c2val1'),('c1val2','c2val2')...);