I have a single table with a few million rows, hundres of clients are accessing this table simultaneously, each one needs to get 20 unique rows, which then needs to be placed last in line.
My setup is:
Table structure:
id | last_access | reserved_id | [Data columns]
id + last_access is indexed
For selecting 20 unique rows I use the following:
UPDATE "table" SET "reserved" = 'client-id_timestamp' WHERE "reserved" = ''ORDER BY "last_access" ASC LIMIT 20
This update query is quite bad performance wise, which is why I ask:
Is there a better solution for my specific requirements? Another table structure perhaps?
Is last_access a date column? Try expressing it with an integer value (ie. seconds since 1970-01-01), it might be faster to sort.
Second performance issue might come from the need to reindex the table after you change the "reserved" field. It is possible that the performance might improve if you remove the index from that column. Though the search will take longer, the more expensive reindex is thrown out of the equation.
If you are using MySQL 5.6.3 or newer, you can execute EXPLAIN with your query to find out what part of it takes the longest.
Related
I have a table with a large number of records ( > 300,000). The most relevant fields in the table are:
CREATE_DATE
MOD_DATE
Those are updated every time a record is added or updated.
I now need to query this table to find the date of the record that was modified last. I'm currently using
SELECT mod_date FROM table ORDER BY mod_date DESC LIMIT 1;
But I'm wondering if this is the most efficient way to get the answer.
I've tried adding a where clause to limit the date to the last month, but it looks like that's actually slower (and I need the most recent date, which could be older than the last month).
I've also tried the suggestion I read elsewhere to use:
SELECT UPDATE_TIME
FROM information_schema.tables
WHERE TABLE_SCHEMA = 'db'
AND TABLE_NAME = 'table';
But since I might be working on a dump of the original that query might result into NULL. And it looks like this is actually slower than the original query.
I can't resort to last_insert_id() because I'm not updating or inserting.
I just want to make sure I have the most efficient query possible.
The most efficient way for this query would be to use an index for the column MOD_DATE.
From How MySQL Uses Indexes
8.3.1 How MySQL Uses Indexes
Indexes are used to find rows with specific column values quickly.
Without an index, MySQL must begin with the first row and then read
through the entire table to find the relevant rows. The larger the
table, the more this costs. If the table has an index for the columns
in question, MySQL can quickly determine the position to seek to in
the middle of the data file without having to look at all the data. If
a table has 1,000 rows, this is at least 100 times faster than reading
sequentially.
You can use
SHOW CREATE TABLE UPDATE_TIME;
to get the CREATE statement and see, if an index on MOD_DATE is defined.
To add an Index you can use
CREATE INDEX
CREATE [UNIQUE|FULLTEXT|SPATIAL] INDEX index_name
[index_type]
ON tbl_name (index_col_name,...)
[index_option]
[algorithm_option | lock_option] ...
see http://dev.mysql.com/doc/refman/5.6/en/create-index.html
Make sure that both of those fields are indexed.
Then I would just run -
select max(mod_date) from table
or create_date, whichever one.
Make sure to create 2 indexes, one on each date field, not a compound index on both.
As for a discussion of the difference between this and using limit, see MIN/MAX vs ORDER BY and LIMIT
Use EXPLAIN:
http://dev.mysql.com/doc/refman/5.0/en/explain.html
This tells You how mysql executes statement, thanks to that You can figure out most efficient way, cause it depends on Your db structure and there is no one universal solution.
I've got a three col table. It has a unique index, and another two (for two different columnts) for faster queries.
+-------------+-------------+----------+
| category_id | related_id | position |
+-------------+-------------+----------+
Sometimes the query is
SELECT * FROM table WHERE category_id = foo
and sometimes it's
SELECT * FROM table WHERE related_id = foo
So I decided to make both category_id and related_id an index for better performance. Is this bad practice? What are the downsides of this approach?
In the case I already have 100.000 rows in that table, and am inserting another 100.000, will it be an overkill. having to refresh the index with every new insert? Would that operation then take too long? Thanks
There are no downsides if it's doing exactly what you want, you query on a specific column a lot, so you make that column indexed, that's the whole point. Now you have a 60 column table and your adding indexes to columns you never query on then you are wasting resources because those indexes need to be maintained on INSERT/UPDATE/DELETE operations.
If you have created index for each column then you will definitely get benefit out of it.
Don't go for composite indexes (Multiple coulmn indexes).
You yourself can see the advantage of index in your query by using EXPLAIN (statement provides information about how MySQL executes statements).
EXAMPLE:
EXPLAIN SELECT * FROM table WHERE category_id = foo;
Hope this will help.
~K
Its good to have indexes. Just understand that indexes would take more disk space, but faster search.
It is in your best interest to index those fields which have less repeated values. For eg. Indexing a field that contains a Boolean flag might not be a good idea.
Since in your case you are having an id, hence I think you won't be having any problem in keeping the indexes that you have created.
Also, the inserts would be slower, but since you are saving id's there won't be much of a difference in the time required to insert. Go ahead and do the insert.
My personal advice :
When you are inserting large number of rows in a single table in one go, don't insert them using a single query, unless mandatory. This would prevent your table from getting locked and inaccessible for a long time.
If I SELECT IDs then UPDATE using those IDs, then the UPDATE query is faster than if I would UPDATE using the conditions in the SELECT.
To illustrate:
SELECT id FROM table WHERE a IS NULL LIMIT 10; -- 0.00 sec
UPDATE table SET field = value WHERE id IN (...); -- 0.01 sec
The above is about 100 times faster than an UPDATE with the same conditions:
UPDATE table SET field = value WHERE a IS NULL LIMIT 10; -- 0.91 sec
Why?
Note: the a column is indexed.
Most likely the second UPDATE statement locks much more rows, while the first one uses unique key and locks only the rows it's going to update.
The two queries are not identical. You only know that the IDs are unique in the table.
UPDATE ... LIMIT 10 will update at most 10 records.
UPDATE ... WHERE id IN (SELECT ... LIMIT 10) may update more than 10 records if there are duplicate ids.
I don't think there can be a one straight-forward answer to your "why?" without doing some sort of analysis and research.
The SELECT queries are normally cached, which means that if you run the same SELECT query multiple times, the execution time of the first query is normally greater than the following queries. Please note that this behavior can only be experienced where the SELECT is heavy and not in scenarios where even the first SELECT is much faster. So, in your example it might be that the SELECT took 0.00s because of the caching. The UPDATE queries are using different WHERE clauses and hence it is likely that their execution times are different.
Though the column a is indexed, but it is not necessary that MySQL must be using the index when doing the SELECT or the UPDATE. Please study the EXPLAIN outputs. Also, see the output of SHOW INDEX and check if the "Comment" column reads "disabled" for any indexes? You may read more here - http://dev.mysql.com/doc/refman/5.0/en/show-index.html and http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html.
Also, if we ignore the SELECT for a while and focus only on the UPDATE queries, it is obvious that they aren't both using the same WHERE condition - the first one runs on id column and the latter on a. Though both columns are indexed but it does not necessarily mean that all the table indexes perform alike. It is possible that some index is more efficient than the other depending on the size of the index or the datatype of the indexed column or if it is a single- or multiple-column index. There sure might be other reasons but I ain't an expert on it.
Also, I think that the second UPDATE is doing more work in the sense that it might be putting more row-level locks compared to the first UPDATE. It is true that both UPDATES are finally updating the same number of rows. But where in the first update, it is 10 rows that are locked, I think in the second UPDATE, all rows with a as NULL (which is more than 10) are locked before doing the UPDATE. Perhaps MySQL first applies the locking and then runs the LIMIT clause to update only limited records.
Hope the above explanation makes sense!
Do you have a composite index or separate indexes?
If it is a composite index of id and a columns,
In 2nd update statement the a column's index would not be used. The reason is that only the left most prefix indexes are used (unless if a is the PRIMARY KEY)
So if you want the a column's index to be used, you need in include id in your WHERE clause as well, with id first then a.
Also it depends on what storage engine you are using since MySQL does indexes at the engine level, not server.
You can try this:
UPDATE table SET field = value WHERE id IN (...) AND a IS NULL LIMIT 10;
By doing this id is in the left most index followed by a
Also from your comments, the lookups are much faster because if you are using InnoDB, updating columns would mean that the InnoDB storage engine would have to move indexes to a different page node, or have to split a page if the page is already full, since InnoDB stores indexes in sequential order. This process is VERY slow and expensive, and gets even slower if your indexes are fragmented, or if your table is very big
The comment by Michael J.V is the best description. This answer assumes a is a column that is not indexed and 'id' is.
The WHERE clause in the first UPDATE command is working off the primary key of the table, id
The WHERE clause in the second UPDATE command is working off a non-indexed column. This makes the finding of the columns to be updated significantly slower.
Never underestimate the power of indexes. A table will perform better if the indexes are used correctly than a table a tenth the size with no indexing.
Regarding "MySQL doesn't support updating the same table you're selecting from"
UPDATE table SET field = value
WHERE id IN (SELECT id FROM table WHERE a IS NULL LIMIT 10);
Just do this:
UPDATE table SET field = value
WHERE id IN (select id from (SELECT id FROM table WHERE a IS NULL LIMIT 10));
The accepted answer seems right but is incomplete, there are major differences.
As much as I understand, and I'm not a SQL expert:
The first query you SELECT N rows and UPDATE them using the primary key.
That's very fast as you have a direct access to all rows based on the fastest possible index.
The second query you UPDATE N rows using LIMIT
That will lock all rows and release again after the update is finished.
The big difference is that you have a RACE CONDITION in case 1) and an atomic UPDATE in case 2)
If you have two or more simultanous calls of the case 1) query you'll have the situation that you select the SAME id's from the table.
Both calls will update the same IDs simultanously, overwriting each other.
This is called "race condition".
The second case is avoiding that issue, mysql will lock all rows during the update.
If a second session is doing the same command it will have a wait time until the rows are unlocked.
So no race condition is possible at the expense of lost time.
I have a MySQL MYISAM table (say tbl) consisting of 2 unsigned int fields, say, f1 and f2. There is an index on f2 and the table is very large (approximately 320,000,000+ rows). I update this table periodically (with approximately 100,000 new rows a week), and, in order to be able to search this table without doing an ORDER BY (which would be very time consuming in real-time queries), I physically ORDER the table according to the way in which I want to retrieve its rows.
So, I perform an ALTER TABLE tbl ORDER BY f1 DESC. (I know I have enough physical space on the server for a copy of the table.) I have read that during this operation, a temporary table is created and SELECT statements are not affected on the current rows.
However, I have experienced that this is not the case, and SELECT statements on the table that occur at the same time with the ALTER table are getting blocked and do not terminate. After the ALTER TABLE tbl completes (about 40 minutes on the production server), the SELECT statements on tbl start executing fine again.
Is there any reason why the "ALTER table tbl ORDER BY f1 DESC" seems to be blocking other clients from querying tbl?
Altering a table will always grab a lock on the table, preventing SELECTs from running.
I'll admin that I didn't even know you could do that with an ALTER TABLE.
What are you trying to get from the table? For example, all records in a given range? 320 million rows is not a trivial number. I'll give you my gut reactions:
Switch to InnoDB (allows #2, also gives transactions, but without #2 may hurt performance)
Partition the table (makes it act like a number of slightly smaller tables)
Consider a redesign, such as having a "working set" table and a "historical" table, basically manually partitioning. If you usually look for recently inserted data, this (along with partitioning) will help a lot. If your lookups are evenly distributed, this probably won't make a difference.
Consider adding a new column you could use in conjunction to narrow down selects (so instead of searching on date, search on date and customer ID)
Since I don't know what you're storing, some of these (such as #4) may not apply.
There are some other things you could try. OPTIMIZE TABLE may help you but take less time, but I doubt it. I think internally it's implemented as a dump/reload, at least on the InnoDB side.
Right now, I'm debating whether or not to use COUNT(id) or "count" columns. I heard that InnoDB COUNT is very slow without a WHERE clause because it needs to lock the table and do a full index scan. Is that the same behavior when using a WHERE clause?
For example, if I have a table with 1 million records. Doing a COUNT without a WHERE clause will require looking up 1 million records using an index. Will the query become significantly faster if adding a WHERE clause decreases the number of rows that match the criteria from 1 million to 500,000?
Consider the "Badges" page on SO, would adding a column in the badges table called count and incrementing it whenever a user earned that particular badge be faster than doing a SELECT COUNT(id) FROM user_badges WHERE user_id = 111?
Using MyIASM is not an option because I need the features of InnoDB to maintain data integrity.
SELECT COUNT(*) FROM tablename seems to do a full table scan.
SELECT COUNT(*) FROM tablename USE INDEX (colname) seems to be quite fast if
the index available is NOT NULL, UNIQUE, and fixed-length. A non-UNIQUE index doesn't help much, if at all. Variable length indices (VARCHAR) seem to be slower, but that may just be because the index is physically larger. Integer UNIQUE NOT NULL indices can be counted quickly. Which makes sense.
MySQL really should perform this optimization automatically.
Performance of COUNT() is fine as long as you have an index that's used.
If you have a million records and the column in question is NON NULL then a COUNT() will be a million quite easily. If NULL values are allowed, those aren't indexed so the number of records is easily obtained by looking at the index size.
If you're not specifying a WHERE clause, then the worst case is the primary key index will be used.
If you specify a WHERE clause, just make sure the column(s) are indexed.
I wouldn't say avoid, but it depends on what you are trying to do:
If you only need to provide an estimate, you could do SELECT MAX(id) FROM table. This is much cheaper, since it just needs to read the max value in the index.
If we consider the badges example you gave, InnoDB only needs to count up the number of badges that user has (assuming an index on user_id). I'd say in most case that's not going to be more than 10-20, and it's not much harm at all.
It really depends on the situation. I probably would keep the count of the number of badges someone has on the main user table as a column (count_badges_awarded) simply because every time an avatar is shown, so is that number. It saves me having to do 2 queries.