Mysql High Concurrency Updates - mysql

I have a mysql table:
CREATE TABLE `coupons` (
`id` INT NOT NULL AUTO_INCREMENT,
`code` VARCHAR(255),
`user_id` INT,
UNIQUE KEY `code_idx` (`code`)
) ENGINE=InnoDB;
The table consists of thousands/millions of codes and initially user_id is NULL for everyone.
Now I have a web application which assigns a unique code to thousands of users visiting the application concurrently. I am not sure what is the correct way to handle this considering very high traffic.
The query I have written is:
UPDATE coupons SET user_id = <some_id> where user_id is NULL limit 1;
And the application runs this query with say a concurrency of 1000 req/sec.
What I have observed is the entire table gets locked and this is not scaling well.
What should I do?
Thanks.

As it is understood, coupons is prepopulated and a null user_id is updated to one that is not null.
explain update coupons set user_id = 1 where user_id is null limit 1;
This is likely requiring an architectural solution, but you may wish to review the explain after ensuring that the table has indexes for the columns treated, and that the facilitate rapid updates.
Adding an index to coupons.user_id, for example alters MySQL's strategy.
create unique index user_id_idx on coupons(user_id);
explain update coupons set user_id = 1 where user_id is null limit 1;
+----+-------------+---------+------------+-------+---------------+-------------+---------+-------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+-------+---------------+-------------+---------+-------+------+----------+------------------------------+
| 1 | UPDATE | coupons | NULL | range | user_id_idx | user_id_idx | 5 | const | 6 | 100.00 | Using where; Using temporary |
+----+-------------+---------+------------+-------+---------------+-------------+---------+-------+------+----------+------------------------------+
1 row in set (0.01 sec)
So you should work with a DBA to ensure that the database entity is optimized. Trade-offs need to be considered.
Also, since you have a client application, you have the opportunity to pre-fetch null coupons.user_id and do an update directly on coupons.id. Curious to hear of your solution.

This question might be more suitable for DBA's (and I'm not a DBA) but I'll try to give you some ideas of what's going on.
InnoDB does not actually lock the whole table when you perform you update query. What it does is the next: it puts a record lock which prevents any other transaction from inserting, updating, or deleting rows where the value of coupons.user_id is NULL.
With your query you have at the moment(which depends on user_id to be NULL), you cannot have concurrency because your transaction will run one after another, not in parallel.
Even an index on your coupons.user_id won't help, because when putting the lock InnoDB create a shadow index for you if you don't have one. The outcome would be the same.
So, if you want to increase your throughput, there are two options I can think of:
Assign a user to a coupon in async mode. Put all assignment request in a queue then process the queue in background. Might not be suitable for your business rules.
Decrease the number of locked records. The idea here is to lock as less records as possible while performing an update. To achieve this you can add one or more indexed columns to your table, then use the index in your WHERE clause of Update query.
An example of column is a product_id, or a category, maybe a user location(country, zip).
then your query will look something like this:
UPDATE coupons SET user_id = WHERE product_id = user_id is NULL LIMIT 1;
And now InnoDB will lock only records with product_id = <product_id>. this way you you'll have concurrency.
Hope this helps!

Related

Slow time updating primary key indexed row

I have a query that updates a field in a table using the primary key to locate the row. The table can contain many rows where the date/time field is initially NULL, and then is updated with a date/time stamp using NOW().
When I run the update statement on the table, I am getting a slow query log entry (3.38 seconds). The log indicates that 200,000 rows were examined. Why would that many rows be examined if I am using the PK to identify the row being updated?
Primary key is item_id and customer_id. I have verified the PRIMARY key is correct in the mySQL table structure.
UPDATE cust_item
SET status = 'approved',
lstupd_dtm = NOW()
WHERE customer_id = '7301'
AND item_id = '12498';
I wonder if it's a hardware issue.
While the changes I've mentioned in comments might help slightly, in truth, I cannot replicate this issue...
I have a data set of roughly 1m rows...:
CREATE TABLE cust_item
(customer_id INT NOT NULL
,item_id INT NOT NULL
,status VARCHAR(12) NULL
,PRIMARY KEY(customer_id,item_id)
);
-- INSERT some random rows...
SELECT COUNT(*)
, SUM(customer_id = 358) dense
, SUM(item_id=12498) sparse
FROM cust_item;
+----------+-------+--------+
| COUNT(*) | dense | sparse |
+----------+-------+--------+
| 1047720 | 104 | 8 |
+----------+-------+--------+
UPDATE cust_item
SET status = 'approved'
WHERE item_id = '12498'
AND customer_id = '358';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
How long does it take to select the record, without the update?
If select is fast then you need to look into things that can affect update/write speed.
too many indexes on the table, don't forget filtered indexes and indexed views
the index pages have 0 fill factor and need to split to accommodate the data change.
referential constraints with cascade
triggers
slow write speed at the storage level
If the select is slow
old/bad statistics on the index
extreme fragmentation
columnstore index with too many open rowgroups
If the select speed improves significantly after the first time, you may be having some cold buffer performance issues. That could point to storage I/O problems as well.
You may also be having concurrency issues caused by another process locking the table momentarily.
Finally, any chance the tool executing the query is returning a false duration? For example, SQL Server Management Studio can occasionally be slow to return a large resultset, even if the server handled it very quickly.

What is the "Default order by" for a mysql Innodb query that omits the Order by clause?

So i understand and found posts that indicates that it is not recommended to omit the order by clause in a SQL query when you are retrieving data from the DBMS.
Resources & Post consulted (will be updated):
SQL Server UNION - What is the default ORDER BY Behaviour
When no 'Order by' is specified, what order does a query choose for your record set?
https://dba.stackexchange.com/questions/6051/what-is-the-default-order-of-records-for-a-select-statement-in-mysql
Questions :
See logic of the question below if you want to know more.
My question is : under mysql with innoDB engine, does anyone know how the DBMS effectively gives us the results ?
I read that it is implementation dependent, ok, but is there a way to know it for my current implementation ?
Where is this defined exactly ?
Is it from MySQL, InnoDB , OS-Dependent ?
Isn't there some kind of list out there ?
Most importantly, if i omit the order by clause and get my result, i can't be sure that this code will still work with newer database versions and that the DBMS will never give me the same result, can i ?
Use case & Logic :
I'm currently writing a CRUD API, and i have table in my DB that doesn't contain an "id" field (there is a PK though), and so when i'm showing the results of that table without any research criteria, i don't really have a clue on what i should use to order the results. I mean, i could use the PK or any field that is never null, but it wouldn't make it relevant. So i was wondering, as my CRUD is supposed to work for any table and i don't want to solve this problem by adding an exception for this specific table, i could also simply omit the order by clause.
Final Note :
As i'm reading other posts, examples and code samples, i'm feeling like i want to go too far. I understand that it is common knowledge that it's just a bad practice to omit the Order By clause in a request and that there is no reliable default order clause, not to say that there is no order at all unless you specify it.
I'd just love to know where this is defined, and would love to learn how this works internally or at least where it's defined (DBMS / Storage Engine / OS-Dependant / Other / Multiple criteria). I think it would also benefit other people to know it, and to understand the inners mechanisms in place here.
Thanks for taking the time to read anyway ! Have a nice day.
Without a clear ORDER BY, current versions of InnoDB return rows in the order of the index it reads from. Which index varies, but it always reads from some index. Even reading from the "table" is really an index—it's the primary key index.
As in the comments above, there's no guarantee this will remain the same in the next version of InnoDB. You should treat it as a coincidental behavior, it is not documented and the makers of MySQL don't promise not to change it.
Even if their implementation doesn't change, reading in index order can cause some strange effects that you might not expect, and which won't give you query result sets that makes sense to you.
For example, the default index is the clustered index, PRIMARY. It means index order is the same as the order of values in the primary key (not the order in which you insert them).
mysql> create table mytable ( id int primary key, name varchar(20));
mysql> insert into mytable values (3, 'Hermione'), (2, 'Ron'), (1, 'Harry');
mysql> select * from mytable;
+----+----------+
| id | name |
+----+----------+
| 1 | Harry |
| 2 | Ron |
| 3 | Hermione |
+----+----------+
But if your query uses another index to read the table, like if you only access column(s) of a secondary index, you'll get rows in that order:
mysql> alter table mytable add key (name);
mysql> select name from mytable;
+----------+
| name |
+----------+
| Harry |
| Hermione |
| Ron |
+----------+
This shows it's reading the table by using an index-scan of that secondary index on name:
mysql> explain select name from mytable;
+----+-------------+---------+-------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | mytable | index | NULL | name | 83 | NULL | 3 | Using index |
+----+-------------+---------+-------+---------------+------+---------+------+------+-------------+
In a more complex query, it can become very tricky to predict which index InnoDB will use for a given query. The choice can even change from day to day, as your data changes.
All this goes to show: You should just use ORDER BY if you care about the order of your query result set!
Bill's answer is good. But not complete.
If the query is a UNION, it will (I think) deliver first the results of the first SELECT (according to the rules), then the results of the second. Also, if the table is PARTITIONed, it is likely to do a similar thing.
GROUP BY may sort by the grouping expressions, thereby leading to a predictable order, or it may use a hashing technique, which scrambles the rows. I don't know how to predict which.
A derived table used to be an ordered list that propagates into the parent query's ordering. But recently, the ORDER BY is being thrown away in that subquery! (Unless there is a LIMIT.)
Bottom Line: If you care about the order, add an ORDER BY, even if it seems unnecessary based on this Q & A.
MyISAM, in contrast, starts with this premise: The default order is the order in the .MYD file. But DELETEs leave gaps, UPDATEs mess with the gaps, and INSERTs prefer to fill in gaps over appending to the file. So, the row order is rather unpredictable. ALTER TABLE x ORDER BY y temporarily sets the .MYD order; this 'feature' does not work for InnoDB.

Optimizing the performance of MySQL regarding aggregation

I'm trying to optimize a report query, as most of report queries this one incorporates aggregation. Since the size of table is considerable and growing, I need to tend to its performance.
For example, I have a table with three columns: id, name, action. And I would like to count the number of actions each name has done:
SELECT name, COUNT(id) AS count
FROM tbl
GROUP BY name;
As simple as it gets, I can't run it in a acceptable time. It might take 30 seconds and there's no index, whatsoever, I can add which is taken into account, nevertheless improves it.
When I run EXPLAIN on the above query, it never uses any of indices of the table, i.e. an index on name.
Is there any way to improve the performance of aggregation? Why the index is not used?
[UPDATE]
Here's the EXPLAIN's output:
+----+-------------+-------+------+---------------+------+---------+------+---------+----------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------+---------------+------+---------+------+---------+----------+-----------------+
| 1 | SIMPLE | tbl | ALL | NULL | NULL | NULL | NULL | 4025567 | 100.00 | Using temporary |
+----+-------------+-------+------+---------------+------+---------+------+---------+----------+-----------------+
And here is the table's schema:
CREATE TABLE `tbl` (
`id` bigint(20) unsigned NOT NULL DEFAULT '0',
`name` varchar(1000) NOT NULL,
`action` int unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `inx` (`name`(255))
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The problem with your query and use of indexes is that you refer to two different columns in your SELECT statement yet only have one column in your indexes, plus the use of a prefix on the index.
Try this (refer to just the name column):
SELECT name, COUNT(*) AS count
FROM tbl
GROUP BY name;
With the following index (no prefix):
tbl (name)
Don't use a prefix on the index for this query because if you do, MySQL won't be able to use it as a covering index (will still have to hit the table).
If you use the above, MySQL will scan through the index on the name column, but won't have to scan the actual table data. You should see USING INDEX in the explain result.
This is as fast as MySQL will be able to accomplish such a task. The alternative is to store the aggregate result separately and keep it updated as your data changes.
Also, consider reducing the size of the name column, especially if you're hitting index size limits, which you most likely are hence why you're using the prefix. Save some room by not using UTF8 if you don't need it (UTF8 is 3 bytes per character for index).
It's a very common question and key for solution lies in fact, that your table is growing.
So, first way would be: to create index by name column if it isn't created yet. But: this will solve your issue for a time.
More proper approach would be: to create separate statistics table like
tbl_counts
+------+-------+
| name | count |
+------+-------+
And store your counts separately. When changing (insert/update or delete) your data in tbl table - you'll need to adjust corresponding row inside tbl_counts table. This way allows you to get rid of performing COUNT query at all - but will need to add some logic inside tbl table.
To maintain integrity of your statistics table you can either use triggers or do that inside application. This method is good if performance of COUNT query is much more important for you than your data changing queries (but overhead from changing tbl_counts table won't be too high)

Make MySQL read from multiple indexes?

Let's start off with a simple example:
CREATE TABLE `test` (
`id` INT UNSIGNED NOT NULL,
`value` CHAR(12) NOT NULL,
INDEX (`id`),
INDEX (`value`)
) ENGINE = InnoDB;
So 2 columns, both indexed. What I thought this meant was that MySQL would never have to read the actual table anymore, since all the data is stored in an index.
mysql> EXPLAIN SELECT id FROM test WHERE id = 1;
+----+-------------+-------+------+---------------+------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+------+-------------+
| 1 | SIMPLE | test | ref | id | id | 4 | const | 1 | Using index |
+----+-------------+-------+------+---------------+------+---------+-------+------+-------------+
"Using index", very nice. To my understanding this means that it is reading data from the index and not from the actual table. But what I really want is the "value" column.
mysql> EXPLAIN SELECT value FROM test WHERE id = 1;
+----+-------------+-------+------+---------------+------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+------+-------+
| 1 | SIMPLE | test | ref | id | id | 4 | const | 1 | |
+----+-------------+-------+------+---------------+------+---------+-------+------+-------+
Hmm, no "using index" this time.
I thought it might help if I add an index that covers both columns.
ALTER TABLE `test` ADD INDEX `id_value` (`id`,`value`);
Now let's run that previous select-statement again and tell it to use the new index.
mysql> EXPLAIN SELECT id, value FROM test USE INDEX (id_value) WHERE id = 1;
+----+-------------+-------+------+---------------+----------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+----------+---------+-------+------+-------------+
| 1 | SIMPLE | test | ref | id_value | id_value | 4 | const | 1 | Using index |
+----+-------------+-------+------+---------------+----------+---------+-------+------+-------------+
Praise the Lord, it's reading from the index.
But actually I don't really need the combined index for anything else. Is it possible to make MySQL read from 2 separate indexes?
Any insights would be greatly appreciated.
EDIT: Ok, yet another example. This one is with the original table definition (so an index on each column).
mysql> EXPLAIN SELECT t1.value
-> FROM test AS t1
-> INNER JOIN test AS t2
-> ON t1.id <> t2.id AND t1.value = t2.value
-> WHERE t1.id = 1;
+----+-------------+-------+------+---------------+-------+---------+----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+-------+---------+----------+------+-------------+
| 1 | SIMPLE | t1 | ref | id,value | id | 4 | const | 1 | |
| 1 | SIMPLE | t2 | ref | value | value | 12 | t1.value | 1 | Using where |
+----+-------------+-------+------+---------------+-------+---------+----------+------+-------------+
This must certainly read from both indexes (since both fields are used in the join condition) yet it STILL reads the data from the actual record, right? Why doesn't it just use the data it has read from the index? Or does it actually use that data without saying "using index"?
Thanks again
The key, ref and rows columns are more telling for this purpose. In each case, they indicate that MySQL has selected an index, has a value to lookup in that index, and is retrieving only one row from the table as a result. This is what you were after.
In your second query, MySQL still needs to retrieve the value from the record even though it has located the record on id via an index. If your WHERE criterion looked up based on value, then that index would have been used and there would have been no need to retrieve the record.
The manual on Using index Extra information:
The column information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row. This strategy can be used when the query uses only columns that are part of a single index.
If the Extra column also says Using where, it means the index is being used to perform lookups of key values. Without Using where, the optimizer may be reading the index to avoid reading data rows but not using it for lookups. For example, if the index is a covering index for the query, the optimizer may scan it without using it for lookups.
For InnoDB tables that have a user-defined clustered index, that index can be used even when Using index is absent from the Extra column. This is the case if type is index and key is PRIMARY.
In your first query, MySQL says using index because it can answer your query by looking at the index and the index alone. It does not need to go to the table to look up the corresponding value for the id column, because that's actually the same thing it's already got in the index.
In the second query, MySQL does need to look at the table to fetch the correct value, but it's still using the index, as you can see in the key column of your EXPLAIN statement.
In the third query, MySQL again doesn't have to look at the table anymore, because all the information it needs to answer your query is right there in the multiple-column index.
Just think a bit how indexes works.
Say, you have 10k records in your test table and index on the value column. While you're populating your table with data (or explicitly using ANALYZE command), database is keeping statistics on your table and all indexes.
At the moment you issue your query, there're several ways how to deliver you the data. In the very simplified case of test table and value column, something like:
SELECT * FROM test WHERE value = 'a string';
database query planner has 2 options:
performing a sequential scan on the whole table and filter the results or
performing index scan to lookup the desired data entries.
Querying indexes has some performance penalty, as database must seek for the value in the index. If we take that you have a B-tree index in a "good shape" (i.e. balanced), then you'll find your entry in at most 14 lookups in the index (as 2^14 > 10k, I hope I'm not mistaken here). So, in order to deliver you 1 row with a string value, database will have to perform up to 14 lookups in the index and 1 extra lookup in your table. In the unlucky case, this will mean system will perform 15 random I/O operations to read in custom data portions from your disk.
In the case there's only one value that requires lookup in the index and that your table is quite big in size, index operations will give you a significant performance boost.
But there's a point after which index scan becomes more expensive, then a straightforward sequential scan:
when your table is occupying really small size on the disk;
when your query will require lookup of round 10% of the total number of records in the test table (the number 10% very approximate, don't take it for granted).
Things to consider:
comparison operations for numeric data types are significantly cheaper, then comparing strings;
statistics accuracy;
how often index / table is queried, or which probability it is to find needed data in the database's shared pool.
These all affects performance and also the plans that database chooses to deliver the data.
So, indexes are not always good.
To answer your to read from 2 separate indexes question: feature you're looking for is called Bitmap index, and it is not available in MySQL as far as I know.
New with 5.0, MySQL can utilize more than one index on a table with Index merge, though they're not as speedy (by far) as multi-column covering indexes, so MySQL will only use them in special cases.
So, other than the merge index case, MySQL only uses one index per table.
Don't be too afraid of covering indexes. They can serve double duty. Indexes are left most prefixed, so you can use a multi-column index for just the left most column, or the first and second, and so on.
For example, if you have the multi-column index id_value (id,value), you can delete the index id (id), since it's redundant. The id_value index can also be used for just the id column.
Also, with InnoDB, every index automatically includes the primary key column(s), so if id were your primary key, an index on value provides the same benefit as having a covering index on (id, value).
Every index does negatively affect inserts, and updates against the indexed columns. There's a trade off, and only you (and some testing) can decide if covering indexes are right for you.
Deletes don't have much impact on indexes because they're just "marked for deletion", and they only get purged when your system's load is low.
Indexes also use up memory. Given enough memory, a properly configured MySQL server will have every index loaded in memory. This makes selects that utilize a covering index super fast.

Index a query "WHERE a IN (1,2,3) AND b = 4"

I am attempting to apply an index that will speed up one of the slowest queries in my application:
SELECT * FROM orders WHERE product_id IN (1, 2, 3, 4) AND user_id = 5678;
I have an index on product_id, user_id, and the pair (product_id, user_id). However, the server does not use any of these indexes:
+----+-------------+------- +------+-------------------------------------------------------------------------------------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+-------------------------------------------------------------------------------------------+------+---------+------+------+-------------+
| 1 | SIMPLE | orders | ALL | index_orders_on_product_id,index_orders_on_user_id,index_orders_on_product_id_and_user_id | NULL | NULL | NULL | 6 | Using where |
+----+-------------+--------+------+-------------------------------------------------------------------------------------------+------+---------+------+------+-------------+
(There are only 6 rows on development, so whatever, but on production there are about 400k rows, so execution takes about 0.25s, and this query is fired pretty darn often.)
How can I avoid a simple WHERE here? I suppose I could send a query for each product_id, which would likely be faster than this version, but the number of products could be very high, so if it's doable in one query that would be significantly preferable. This query is generated by Rails, so I'm a bit limited in how much I can restructure the query itself. Thanks!
For optimal performance of this particular query on your production table (with 400k rows), you need a composite index on {user_id, product_id}, in that order.
Ideally, this would be the only index, and you would use InnoDB so the table is clustered. Every additional index incurs a penalty when modifying data, and on top of that secondary indexes in clustered tables are even more expensive than secondary indexes in heap-based tables.
To understand why user_id (and not product_id) should be at the leading edge of the index, please take a look at the the Anatomy of an Index. Essentially, since WHERE searches for only one user_id, putting it first clusters the related product_id values closer in the index.
(The {product_id, user_id} would also work, but would "scatter" the "target" index nodes less favorably.)
When there are so little rows on the database, it does not use indexes, because it's cheaper to do a full scan. Try checking the data on your prod environment and see if it uses one of your indexes.
Also, note that you can eliminate one of your indexes, index_by_product_id, because you already have another index that starts with product_id field.