Update field value but also apply cap? - mysql

Is it possible to update the value of a field but cap it at the same time?:
UPDATE users SET num_apples=num_apples-1 WHERE xxx = ?
I don't want to the field "num_apples" to fall below zero. Can I do that in one operation?
Thanks
----- Update ------------------
UPDATE users SET num_apples=num_apples-1 WHERE user_id = 123 AND num_apples > 0;
If I only have an index on "user_id", and not "num_apples", is that going to be bad for performance? I'm not sure how mysql implements this operation. I'm hoping that the WHERE on the user_id part makes it fast. I have to perform this operation somewhat frequently.
Thanks

Just add a WHERE condition specifying only rows > 0, so it won't update any rows into the negative.
UPDATE users SET num_apples=num_apples-1 WHERE num_apples > 0;
Update
Following your subquestion on indexing, as always, the way to test performance is to benchmark it for yourself. Examine the EXPLAIN for the query and make sure it is using the index on user_id (it should be). And finally, don't worry too much about performance of this simple operation until it becomes a problem. You don't have an index on num_apples now, but could you not add one if performance wasn't scaling to your needs?

You don't need to create 2 indexes as only one will be used. You should index both fields into one index. The index should be the pair user_id and num_apples:
alter table t add index(user_id, num_apples) yourNewIndex;
You can actually remove the previous index as this will also include it:
alter table t drop index yourOldIndex;
Before dropping it you can get information on what index is being used by running:
EXPLAIN UPDATE users SET num_apples=num_apples-1
WHERE user_id = 123 AND num_apples > 0;
If the index used is the yourNewIndex, then MySQL realized that it is faster to use that than the previous one.
Edit:
do I even need any checks? Will mysql prevent the value from going < 0 by default in that case?
Yes you will. You'll get a data truncation error when running the update if you do not control that:
Data truncation: BIGINT UNSIGNED value is out of range

Related

MySQL not updating information_schema, unless I manually run ANALYZE TABLE `myTable`

I have the need to get last id (primary key) of a table (InnoDB), and to do so I perform the following query:
SELECT (SELECT `AUTO_INCREMENT` FROM `information_schema`.`TABLES` WHERE `TABLE_SCHEMA` = 'mySchema' AND `TABLE_NAME` = 'myTable') - 1;
which returns the wrong AUTO_INCREMENT. The problem is the TABLES table of information_schema is not updated with the current value, unless I run the following query:
ANALYZE TABLE `myTable`;
Why doesn't MySQL update information_schema automatically, and how could I fix this behavior?
Running MySQL Server 8.0.13 X64.
Q: Why doesn't MySQL update information_schema automatically, and how could I fix this behavior?
A: InnoDB holds the auto_increment value in memory, and doesn't persist that to disk.
Behavior of metadata queries (e.g. SHOW TABLE STATUS) is influenced by setting of innodb_stats_on_metadata and innodb_stats_persistent variables.
https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_stats_on_metadata
Forcing an ANALYZE everytime we query metadata can be a drain on performance.
Other than the settings of those variables, or forcing statistics to be collected by manually executing the ANALYZE TABLE, I don't think there's a "fix" for the issue.
(I think that mostly because I don't think it's a problem that needs to be fixed.)
To get the highest value of an auto_increment column in a table, the normative pattern is:
SELECT MAX(`ai_col`) FROM `myschema`.`mytable`
What puzzles me is why we need to retrieve this particular piece of information. What are we going to use it for?
Certainly, we aren't going to use that in application code to determine a value that was assigned to a row we just inserted. There's no guarantee that the highest value isn't from a row that was inserted by some other session. And we have LAST_INSERT_ID() mechanism to retrieve the value of a row our session just inserted.
If we go with the ANALYZE TABLE to refresh statistics, there's still a small some time between that and a subsequent SELECT... another session could slip in another INSERT so that the value we get from the gather stats could be "out of date" by the time we retrieve it.
SELECT * FROM tbl ORDER BY insert_datetime DESC LIMIT 1;
will get you all the data from, the "latest" inserted row. No need to deal with AUTO_INCREMENT, no need to use subqueries, no ANALYZE, no information_schema, no extra fetch once you have the id, no etc, etc.
Yes, you do need an index on the column that you use to determine what is "latest". Yes, id could be used, but it should not be. AUTO_INCREMENT values are guaranteed to be unique, but nothing else.

Mysql Update one column of multiple rows in one query

I've looked over all of the related questions i've find, but couldn't get one which will answer mine.
i got a table like this:
id | name | age | active | ...... | ... |
where "id" is the primary key, and the ... meaning there are something like 30 columns.
the "active" column is of tinyint type.
My task:
Update ids 1,4,12,55,111 (those are just an example, it can be 1000 different id in total) with active = 1 in a single query.
I did:
UPDATE table SET active = 1 WHERE id IN (1,4,12,55,111)
its inside a transaction, cause i'm updating something else in this process.
the engine is InnoDB
My problem:
Someone told me that doing such a query is equivalent to 5 queries at execution, cause the IN will translate to the a given number of OR, and run them one after another.
eventually, instead of 1 i get N which is the number in the IN.
he suggests to create a temp table, insert all the new values in it, and then update by join.
Does he right? both of the equivalency and performance.
What do you suggest? i've thought INSERT INTO .. ON DUPLICATE UPDATE will help but i don't have all the data for the row, only it id, and that i want to set active = 1 on it.
Maybe this query is better?
UPDATE table SET
active = CASE
WHEN id='1' THEN '1'
WHEN id='4' THEN '1'
WHEN id='12' THEN '1'
WHEN id='55' THEN '1'
WHEN id='111' THEN '1'
ELSE active END
WHERE campaign_id > 0; //otherwise it throws an error about updating without where clause in safe mode, and i don't know if i could toggle safe mode off.
Thanks.
It's the other way around. OR can sometimes be turned into IN. IN is then efficiently executed, especially if there is an index on the column. If you have 1000 entries in the IN, it will do 1000 probes into the table based on id.
If you are running a new enough version of MySQL, I think you can do EXPLAIN EXTENDED UPDATE ...OR...; SHOW WARNINGS; to see this conversion;
The UPDATE CASE... will probably tediously check each and every row.
It would probably be better on other users of the system if you broke the UPDATE up into multiple UPDATEs, each having 100-1000 rows. More on chunking .
Where did you get the ids in the first place? If it was via a SELECT, then perhaps it would be practical to combine it with the UPDATE to make it one step instead of two.
I think below is better because it uses primary key.
UPDATE table SET active = 1 WHERE id<=5

"is not null" vs boolean MySQL - Performance

I have a column that is a datetime, converted_at.
I plan on making calls that check WHERE converted_at is not null very often. As such, I'm considering having a boolean field converted. Is their a significant performance difference between checking if a field is not null vs if it is false?
Thanks.
If things are answerable in a single field you favour that over to splitting the same thing into two fields. This creates more infrastructure, which, in your case is avoidable.
As to the nub of the question, I believe most database implementation, MySQL included, will have an internal flag which is boolean anyways for representing the NULLability of a field.
You should rely that this is done for you correctly.
As to performance, the bigger question should be on profiling the typical queries that you run on your database and where you created appropriate indexes and analyze table on to improve execution plans and which indexes are used during queries. This question will have a far bigger impact to performance.
Using WHERE converted_at is not null or WHERE converted = FALSE will probably be the same in matters of query performance.
But if you have this additional bit field, that is used to store whether the converted_at field is Null or not, you'll have to somehow maintain integrity (via triggers?) whenever a new row is added and every time the column is updated. So, this is a de-normalization. And also means more complicated code. Moreover, you'll have at least one more index on the table (which means a bit slower Insert/Update/Delete operations).
Therefore, I don't think it's good to add this bit field.
If you can change the column in question from NULL to NOT NULL (possibly by normalizing the table), you may get some performance gain (at the cost/gain of having more tables).
I had the same question for my own usage. So I decided to put it to the test.
So I created all the fields required for the 3 possibilities I imagined:
# option 1
ALTER TABLE mytable ADD deleted_at DATETIME NULL;
ALTER TABLE mytable ADD archived_at DATETIME NULL;
# option 2
ALTER TABLE mytable ADD deleted boolean NOT NULL DEFAULT 0;
ALTER TABLE mytable ADD archived boolean NOT NULL DEFAULT 0;
# option 3
ALTER TABLE mytable ADD invisibility TINYINT(1) UNSIGNED NOT NULL DEFAULT 0
COMMENT '4 values possible' ;
The last is a bitfield where 1=archived, 2=deleted, 3=deleted + archived
First difference, you have to create indexes for optioon 2 and 3.
CREATE INDEX mytable_deleted_IDX USING BTREE ON mytable (deleted) ;
CREATE INDEX mytable_archived_IDX USING BTREE ON mytable (archived) ;
CREATE INDEX mytable_invisibility_IDX USING BTREE ON mytable (invisibility) ;
Then I tried all of the options using a real life SQL request, on 13k records on the main table, here is how it looks
SELECT *
FROM mytable
LEFT JOIN table1 ON mytable.id_qcm = table1.id_qcm
LEFT JOIN table2 ON table2.id_class = mytable.id_class
INNER JOIN user ON mytable.id_user = user.id_user
where mytable.id_user=1
and mytable.deleted_at is null and mytable.archived_at is null
# and deleted=0
# and invisibility=0
order BY id_mytable
Used alternatively the above commented filter options.
Used mysql 5.7.21-1 debian9
My conclusion:
The "is null" solution (option 1) is a bit faster, or at least same performance.
The 2 others ("deleted=0" and "invisibility=0") seems in average a bit slower.
But the nullable fields option have decisive advantages: No index to create, easier to update, easier to query. And less storage space used.
(additionnaly inserts & updates virtually should be faster as well, since mysql do not need to update indexes, but you never would be able to notice that).
So you should use the nullable datatime fields option.

SELECT vs UPDATE performance with index

If I SELECT IDs then UPDATE using those IDs, then the UPDATE query is faster than if I would UPDATE using the conditions in the SELECT.
To illustrate:
SELECT id FROM table WHERE a IS NULL LIMIT 10; -- 0.00 sec
UPDATE table SET field = value WHERE id IN (...); -- 0.01 sec
The above is about 100 times faster than an UPDATE with the same conditions:
UPDATE table SET field = value WHERE a IS NULL LIMIT 10; -- 0.91 sec
Why?
Note: the a column is indexed.
Most likely the second UPDATE statement locks much more rows, while the first one uses unique key and locks only the rows it's going to update.
The two queries are not identical. You only know that the IDs are unique in the table.
UPDATE ... LIMIT 10 will update at most 10 records.
UPDATE ... WHERE id IN (SELECT ... LIMIT 10) may update more than 10 records if there are duplicate ids.
I don't think there can be a one straight-forward answer to your "why?" without doing some sort of analysis and research.
The SELECT queries are normally cached, which means that if you run the same SELECT query multiple times, the execution time of the first query is normally greater than the following queries. Please note that this behavior can only be experienced where the SELECT is heavy and not in scenarios where even the first SELECT is much faster. So, in your example it might be that the SELECT took 0.00s because of the caching. The UPDATE queries are using different WHERE clauses and hence it is likely that their execution times are different.
Though the column a is indexed, but it is not necessary that MySQL must be using the index when doing the SELECT or the UPDATE. Please study the EXPLAIN outputs. Also, see the output of SHOW INDEX and check if the "Comment" column reads "disabled" for any indexes? You may read more here - http://dev.mysql.com/doc/refman/5.0/en/show-index.html and http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html.
Also, if we ignore the SELECT for a while and focus only on the UPDATE queries, it is obvious that they aren't both using the same WHERE condition - the first one runs on id column and the latter on a. Though both columns are indexed but it does not necessarily mean that all the table indexes perform alike. It is possible that some index is more efficient than the other depending on the size of the index or the datatype of the indexed column or if it is a single- or multiple-column index. There sure might be other reasons but I ain't an expert on it.
Also, I think that the second UPDATE is doing more work in the sense that it might be putting more row-level locks compared to the first UPDATE. It is true that both UPDATES are finally updating the same number of rows. But where in the first update, it is 10 rows that are locked, I think in the second UPDATE, all rows with a as NULL (which is more than 10) are locked before doing the UPDATE. Perhaps MySQL first applies the locking and then runs the LIMIT clause to update only limited records.
Hope the above explanation makes sense!
Do you have a composite index or separate indexes?
If it is a composite index of id and a columns,
In 2nd update statement the a column's index would not be used. The reason is that only the left most prefix indexes are used (unless if a is the PRIMARY KEY)
So if you want the a column's index to be used, you need in include id in your WHERE clause as well, with id first then a.
Also it depends on what storage engine you are using since MySQL does indexes at the engine level, not server.
You can try this:
UPDATE table SET field = value WHERE id IN (...) AND a IS NULL LIMIT 10;
By doing this id is in the left most index followed by a
Also from your comments, the lookups are much faster because if you are using InnoDB, updating columns would mean that the InnoDB storage engine would have to move indexes to a different page node, or have to split a page if the page is already full, since InnoDB stores indexes in sequential order. This process is VERY slow and expensive, and gets even slower if your indexes are fragmented, or if your table is very big
The comment by Michael J.V is the best description. This answer assumes a is a column that is not indexed and 'id' is.
The WHERE clause in the first UPDATE command is working off the primary key of the table, id
The WHERE clause in the second UPDATE command is working off a non-indexed column. This makes the finding of the columns to be updated significantly slower.
Never underestimate the power of indexes. A table will perform better if the indexes are used correctly than a table a tenth the size with no indexing.
Regarding "MySQL doesn't support updating the same table you're selecting from"
UPDATE table SET field = value
WHERE id IN (SELECT id FROM table WHERE a IS NULL LIMIT 10);
Just do this:
UPDATE table SET field = value
WHERE id IN (select id from (SELECT id FROM table WHERE a IS NULL LIMIT 10));
The accepted answer seems right but is incomplete, there are major differences.
As much as I understand, and I'm not a SQL expert:
The first query you SELECT N rows and UPDATE them using the primary key.
That's very fast as you have a direct access to all rows based on the fastest possible index.
The second query you UPDATE N rows using LIMIT
That will lock all rows and release again after the update is finished.
The big difference is that you have a RACE CONDITION in case 1) and an atomic UPDATE in case 2)
If you have two or more simultanous calls of the case 1) query you'll have the situation that you select the SAME id's from the table.
Both calls will update the same IDs simultanously, overwriting each other.
This is called "race condition".
The second case is avoiding that issue, mysql will lock all rows during the update.
If a second session is doing the same command it will have a wait time until the rows are unlocked.
So no race condition is possible at the expense of lost time.

MySQL, delete and index hint

I have to delete about 10K rows from a table that has more than 100 million rows based on some criteria. When I execute the query, it takes about 5 minutes. I ran an explain plan (the delete query converted to select * since MySQL does not support explain delete) and found that MySQL uses the wrong index.
My question is: is there any way to tell MySQL which index to use during delete? If not, what ca I do? Select to temp table then delete from temp table?
There is index hint syntax. //ETA: sadly, not for deletes
ETA:
Have you tried running ANALYZE TABLE $mytable?
If that doesn't pay off, I'm thinking you have 2 choices: Drop the offending index before the delete and recreate it after. Or JOIN your delete table to another table on the desired index which should ensure that the desired index is used.
I've never really come across a situation where MySQL chose the wrong index, but rather my understanding of how indexes worked was usually at fault.
You might want to check out this book: http://oreilly.com/catalog/9780596003067
It has a great section on how indexes work and other tuning options.
As stated in other answers, MySQL can't use indexes, but the PRIMARY KEY index.
So your best option, if you have a PRIMARY KEY on the table is to run a fast SELECT, then DELETE according lines. Preferably in a TRANSACTION, so that you don't delete wrong rows.
Hence:
DELETE FROM table WHERE column_with_index = 0
Will be rewritten:
SELECT primary_key FROM table WHERE column_with_index = 0 => returns many lines
DELETE FROM table WHERE primary_key IN(?, ?, ?) => ? will be replaced by the results of the SELECTed primary keys.
If you have not that much lines to delete, it would be more efficient this way.
For example, I've just hit an exemple, on the same table, with the same data:
7499067 rows analyzed by DELETE : 12 seconds
vs
6 rows analyzed by SELECT using a good index : 0.10 seconds
0 rows to be deleted in the end