I've looked over all of the related questions i've find, but couldn't get one which will answer mine.
i got a table like this:
id | name | age | active | ...... | ... |
where "id" is the primary key, and the ... meaning there are something like 30 columns.
the "active" column is of tinyint type.
My task:
Update ids 1,4,12,55,111 (those are just an example, it can be 1000 different id in total) with active = 1 in a single query.
I did:
UPDATE table SET active = 1 WHERE id IN (1,4,12,55,111)
its inside a transaction, cause i'm updating something else in this process.
the engine is InnoDB
My problem:
Someone told me that doing such a query is equivalent to 5 queries at execution, cause the IN will translate to the a given number of OR, and run them one after another.
eventually, instead of 1 i get N which is the number in the IN.
he suggests to create a temp table, insert all the new values in it, and then update by join.
Does he right? both of the equivalency and performance.
What do you suggest? i've thought INSERT INTO .. ON DUPLICATE UPDATE will help but i don't have all the data for the row, only it id, and that i want to set active = 1 on it.
Maybe this query is better?
UPDATE table SET
active = CASE
WHEN id='1' THEN '1'
WHEN id='4' THEN '1'
WHEN id='12' THEN '1'
WHEN id='55' THEN '1'
WHEN id='111' THEN '1'
ELSE active END
WHERE campaign_id > 0; //otherwise it throws an error about updating without where clause in safe mode, and i don't know if i could toggle safe mode off.
Thanks.
It's the other way around. OR can sometimes be turned into IN. IN is then efficiently executed, especially if there is an index on the column. If you have 1000 entries in the IN, it will do 1000 probes into the table based on id.
If you are running a new enough version of MySQL, I think you can do EXPLAIN EXTENDED UPDATE ...OR...; SHOW WARNINGS; to see this conversion;
The UPDATE CASE... will probably tediously check each and every row.
It would probably be better on other users of the system if you broke the UPDATE up into multiple UPDATEs, each having 100-1000 rows. More on chunking .
Where did you get the ids in the first place? If it was via a SELECT, then perhaps it would be practical to combine it with the UPDATE to make it one step instead of two.
I think below is better because it uses primary key.
UPDATE table SET active = 1 WHERE id<=5
Related
I've got a bit of a stupid question. The thing is my program has to have the function to delete data from my database. Yay, not really the problem. But how can I delete data without the danger that others can see, that there has been something deleted.
User Table:
U_ID U_NAME
1 Chris
2 Peter
OTHER TABLE
ID TIMESTAMP FK_U_D
1 2012-12-01 1
2 2012-12-02 1
Sooooo the ID's are AUTO_INCREMENT, so if I delete one of them there's a gap. Furthermore, the timestamp is also bigger than the row before, so ascending.
I want to let the data with ID 1 disappear from the user's profile (U_ID 1).
If I delete it, there is a gap. If I just change the FK_U_ID to 2 (Peter) it's obvious, because when I insert data, there are 20 or 30 data rows with the same U_ID...so it's obvious that there has been a modification.
If I set the FK_U_ID NULL --> same sh** like when I change it to another U_ID.
Is there any solution to get this work? I know that if nobody but me has access to the database, it's just no problem. But just in case, if somebody controls my program it should not be obvious that there has been modifications.
So here we go.
For the ID gaps issue you can use GUIDs as #SLaks suggests, but then you can't use the native RDBMS auto_increment which means you have to create the GUID and insert it along with the rest of the record data upon creation. Of course, you don't really need the ID to be globally unique, you could just store a random string of 20 characters or something, but then you have to do a DB read to see if that ID is taken and repeat (recursively) that process until you find an unused ID... could be quite taxing.
It's not at all clear why you would want to "hide" evidence that a delete was performed. That sounds like a really bad idea. I'm not a fan of promulgating misinformation.
Two of the characteristics of an ideal primary key are:
- anonymous (be void of any useful information, doesn't matter what it's set to)
- immutable (once assigned, it will never be changed.)
But, if we set that whole discussion aside...
I can answer a slightly different question (an answer you might find helpful to your particular situation)
The only way to eliminate a "gap" in the values in a column with an AUTO_INCREMENT would be to change the column values from their current values to a contiguous sequence of new values. If there are any foreign keys that reference that column, the values in those columns would need to be updated as well, to preserve the relationship. That will likely leave the current auto_increment value of the table higher than the largest value of the id column, so I'd want to reset that as well, to avoid a "gap" on the next insert.
(I have done re-sequencing of auto_increment values in development and test environments, to "cleanup" lookup tables, and to move the id values of some tables to ranges that are distinct from ranges in other tables... that let's me test SQL to make sure the SQL join predicates aren't inadvertently referencing the wrong table, and returning rows that look correct by accident... those are some reasons I've done reassignment if auto_increment values)
Note that the database can "automagically" update foreign key values (for InnnoDB tables) when you change the primary key value, as long as the foreign key constraint is defined with ON UPDATE CASCADE, and FOREIGN_KEY_CHECKS is not disabled.
If there are no foreign keys to deal with, and assuming that all of the current values of id are positive integers, then I've been able to do something like this: (with appropriate backups in place, so I can recover if things don't work right)
UPDATE mytable t
JOIN (
SELECT s.id AS old_id
, #i := #i + 1 AS new_id
FROM mytable s
CROSS
JOIN (SELECT #i := 0) i
ORDER BY s.id
) c
ON t.id = c.old_id
SET t.id = c.new_id
WHERE t.id <> c.new_id
To reset the table AUTO_INCREMENT back down to the largest id value in the table:
ALTER TABLE mytable AUTO_INCREMENT = 1;
Typically, I will create a table and populate it from that query in the inline view (aliased as c) above. I can then use that table to update both foreign key columns and the primary key column, first disabling the FOREIGN_KEY_CHECKS and then re-enabling it. (In a concurrent environment, where other processes might be inserting/updating/deleting rows from one of the tables, I would of course first obtain an exclusive lock on all of the tables to be updated.)
Taking up again, the discussion I set aside earlier... this type of "administrative" function can be useful in a test environment, when setting up test cases. But it is NOT a function that is ever performed in a production environment, with live data.
Is it possible to update the value of a field but cap it at the same time?:
UPDATE users SET num_apples=num_apples-1 WHERE xxx = ?
I don't want to the field "num_apples" to fall below zero. Can I do that in one operation?
Thanks
----- Update ------------------
UPDATE users SET num_apples=num_apples-1 WHERE user_id = 123 AND num_apples > 0;
If I only have an index on "user_id", and not "num_apples", is that going to be bad for performance? I'm not sure how mysql implements this operation. I'm hoping that the WHERE on the user_id part makes it fast. I have to perform this operation somewhat frequently.
Thanks
Just add a WHERE condition specifying only rows > 0, so it won't update any rows into the negative.
UPDATE users SET num_apples=num_apples-1 WHERE num_apples > 0;
Update
Following your subquestion on indexing, as always, the way to test performance is to benchmark it for yourself. Examine the EXPLAIN for the query and make sure it is using the index on user_id (it should be). And finally, don't worry too much about performance of this simple operation until it becomes a problem. You don't have an index on num_apples now, but could you not add one if performance wasn't scaling to your needs?
You don't need to create 2 indexes as only one will be used. You should index both fields into one index. The index should be the pair user_id and num_apples:
alter table t add index(user_id, num_apples) yourNewIndex;
You can actually remove the previous index as this will also include it:
alter table t drop index yourOldIndex;
Before dropping it you can get information on what index is being used by running:
EXPLAIN UPDATE users SET num_apples=num_apples-1
WHERE user_id = 123 AND num_apples > 0;
If the index used is the yourNewIndex, then MySQL realized that it is faster to use that than the previous one.
Edit:
do I even need any checks? Will mysql prevent the value from going < 0 by default in that case?
Yes you will. You'll get a data truncation error when running the update if you do not control that:
Data truncation: BIGINT UNSIGNED value is out of range
All rows in MySQL tables are being inserted like this:
1
2
3
Is there any way how to insert new row at a top of table so that table looks like this?
3
2
1
Yes, yes, I know "order by" but let me explain the problem. I have a dating website and users can search profiles by sex, age, city, etc. There are more than 20 search criteria and it's not possible to create indexes for each possible combination. So, if I use "order by", the search usually ends with "using temporary, using filesort" and this causes a very high server load. If I remove "order by" oldest profiles are shown as first and users have to go to the last page to see the new profiles. That's very bad because first pages of search results always look the same and users have a feeling that there are no new profiles. That's why I asked this question. If it's not possible to insert last row at top of table, can you suggest anything else?
The order in which the results are returned when there's no ORDER BY clause depends on the RDBM. In the case of MySQL, or at least most engines, if you don't explicitly specify the order it will be ascending, from oldest to new entries. Where the row is located "physically" doesn't matter. I'm not sure if all mysql engines work that way though. I.e., in PostgreSQL the "default" order shows the most recently updated rows first. This might be the way some of the MySQL engines work too.
Anyway, the point is - if you want the results ordered - always specify sort order, don't just depend on something default that seems to work. In you case you want something trivial - you want the users in descending order, so just use:
SELECT * FROM users ORDER BY id DESC
I think you just need to make sure that if you always need to show the latest data first, all of your indexes need to specify the date/time field first, and all of your queries order by that field first.
If ORDER BY is slowing everything down then you need to optimise your queries or your database structure, i would say.
Maybe if you add the id 'by hand', and give it a negative value, but i (and probably nobody) would recommend you to do that:
Regular insert, e.g.
insert into t values (...);
Update with set, e.g.
update t set id = -id where id = last_insert_id();
Normally you specify a auto_incrementing primary key.
However, you can just specify the primary key like so:
CREATE TABLE table1 (
id signed integer primary key default 1, <<-- no auto_increment, but has a default value
other fields .....
Now add a BEFORE INSERT trigger that changes the primary key.
DELIMITER $$
CREATE TRIGGER ai_table1_each BEFORE INSERT ON table1 FOR EACH ROW
BEGIN
DECLARE new_id INTEGER;
SELECT COALESCE(MIN(id), 0) -1 INTO new_id FROM table1;
SET NEW.id = new_id;
END $$
DELIMITER ;
Now your id will start at -1 and run down from there.
The insert trigger will make sure no concurrency problems occur.
I know that a lot of time has passed since the above question was asked. But I have something to add to the comments:
I'm using MySQL version: 5.7.18-0ubuntu0.16.04.1
When no ORDER BY clause is used with SELECT it is noticeable that records are displayed, regardless of the order in which they are added, in the table's Prime Key sequence.
I don't think this is possible as I couldn't find anything but I thought I would check on here in case I am not searching for the correct thing.
I have a settings table in my database which has two columns. The first column is the setting name and the second column is the value.
I need to update all of these at the same time. I wanted to see if there was a way to update these values at the same time one query like the following
UPDATE table SET col1='setting name' WHERE col2='1 value' AND SET col1='another name' WHERE col2='another value';
I know the above isn't a correct SQL format but this is the sort of thing that I would like to do so was wondering if there was another way that this can be done instead of having to perform separate SQL queries for each setting I want to update.
Thanks for your help.
You can use INSERT INTO .. ON DUPLICATE KEY UPDATE to update multiple rows with different values.
You do need a unique index (like a primary key) to make the "duplicate key"-part work
Example:
INSERT INTO table (a,b,c) VALUES (1,2,3),(4,5,6)
ON DUPLICATE KEY UPDATE b = VALUES(b), c = VALUES(c);
-- VALUES(x) points back to the value you gave for field x
-- so for b it is 2 and 5, for c it is 3 and 6 for rows 1 and 4 respectively (if you assume that a is your unique key field)
If you have a specific case I can give you the exact query.
UPDATE table
SET col2 =
CASE col1
WHEN 'setting1'
THEN 'value'
ELSE col2
END
, SET col1 = ...
...
I decided to use multiple queries all in one go. so the code would go like
UPDATE table SET col2='value1' WHERE col1='setting1';
UPDATE table SET col2='value2' WHERE col1='setting1';
etc
etc
I've just done a test where I insert 1500 records into the database. Do it without starting a DB transaction and it took 35 seconds, blanked the database and did it again but starting a transaction first, then once the 1500th record inserted finish the transaction and the time it took was 1 second, so definetely seems like doing it in a db transaction is the way to go.
You need to run separate SQL queries and make use of Transactions if you want to run as atomic.
UPDATE table SET col1=if(col2='1 value','setting name','another name') WHERE col2='1 value' OR col2='another value'
#Frits Van Campen,
The insert into .. on duplicate works for me.
I am doing this for years when I want to update more than thousand records from an excel import.
Only problem with this trick is, when there is no record to update, instead of ignoring, this method inserts a record and on some instances it is a problem. Then I need to insert another field, then after import I have to delete all the records that has been inserted instead of update.
Is there a faster way to update the oldest row of a MySQL table that matches a certain condition than using ORDER BY id LIMIT 1 as in the following query?
UPDATE mytable SET field1 = '1' WHERE field1 = 0 ORDER BY id LIMIT 1;
Note:
Assume the primary key is id and there is also a index on field1.
We are updating a single row.
We are not updating strictly the oldest row, we are updating the oldest row that matches a condition.
We want to update the oldest matching row, i.e the lowest id, i.e. the head of the FIFO queue.
Questions:
Is the ORDER BY id necessary? How does MySQL order by default?
Real world example
We have a DB table being used for a email queue. Rows are added when we want to queue emails to send to our users. Rows are removed by a cron job, run each minute, processing as many as possible in that minute and sending 1 email per row.
We plan to ditch this approach and use something like Gearman or Resque to process our email queue. But in the meantime I have a question on how we can efficiently mark the oldest item of the queue for processing, a.k.a. The row with the lowest ID. This query does the job:
mysql_query("UPDATE email_queue SET processingID = '1' WHERE processingID = 0 ORDER BY id LIMIT 1");
However, it is appearing in the mysql slow log a lot due to scaling issues. The query can take more than 10s when the table has 500,000 rows. The problem is that this table has grown massively since it was first introduced and now sometimes has half a million rows and a overhead of 133.9 MiB. For example we INSERT 6000 new rows perhaps 180 times a day and DELETE roughly the same number.
To stop the query appearing in the slow log we removed the ORDER BY id to stop a massive sort of the whole table. i.e.
mysql_query("UPDATE email_queue SET processingID = '1' WHERE processingID = 0 LIMIT 1");
... but the new query no longer always gets the row with the lowest id (although it often does). Is there a more efficient way of getting the row with the lowest id other than using ORDER BY id ?
For reference, this is the structure of the email queue table:
CREATE TABLE IF NOT EXISTS `email_queue` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`time_queued` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Time when item was queued',
`mem_id` int(10) NOT NULL,
`email` varchar(150) NOT NULL,
`processingID` int(2) NOT NULL COMMENT 'Indicate if row is being processed',
PRIMARY KEY (`id`),
KEY `processingID` (`processingID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Give this a read:
ORDER BY … LIMIT Performance Optimization
sounds like you have other processes locking the table preventing your update completing in a timely manner - have you considered using innodb ?
I think the 'slow part' comes from
WHERE processingID = 0
It's slow because it's not indexed. But, indexing this column (IMHO) seems incorrect too.
The idea is to change above query to something like :
WHERE id = 0
Which theoretically will be faster since it uses index.
How about creating another table which contains ids of rows which hasn't been processed? Hence the insertion works twice. First to insert to the real table and the second is to insert id into 'table of hasn't processed'. The processing part too, needs to double its duty. First to retrieve an id from 'table of hasn't been processed' then delete it. The second job of processing part is to process of course.
Of course, the id column in 'table of hasn't been processed' needs to index its content. Just to ensure that selecting and deleting will be faster.
This question is old, but for reference for anyone ending up here:
You have a condition on processingID (WHERE processingID = 0), and within that constraint you want to order by ID.
What's happening with your current query is that it scans the table from the lowest ID to the greatest, stopping when it finds 1 record matching the condition. Presumably, it will first find a ton of old records, scanning almost the entire table until it finds an unprocessed one near the end.
How do we improve this?
Consider that you have an index on processingID. Technically, the primary key is always appended (which is how the index can "point" to anything in the first place). So you really have an index on processingID, id. That means ordering on that will be fast.
Change your ordering to: ORDER BY processingID, id
Since you have fixed processingID to a single value with you WHERE clause, this does not change the resulting order. However, it does make it easy for the database to apply both your condition and your ordering, without scanning any records that do not match.
One funny thing is that MySQL, by default, returns rows orderd by ID, instead in a casual way as stated in the relational theory (I am not sure if this behaviour is changed in the latest versions). So, the last row you get from a select should be the last inserted row. I would not use this way, of course.
As you said, the best solution is to use something like Resque, or RabbitMQ & co.
You could use an in-memory table, that is volatile, but much faster, than store, there the latest ID, or just use a my_isam table to add persistency. It is simple and fast in performance and it takes a little bit to implement.