SQL UPDATE multiple rows at once after SELECT - mysql

My sample table:
+------------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------+------+-----+---------+-------+
| id | bigint(20) | YES | | NULL | |
| other_id | bigint(20) | YES | | NULL | |
| another_id | bigint(20) | YES | | NULL | |
+------------+------------+------+-----+---------+-------+
+------+----------+------------+
| id | other_id | another_id |
+------+----------+------------+
| 988 | 102 | NULL |
| 989 | 103 | NULL |
| 990 | 104 | NULL |
| 991 | 105 | NULL |
| 992 | 106 | NULL |
| 987 | 101 | NULL |
+------+----------+------------+
How would I SELECT and UPDATE the above table in one query to the effect of doing something like this for every row:
UPDATE
x
SET
another_id = 987
WHERE
id = 987
AND other_id = 101;
UPDATE
x
SET
another_id = 988
WHERE
id = 988
AND other_id = 102
I would hate to run a manual update like this for every row and would like to do it all in one go.

To me it seems that you simply want to set the value of another_id to id:
UPDATE
x
SET
another_id = id
You can provide a range of other_id values in the where clause if you need to restrict the number of rows updated:
UPDATE
x
SET
another_id = id
WHERE other_id IN (...) --list the values you want here.

Shadow is correct on his update statement. It looks like you want to carry all ID directly into the ANOTHER_ID. If you only want this to occur where the "Other_ID" is a given range, just add " where other_id between 101 and 21234" or whatever range you want it to happen for.
To see the results of what Shadow's answer would result in, change it to a simple SELECT statement to see. If it is correct, change to the update version. Example...
Select
ID,
ID AS Another_ID,
Other_ID
from
YourTable
you will get all records showing the two columns showing the ID AS The "Another_ID". It does NOT UPDATE The "Another_ID" column, just queries the value AS the result column name. Again, if you wanted only certain range of numbers, just add
where Other_ID between 101 and 21234
(or whatever value range)
Now, to see as an UPDATE command is exactly as Shadow TRIED to explain..
update YourTable set
AnotherID = ID
and ALL records get updated... If within specific range... use the same where clause as the Select.
If you want to try this without messing up production data, work with a temporary bogus table you can always delete after you are done..
insert into MyTempTable
( ID,
Another_ID,
Other_ID
)
select ID, Another_ID, Other_ID
From YourTable
where ID between 500 and 800
Now you have a test table to play with the insert table and see the impact...

Related

MySQL increase the old value by the new value when inserting new records by "on duplicate key update"

I've created a new table like this:
+------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| first | varchar(100) | NO | PRI | NULL | |
| last | varchar(400) | NO | PRI | NULL | |
| source | varchar(100) | NO | | NULL | |
| count | int | YES | | 1 | |
+------------+--------------+------+-----+---------+-------+
And I try to insert multiple records to this table using this:
insert into my_table(first,last,source,count) values ('a','b','c',50),('a','b','c',20),('d','e','f',30) on duplicate key update count = count + 1;
After insert, this is the content of the table:
+------------+-----------+--------+-------+
| first | last | source | count |
+------------+-----------+--------+-------+
| a | b | c | 2 |
| d | e | f | 1 |
+------------+-----------+--------+-------+
However, I'd like the count to be updated by the numbers provided in the values of the new records (i.e., 50, 20, and 30 in the provided example). So, the table should look like this:
+------------+-----------+--------+-------+
| first | last | source | count |
+------------+-----------+--------+-------+
| a | b | c | 70 |
| d | e | f | 30 |
+------------+-----------+--------+-------+
Is it possible to achieve this using "on duplicate key update" in MySQL? Or is there any other efficient way to achieve this? The table will be very large (with millions of rows).
VALUES() is the method to use, as GMB mentioned, if you are on a mysql version older than 8.0.19. However, it was deprecated as of 8.0.20, if you are using mysql 8.0.19 or newer its recommended to give an alias to the rows being inserted, and then refer to the values of the inserts by the alias like this:
insert into my_table (first, last, source, count)
values ('a','b','c',50), ('a','b','c',20), ('d','e','f',30) as newRow
on duplicate key update count = count + newRow.count;
More information can be found here: https://dev.mysql.com/doc/refman/8.0/en/insert-on-duplicate.html
Consider the VALUES() syntax, that you can use in the on duplicate key clause to refer to the column value that would otherwise have been inserted:
insert into my_table(first, last, source, count)
values ('a','b','c',50), ('a','b','c',20), ('d','e','f',30)
on duplicate key update count = count + VALUES(count);
Note: first, last and source are MySQL keywords. I would not recommend using them as column names.

MySQL: Strange behavior of UPDATE query (ERROR 1062 Duplicate entry)

I have a MySQL database the stores news articles with the publications date (just day information), the source, and category. Based on these I want to generate a table that holds the article counts w.r.t. to these 3 parameters.
Since for some combinations of these 3 parameters there might be no article, a simple GROUP BY won't do. I therefore first generate a table news_article_counts with all possible combinations of the 3 parameters, and an default article_count of 0 -- like this:
SELECT * FROM news_article_counts;
+--------------+------------+----------+---------------+
| published_at | source | category | article_count |
+------------- +------------+----------+---------------+
| 2016-08-05 | 1826089206 | 0 | 0 |
| 2016-08-05 | 1826089206 | 1 | 0 |
| 2016-08-05 | 1826089206 | 2 | 0 |
| 2016-08-05 | 1826089206 | 3 | 0 |
| 2016-08-05 | 1826089206 | 4 | 0 |
| ... | ... | ... | ... |
+--------------+------------+----------+---------------+
For testing, I now created a temporary table tmp as the GROUP BY result from the original news article table:
SELECT * FROM tmp LIMIT 6;
+--------------+------------+----------+-----+
| published_at | source | category | cnt |
+--------------+------------+----------+-----+
| 2016-08-05 | 1826089206 | 3 | 1 |
| 2003-09-19 | 1826089206 | 4 | 1 |
| 2005-08-08 | 1826089206 | 3 | 1 |
| 2008-07-22 | 1826089206 | 4 | 1 |
| 2008-11-26 | 1826089206 | 8 | 1 |
| ... | ... | ... | ... |
+--------------+------------+----------+-----+
Given these two tables, the following query works as expected:
SELECT * FROM news_article_counts c, tmp t
WHERE c.published_at = t.published_at AND c.source = t.source AND c.category = t.category;
But now I need to update the article_count of table news_article_counts with the values in table tmp where the 3 parameters match up. For this I'm using the following query (I've tried different ways but with the same results):
UPDATE
news_article_counts c
INNER JOIN
tmp t
ON
c.published_at = t.published_at AND
c.source = t.source AND
c.category = t.category
SET
c.article_count = t.cnt;
Executing this query yields this error:
ERROR 1062 (23000): Duplicate entry '2018-04-07 14:46:17-1826089206-1' for key 'uniqueIndex'
uniqueIndex is a joint index over published_at, source, category of table news_article_counts. But this shouldn't be a problem since I do not -- as far as I can tell -- update any of those 3 values, only article_count.
What confuses me most is that in the error it mentions the timestamp I executed the query (here: 2018-04-07 14:46:17). I have no absolutely idea where this comes into play. In fact, some rows in news_article_counts now have 2018-04-07 14:46:17 as value for published_at. While this explains the error, I cannot see why published_at gets overwritten with the current timestamp. There is no ON UPDATE CURRENT_TIMESTAMP on this column; see:
CREATE TABLE IF NOT EXISTS `test`.`news_article_counts` (
`published_at` TIMESTAMP NOT NULL,
`source` INT UNSIGNED NOT NULL,
`category` INT UNSIGNED NOT NULL,
`article_count` INT UNSIGNED NOT NULL DEFAULT 0,
UNIQUE INDEX `uniqueIndex` (`published_at` ASC, `source` ASC, `category` ASC))
ENGINE = MyISAM
DEFAULT CHARACTER SET = utf8mb4;
What am I missing here?
UPDATE 1: I actually checked the table definition of news_article_counts in the database. And there's indeed the following:
mysql> SHOW COLUMNS FROM news_article_counts;
+---------------+------------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+------------------+------+-----+-------------------+-----------------------------+
| published_at | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| source | int(10) unsigned | NO | | NULL | |
| category | int(10) unsigned | NO | | NULL | |
| article_count | int(10) unsigned | NO | | 0 | |
+---------------+------------------+------+-----+-------------------+-----------------------------+
But why is on update CURRENT_TIMESTAMP set. I double and triple-checked my CREATE TABLE statement. I removed the joint index, I added an artificial primary key (auto_increment). Nothing help. I've even tried to explicitly remove these attributes from published_at with:
ALTER TABLE `news_article_counts` CHANGE `published_at` `published_at` TIMESTAMP NOT NULL;
Nothing seems to work for me.
It looks like you have the explicit_defaults_for_timestamp system variable disabled. One of the effects of this is:
The first TIMESTAMP column in a table, if not explicitly declared with the NULL attribute or an explicit DEFAULT or ON UPDATE attribute, is automatically declared with the DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP attributes.
You could try enabling this system variable, but that could potentially impact other applications. I think it only takes effect when you're actually creating a table, so it shouldn't affect any existing tables.
If you don't to make a system-level change like this, you could add an explicit DEFAULT attribute to the published_at column of this table, then it won't automatically add ON UPDATE.

My mysql statement to query by primary key sometimes returns more than one row, so what happened?

My schema is this:
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_name` varchar(10) NOT NULL,
`account_type` varchar(10) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=latin1
INSERT INTO user VALUES (1, "zhangsan", "premiumv"), (2, "lisi", "premiumv"), (3, "wangwu", "p"), (4, "maliu", "p"), (5, "hengqi", "p"), (6, "shuba", "p");
I have the following 6 rows in the table:
+----+-----------+--------------+
| id | user_name | account_type |
+----+-----------+--------------+
| 1 | zhangsan | premiumv |
| 2 | lisi | premiumv |
| 3 | wangwu | p |
| 4 | maliu | p |
| 5 | hengqi | p |
| 6 | shuba | p |
+----+-----------+--------------+
Here is mysql to query the table by id:
SELECT * FROM user WHERE id = floor(rand()*6) + 1;
I expect it to return one row, but the actual result is non-predictive. It either will return 0 row, 1 row or sometimes more than one row. Can somebody help clarify this? Thanks!
You're testing each row against a different random number, so sometimes multiple rows will match. To fix this, calculate the random number once in a subquery.
SELECT u.*
FROM user AS u
JOIN (SELECT floor(rand()*6) + 1 AS r) AS r
ON u.id = r.r
This method of selecting a random row from a table seems like a poor design. If there are any gaps in the id sequence (which can happen easily -- MySQL doesn't guarantee that they'll always be sequential, and deleting rows will leave gaps) then it could return an empty result. The usual way to select a random row from a table is with:
SELECT *
FROM user
ORDER BY RAND()
LIMIT 1
The WHERE part must be evaluated for each row to see if there is a match. Because of this, the rand() function is evaluated for every row. Getting an inconsistent number of rows seems reasonable.
If you add LIMIT 1 to your query, the probability of returning rows from the end diminishes.
It's because the WHERE clause floor(rand()*6) + 1 is evaluated against every rows in the table to see if the condition matches the criteria. The value could be different each time it is matched against the row from the table.
You can test with a table that has same values in the column used in WHERE clause, and you can see the result:
select * from test;
+------+------+
| id | name |
+------+------+
| 1 | a |
| 2 | b |
| 1 | c |
| 2 | d |
| 1 | e |
| 2 | f |
+------+------+
select * from test where id = floor(rand()*2) + 1;
+------+------+
| id | name |
+------+------+
| 1 | a |
| 2 | d |
| 1 | e |
+------+------+
In the above example, the expression floor(rand()*2) + 1 returns 1 when matching against the first row (with name = 'a') so it is included in the result set. But then it returns 2 when matching against the forth row (with name = 'd'), so it is also included in the result set even the value of id is different from the value of the first row in the result set.

SELECT ... FOR UPDATE inside a UPDATE query

I've been trying to implement a simple script that locks a table from being read, updates some fields, and then unlocks it.
Here's my table:
mysql> SHOW COLUMNS FROM tb1;
+---------------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+---------+------+-----+---------+-------+
| id | int(11) | YES | PRI | NULL | |
| status | int(1) | YES | | 0 | |
+---------------+---------+------+-----+---------+-------+
Lets see if you guys understand what I'm trying to do:
Start transaction
SELECT all rows with status != 1 (it may return more than 1 row) with FOR UPDATE statement;
UPDATE the field status of the rows that has been selected in pass 2
Commit
I tried to achieve this in many ways, but I cant persist the SELECT data that I got in pass 2 and I can't use SELECT ... FOR UPDATE as a subquery of a UPDATE like this UPDATE tb1 SET status=1 WHERE id IN (SELECT id FROM tb1 WHERE status != 1 FOR UPDATE);
Is it possible to achieve this instead of updating row by row?

MySQL database relationship without an ID

Hi StackOverflow community,
I have these two tables:
tbl_users
ID_user (PRIMARY KEY)
Username (UNIQUE)
Password
...
tbl_posts
ID_post (PRIMARY KEY)
Owner (UNIQUE)
Description
...
Why always everybody make database relationships with foreign keys? What about if I want to relate Username with Owner instead of doing ID_user with ID_user in both tables?
Username is UNIQUE and the Owner is the username of the creator of the post.
Can it be done like that? There is something to correct or make better? Maybe I have a misconception.
I would appreciate detailed and understandable answers.
Thank you in advance.
The reason is primarily for data integrity. The argument concerning performance is a little misleading. While neither exhaustive, nor definitive, I hope this little example will shed some light on that fact:
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(i INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,s CHAR(12) NOT NULL UNIQUE
);
STEP1:
INSERT IGNORE INTO my_table (s)
SELECT CONCAT(CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97)
,CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97)
);
STEP2:
INSERT IGNORE INTO my_table (s)
SELECT CONCAT(CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97)
,CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97)
)
FROM my_table;
[REPEAT STEP 2 SEVERAL TIMES]
SELECT COUNT(*) FROM my_table;
+----------+
| COUNT(*) |
+----------+
| 16384 |
+----------+
1 row in set (0.01 sec)
SELECT * FROM my_table ORDER BY i LIMIT 12;;
+----+------------+
| i | s |
+----+------------+
| 1 | kkxeehxsvy |
| 2 | iuyhrk{vaq |
| 3 | ngpedelooc |
| 4 | irkbyqgkhc |
| 6 | yqkcifcxdz |
| 7 | sgezlgvjjq |
| 8 | blavbvxbnl |
| 9 | wdbtqvgvgt |
| 13 | pakzpbnhxr |
| 14 | vpoy{gdwyd |
| 15 | ezlhz{drwg |
| 16 | ncwcwbpudh |
+----+------------+
SELECT * FROM my_table x JOIN my_table y ON y.i < x.i ORDER BY x.i,y.i LIMIT 1;
+---+------------+---+------------+
| i | s | i | s |
+---+------------+---+------------+
| 2 | iuyhrk{vaq | 1 | kkxeehxsvy |
+---+------------+---+------------+
1 row in set (1 min 22.60 sec)
SELECT * FROM my_table x JOIN my_table y ON y.s < x.s ORDER BY x.s,y.s LIMIT 1;
+-------+------------+------+------------+
| i | s | i | s |
+-------+------------+------+------------+
| 21452 | aabetdlvum | 6072 | aabdnegtav |
+-------+------------+------+------------+
1 row in set (1 min 13.59 sec)
So, we have two queries doing essentially the same thing (a comparison of 270 million values). The first joins the table to itself on an integer value. The second joins the table to itself on a string value. Both columns are indexed. As you can see, in this example, the string join actually performs better than the integer join - even though the hit on the CPU may actually be greater!