When I try to run below update query, It takes about 40 hours to complete. So I added a time limitation(Update query with time limitation). But still it takes nearly same time to complete.Is there any way to speed up this update?
EDIT: What I really want to do is only get logs between some specific dates and run this update query on this records.
create table user
(userid varchar(30));
create table logs
( log_time timestamp,
log_detail varchar(100),
userid varchar(30));
insert into user values('user1');
insert into user values('user2');
insert into user values('user3');
insert into user values('');
insert into logs values('no user mentioned','user3');
insert into logs values('inserted by user2','user2');
insert into logs values('inserted by user3',null);
Table before Update
log_time | log_detail | userid |
.. |-------------------|--------|
.. | no user mention | user3 |
.. | inserted by user2 | user2 |
.. | inserted by user3 | (null) |
Update query
update logs join user
set logs.userid=user.userid
where logs.log_detail LIKE concat("%",user.userID,"%") and user.userID != "";
Update query with time limitation
update logs join user
set logs.userid = IF (logs.log_time between '2015-08-11 00:39:41' AND '2015-08-01 17:39:44', user.userID, null)
where logs.log_detail LIKE concat("%",user.userID,"%") and user.userID != "";
Table after update
log_time | log_detail | userid |
.. |-------------------|--------|
.. | no user mentione | user3 |
.. | inserted by user2 | user2 |
.. | inserted by user3 | user3 |
EDIT: Original question Sql update statement with variable .
Log tables can easily fill up with tons of rows of data each month and even the best indexing won't help, especially in the case of a LIKE operator. Your log_detail column is 100 characters long and your search query is CONCAT("%",user.userID,"%"). Using a function in a SQL command can slow things down because the function is doing extra computations. And what you're trying to search for is, if your userID is John, %John%. So your query will scan every row in that table because indexes will be semi-useless. If you didn't have the first %, then the query would be able to utilize its indexes efficiently. Your query would, in effect, do an INDEX SCAN as opposed to an INDEX SEEK.
For more information on these concepts, see:
Index Seek VS Index Scan
Query tuning a LIKE operator
Alright, what can you do about this? Two strategies.
Option 1 is to limit the number of rows that you're searching
through. You had the right idea using time limitations to reduce the
number of rows to search through. What I would suggest is to put the
time limitations as the first expression in your WHERE clause.
Most databases execute the first expression first. So when
the second expression kicks in, it'll only scan through the rows returned by
the first expression.
update logs join user
set logs.userid=user.userid
where logs.log_time between '2015-08-01' and '2015-08-11'
and logs.log_detail LIKE concat('%',user.userID,'%')
Option 2 depends on your control of the database. If you have total
control (and you have the time and money, MySQL has a feature called
Auto-Sharding. This is available in MySQL Cluster and MySQL
Fabric. I won't go over those products in much detail as the links
provided below can explain themselves much better than I could
summarize, but the idea behind Sharding is to split the rows into
horizontal tables, so to speak. The idea behind it is that you're
not searching through a long database table, but instead across
several sister tables at the same time. Searching through 10 tables
of 10 million rows is faster than searching through 1 table of 100
million rows.
Database Sharding - Wikipedia
MySQL Cluster
MySQL Fabric
First, the right place to put the time limitation is in the where clause, not an if:
update logs l left join
user u
on l.log_detail LIKE concat("%", u.userID)
set l.userid = u.userID
where l.log_time between '2015-08-11 00:39:41' AND '2015-08-01 17:39:44';
If you want to set the others to NULL do this before:
update logs l
set l.userid = NULL
where l.log_time not between '2015-08-11 00:39:41' AND '2015-08-01 17:39:44';
But, if you really want this to be fast, you need to use an index for the join. It is possible that this will use an index on users(userid):
update logs l left join
user u
on cast(substring_index(l.log_detail, ' ', -1) as signed) = u.userID
set l.userid = u.userID
where l.log_time between '2015-08-11 00:39:41' AND '2015-08-01 17:39:44';
Look at the explain on the equivalent select. It is really important that the cast() be to the same type as the UserId.
You could add a new column called log_detail_reverse where a trigger can be set so that when you insert a new row, you also insert the log_detail column in reverse character order using the MySQL function reverse. When you're doing your update query, you also reverse the userID search. The net effect is that you then transform your INDEX SCAN to an INDEX SEEK, which will be much faster.
update logs join user
set logs.userid=user.userid
where logs.log_time between '2015-08-01' and '2015-08-11'
and logs.log_detail_reverse LIKE concat(reverse(user.userID), '%')
MySQL Trigger
The Trigger could be something like:
DELIMITER //
CREATE TRIGGER log_details_in_reverse
AFTER INSERT
ON logs FOR EACH ROW
BEGIN
DECLARE reversedLogDetail varchar(100);
DECLARE rowId int; <-- you don't have a primary key in your example, but I'm assuming you do have one. If not, you should look into adding it.
-- Reverse the column log_detail and assign it to the declared variable
SELECT reverse(log_detail) INTO reversedLogDetail;
SELECT mysql_insert_id() INTO rowId;
-- Update record into logs table
UPDATE logs
SET log_detail_reverse = reversedLogDetail
WHERE log_id = rowId;
END; //
DELIMITER ;
One thing about speeding up updates is not to update records that need no update. You only want to update records in a certain time range where the user doesn't match the user mentioned in the log text. Hence limit the records to be updated in your where clause.
update logs
set userid = substring_index(log_detail, ' ', -1)
where log_time between '2015-08-11 00:39:41' AND '2015-08-01 17:39:44'
and not userid <=> substring_index(log_detail, ' ', -1);
Related
I have two tables in MySQL like this
Users -> user_id , user_name , number_of_comments
Comments -> comment_id , comment , user_id
Is there a way to get the number of comments for each user and update it in the number_of_comments column automatically?
Not recommended, but solves nevertheless. For learning purposes only.
CREATE TRIGGER tr_ai_update_n_of_comments
AFTER INSERT ON comments
FOR EACH ROW
UPDATE users
SET number_of_comments = ( SELECT COUNT(*)
FROM comments
WHERE comments.user_id = NEW.user_id )
WHERE user_id = NEW.user_id;
If the rows in comments may be updated (with user_id value changing) and/or deleted then create similar AFTER DELETE and AFTER UPDATE triggers.
PS. I strongly recommend you to remove users.number_of_comments column at all and calculate actual comments amount value by according query when needed.
If you agree that the value may be approximate (slightly different from the exact one), then you can use an incremental trigger.
CREATE TRIGGER tr_ai_update_n_of_comments
AFTER INSERT ON comments
FOR EACH ROW
UPDATE users
SET number_of_comments = number_of_comments + 1
WHERE user_id = NEW.user_id;
But just in case, provide for the creation of a service stored procedure (or event) that will periodically recalculate the accumulated value.
I apologize for the title, I couldn't find a good way to word my question. I'm very new to SQL.
Essentially, I'm creating this table (the ids actually reference the students table, but I've simplified it here):
CREATE TABLE followers (student_id int not null,
followee_id int not null,
followsback boolean,
PRIMARY KEY(student_id, followee_id)
SET followsback = IF(SELECT from followers
WHERE student_id = followee_id AND
followee_id = student_id, 1, 0)
My problem lies with the IF statement. Say I ran this INSERT query:
INSERT into followers(student_id, followee_id) values(001,002)
This is supposed to store that student 001 is following student 002.
I need to select the followee (002) and check if they are following the student (001) back. To do this, I need to check the followers table for a user with student_id = followee_id (e.g student_id = 002), then check to see if that user (002) is following the original student_id (001).
The problem is that I don't know how to reference the student_id as specified within the INSERT query vs referencing the value within my SELECT query.
Then if the two students are following each other then I need to set followsback to 1.
Hopefully this makes sense, I am having a ridiculously hard time explaining this.
There's no syntax in MySQL's CREATE TABLE statement to do what you show. It could be done by a rare feature in the SQL specification called an "assertion"—except there is no SQL database on the market that implements this feature.
You can try to implement it as a trigger:
CREATE TRIGGER followback_ins BEFORE INSERT ON followers
FOR EACH ROW
SET NEW.followsback = EXISTS (
SELECT * from followers
WHERE student_id = NEW.followee_id AND followee_id = NEW.student_id);
But this has a problem. It only updates the followsback for the new record, not the original record.
mysql> insert into followers set student_id = 123, followee_id = 456;
mysql> insert into followers set student_id = 456, followee_id = 123;
mysql> select * from followers;
+------------+-------------+-------------+
| student_id | followee_id | followsback |
+------------+-------------+-------------+
| 123 | 456 | 0 |
| 456 | 123 | 1 |
+------------+-------------+-------------+
This is called an anomaly because when you try to store the same fact in two places, these two rows can contradict each other.
We could try to make a trigger that updates the original row too:
CREATE TRIGGER followback_ins AFTER INSERT ON followers
FOR EACH ROW
UPDATE followers AS f1 JOIN followers AS f2
ON f1.student_id=f2.followee_id AND f1.followee_id=f2.student_id
SET f1.followsback=true, f2.followsback=true;
But this is illegal. You can't update a table from a trigger on that same table (too much risk of infinite recursion).
ERROR 1442 (HY000): Can't update table 'followers' in stored function/trigger
because it is already used by statement which invoked this stored
function/trigger.
I'd suggest to forget about storing followsback at all. Instead, just store the following relationships as two rows, without a followsback column. If you want to know if they follow each other, you have to join two rows together:
SELECT COUNT(*)
FROM followers AS f1 JOIN followers AS f2
ON f1.student_id=f2.followee_id AND f1.followee_id=f2.student_id
WHERE f1.student_id = 123 AND f1.followee_id = 456.
This query will return 0 if there is no mutual following, and 2 if there is.
I have access to a reporting dataset (that I don't control) that we retrieve daily from a cloud service and store in a mysql db to run advanced reporting and report combining locally with 3rd party data visualization software.
The data often has duplicate values on an id field that create problems when joining with other tables for data analysis.
For example:
+-------------+----------+------------+----------+
| workfile_id | zip_code | date | total |
+-------------+----------+------------+----------+
| 78002 | 90210 | 2016-11-11 | 2010.023 |
| 78002 | 90210 | 2016-12-22 | 427.132 |
+-------------+----------+------------+----------+
Workfile_id is duplicated because this is the same job, but additional work on the job was performed in a different month than the original work. Instead of the software creating another workfile id for the job, the same is used.
Doing joins with other tables on workfile_id is problematic when more than one of the same id is present, so I was wondering if it is possible to do one of two things:
Make duplicate workfile_id's unique. Have sql append a number to the workfile id when a duplicate is found. The first duplicate (or second occurrence of the same workfile id) would need to get a .01 appended to the end of the workfile id. Then later, if another duplicate is inserted, it would need to auto increment the appended number, say .02, and so on with any subsequent duplicate workfile_id. This method would work best with our data but I'm curious how difficult this would be for the server from a performance perspective. If I could schedule the alteration to take place after the data is inserted to speed up the initial data insert, that would be ideal.
Sum total columns and remove duplicate workfile_id row. Have a task that identifies duplicate workfile_ids and sums the financial columns of the duplicates, replacing the original total with new sum and deleting the 'new row' after the columns have been added together.
This is more messy from a data preservation perspective, but is acceptable if the first solution isn't possible.
My assumption is that there will be significant overhead to have the server compare new workfile_id values to all existing worlfile_id values each time data is inserted, but our dataset is small and new data is only inserted once daily, at 1:30am, and it also should be feasible to keep the duplicate workfile_id searching to rows inserted within the last 6 mo.
Is finding duplicates in a column (workfile_id) and appending an auto-incrementing value onto the workfile_id possible?
EDIT:
I'm having trouble getting my trigger to work based on sdsc81's answer below.
Any ideas?
DELIMITER //
CREATE TRIGGER append_subID_to_workfile_ID_salesjournal
AFTER INSERT
ON salesjournal FOR EACH ROW
BEGIN
SET #COUNTER = ( SELECT (COUNT(*)-1) FROM salesjournal WHERE workfile_id = NEW.workfile_id );
IF #COUNTER > 1 THEN
UPDATE salesjournal SET workfile_id = CONCAT(workfile_id, #COUNTER) WHERE id = NEW.id;
END IF;
END;//
DELIMITER ;
It's hard to know if the trigger isn't working at all, or if just the code in the trigger isn't working. I get no errors on insert. Is there any way to debug trigger errors?
Well, everything is posible ;)
You dont control the dataset but you can modifify the database, right?
Then you could use a trigger after every insert of a new value, and update it, if its duplicate. Something like:
SET #COUNTER = ( SELECT (COUNT(*)-1) FROM *your_table* WHERE workfile_id = NEW.workfile_id );
IF #COUNTER > 1 THEN
UPDATE *your_table* SET workfile_id = CONCAT(workfile_id, #COUNTER) WHERE some_unique_id = NEW.some_unique_id;
END IF;
If there are only one insert a day, and there is defined an index over the workfile_id value, then it shouldn't be any problem for your server at all.
Also, you could implement the second solution, doing:
DELIMITER //
CREATE TRIGGER append_subID_to_workfile_ID_salesjournal
AFTER INSERT ON salesjournal FOR EACH ROW
BEGIN
SET #COUNTER = ( SELECT (COUNT(*)-1) FROM salesjournal WHERE workfile_id = NEW.workfile_id );
IF #COUNTER > 1 THEN
UPDATE salesjournal SET total = total + NEW.total WHERE workfile_id = NEW.workfile_id AND id <> NEW.id;
DELETE FROM salesjournal WHERE id = NEW.id;
END IF;
END;//
DELIMITER ;
Hope this helps.
I'm updating an existing table by adding data into an existing column.
I have already have an output of the data to be inserted, but due to the amount of records, i'm looking for the best way to insert this into my table without having to manually write to each line of sql.
Here's my sql (partial) i want to insert into
INSERT INTO `tbl_user_variables_dobRE` (`user_id`, `value`) VALUES
(150, '1959-11-02'),
(151, '1948-04-20'),
(152, '1961-06-18'),
And this is the table i want to insert it into
id | 7
username | guestinvite
password | BLANK
forname | forname
surname | surname
email | guestinvite#test.com
address_id | 286
type_id | 4
dob | 0000-00-00
plusGuest | 0
update | 2016-02-16 11:54:36
created | 2016-04-04 17:03:12
So i want to insert the second item into the 'dob' column where first item = id
Is there anyway to do this programmatically or do i have to write WHERE & OR statements for every line?
You tagged both MySql AND sql-server in your post. The following is assuming you're using SQL Server, but the idea would remain the same in MySQL (just different syntax)...
If I'm understanding correctly, it sounds like you want to do an UPDATE, not an INSERT, being that you're modifying existing rows.
You said that you have an output of the data to be inserted - Insert this into a TEMP table and JOIN it to the table you'd like to update where the id's match.
BEGIN TRANSACTION [Transaction1] -- Do large updates as transactions to avoid data loss
CREATE TABLE #temp ( -- Create temp table
[user_id] int,
[dob] nvarchar(20)
)
INSERT INTO #temp
-- YOUR SELECT GOES HERE
SELECT my_id as [user_id], my_dob as [dob]
UPDATE my_table
SET my_table.dob = t.dob
FROM tbl_user_variables_dobRE my_table
INNER JOIN #temp t ON t.user_id = my_table.id
DROP TABLE #temp
If your data looks good, commit the transaction: (Don't dwell too long, transactions lock table data!)
COMMIT TRANSACTION [Transaction1]
Otherwise:
ROLLBACK TRANSACTION [Transaction1]
The quickest way I can think of doing this is creating a temporary table with the new data that you want to add (you could possibly bulk import it all from say, a CSV file).
The temporary table will just need a couple of columns - one with user_id and the other one dob - you'll be getting rid of it after anyway.
You could then do something like this:
UPDATE tbl_user_variables_dobRE a
JOIN tmp_table b
ON ( a.user_id = b.user_id )
SET a.dob = b.dob
Once you've done that you can DROP your temporary table and be good to go - good luck!
Important
Be super-careful when updating data - it's so easy to mess up your data by forgetting to add a clause. If possible, do this with some test data before trying it with the real production data.
I've just imported a bunch of data to a MySQL table and I have a column "GUID" that I want to basically fill down all existing rows with new and unique random GUID's.
How do I do this in MySQL ?
I tried
UPDATE db.tablename
SET columnID = UUID()
where columnID is not null
And just get every field the same
I had a need to add a guid primary key column in an existing table and populate it with unique GUID's and this update query with inner select worked for me:
UPDATE sri_issued_quiz SET quiz_id=(SELECT uuid());
So simple :-)
I'm not sure if it's the easiest way, but it works. The idea is to create a trigger that does all work for you, then, to execute a query that updates your table, and finally to drop this trigger:
delimiter //
create trigger beforeYourTableUpdate BEFORE UPDATE on YourTable
FOR EACH ROW
BEGIN
SET new.guid_column := (SELECT UUID());
END
//
Then execute
UPDATE YourTable set guid_column = (SELECT UUID());
And DROP TRIGGER beforeYourTableUpdate;
UPDATE
Another solution that doesn't use triggers, but requires primary key or unique index :
UPDATE YourTable,
INNER JOIN (SELECT unique_col, UUID() as new_id FROM YourTable) new_data
ON (new_data.unique_col = YourTable.unique_col)
SET guid_column = new_data.new_id
UPDATE once again:
It seems that your original query should also work (maybe you don't need WHERE columnID is not null, so all my fancy code is not needed.
The approved solution does create unique IDs but on first glance they look identical, only the first few characters differ.
If you want visibly different keys, try this:
update CityPopCountry set id = (select md5(UUID()));
MySQL [imran#lenovo] {world}> select city, id from CityPopCountry limit 10;
+------------------------+----------------------------------+
| city | id |
+------------------------+----------------------------------+
| A Coruña (La Coruña) | c9f294a986a1a14f0fe68467769feec7 |
| Aachen | d6172223a472bdc5f25871427ba64e46 |
| Aalborg | 8d11bc300f203eb9cb7da7cb9204aa8f |
| Aba | 98aeeec8aa81a4064113764864114a99 |
| Abadan | 7aafe6bfe44b338f99021cbd24096302 |
| Abaetetuba | 9dd331c21b983c3a68d00ef6e5852bb5 |
| Abakan | e2206290ce91574bc26d0443ef50fc05 |
| Abbotsford | 50ca17be25d1d5c2ac6760e179b7fd15 |
| Abeokuta | ab026fa6238e2ab7ee0d76a1351f116f |
| Aberdeen | d85eef763393862e5fe318ca652eb16d |
+------------------------+----------------------------------+
I'm using MySQL Server version: 5.5.40-0+wheezy1 (Debian)
select #i:=uuid();
update some_table set guid = (#i:=uuid());
Just a minor addition to make as I ended up with a weird result when trying to modify the UUIDs as they were generated. I found the answer by Rakesh to be the simplest that worked well, except in cases where you want to strip the dashes.
For reference:
UPDATE some_table SET some_field=(SELECT uuid());
This worked perfectly on its own. But when I tried this:
UPDATE some_table SET some_field=(REPLACE((SELECT uuid()), '-', ''));
Then all the resulting values were the same (not subtly different - I quadruple checked with a GROUP BY some_field query). Doesn't matter how I situated the parentheses, the same thing happens.
UPDATE some_table SET some_field=(REPLACE(SELECT uuid(), '-', ''));
It seems when surrounding the subquery to generate a UUID with REPLACE, it only runs the UUID query once, which probably makes perfect sense as an optimization to much smarter developers than I, but it didn't to me.
To resolve this, I just split it into two queries:
UPDATE some_table SET some_field=(SELECT uuid());
UPDATE some_table SET some_field=REPLACE(some_field, '-', '');
Simple solution, obviously, but hopefully this will save someone the time that I just lost.
Looks like a simple typo. Didn't you mean "...where columnId is null"?
UPDATE db.tablename
SET columnID = UUID()
where columnID is null
I faced mostly the same issue.
Im my case uuid is stored as BINARY(16) and has NOT NULL UNIQUE constraints.
And i faced with the issue when the same UUID was generated for every row, and UNIQUE constraint does not allow this. So this query does not work:
UNHEX(REPLACE(uuid(), '-', ''))
But for me it worked, when i used such a query with nested inner select:
UNHEX(REPLACE((SELECT uuid()), '-', ''))
Then is produced unique result for every entry.
MYsql
UPDATE tablename SET columnName = UUID()
oracle
UPDATE tablename SET columnName = SYS_GUID();
SQLSERVER
UPDATE tablename SET columnName = NEWID();;
UPDATE db.tablename SET columnID = (SELECT UUID()) where columnID is not null
// UID Format: 30B9BE365FF011EA8F4C125FC56F0F50
UPDATE `events` SET `evt_uid` = (SELECT UPPER(REPLACE(#i:=UUID(),'-','')));
// UID Format: c915ec5a-5ff0-11ea-8f4c-125fc56f0f50
UPDATE `events` SET `evt_uid` = (SELECT UUID());
// UID Format: C915EC5A-5FF0-11EA-8F4C-125FC56F0F50
UPDATE `events` SET `evt_uid` = (SELECT UPPER(#i:=UUID()));
I got this error when using mysql as sql_mode = "". After some testing, I decided that the problem was caused by this usage. When I tested on the default settings, I found that this problem was not there.
Note: Don't forget to refresh your connection after changing the mode.
SELECT CONCAT(SUBSTRING(REPLACE(UUID(),'-',''), 1, 5), SUBSTRING(UPPER(REPLACE(UUID(),'-','')), 4, 5), SUBSTRING('##$%(*&', FLOOR(RAND()*(1-8))+8, 1)) pass
I did this SELECT, five character lower case, five character upper case and one special character.