Update with Subquery never completes - mysql

I'm currently working on a project with a MySQL Db of more than 8 million rows. I have been provided with a part of it to test some queries on it. It has around 20 columns out of which 5 are of use to me. Namely: First_Name, Last_Name, Address_Line1, Address_Line2, Address_Line3, RefundID
I have to create a unique but random RefundID for each row, that is not the problem. The problem is to create same RefundID for those rows whose First_Name, Last_Name, Address_Line1, Address_Line2, Address_Line3 as same.
This is my first real work related to MySQL with such large row count. So far I have created these queries:
-- Creating Teporary Table --
CREATE temporary table tempT (SELECT tt.First_Name, count(tt.Address_Line1) as
a1, count(tt.Address_Line2) as a2, count(tt.Address_Line3) as a3, tt.RefundID
FROM `tempTable` tt GROUP BY First_Name HAVING a1 >= 2 AND a2 >= 2 AND a3 >= 2);
-- Updating Rows with First_Name from tempT --
UPDATE `tempTable` SET RefundID = FLOOR(RAND()*POW(10,11))
WHERE First_Name IN (SELECT First_Name FROM tempT WHERE First_Name is not NULL);
This update query keeps on running but never ends, tempT has more than 30K rows. This query will then be run on the main DB with more than 800K rows.
Can someone help me out with this?
Regards

The solutions that seem obvious to me....
Don't use a random value - use a hash:
UPDATE yourtable
SET refundid = MD5('some static salt', First_Name
, Last_Name, Address_Line1, Address_Line2, Address_Line3)
The problem is that if you are using an integer value for the refundId then there's a good chance of getting a collision (hint CONV(SUBSTR(MD5(...),1,16),16,10) to get a SIGNED BIGINT). But you didn't say what the type of the field was, nor how strict the 'unique' requirement was. It does carry out the update in a single pass though.
An alternate approach which creates a densely packed seguence of numbers is to create a temporary table with the unique values from the original table and a random value. Order by the random value and set a monotonically increasing refundId - then use this as a look up table or update the original table:
SELECT DISTINCT First_Name
, Last_Name, Address_Line1, Address_Line2, Address_Line3
INTO temptable
FROM yourtable;
set #counter=-1;
UPDATE temptable t SET t,refundId=(#counter:=#counter + 1)
ORDER BY r.randomvalue;
There are other solutions too - but the more efficient ones rely on having multiple copies of the data and/or using a procedural language.

Try using the following:
UPDATE `tempTable` x SET RefundID = FLOOR(RAND()*POW(10,11))
WHERE exists (SELECT 1 FROM tempT y WHERE First_Name is not NULL and x.First_Name=y.First_Name);

In MySQL, it is often more efficient to use join with update than to filter through the where clause using a subquery. The following might perform better:
UPDATE `tempTable` join
(SELECT distinct First_Name
FROM tempT
WHERE First_Name is not NULL
) fn
on temptable.First_Name = fn.First_Name
SET RefundID = FLOOR(RAND()*POW(10,11));

Related

MYSQL: How to update unique random number to existing rows

It's been my first question to this website, I'm sorry if I used any wrong keywords. I have been with one problem from quite a few days.
The Problem is, I have a MYSQL table named property where I wanted to add a ref number which will be a unique 6 digit non incremental number so I alter the table to add a new column named property_ref which has default value as 1.
ALTER TABLE property ADD uniqueIdentifier INT DEFAULT (1) ;
Then I write a script to first generate a number then checking it to db if exist or not and If not exist then update the row with the random number
Here is the snippet I tried,
with cte as (
select subIdentifier, id from (
SELECT id, LPAD(FLOOR(RAND() * (999999 - 100000) + 100000), 6, 0) AS subIdentifier
FROM property as p1
WHERE "subIdentifier" NOT IN (SELECT uniqueIdentifier FROM property as p2)
) as innerTable group by subIdentifier
)
UPDATE property SET uniqueIdentifier = (
select subIdentifier from cte as c where c.id = property.id
) where property.id != ''
this query returns a set of record for almost all the rows but I have a table of entries of total 20000,
but this query fills up for ~19000 and rest of the rows are null.
here is a current output
[current result picture]
If anyone can help, I am extremely thanks for that.
Thanks
Instead of trying to randomly generate unique numbers that do not exist in the table, I would try the approach of randomly generating numbers using the ID column as a seed; as long as the ID number is unique, the new number will be unique as well. This is not technically fully "random" but it may be sufficient for your needs.
https://www.db-fiddle.com/f/iqMPDK8AmdvAoTbon1Yn6J/1
update Property set
UniqueIdentifier = round(rand(id)*1000000)
where UniqueIdentifier is null
SELECT id, round(rand(id)*1000000) as UniqueIdentifier FROM test;

Delete Duplicates from large mysql Address DB

I know, deleting duplicates from mysql is often discussed here. But none of the solution work fine within my case.
So, I have a DB with Address Data nearly like this:
ID; Anrede; Vorname; Nachname; Strasse; Hausnummer; PLZ; Ort; Nummer_Art; Vorwahl; Rufnummer
ID is primary Key and unique.
And i have entrys for example like this:
1;Herr;Michael;Müller;Testweg;1;55555;Testhausen;Mobile;012345;67890
2;Herr;Michael;Müller;Testweg;1;55555;Testhausen;Fixed;045678;877656
The different PhoneNumber are not the problem, because they are not relevant for me. So i just want to delete the duplicates in Lastname, Street and Zipcode. In that case ID 1 or ID 2. Which one of both doesn't matter.
I tried it actually like this with delete:
DELETE db
FROM Import_Daten db,
Import_Daten dbl
WHERE db.id > dbl.id AND
db.Lastname = dbl.Lastname AND
db.Strasse = dbl.Strasse AND
db.PLZ = dbl.PLZ;
And insert into a copy table:
INSERT INTO Import_Daten_1
SELECT MIN(db.id),
db.Anrede,
db.Firstname,
db.Lastname,
db.Branche,
db.Strasse,
db.Hausnummer,
db.Ortsteil,
db.Land,
db.PLZ,
db.Ort,
db.Kontaktart,
db.Vorwahl,
db.Durchwahl
FROM Import_Daten db,
Import_Daten dbl
WHERE db.lastname = dbl.lastname AND
db.Strasse = dbl.Strasse And
db.PLZ = dbl.PLZ;
The complete table contains over 10Mio rows. The size is actually my problem. The mysql runs on a MAMP Server on a Macbook with 1,5GHZ and 4GB RAM. So not really fast. SQL Statements run in a phpmyadmin. Actually i have no other system possibilities.
You can write a stored procedure that will each time select a different chunk of data (for example by rownumber between two values) and delete only from that range. This way you will slowly bit by bit delete your duplicates
A more effective two table solution can look like following.
We can store only the data we really need to delete and only the fields that contain duplicate information.
Let's assume we are looking for duplicate data in Lastname , Branche, Haushummer fields.
Create table to hold the duplicate data
DROP TABLE data_to_delete;
Populate the table with data we need to delete ( I assume all fields have VARCHAR(255) type )
CREATE TABLE data_to_delete (
id BIGINT COMMENT 'this field will contain ID of row that we will not delete',
cnt INT,
Lastname VARCHAR(255),
Branche VARCHAR(255),
Hausnummer VARCHAR(255)
) AS SELECT
min(t1.id) AS id,
count(*) AS cnt,
t1.Lastname,
t1.Branche,
t1.Hausnummer
FROM Import_Daten AS t1
GROUP BY t1.Lastname, t1.Branche, t1.Hausnummer
HAVING count(*)>1 ;
Now let's delete duplicate data and leave only one record of all duplicate sets
DELETE Import_Daten
FROM Import_Daten LEFT JOIN data_to_delete
ON Import_Daten.Lastname=data_to_delete.Lastname
AND Import_Daten.Branche=data_to_delete.Branche
AND Import_Daten.Hausnummer = data_to_delete.Hausnummer
WHERE Import_Daten.id != data_to_delete.id;
DROP TABLE data_to_delete;
You can add a new column e.g. uq and make it UNIQUE.
ALTER TABLE Import_Daten
ADD COLUMN `uq` BINARY(16) NULL,
ADD UNIQUE INDEX `uq_UNIQUE` (`uq` ASC);
When this is done you can execute an UPDATE query like this
UPDATE IGNORE Import_Daten
SET
uq = UNHEX(
MD5(
CONCAT(
Import_Daten.Lastname,
Import_Daten.Street,
Import_Daten.Zipcode
)
)
)
WHERE
uq IS NULL;
Once all entries are updated and the query is executed again, all duplicates will have the uq field with a value=NULL and can be removed.
The result then is:
0 row(s) affected, 1 warning(s): 1062 Duplicate entry...
For newly added rows always create the uq hash and and consider using this as the primary key once all entries are unique.

Query in MySQL to Find Distinct Status Based on Primary Key

My question is regarding a MySQL query that I am trying to write. I have written some psuedo-code to help illustrate what query I am trying to write:
SELECT *
FROM persons AS p
INNER JOIN person_info AS pi
ON p.person_id = pi.person_id
WHERE status MAY INCLUDE lost, missing, or found
WHAT person_id has no instances of the found status
I'd like to know for each person_id (which can have multiple statuses), which do not have any instance of the status "found." I'm not concerned with just the records of lost and missing. I want to find the unique cases where there is no "found" status based on each unique, distinct person_id.
If I'm understand correctly, one option is to use not in:
select *
from persons
where personid not in (
select personid
from person_info
where status = 'found'
)
This will return all records from the persons table that don't have a matching record in the person_info table with status = 'found'.
Alternatively you can use left join/null check. Not exists can work, but may be slower with mysql. There are some potential issues with null checks as well. Depends on desired results at that point.
This is as far as I took it #sgeddes. In writing it I realized it just makes peoples eyes glaze over.
SQL NOT IN () danger
create table mStatus
( id int auto_increment primary key,
status varchar(10) not null
);
insert mStatus (status) values ('single'),('married'),('divorced'),('widow');
create table people
( id int auto_increment primary key,
fullName varchar(100) not null,
status varchar(10) null
);
Chunk1:
truncate table people;
insert people (fullName,status) values ('John Henry','single');
select * from mstatus where status not in (select status from people);
** 3 rows, as expected **
Chunk2:
truncate table people;
insert people (fullName,status) values ('John Henry','single'),('Kim Billings',null);
select * from mstatus where status not in (select status from people);
no rows, huh?
Obviously this is 'incorrect'. It arises from SQL's use of three-valued logic,
driven by the existence of NULL, a non-value indicating missing (or UNKNOWN) information.
With NOT IN, Chunk2 it is translated like this:
status NOT IN ('married', 'divorced', 'widowed', NULL)
This is equivalent to:
NOT(status='single' OR status='married' OR status='widowed' OR status=NULL)
The expression "status=NULL" evaluates to UNKNOWN and, according to the rules of three-valued logic,
NOT UNKNOWN also evaluates to UNKNOWN. As a result, all rows are filtered out and the query returns an empty set.
Possible solutions include:
select s.status
from mstatus s
left join people p
on p.status=s.status
where p.status is null
or use not exists

DELETE FROM X WHERE IN (..) is not deleting rows

I have an issue where I'm filtering a table by a bunch of different values. There's about 30 different filters on this table and since I'm still a novice with MySQL I have it done in a stored procedure executing multiple DELETE queries from a temporary table to filter. This example is only going to show the filter that I'm having issues from, which is a DELETE FROM table WHERE value IN () query.
Here's a test Schisma:
CREATE TABLE accounts (
user_id INT(11) NOT NULL AUTO_INCREMENT,
name VARCHAR(40) NOT NULL,
PRIMARY KEY(user_id)
);
CREATE TABLE blocked (
user_id INT(11) NOT NULL,
other_id INT(11) NOT NULL,
);
INSERT INTO accounts (name) VALUES ('Chris'), ('Andy');
INSERT INTO blocked (user_id, other_id) VALUES (1, 2);
The queries create two tables: the accounts table containing two rows, and the blocked table containing one row where user_id 1 has user_id 2 blocked.
Here's the query that's causing us some problem (Please note that the queries are actually more complex than displayed, but the DELETE query is 100% the same, and the issue persists through the test example provided):
BEGIN
#user_in input is a int(11) value bassed in the CALL FUNCTION(ID).
CREATE TEMPORARY TABLE IF NOT EXISTS filtered AS (SELECT * FROM accounts);
DELETE FROM filtered WHERE user_id IN (SELECT other_id FROM blocked WHERE blocked.user_id = user_in);
SELECT * FROM filtered;
END
This query should delete the row with the user_id field of 2, as in the blocked table the only row is (1, 2).
Running the SELECT query directly providing the user_id returns the other_id of 2.
SELECT other_id FROM blocked WHERE blocked.other_id = 2;
However, the stored procedure returns both rows, instead of just one. Why?
NOTE: The above query is to show what is returned when the query SELECT other_id FROM blocked WHERE blocked.user_id = user_in, another example would be SELECT other_id FROM blocked WHERE blocked.user_id = 1 granted user_in is set to 1. Both of these queries will return a set of (2) which would make the delete query look like DELETE FROM filtered WHERE user_id IN (2). This is not working, for whatever reason.
To get a simple select of that users use next query
SELECT * FROM accounts WHERE accounts.user_id NOT IN (SELECT distinct blocked.other_id from blocked)
To do it with one single select without deleting rows from temporary table use next query:
BEGIN
CREATE TEMPORARY TABLE IF NOT EXISTS filtered AS (SELECT * FROM accounts WHERE accounts.user_id NOT IN (SELECT distinct blocked.other_id from blocked));
SELECT * from filtered;
END
No need for select all in temporary table first and then delete specific rows.
Hope it helps
EDIT:
I'v read the question and still a bit confused about you problem. But i checked this solution and it works perfectly so i don't understand what is problem with this. In your procedure you have
DELETE FROM filtered WHERE user_id IN (SELECT other_id FROM blocked WHERE blocked.user_id = user_in);
and after that you say that
SELECT other_id FROM blocked WHERE blocked.other_id = 2;
And i can say that blocked.other_id and blocked.user_id are two different columns.
No disrespect but amateur mistake to mix up columns. :)
The problem here is with this statement:
DELETE FROM filtered WHERE user_id IN (SELECT other_id FROM blocked WHERE blocked.other_id = user_id);
Try changing it to this:
DELETE FROM filtered WHERE user_id
IN (SELECT other_id FROM blocked);
Reason being that the blocked table has both a other_id and a user_id column. So where you are attempting to join out to the filtered table you are in fact comparing the other_id and user_id columns in the blocked table only. Which are not equal. So no delete happens.

Insert into... select from.... query with where condition

I want make sql query which will insert values from one table to another table by checking where condition on 1st table.
I have to check is that row present previously in 1st table or not. If not present then add otherwise don't add.
There is query "insert into select from" pattern in sql.
I have tried following query. But it inserts many duplicates.
INSERT INTO
company_location (company_id, country_id, city_id)
SELECT
ci.company_id, hq_location, hq_city
FROM
company_info ci, company_location cl
WHERE
ci.company_id <> cl.company_id
AND cl.country_id <> ci.hq_location
AND cl.city_id <> ci.hq_city;
Duplicate avoiding means that tuple (company_id, country_id, city_id) shouldn't added again. And I have to add from more 4 tables into these table.
Also I require query for removing duplicates from company_location. i.e. combination of (company_id, country_id, city_id) should exist only single time. Keep only one tuple and remove other rows.
I hope this untested Script helps! It inserts every combination just once.
INSERT INTO company_location
(company_id,country_id,city_id)
SELECT distinct ci.company_id,
ci.hq_location,
ci.hq_city
FROM company_info ci
WHERE ci.company_id NOT IN
(SELECT cl1.company_id FROM company_location cl1
WHERE cl1.country_id = ci.hq_location
AND cl1.city_id = ci.hq_city
AND cl1.company_id = ci.company_id)
INSERT INGORE works.
If you want a column (or column set) to be unique, put a UNIQUE constraint on your table. If yu no have UNIQUE CONSTRAINT, therefore, by definition, the table cannot contain any undesirable duplicates, since not putting a UNIQUE constraint means duplicates are desirable.
Add UNIQUE( company_id,country_id,city_id )(or maybe it's your primary key for that table)
use INSERT IGNORE
You can also rewrite your query correctly. The query does not do what you think it does, and you cannot do what you want by using the old join syntax from the 18th century.
SELECT * FROM t1, t2, t3
Is a CROSS JOIN, this means it takes all possible combinations of rows from table t1,t2,t3. Usually the WHERE contains some "t1.id=t2.id" conditions to restrict it and turn it into an INNER JOIN, but "<>" conditions do not do this...
You need a proper LEFT JOIN :
INSERT INTO company_location (company_id,country_id,city_id)
SELECT ci.company_id, hq_location, hq_city
FROM company_info ci,
LEFT JOIN company_location cl ON (
ci.company_id = cl.company_id
AND cl.country_id = ci.hq_location
AND cl.city_id = ci.hq_city
)
WHERE cl.company_id IS NULL
Here the answer to your second Question; Query to delete duplicate entries:
Please be careful with the statements they are not tested.
Solution 1:
This solution only works, if you have a row-Id in your table.
DELETE FROM company_location
WHERE id NOT IN
(SELECT MAX(cl1.id)
FROM company_location cl1
WHERE cl1.company_id = company_location.company_id
AND cl1.country_id = company_location.country_id
AND cl1.city_id = company_location.city_id)
Solution 2:
This works without row_id. It writes all data into a Temporary table. Deletes the content on the first table. And inserts every tupel just once.
To that solution: Be careful if you have defined constraints on that table!
CREATE TEMPORARY TABLE tmp_company_location
(
company_id bigint
,country_id bigint
,city_id bigint
);
INSERT INTO tmp_company_location
(company_id,country_id,city_id)
SELECT DISTINCT
company_id
,country_id
,city_id
FROM company_location WHERE 1;
DELETE FROM company_location;
INSERT INTO company_location
SELECT DISTINCT
company_id
,country_id
,city_id
FROM tmp_company_location;
use INSERT IGNORE INTO
from Mysql Docs
Specify IGNORE to ignore rows that would cause duplicate-key violations.