Convert MySQL query into PostgreSQL - mysql

I have this query:
DROP TABLE IF EXISTS tmp_table;
CREATE TEMPORARY TABLE tmp_table(id int primary key)
IGNORE (
SELECT user2role.userid AS userid
FROM user2role
INNER JOIN users ON users.id=user2role.userid
INNER JOIN role ON role.roleid=user2role.roleid
WHERE role.parentrole like 'H1::H2::H3::H4::H5::%')
UNION (
SELECT groupid
FROM groups
WHERE groupid IN (2,3,4));
This query was originally written in MySQL and instead of DROP TABLE IF EXISTS it used IF NOT EXISTS. I changed that part, but I don't know what to do about the IGNORE.
First off, what is IGNORE doing?
I tried looking for PostgreSQL equivalents, but they all seem to involve complicated procedures. Do I have to write a procedure for this? And if I have to write one, what would it look like? Could I just emulate IGNORE using some PHP code instead? (The SQL queries are generated by PHP.)

You would write like this in postgres.
IGNORE is irrelevant here, as the table has just been recreated and is guaranteed to be empty. And UNION guarantees there are no duplicate rows inserted.
DROP TABLE IF EXISTS tmp_table;
CREATE TEMP TABLE tmp_table(id int4 primary key);
INSERT INTO tmp_table
SELECT user2role.userid::int4 AS id
FROM user2role
JOIN users ON users.id = user2role.userid
JOIN role ON role.roleid = user2role.roleid
WHERE role.parentrole like 'H1::H2::H3::H4::H5::%'
UNION
SELECT groupid::int4
FROM groups
WHERE groupid in (2,3,4);
If duplicates in the SELECT cannot occur, you might consider the faster UNION ALL instead of UNION. Otherwise you need UNION to eliminate possible dupes. Read here.
If your dataset is large you might consider creating the primary key after the INSERT. That's faster.
Read the mySQL docs on effects of IGNORE.
On revisiting the page I realized you mention IF NOT EXISTS in the original code.
You don't say so, but that only makes sense if the original code created the table only if it didn't exist already, which introduces the possibility of it being not empty before the INSERT. In this case IGNORE is relevant and needs an equivalent in PostgreSQL.
So here is alternative answer for that interpretation of your question.
CREATE TEMP TABLE IF NOT EXISTS has been implemented in PostgreSQL 9.1.
For older version I posted a solution on SO recently.
CREATE TEMP TABLE IF NOT EXISTS tmp_table(id int4 primary key);
INSERT INTO tmp_table
SELECT x.id
FROM (
SELECT user2role.userid::int4 AS id
FROM user2role
JOIN users ON users.id = user2role.userid
JOIN role ON role.roleid = user2role.roleid
WHERE role.parentrole like 'H1::H2::H3::H4::H5::%'
UNION
SELECT groupid::int4
FROM groups
WHERE groupid in (2,3,4)
) x
LEFT JOIN tmp_table t USING (id)
WHERE t.id IS NULL;
LEFT JOIN ... WHERE t.id IS NULL excludes any id that might already be present in tmp_table. UNION goes into a sub-select, so that clause needs only be applied once. Should be fastest.
More on LEFT JOIN here.

Related

MYSQL drop duplicates of userid

I thought I'd made the column userid in my table "userslive" unique, but somehow must have made a mistake. I've seen multiple answers to this question, but I'm afraid of messing up again so I hope someone can help me directly.
So this table has no unique columns, but I've got a column "timer" which was the timestamp of scraping the data. If possible I'd like to drop rows with the lowest "timer" with duplicate "userid" column.
It's a fairly big table at about 2 million rows (20 columns). There is about 1000 duplicate userid which I've found using this query:
SELECT userid, COUNT(userid) as cnt FROM userslive GROUP BY userid HAVING (cnt > 1);
Is this the correct syntax? I tried this on a backup table, but I suspect this is too heavy for a table this big (unless left to run for a very long time.
DELETE FROM userslive using userslive,
userslive e1
where userslive.timer < e1.timer
and userslive.userid = e1.userid
Is there a quicker way to do this?
EDIT: I should say the "timer" is not a unique column.
DELETE t1.* /* delete from a copy named t1 only */
FROM userslive t1, userslive t2
WHERE t1.userid = t2.userid
AND t1.timer < t2.timer
fiddle
Logic: if for some record (in a copy aliased as t1) we can find a record (in a table copy aliased as t2) with the same user but with greater/later timer value - this record must be deleted.
I've done this in the past and the easiest way to solve this is to add an id column and then select userid, max(new_id) into a new table and join that for the delete. Something like this.
ALTER TABLE `userslive`
ADD `new_id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY;
Now you have your new unique column and create a new table for selecting the ones to delete.
CREATE TABLE `users_to_delete`
AS
SELECT userid, new_id
FROM (
SELECT userid, max(new_id) new_id, count(*) user_rows
FROM `userslive`
GROUP BY 1
) dataset
WHERE user_rows > 1
Then use that to delete your duplicate rows by joining it into a DELETE statement like this:
DELETE `userslive` FROM `userslive`
INNER JOIN `users_to_delete` USING(userid,new_id);
Make sure you back everything up before you delete anything just in case.

How do I migrate and sync up a table efficiently?

I have a table A, it has millions of records and it's growing. A new column needs be added to table A with indexing, but it could be headache to migrate such a large table. So table B is created at some point from table A, question is how to sync up these 2 tables efficiently?
There're multiple scenarios new records will be added to table A.
To 'Sync' up two tables without actually merging them you can create a UNION VIEW. A VIEW can be used just like a table for calculation, manipulation, data storage etc. This is assuming that both tables have the same amount of rows, if not you'll need to create primary and foreign keys.
CREATE OR REPLACE VIEW viewname AS
SELECT * FROM TABLE_A
UNION ALL
SELECT * FROM TABLE_B
Now if both tables don't share the same amount of rows you'll need at least 1 field in common between the two tables called primary and foreign keys, to join the tables using the primary and foreign keys you'll need to use a JOIN like this:
CREATE OR REPLACE VIEW viewname AS
SELECT TableA.FieldName, TableB.FieldName, TableA.FieldName
FROM TableA
LEFT JOIN TableB
ON TableA.primarykeyField = TableB.foreignkeyField
UNION ALL
SELECT TableA.FieldName, TableB.FieldName, TableA.FieldName
FROM TableA
RIGHT JOIN TableB
ON TableA.primarykeyField = TableB.foreignkeyField
It depends on what type of join you want but I think FULL JOIN will give you the best results, FULL JOIN's aren't supported in MySQL but using LEFT JOIN RIGHT JOIN and UNION ALL mimics the same results.
Or if you simply want to copy all the records from table A to table B you could use this.
INSERT INTO TableB
SELECT * FROM TableA;

Mysql, Insert new record into table B if foreign key exists in table A

There are a few similar questions on here. None provide a solution. I would like to INSERT a NEW record into table B, but only if a foreign key exists in table A. To be clear, I do not wish to insert the result of a select. I just need to know that the foreign key exists.
INSERT INTO tableB (tableA_ID,code,notes,created) VALUES ('24','1','test',NOW())
SELECT tableA_ID FROM tableA WHERE tableA_ID='24' AND owner_ID='9'
Clearly, the above does not work. But is this even possible? I want to insert the NEW data into tableB, only if the record for the row in tableA exists and belongs to owner_ID.
The queries I have seen so far relate to INSERTING the results from the SELECT query - I do not wish to do that.
Try this:
INSERT INTO tableB (tableA_ID,code,notes,created)
SELECT id, code, notes, created
FROM ( SELECT '24' as id, '1' as code, 'test' as notes, NOW() as created) t
WHERE EXISTS
(
SELECT tableA_ID
FROM tableA
WHERE tableA_ID='24' AND owner_ID='9'
)
I know it's a pretty much old answered question but it's highly ranked now in google search results and I think some addition may help someone in the future.
In some DB configuration, you may want to insert a row in a table that have two or more foreign keys. Let's say we have four tables in a chat application :
Users, Threads, Thread_Users and Messages
If we want a User to join a Thread we'll want to insert a row in Thread_Users in wich have two foreign keys : user_id, thread_id.
Then, we can use a query like this, to insert if both foreign keys exists, and silently fail otherwise :
INSERT INTO `thread_users` (thread_id,user_id,status,creation_date)
SELECT 2,3,'pending',1601465161690 FROM (SELECT 1 as nb_threads, 1 as nb_users) as tmp
WHERE tmp.nb_threads = (SELECT count(*) FROM `threads` WHERE threads.id = 2)
AND tmp.nb_users = (SELECT count(*) FROM `users` WHERE users.id = 3)
It's a little verbose but it does the job pretty well.
Application-side, we just have to raise an error if affectedRows = 0 and maybe trying to see which of the keys doesn'nt exists. IMHO, it's a better way to do the job than to execute two SELECT queries and THEN execute the INSERT especially when an inexistent foreign key probability is very low.

Deleting multiple records from a very big table (query takes forever)

I am using the following query to delete multiple records except one from my table. It works well with small tables, but it got stuck when I tried it with a table that has >130000 records. The thing is, I don't even get an error. phpMyAdmin just gets stuck and the query ("loading... yellow line) basically takes forever.
My table structure
person_id (AI & PK)
person_name ( I want to delete multiple person_name records except one)
query
DELETE t2
FROM `person` t1
INNER JOIN `person` t2
ON t1.person_name = t2.person_name
AND t1.person_id < t2.person_id;
UPDATE : I don't have an index on person table. But my three other tables (person_job & person_image, book_who_wrote_it) contains foreign keys from person table (person_id)
First, do you have an index on person(person_name, person_id)? That would be the place to start.
Deleting lots of rows incurs overhead. Often, it is faster to put the results in another table and reinsert them:
create temporary table tmp_person as
select p.*
from person p join
(select person_name, max(person_id) as max_person_id
from person
) pp
on p.person_id = pp.max_person_id;
truncate table person;
insert into person
select * from tmp_person;
Be sure you validate tmp_person before truncating person! Truncate does not log the deletion of each row, so it is much, much, much faster than delete under most circumstances.
NOTE:
If you really only have two columns in person, then you can simplify the first query to:
create temporary table tmp_person as
select person_name, max(person_id) as max_person_id
from person;
try this
DELETE
FROM `person` t1
where person_id not in
(select * from
(select person_id from person group by person_name)x)

Deleting duplicates in mysql (2 tables)

I have two tables (id_test, test) , each of them has an ID column, which is unique, and two entries with the same id in the two tables are the same. Now, i have another column in one of the tables (id_test) that also should be unique, so I want to eliminate duplicates according to this other column, let's call it YD.
To identify the duplicates I used
SELECT ID, YD AS x, COUNT(*) AS y
FROM id_test
GROUP BY x
HAVING y>1;
now, I want to delete these entries in both tables. How can I do it?
This query shows the first ID for every YD in id_test table:
SELECT ID, YD
FROM id_test
GROUP BY YD
and these are the rows you have to keep. The following query returns the IDs you have to delete:
SELECT id_test.ID
FROM id_test LEFT JOIN (select ID, YD from id_test group by YD) id_test_keep
on id_test.ID=id_test_keep.ID and id_test.YD = id_test_keep.YD
WHERE id_test_keep.ID IS NULL
Now I think i need more details about your tables, but what I think you need is this:
DELETE FROM test
WHERE
test.ID IN (
SELECT id_test.ID
FROM id_test LEFT JOIN (select ID, YD from id_test group by YD) id_test_keep
on id_test.ID=id_test_keep.ID and id_test.YD = id_test_keep.YD
WHERE id_test_keep.ID IS NULL)
As documented under ALTER TABLE Syntax (emphasis added):
IGNORE is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.
Therefore:
ALTER IGNORE TABLE id_test ADD UNIQUE (YD)
I think you don't user select in because if data large it impossible.
You should clone a table the same structure. Insert data not duplicate in it.
INSERT INTO test_new (ID, YD) SELECT t.ID, t.YD FROM test t LEFT JOIN test_id ti ON t.ID = ti.id WHERE ti.id IS NULL;
After drop table test, rename test_new -> test.