Using WHERE on grouped rows after UNION statement - mysql

I have a database schema with two tables, song and edited_song. These tables are identical, except for one extra column in edited_song called deleted. The edited_song-table contains a reference to the id in the song-table. I want to find all the songs which aren't deleted.
I have a UNION-statement in which I GROUP on the id of the result of two SELECT-statements. I want to exclude results where the deleted column has the value 1. An example of the setup can be seen here.
CREATE TABLE if not exists song
(
id int(11) NOT NULL auto_increment ,
title varchar(255),
PRIMARY KEY (id)
);
CREATE TABLE if not exists editedsong
(
id int(11) NOT NULL auto_increment ,
title varchar(255),
deleted tinyint(1),
PRIMARY KEY (id)
);
INSERT INTO song (id, title) VALUES
(1, 'Born in the USA');
INSERT INTO editedsong (id, title, deleted) VALUES
(1, 'Born in the USA', 1);
And the query is here:
SELECT * FROM
((SELECT *, 0 AS deleted FROM song WHERE id=1)
UNION
(SELECT * FROM editedsong WHERE id=1)) AS song
WHERE song.deleted!=1
GROUP BY song.id
The UNION-statement is used instead of a join as there is a LOT of text in these two tables and a join results in writing to disk. This is a simplified form of the real query, but it reproduces the problem I'm experiencing. I would expect the query to yield no results as the GROUP BY should preserve the first row and throw away all following. Why doesn't it do this? Is it because the WHERE is executed before the GROUP BY? If it is, what is a good way to overcome this problem?
http://sqlfiddle.com/#!2/5cdb6c/3

The reason that the code in the SQLFiddle doesn't work is that the WHERE clause is excluding the deleted record from editedsong before the GROUP BY is executed.
You can use HAVING to apply criteria after a GROUP BY clause.
This appears to work:
SELECT *, max(deleted) as md FROM
((SELECT *, 0 AS deleted FROM song)
UNION
(SELECT * FROM editedsong)) AS song
-- WHERE song.deleted!=1
GROUP BY song.id
HAVING md != 1
This returns the record from song, not the record from editedsong for records that haven't been deleted. If you want the other, reverse the order of the items in the UNION clause.
This syntax for GROUP BY is unusual, and I'm surprised it's supported. Most database systems I've worked with require every field in the output to have some treatment specified (MAX, COUNT, GROUP BY, etc). So a SELECT * is incompatible with GROUP BY. MySQL must be making some assumption or have some default behaviours here, but I think most servers wouldn't like it (me either).

Related

Why am I getting "Duplicate entry" error on SELECT DISTINCT query?

I have the following query to append data into a table if it is unique:
INSERT INTO belgarath.players(tour_id, player_id, player_name_oc)
SELECT DISTINCT 0, ID_P, NAME_P FROM oncourt.players_atp
LEFT JOIN belgarath.players
ON belgarath.players.tour_id = 0
AND belgarath.players.player_id=oncourt.players_atp.ID_P;
I run this once on an empty table and it's fine. I delete a row and run it expecting MySQL to append the one deleted row. However, I get the following error code: Error Code: 1062. Duplicate entry '0-43042' for key 'players.unique_plyrs' . I have a unique key across tour_id and player_id and clearly it's failing because I'm trying to append a duplicate record.
Why would I be getting this if I'm only selecting distinct records to insert? How do I avoid getting this in future?
This should resolve your issue. Put a Where clause to check for belgarath.players.player_id is NULL.
INSERT INTO belgarath.players(tour_id, player_id, player_name_oc)
SELECT DISTINCT 0, ID_P, NAME_P FROM oncourt.players_atp
LEFT JOIN belgarath.players
ON belgarath.players.tour_id = 0
AND belgarath.players.player_id=oncourt.players_atp.ID_P
WHERE belgarath.players.player_id is NULL;
Hope this hint realted to Distinct keyword helps you. When we use distinct key it usually select distinct rows. So we can't expect it should return distinct values for only one column before which we have wrote distinct. Below example will better explain you what i am trying to say.
create table test(id1 int, id2 int);
insert into test values(1,1),(1,2),(1,3);
Here i have created a test table and when i use distinct keyword as used in below query
select distinct id1, id2 from test;
Then we'll get output like this:
id1 id2
1 1
1 2
1 3
You are inserting tour_ID as 0, and as you have defined tour_id and player_id as unique key in oncourt.players_atp table. So your select query is selecting tour_id as '0' every time. Because select query with distinct is getting really distinct records like say player_id is 1,2,3 and names are john, steve, bill respectively then select query will return this 3 records like (0, 1, john), (0, 2, steve), (0, 3, bill) and so on.
If your oncourt.players_atp table also has unique constraint and that table also contains tour_id then you can just copy tour ID from there. If tour_id is not present there and you want to generate it inside belgarath.players table only then in you table definition you can define tour id as a auto increment then it will generate unique id's there and then you don't need to select tour_id in your query you just have to insert player_id and player_name once you define tour_id as an autoincrement ID.
Hope this may help you.

What is auto_increment fields values order for MySQL INSERT .. SELECT statement

Lets say we have next table:
CREATE TABLE test_insert_order (
id INT(11) NOT NULL AUTO_INCREMENT,
parent_id INT(11) NOT NULL,
name VARCHAR(20),
PRIMARY KEY (id)
);
With some data like
INSERT INTO test_insert_order (parent_id, name) VALUES (1, 'a'),(1, 'b'),(1,'c'),(2,'b'),(2,'d'),(2,'a'),(3,'d'),(3,'a'),(4,'aa'),(5,'bb'),(6,'a'),(3,'a'),(1,'d'),(2,'c');
If we do
INSERT INTO test_insert_order (parent_id, name) SELECT 7, `name` FROM test_insert_order WHERE parent_id = 2 ORDER BY id;
Can we assume that new auto_generated ids will be in the same order as ids in result of select
SELECT id, 7, `name` FROM test_insert_order WHERE parent_id = 2 ORDER BY id;
So in result next query a.name will always match b.name
SET #i:=0;set #j:=0;
SELECT a.id, a.parent_id, a.name, a.order_id, b.id, b.parent_id, b.name, b.order_id FROM
(SELECT *, #j:=#j+1 as order_id FROM test_insert_order WHERE parent_id = 2 ORDER BY id) a,
(SELECT *, #i:=#i+1 as order_id FROM test_insert_order WHERE parent_id = 7 ORDER BY id) b
WHERE a.order_id = b.order_id;
I have made a few tests with concurrent treads and it is always true. But I can not find anything in MySQL docs about this situation.
UPDATE:
I guess this could be not true in some cluster solutions when a few instances have own pattern for autoincrement value, and one is lagging behind and query execution get distributed between then some how. But I do not have environment to check this.
After more research I will answer my own question.
I guess in most situations this will be true, but I was able to find cases when this algorithm could cause problem. First is mentioned in my UPDATE to the question regarding clusters solutions. Second, I can imagine is situation when table has gaps in id - auto_increment field (initially started from 1000000 not 0) and during execution of insert statement auto_increment value manually changed to lower value. So this will break auto_increment pattern.
I would suggest instead use order by some meaningful fields that we can predict uniqueness. If there are non, then there are not difference which from 2 identical record that was just inserted to use.
Regarding question from title. Auto_increment values in single bulk insert in single MySQL instance usually will be in growing order, but there are cases when it could be interrupted with auto_increment value changed to lower. On cluster solution it depend on cluster implementation, and most likely will be not predictable too.

Mysql, Insert new record into table B if foreign key exists in table A

There are a few similar questions on here. None provide a solution. I would like to INSERT a NEW record into table B, but only if a foreign key exists in table A. To be clear, I do not wish to insert the result of a select. I just need to know that the foreign key exists.
INSERT INTO tableB (tableA_ID,code,notes,created) VALUES ('24','1','test',NOW())
SELECT tableA_ID FROM tableA WHERE tableA_ID='24' AND owner_ID='9'
Clearly, the above does not work. But is this even possible? I want to insert the NEW data into tableB, only if the record for the row in tableA exists and belongs to owner_ID.
The queries I have seen so far relate to INSERTING the results from the SELECT query - I do not wish to do that.
Try this:
INSERT INTO tableB (tableA_ID,code,notes,created)
SELECT id, code, notes, created
FROM ( SELECT '24' as id, '1' as code, 'test' as notes, NOW() as created) t
WHERE EXISTS
(
SELECT tableA_ID
FROM tableA
WHERE tableA_ID='24' AND owner_ID='9'
)
I know it's a pretty much old answered question but it's highly ranked now in google search results and I think some addition may help someone in the future.
In some DB configuration, you may want to insert a row in a table that have two or more foreign keys. Let's say we have four tables in a chat application :
Users, Threads, Thread_Users and Messages
If we want a User to join a Thread we'll want to insert a row in Thread_Users in wich have two foreign keys : user_id, thread_id.
Then, we can use a query like this, to insert if both foreign keys exists, and silently fail otherwise :
INSERT INTO `thread_users` (thread_id,user_id,status,creation_date)
SELECT 2,3,'pending',1601465161690 FROM (SELECT 1 as nb_threads, 1 as nb_users) as tmp
WHERE tmp.nb_threads = (SELECT count(*) FROM `threads` WHERE threads.id = 2)
AND tmp.nb_users = (SELECT count(*) FROM `users` WHERE users.id = 3)
It's a little verbose but it does the job pretty well.
Application-side, we just have to raise an error if affectedRows = 0 and maybe trying to see which of the keys doesn'nt exists. IMHO, it's a better way to do the job than to execute two SELECT queries and THEN execute the INSERT especially when an inexistent foreign key probability is very low.

INNER JOIN and GROUP BY to prevent duplicate results

Context:
I'm working on a simple ORM (for PHP) that automatize most of queries, based on a static configuration.
Thus, from tables and entities definitions, the library handles joins automatically and generates appropriate fields/table alias... No problem for LEFT joins but INNER may result in duplicated results in case of relation One-to-Many.
My thought was to automatically add a GROUP BY clause (on the auto-increment key) if necessary.
The question
Is it correct to consider that I need to add a GROUP BY clause if (and only if) the join's ON and WHERE conditions doesn't match a unique key of the joined table ?
Example
A very simple example, where I want to select all events with (at least) an associated Showing.
If there is an other way to do it without INNER JOIN, I'm interested to know how :)
CREATE TABLE `Event` (
`Id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
`Name` VARCHAR(255) NOT NULL
);
INSERT INTO `Event` (`Name`) VALUES ('My cool event');
CREATE TABLE `Showing` (
`Id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
`EventId` INT UNSIGNED NOT NULL,
`Place` VARCHAR(50) NOT NULL,
FOREIGN KEY (`EventId`) REFERENCES `Event`(`Id`),
UNIQUE (`EventId`, `Place`)
);
INSERT INTO `Showing` (`EventId`, `Place`) VALUES (1, 'School');
INSERT INTO `Showing` (`EventId`, `Place`) VALUES (1, 'Park');
-- Correct queries
SELECT t.* FROM `Event` t INNER JOIN `Showing` t1 ON t.Id=t1.`EventId` WHERE t1.`PlaceId` = 'School';
SELECT t.* FROM `Event` t INNER JOIN `Showing` t1 ON t.Id=t1.`EventId` AND t1.`PlaceId` = 'School';
-- Query leading to duplicate values
SELECT t.* FROM `Event` t INNER JOIN `Showing` t1 ON t.Id=t1.`EventId`;
-- Group by query to prevent duplicate values
SELECT t.* FROM `Event` t INNER JOIN `Showing` t1 ON t.Id=t1.`EventId` GROUP BY t.`Id`;
Thanks !
(this should be a comment but its a bit long)
No problem for LEFT joins but INNER may result in duplicated results in case of relation One-to-Many
It's clear from that sentence that at least one of us is very confused about how a relational database works, and how object-relation mapping should work.
Query leading to duplicate values
The rows produced are not duplicates - you've written the query so it doesn't show you why they are different:
SELECT t1.place, t.*
FROM Event
INNER JOIN Showing
ON Event.Id=Showing.EventId;
If you're not interested in the data from 'showing' then why is it in your query? If you have events without related showing records then you should be using an 'EXISTS' - not a join (consider where you have a single event but 3 million showings)
SELECT t1.place, t.*
FROM `Event` t
WHERE EXISTS (SELECT 1
FROM Showing
WHERE Event.Id=Showing.EventId);
If you are strictly implementing ORM, then you probably shouldn't be writing queries with joins at all - but IMHO, the scenario is better served by using factories.
The data is saying that "My Cool Event" is happening at the park, and at the school. If you inner join the tables you will get more than one result.
Do this query to see what is going on:
Select t.*, t1.* FROM `Event` t INNER JOIN `Showing` t1 ON t.Id=t1.`EventId`;
That is the same query as your duplicate query, but selecting columns from both tables.
The first line of results says the event is happening at the park. The second line says that the same event is happening at the school.

Deleting duplicates in mysql (2 tables)

I have two tables (id_test, test) , each of them has an ID column, which is unique, and two entries with the same id in the two tables are the same. Now, i have another column in one of the tables (id_test) that also should be unique, so I want to eliminate duplicates according to this other column, let's call it YD.
To identify the duplicates I used
SELECT ID, YD AS x, COUNT(*) AS y
FROM id_test
GROUP BY x
HAVING y>1;
now, I want to delete these entries in both tables. How can I do it?
This query shows the first ID for every YD in id_test table:
SELECT ID, YD
FROM id_test
GROUP BY YD
and these are the rows you have to keep. The following query returns the IDs you have to delete:
SELECT id_test.ID
FROM id_test LEFT JOIN (select ID, YD from id_test group by YD) id_test_keep
on id_test.ID=id_test_keep.ID and id_test.YD = id_test_keep.YD
WHERE id_test_keep.ID IS NULL
Now I think i need more details about your tables, but what I think you need is this:
DELETE FROM test
WHERE
test.ID IN (
SELECT id_test.ID
FROM id_test LEFT JOIN (select ID, YD from id_test group by YD) id_test_keep
on id_test.ID=id_test_keep.ID and id_test.YD = id_test_keep.YD
WHERE id_test_keep.ID IS NULL)
As documented under ALTER TABLE Syntax (emphasis added):
IGNORE is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.
Therefore:
ALTER IGNORE TABLE id_test ADD UNIQUE (YD)
I think you don't user select in because if data large it impossible.
You should clone a table the same structure. Insert data not duplicate in it.
INSERT INTO test_new (ID, YD) SELECT t.ID, t.YD FROM test t LEFT JOIN test_id ti ON t.ID = ti.id WHERE ti.id IS NULL;
After drop table test, rename test_new -> test.