Compare two query results and output difference - mysql

I have two queries against the same table:
SELECT * FROM table WHERE fileName='x';
SELECT * FROM table WHERE fileName='y';
Now, I am trying to compare the result that is being returned by these queries. There is no ID I could compare against but I need to compare each column.
I was trying to modify this approach, as seen here:
SELECT 'robot' AS `set`, r.*
FROM robot r
WHERE ROW(r.col1, r.col2, …) NOT IN
(
SELECT *
FROM tbd_robot
)
UNION ALL
SELECT 'tbd_robot' AS `set`, t.*
FROM tbd_robot t
WHERE ROW(t.col1, t.col2, …) NOT IN
(
SELECT *
FROM robot
)
I am not sure how to modify this code correctly. My attempts to change the table names to the same table but adding a WHERE clause failed in SQL exceptions.
Is this even the best route to take? Maybe there is an even more clever way to compare two query results and output the differences?
Thank you very much for your help in advance :-)
EDIT:
Sample Data:
ID | fileName | firstName | lastName | address
1 | x.txt | John | Doe | 1 Test st
2 | x.txt | Jane | Doe | 3 Test st
3 | y.txt | John | Doe | 2 Test st
4 | y.txt | Jane | Doe | 3 Test st
Since the address differs where ID = 3, this is the row that should be returned.

Perhaps group by is a simpler approach:
select col1, col2, col3, . . .,
sum(fileName = 'x') as count_x,
sum(fileName = '7') as count_y
from table t
where fileName in ('x', 'y')
group by col1, col2, col3, . . .;
For the columns that you specify, you will get the count of rows with 'x' and 'y'.
You can just output the differences by putting having count_x <> count_y at the end of the query.

Related

Select all records where last n characters in column are not unique

I have bit strange requirement in mysql.
I should select all records from table where last 6 characters are not unique.
for example if I have table:
I should select row 1 and 3 since last 6 letters of this values are not unique.
Do you have any idea how to implement this?
Thank you for help.
I uses a JOIN against a subquery where I count the occurences of each unique combo of n (2 in my example) last chars
SELECT t.*
FROM t
JOIN (SELECT RIGHT(value, 2) r, COUNT(RIGHT(value, 2)) rc
FROM t
GROUP BY r) c ON c.r = RIGHT(value, 2) AND c.rc > 1
Something like that should work:
SELECT `mytable`.*
FROM (SELECT RIGHT(`value`, 6) AS `ending` FROM `mytable` GROUP BY `ending` HAVING COUNT(*) > 1) `grouped`
INNER JOIN `mytable` ON `grouped`.`ending` = RIGHT(`value`, 6)
but it is not fast. This requires a full table scan. Maybe you should rethink your problem.
EDITED: I had a wrong understanding of the question previously and I don't really want to change anything from my initial answer. But if my previous answer is not acceptable in some environment and it might mislead people, I have to correct it anyhow.
SELECT GROUP_CONCAT(id),RIGHT(VALUE,6)
FROM table1
GROUP BY RIGHT(VALUE,6) HAVING COUNT(RIGHT(VALUE,6)) > 1;
Since this question already have good answers, I made my query in a slightly different way. And I've tested with sql_mode=ONLY_FULL_GROUP_BY. ;)
This is what you need: a subquery to get the duplicated right(value,6) and the main query yo get the rows according that condition.
SELECT t.* FROM t WHERE RIGHT(`value`,6) IN (
SELECT RIGHT(`value`,6)
FROM t
GROUP BY RIGHT(`value`,6) HAVING COUNT(*) > 1);
UPDATE
This is the solution to avoid the mysql error in the case you have sql_mode=only_full_group_by
SELECT t.* FROM t WHERE RIGHT(`value`,6) IN (
SELECT DISTINCT right_value FROM (
SELECT RIGHT(`value`,6) AS right_value,
COUNT(*) AS TOT
FROM t
GROUP BY RIGHT(`value`,6) HAVING COUNT(*) > 1) t2
)
Fiddle here
Might be a fast code, as there is no counting involved.
Live test: https://www.db-fiddle.com/f/dBdH9tZd4W6Eac1TCRXZ8U/0
select *
from tbl outr
where not exists
(
select 1 / 0 -- just a proof that this is not evaluated. won't cause division by zero
from tbl inr
where
inr.id <> outr.id
and right(inr.value, 6) = right(outr.value, 6)
)
Output:
| id | value |
| --- | --------------- |
| 2 | aaaaaaaaaaaaaa |
| 4 | aaaaaaaaaaaaaaB |
| 5 | Hello |
The logic is to test other rows that is not equal to the same id of the outer row. If those other rows has same right 6 characters as the outer row, then don't show that outer row.
UPDATE
I misunderstood the OP's intent. It's the reversed. Anyway, just reverse the logic. Use EXISTS instead of NOT EXISTS
Live test: https://www.db-fiddle.com/f/dBdH9tZd4W6Eac1TCRXZ8U/3
select *
from tbl outr
where exists
(
select 1 / 0 -- just a proof that this is not evaluated. won't cause division by zero
from tbl inr
where
inr.id <> outr.id
and right(inr.value, 6) = right(outr.value, 6)
)
Output:
| id | value |
| --- | ----------- |
| 1 | abcdePuzzle |
| 3 | abcPuzzle |
UPDATE
Tested the query. The performance of my answer (correlated EXISTS approach) is not optimal. Just keeping my answer, so others will know what approach to avoid :)
GhostGambler's answer is faster than correlated EXISTS approach. For 5 million rows, his answer takes 2.762 seconds only:
explain analyze
SELECT
tbl.*
FROM
(
SELECT
RIGHT(value, 6) AS ending
FROM
tbl
GROUP BY
ending
HAVING
COUNT(*) > 1
) grouped
JOIN tbl ON grouped.ending = RIGHT(value, 6)
My answer (correlated EXISTS) takes 4.08 seconds:
explain analyze
select *
from tbl outr
where exists
(
select 1 / 0 -- just a proof that this is not evaluated. won't cause division by zero
from tbl inr
where
inr.id <> outr.id
and right(inr.value, 6) = right(outr.value, 6)
)
Straightforward query is the fastest, no join, just plain IN query. 2.722 seconds. It has practically the same performance as JOIN approach since they have the same execution plan. This is kiks73's answer. I just don't know why he made his second answer unnecessarily complicated.
So it's just a matter of taste, or choosing which code is more readable select from in vs select from join
explain analyze
SELECT *
FROM tbl
where right(value, 6) in
(
SELECT
RIGHT(value, 6) AS ending
FROM
tbl
GROUP BY
ending
HAVING
COUNT(*) > 1
)
Result:
Test data used:
CREATE TABLE tbl (
id INTEGER primary key,
value VARCHAR(20)
);
INSERT INTO tbl
(id, value)
VALUES
('1', 'abcdePuzzle'),
('2', 'aaaaaaaaaaaaaa'),
('3', 'abcPuzzle'),
('4', 'aaaaaaaaaaaaaaB'),
('5', 'Hello');
insert into tbl(id, value)
select x.y, 'Puzzle'
from generate_series(6, 5000000) as x(y);
create index ix_tbl__right on tbl(right(value, 6));
Performances without the index, and with index on tbl(right(value, 6)):
JOIN approach:
Without index: 3.805 seconds
With index: 2.762 seconds
IN approach:
Without index: 3.719 seconds
With index: 2.722 seconds
Just a bit neater code (if using MySQL 8.0). Can't guarantee the performance though
Live test: https://www.db-fiddle.com/f/dBdH9tZd4W6Eac1TCRXZ8U/1
select x.*
from
(
select
*,
count(*) over(partition by right(value, 6)) as unique_count
from tbl
) as x
where x.unique_count = 1
Output:
| id | value | unique_count |
| --- | --------------- | ------------ |
| 2 | aaaaaaaaaaaaaa | 1 |
| 4 | aaaaaaaaaaaaaaB | 1 |
| 5 | Hello | 1 |
UPDATE
I misunderstood OP's intent. It's the reversed. Just change the count:
select x.*
from
(
select
*,
count(*) over(partition by right(value, 6)) as unique_count
from tbl
) as x
where x.unique_count > 1
Output:
| id | value | unique_count |
| --- | ----------- | ------------ |
| 1 | abcdePuzzle | 2 |
| 3 | abcPuzzle | 2 |

MySQL JOIN Statement from Multiple Tables

I have an old database of entries from an abandoned "Joomgalaxy" Joomla plugin.
There are three tables, joomgalaxy_entries, joomgalaxy_fields, and joomgalaxy_entries_data
The id from the entries table matches the entry_id in the entries_data table, but the actual field name is saved in another table, fields
Can someone please help me with the correct SQL statement to obtain results like you can see below in Ultimate Goal? My MySQL knowledge is very basic, and from my searching it sounds like I need to use a LEFT JOIN, but I have no idea how to use the value from field_name as the column name for returned values
Thank You!!
joomgalaxy_entries
---------------------------------------
| id | title | longitude | latitude |
---------------------------------------
| 50 | John | -79.333333 | 43.669999 |
| 51 | Bob | -79.333333 | 43.669999 |
---------------------------------------
joomgalaxy_fields
This is just two examples below to keep it simple, there are more than just these two, so it would have to be able to handle dynamically using the field_name as the column name.
--------------------------------
| id | field_type | field_name |
--------------------------------
| 1 | textbox | websiteurl |
| 2 | dropdown | occupation |
--------------------------------
joomgalaxy_entries_data
"Technically" there shouldn't be any duplicate entries (fieldid and entry_id), so from my understanding that shouldn't affect using the field_name from above as the column name, but what if there ends up being one?
-------------------------------------
| fieldid | field_value | entry_id |
-------------------------------------
| 1 | google.com | 50 |
| 2 | unemployed | 50 |
| 1 | doctor.com | 51 |
| 2 | doctor | 51 |
-------------------------------------
Ultimate Goal
Ultimately trying to get this type of result, so I can then use that statement in MySQL Workbench to export the data that would look like this:
------------------------------------------------------------------
| id | title | longitude | latitude | websiteurl | occupation |
------------------------------------------------------------------
| 50 | John | -79.333333 | 43.669999 | google.com | unemployed |
| 51 | Bob | -79.333333 | 43.669999 | doctor.com | doctor |
------------------------------------------------------------------
EDIT:
There are more than just the two fields websiteurl and occupation, I was just using those two as examples, there are numerous fields that are all different, so in theory pulling the value from field_name would be used for the column name
You can use some conditional logic, like a CASE statement, along with an aggregate function like max() or min() to return those values as columns:
SELECT je.id,
je.title,
je.longitude,
je.latitude,
max(case when jf.fieldid = 1 then jed.field_value end) as WebsiteUrl,
max(case when jf.fieldid = 2 then jed.field_value end) as Occupation
FROM joomgalaxy_entries je
INNER JOIN joomgalaxy_entries_data jed
on je.id = jed.entry_id
GROUP BY je.id,
je.title,
je.longitude,
je.latitude
Using an INNER JOIN will only return the joomgalaxy_entries rows that have values in each table, if you want to return all joomgalaxy_entries even if there are no matching rows to join on in the other tables, then change the INNER JOIN to a LEFT JOIN.
You can write a simple SELECT query like this:
SELECT je.id, je.title, je.longitude, je.latitude,
(SELECT field_value FROM joomgalaxy_entries_data WHERE fieldid = 1 AND entry_id = je.id) AS websiteurl,
(SELECT field_value FROM joomgalaxy_entries_data WHERE fieldid = 2 AND entry_id = je.id) AS occupation
FROM joomgalaxy_entries je;
First step is easy:
SELECT JE.id, JE.title, JE.longitude, JE.latitude
FROM joomgalaxy_entries JE
Now you need to JOIN:
SELECT JE.id, JE.title, JE.longitude, JE.latitude,
JD.*
FROM joomgalaxy_entries JE
JOIN joomgalaxy_entries_data JD
ON JE.id = JD.entry_id
Now you need convert rows to columns
SELECT JE.id, JE.title, JE.longitude, JE.latitude,
MIN(CASE WHEN fieldid = 1 THEN JD.field_value END) as WebsiteUrl,
MIN(CASE WHEN fieldid = 2 THEN JD.field_value END) as Occupation
FROM joomgalaxy_entries JE
JOIN joomgalaxy_entries_data JD
ON JE.id = JD.entry_id
GROUP BY JE.id, JE.title, JE.longitude, JE.latitude
This depend on you only have two field for each entry, if number of field is dynamic you would need a different aproach.
This should work:
select id, title, longitude, latitude,
(select field_value from joomgalaxy_entries_data jed
where fieldid = (select id from joomgalaxy_fields
where field_name = 'websiteurl')
and jed.entry_id = je.id
) as websiteurl,
(select field_value from joomgalaxy_entries_data jed
where fieldid = (select id from joomlgalaxy_fields
where field_name = 'occupation')
and jed.entry_id = je.id) as occupation
from joomgalaxy_entries je;
Note that the reason to have a left join would be if either websiteurl or occupation were null, however, this solution should work in that case anyway.
Well, that certainly makes it a bit more difficult... :) Honestly, I'm not sure what you're asking is possible with a static sql query. I'm sure someone will speak up, however, if I'm wrong.
That said, I do have a few options you can try:
Option 1 - Generate the SQL Dynamically
Assuming this is mysql, if you execute the following SQL, it will generate the subqueries dynamically:
select concat('(select field_value from joomgalaxy_entries_data jed ',
'where fieldid = (select id from joomgalaxy_fields ',
'where field_name = ''', field_name, ''') ',
'and jed.entry_id = je.id) as ', field_name, ',')
from joomgalaxy_fields;
Take the result of that command, copy-paste it into a text editor and add the following at the beginning:
select id, title, longitude, latitude,
And the rest of this at the end:
from joomgalaxy_entries je;
Then run your new uber-query and go grab a cup of copy, lunch, or a good night's sleep depending on how much data is in your database.
Alternatively, you could add all of this to a stored procedure so you don't have to hand edit the SQL. Also, note that my syntax works for MySQL. Other databases have different concatenation operators so you may have to work around that if applicable. Also, with 50+ subqueries there is a good chance this uber-query will be quite slow, maybe too slow to make this option viable.
Option 2 - Create a table structured the way you want, and populate it
Hopefully, this is self-explanatory, but just create a new table with all of the necessary columns from the joomgalaxy_fields table. Then populate each column separately with a long series of what should be pretty straightforward sql commands. Granted this option is only viable if the database is no longer in use which I believe you indicated. From there the result is just:
select * from my_new_table;

SQL - select rows that have the same value in two columns

The solution to the topic is evading me.
I have a table looking like (beyond other fields that have nothing to do with my question):
NAME,CARDNUMBER,MEMBERTYPE
Now, I want a view that shows rows where the cardnumber AND membertype is identical. Both of these fields are integers. Name is VARCHAR. Name is not unique, and duplicate cardnumber, membertype should show for the same name, as well.
I.e. if the following was the table:
JOHN | 324 | 2
PETER | 642 | 1
MARK | 324 | 2
DIANNA | 753 | 2
SPIDERMAN | 642 | 1
JAMIE FOXX | 235 | 6
I would want:
JOHN | 324 | 2
MARK | 324 | 2
PETER | 642 | 1
SPIDERMAN | 642 | 1
this could just be sorted by cardnumber to make it useful to humans.
What's the most efficient way of doing this?
What's the most efficient way of doing this?
I believe a JOIN will be more efficient than EXISTS
SELECT t1.* FROM myTable t1
JOIN (
SELECT cardnumber, membertype
FROM myTable
GROUP BY cardnumber, membertype
HAVING COUNT(*) > 1
) t2 ON t1.cardnumber = t2.cardnumber AND t1.membertype = t2.membertype
Query plan: http://www.sqlfiddle.com/#!2/0abe3/1
You can use exists for this:
select *
from yourtable y
where exists (
select 1
from yourtable y2
where y.name <> y2.name
and y.cardnumber = y2.cardnumber
and y.membertype = y2.membertype)
SQL Fiddle Demo
Since you mentioned names can be duplicated, and that a duplicate name still means is a different person and should show up in the result set, we need to use a GROUP BY HAVING COUNT(*) > 1 in order to truly detect dupes. Then join this back to the main table to get your full result list.
Also since from your comments, it sounds like you are wrapping this into a view, you'll need to separate out the subquery.
CREATE VIEW DUP_CARDS
AS
SELECT CARDNUMBER, MEMBERTYPE
FROM mytable t2
GROUP BY CARDNUMBER, MEMBERTYPE
HAVING COUNT(*) > 1
CREATE VIEW DUP_ROWS
AS
SELECT t1.*
FROM mytable AS t1
INNER JOIN DUP_CARDS AS DUP
ON (T1.CARDNUMBER = DUP.CARDNUMBER AND T1.MEMBERTYPE = DUP.MEMBERTYPE )
SQL Fiddle Example
If you just need to know the valuepairs of the 3 fields that are not unique then you could simply do:
SELECT concat(NAME, "|", CARDNUMBER, "|", MEMBERTYPE) AS myIdentifier,
COUNT(*) AS count
FROM myTable
GROUP BY myIdentifier
HAVING count > 1
This will give you all the different pairs of NAME, CARDNUMBER and MEMBERTYPE that are used more than once with a count (how many times they are duplicated). This doesnt give you back the entries, you would have to do that in a second step.

Select Multiple Columns From Multiple Tables

I'm a beginner at MySQL and I'm having a hard time trying to figure out how to solve this problem:
I have two tables with many entries each. Let's say these are the tables:
Table 1 || Table 2
------------- || -------------------
| dt1 | dt2 | || | dt3 | dt4 | dt5 |
------------- || -------------------
| 1 | abc | || | 3 | wsx | 123 |
| 7 | asd | || | 3 | qax | 456 |
| 19 | zxc | || | 4 | rfv | 789 |
------------- || -------------------
What I want to do is to have as a result one table with columns "dt2", "dt4" and "dt5" and with only one entry. For that, the query I'll apply to each table may even have to LIMIT the results. To get the results I want from each table separetelly I would do the following:
SELECT `dt2` FROM `table1` WHERE `dt1`=7;
and
SELECT `dt4`,`dt5` FROM `table2` WHERE `dt3`=3 LIMIT 0,1;
One more thing, I don't want to use a subquery for each column, because in the real thing I'm trying to solve, I'm calling 5 or 6 columns from each table.
Just to make clear, what I want to get is something like this:
-------------------
| dt2 | dt4 | dt5 |
-------------------
| asd | qax | 456 |
-------------------
SELECT a.dt2, b.dt4, b.dt5
FROM table1 a, table2 b
WHERE a.dt2 = 'asd'
LIMIT 0,1;
Ben's answer solved my similar issue.
SELECT t1.dt2, t2.dt4, t2.dt5, t2.dt3 #get dt3 data from table2
FROM table1 t1, table2 t2
WHERE t1.dt2 = 'asd' AND t2.dt4 = 'qax' AND t2.dt5 = 456
| asd | qax | 456 | 3 |
'3' being the data I require by querying the 'qax', 456 data in table2, otherwise you're specifying exactly what data will be returned from the columns.
I only had 2 tables to query in my instance, so the AND expression I can get away with using, it probably isn't best practice and there's most likely a better way for matching data from multiple tables.
EDIT: I've just realised this question is 5 years old.. I hope you achieved what you wanted to by now.
SELECT a.dt2, b.dt4, b.dt5
FROM table1 a, table2 b
WHERE a.dt2 = 'asd'
LIMIT 0,1;
Ben's answer is good, you can use more tables just by separating them by comma (,) , but if there's relationship between those tables then you should use some Sub Query or JOIN
In here there is smth called INNER JOIN , CROSS JOIN , LEFT JOIN and RIGHT JOIN in MYSQL and also SQL Server that allows you yo get data from different tables as much you want via conditions based on your columns;
So let's start :
First Let's create our tables (sample1,sample2) :
--Create Table sample1 :
CREATE TABLE sample1 (id BIGINT NOT NULL AUTO_INCREMEN , name_sample1 VARCHAR(100),age INT);
--Create Table sample2 :
CREATE TABLE sample2 (id BIGINT NOT NULL AUTO_INCREMEN , name_sample2 VARCHAR(100));
-- Now Let's put an trigger in order to avoid getting incorrect values :
DELIMITER $$
CREATE TRIGGER IF NOT EXISTS insert_sample1_trigger BEFORE INSERT ON sample1
FOR EACH ROW
BEGIN
IF NEW.name_sample1 <> "" AND NEW.age <> "0" THEN
INSERT INTO sample2 (name_sample2) VALUES (NEW.name_sample1);
ELSE
INSERT INTO sample1 (name_sample1,age) VALUES ("Unknown" , 10);
INSERT INTO sample2 (name_sample2) VALUES ("Unknown");
END IF;
END$$
DELIMITER ;
After U ran this query , trigger will be added;
--- Now Inserting
INSERT INTO sample1(name_sample1,age) VALUES ("SomeOne Name" , 15);
After running this query the name will be added to sample2 table, because id is auto increment it's not needed to be called in the insert query
id
name_sample1
age
1
SomeOne Name
15
But if I give another value ...
INSERT INTO sample1(name_sample1,age) VALUES ("SomeOne Name" , 0);
id
name_sample1
age
1
Unknown
10
-- Now at last Select query what we were waiting for :
SELECT * FROM sample1 s1 INNER JOIN sample2 s2 USING(id) GROUP BY s1.name_sample1
ORDER BY s1.name_sample1 DESC
This query selects all columns from tables sample1 and sample2 if you want to show some other columns change * via your column name.
That's it

Duplicating rows in sql query

I have a table containing several fields. The primary key is userId. Currently the user id column contains values '1,2,3,4...etc' like so:
+------+
|userId|
+------+
| 1 |
| 2 |
| 3 |
| 4 |
...etc
I now want to add new rows ending in a,b,c, like so:
+------+
|userId|
+------+
| 1 |
| 1a |
| 1b |
| 1c |
| 2 |
| 2a |
| 2b |
| 2c |
...etc
The new rows should be identical to their parent row, except for the userId. (i.e. 1a,1b & 1c should match 1)
Also I can't guarantee that there won't already be a few 'a', 'b' or 'c's in userid column.
Is there a way to write an sql query to do this quickly and easily?
DON'T DO IT you will run into more problems than the one you are trying to solve!
add a new column to store the letter and make the primary key cover the original UserId and this new column.
If you ever just want the userId, you need to split the letter portion off, which will be expensive for your query and be a real pain.
I agree with KM. I'm not sure why you're creating these duplicate/composite IDs, but it feels like an uncomfortable direction to take.
That said, there is only really one obsticle to overcome; Apparently you can't select from and insert into the same table in MySQL.
So, you need to insert into a Temporary Table first, then insert into the real table...
CREATE Temporary TABLE MyNewUserIDs (
UserID VARCHAR(32)
)
INSERT INTO
myNewUserIDs
SELECT
CONCAT(myTable.UserID, suffix.val)
FROM
myTable
INNER JOIN
(SELECT 'A' as val UNION ALL SELECT 'B' UNION ALL SELECT 'C' UNION ALL SELECT 'D') AS suffix
ON RIGHT(myTable.UserID, 1) <> Suffix.val
WHERE
NOT EXISTS (SELECT * FROM myTable AS lookup WHERE UserID = CONCAT(myTable.UserID, suffix.val))
INSERT INTO
myTable
SELECT
UserID
FROM
MyNewUserIDs
Depending on your environment, you may want to look into locking the tables, so that changes are not made between creating the list of IDs and inserting them into your table.
This is quite simple from a SQL perspective to generate the extra rows: I'll do that here
#Km's answer tells you how to store it as 2 distinct values which I've assumed here. Feel free to concatenate userid and suffix if you prefer.
INSERT myTable (userid, suffix, co11, col2, ...coln)
SELECT M.userid, X.suffix, M.col1, M.col2, ..., M.coln
FROM
myTable M
CROSS JOIN
(SELECT 'a' AS Suffix UNION ALL SELECT 'b' UNION ALL SELECT 'c') X
WHERE
NOT EXISTS (SELECT *
FROM
MyTable M2
WHERE
M2.userid = M.userid ANS M2.Suffix = X.Suffix)