How to calculate difference between tables in MySQL? - mysql

What is a good way to calculate difference (in a sense what should be added and deleted from one table to get another) between tables in MySQL?

Neither of the answers posted so far (from BrynJ and Vadim) does a very thorough job. And doing the thorough job is also incredibly hard. Both answers assume that it is sufficient to know which ID numbers are present in each table. However, in general, tables have more than one column.
Let's call the tables A and B.
One important question is "do the two tables have the same schema"? If not, then one issue is which columns need to be added to A and which need to be added to B to make their schemas the same. This is a metadata query, answerable from the system catalog. Which values should be inserted in the columns added to the tables is an interesting question.
Let's assume that the tables actually have the same schema, including the same primary key and the same functional dependencies between columns. Let's also assume that there is an ID column (storing a unique integer), and Name column (a string), and a RefDate column of type DATE.
Table A Table B
ID Name RefDate ID Name RefDate
1 Frederick 2007-01-23 1 Josephine 2009-01-10
Now, what needs to be inserted, deleted, updated from each table to make them the same?
I think it is fair to say that there is no single answer to that without knowing a lot more context. It might be that Frederick has undergone gender-change surgery since 2007, and the entry in B represents her new identity. Or it might be a blunder; the database should not store both those records. Or there might be another resolution.
Unfortunately, the queries from BrynJ and Vadim would both indicate that there is no difference between A and B, which is a dubious proposition to me.
Incidentally, note that comparing rows when the rows might have nulls is more complex than when they don't. For example, consider comparing names:
No nulls:
(A.Name = B.Name)
With nulls:
(A.Name = B.Name OR (A.Name IS NULL AND B.Name IS NULL))
One more reason to shun nulls whenever you can.

You can also use a left outer join (the first tells you where a row exists in table a and not b, the second vice-versa):
SELECT a.id FROM a LEFT JOIN b ON a.id = b.id WHERE b.id IS NULL
SELECT b.id FROM b LEFT JOIN a ON b.id = a.id WHERE a.id IS NULL

SELECT DISTINCT id FROM a WHERE NOT EXISTS (SELECT * FROM b WHERE a.id = b.id);
SELECT DISTINCT id FROM b WHERE NOT EXISTS (SELECT * FROM a WHERE a.id = b.id);

Related

MYSQL - SubSelect when FK does and doesnt exists

Situation Overview
The current question is a problem about selecting values from two tables table A (material) and table B (MaterialRevision). However, The PK of table A might or Might not exist in Table B. When it doesnt exists, the query described in this question wont return the values of table A, but IT SHOULD. So basically here's whats happening :
The query is only returning values when A.id exists in B.id, when In fact, I need it to return values from A when A.id ALSO dont exist in B.id.
Problem:
Suppose two tables. Table Material and Table Material Revision.
Notice that the PK idMaterial is a FK in MaterialRevision.
Current "Mock" Tables
Query Objective
Obs: remember these two tables are a simplification of the real
tables.
For each Material, print the material variables and the last(MAX) RevisionDate from MaterialRevision. In case theres no RevisionDate, print BLANK ("") for the "last revision date".
What is wrongly happening
For each Material, print the material variables and the last(MAX) RevisionDate from MaterialRevision. In case theres no Revision for the Material, doesnt print the Material (SKIP).
Current Code
SELECT
Material.idMaterial,
Material.nextRevisionDate,
Material.obsolete,
lastRevisionDate
FROM Material,
(SELECT MaterialRevision.idMaterial, max(MaterialRevision.revisionDate) as "revisionDate" from MaterialRevision
GROUP BY MaterialRevision.idMaterial
) AS Revision
WHERE (Material.idMaterial = Revision.idMaterial AND Material.obsolete = 0)
References and Links used to reach the state described in this question
Why is MAX() 100 times slower than ORDER BY ... LIMIT 1?
MySQL get last date records from multiple
MySQL - How to SELECT based on value of another SELECT
MySQL Query Select where id does not exist in the JOIN table
PS I hope this question is correctly understood since it took me a lot of time to build it. I researched a lot in stackoverflow and after
several failed attempts I had no option but to ask for help.
You should use JOIN :
SELECT m.idMaterial, m.nextRevisionDate, mr.revisionDate AS "lastRevisionDate"
FROM Material m
LEFT JOIN MaterialRevision AS mr ON mr.idMaterial = m.idMaterial AND mr.revisionDate = (
SELECT MAX(ch.revisionDate) from MaterialRevision ch
WHERE mr.idMaterial = ch.idMaterial)
WHERE m.obsolete = 0
Here is an explanation of what INNER JOIN, LEFT JOIN and RIGHT JOIN are. (You will love them if you often cross tables in your queries)
As m.obsolete will always be true, I ommited it in the SELECT clause
You should use the left outer join instead of using the cross product.
You're query should be something like this:
SELECT idMaterial, nextRevisionableDate, obsolete,
revisionDate AS lastRevisionDate
FROM Material
LEFT OUTER JOIN MaterialRevision AS mr On
Material.idMaterial = MaterialRevision.id
AND mr.revisionDate = (SELECT MAX(ch.revisionDate) from MaterialRevision ch
WHERE mr.idMaterial = ch.idMaterial)
WHERE obsolete = 0;
Here you can find some documentation about types of join.

Not getting the Join I want

We are doing some pro bone work for a good cause and I'm having a hell of a time with a query. The coding has been done by many volunteers over the years which has an inevitable outcome.
I have two tables, A and B. What I need is a sum of of score_hours on a join between the two where the data is unique for each instance of only A.
Please keep in mind that both tables are quite big (10 to 50k+ each depending on time in the month).
Table A:
id (pk, ai)
uid (int)
scores_date (timestamp (but for some reason only the actual date, not
the time))
score_hours (decimal 3,1)
Table B:
id (pk, ai)
uid (int)
shift_date (timestamp)
There are many records in table B that have the uid we are looking for on several dates (the dates are not unique). Table A has multiple records for uid but on different days. So it could have 1 uid a day, but not 2 instances of 1 uid a day.
There are obviously more selectors for both tables, but they don't match in any way between the tables (although I do need to query them with simple "AND") so this is what I have to work with. I do need to join them because of the rest of the query, but so far I'm not getting the records I need within a decent time.
My attempts were:
This almost made it. But the execution time was disgusting and failed with some simple selectors.
SELECT SUM(score_hours)
FROM A
WHERE
A.uid IN
(SELECT B.uid
FROM B
WHERE B.uid = "1")
This gives the right output but it joins one for every instance of a uid. Normally you can solve that by grouping, but the sum will still count all. So that is not an option:
SELECT SUM(score_hours)
FROM A
LEFT JOIN B ON A.uid = B.uid
WHERE A.uid = "1"
*edit: Not only do I need to JOIN on uid, but there has to be something like this in it:
DISTINCT(date(m.shift_datum)) = DATE(d.dagscores_date)
It is actually a very basic query, except for the fact that a SUM is needed on a record which is not unique in regards to the Left join and that I need to JOIN on two tables at the same time.
If you need more data please tell me so. I can provide all.
You need to remove the duplicates from the table you're joining with, otherwise the cross-product creates multiple rows that get added into the sum.
SELECT SUM(score_hours)
FROM A
JOIN (SELECT DISTINCT uid
FROM B) AS B
ON A.uid = B.uid

LEFT JOIN (or similar) on a single-column table to get whether each row is in other table or not

Here's the situation. I have two sets of data. One is a list of all the "ticket" entries that my system uses – at least one per ID, but potentially more. I also have a separate list of just the IDs that have known hardware problems, which is a relatively small (but important) subset of the IDs. I've put this list into a super-simple table B, which is literally a single column with just those IDs.
I need a MySQL query that joins these two tables, so I get all of the entries from table A, each of which has another field added on that is a simple boolean: whether or not the same ID exists in table B.
So something like this:
SELECT * FROM `table_A` A
LEFT JOIN `table_B` B ON A.id=B.id
If B were a two-column table, and the second column (call it down) was simply true in every row, then I could check if down were true or null.
But since B has only a single column, no data is actually added to the result.
Is there any simple way (without having this otherwise completely unnecessary column in my database) to do this "join" operation, that will simply note whether or not the ID of a given entry in A also exists in B?
(adding another field to A that is up or down is also rather inefficient, since there are often many rows for a single ID and most IDs aren't going to be down anyway)
select
A.*,
case when B.ID is null then 0 else 1 end as myBoolean
from `table_A` A
left join `table_B` B on A.ID = B.ID
with the example below 'down' will not be null if the id exists in table_B
SELECT * , B.id as `down` FROM `table_A` A
LEFT JOIN `table_B` B ON A.id=B.id

Explain SQL and Query optimization

Explain SQL (in phpmyadmin) of a query that is taking more than 5 seconds is giving me the above. I read that we can study the Explain SQL to optimize a query. Can anyone tell if this Explain SQL telling anything as such?
Thanks guys.
Edit:
The query itself:
SELECT
a.`depart` , a.user,
m.civ, m.prenom, m.nom,
CAST( GROUP_CONCAT( DISTINCT concat( c.id, '~', c.prenom, ' ', c.nom ) ) AS char ) AS coordinateur,
z.dr
FROM `0_activite` AS a
JOIN `0_member` AS m ON a.user = m.id
LEFT JOIN `0_depart` AS d ON ( m.depart = d.depart AND d.rank = 'mod' AND d.user_sec =2 )
LEFT JOIN `0_member` AS c ON d.user_id = c.id
LEFT JOIN `zone_base` AS z ON m.depart = z.deprt_num
GROUP BY a.user
Edit 2:
Structures of the two tables a and d. Top: a and bottom: d
Edit 3:
What I want in this query?
I first want to get the value of 'depart' and 'user' (which is an id) from the table 0_activite. Next, I want to get name of the person (civ, prenom and name) from 0_member whose id I am getting from 0_activite via 'user', by matching 0_activite.user with 0_member.id. Here depart is short of department which is also an id.
So at this point, I have depart, id, civ, nom and prenom of a person from two tables, 0_activite and 0_member.
Next, I want to know which dr is related with this depart, and this I get from zone_base. The value of depart is same in both 0_activite and 0_member.
Then comes the trickier part. A person from 0_member can be associated with multiple departs and this is stored in 0_depart. Also, every user has a level, one of what is 'mod', stands for moderator. Now I want to get all the people who are moderators in the depart from where the first user is, and then get those moderaor's name from 0_member again. I also have a variable user_sec, but this is probably less important in this context, though I cannot overlook it.
This is what makes the query a tricky one. 0_member is storing id, name of users, + one depart, 0_depart is storing all departs of users, one line for each depart, and 0_activite is storing some other stuffs and I want to relate those through userid of 0_activite and the rest.
Hope I have been clear. If I am not, please let me know and I will try again to edit this post.
Many many thanks again.
Aside from the few answers provided by the others here, it might help to better understand the "what do I want" from the query. As you've accepted a rather recent answer from me in another of your questions, you have filters applied by department information.
Your query is doing a LEFT join at the Department table by rank = 'mod' and user_sec = 2. Is your overall intent to show ALL records in the 0_activite table REGARDLESS of a valid join to the 0_Depart table... and if there IS a match to the 0_Depart table, you only care about the 'mod' and 2 values?
If you only care about those people specifically associated with the 0_depart with 'mod' and 2 conditions, I would reverse the query starting with THIS table first, then join to the rest.
Having keys on tables via relationship or criteria is always a performance benefit (vs not having the indexes).
Start your query with whatever would be your smallest set FIRST, then join to other tables.
From clarification in your question... I would start with the inner-most... Who it is and what departments are they associated with... THEN get the moderators (from department where condition)... Then get actual moderator's name info... and finally out to your zone_base for the dr based on the department of the MODERATOR...
select STRAIGHT_JOIN
DeptPerMember.*
Moderator.Civ as ModCiv,
Moderator.Prenom as ModPrenom,
Moderator.Nom as ModNom,
z.dr
from
( select
m.ID,
m.Depart,
m.Civ,
m.Prenom,
m.Nom
from
0_Activite as a
join 0_member m
on a.User = m.ID
join 0_Depart as d
on m.depart = d.depart ) DeptPerMember
join 0_Depart as DeptForMod
on DeptPerMember.Depart = DeptForMod.Depart
and DeptForMod.rank = 'mod'
and DeptForMod.user_sec = 2
join 0_Member as Moderator
on DeptForMod.user_id = Moderator.ID
join zone_base z
on Moderator.depart = z.deprt_num
Notice how I tier'd the query to get each part and joined to the next and next and next. I'm building the chain based on the results of the previous with clear "alias" references for clarification of content. Now, you can get whatever respective elements from any of the levels via their distinct "alias" references...
The output from EXPLAIN is showing us that the first and third tables listed (a & d) are not having any indexes utilised by the database engine in executing this query. The key column is NULL for both - which is a shame since both are 'large' tables (OK, they're not really large, but compared to the rest of the tables they're the big 'uns).
Judging from the query, an index on user on 0_activite and an index on (depart, rank, user_sec) on 0_depart would go some way to improving performance.
you can see that columns key and key_len are null this means its not using any key in the possible_keys column. So table a and d are both scanning all rows. (check larger numbers in rows column. you want this smaller).
To deal with 0_depart:
Make sure you have a key on (d.depart, d.rank,d.user_sec) which are part of the join of 0_depart.
To deal with 0_activite:
I'm not positive but a GROUP column should be indexed too so you need a key on a.user

Selecting multiple columns/fields in MySQL subquery

Basically, there is an attribute table and translation table - many translations for one attribute.
I need to select id and value from translation for each attribute in a specified language, even if there is no translation record in that language. Either I am missing some join technique or join (without involving language table) is not working here since the following do not return attributes with non-existing translations in the specified language.
select a.attribute, at.id, at.translation
from attribute a left join attributeTranslation at on a.id=at.attribute
where al.language=1;
So I am using subqueries like this, problem here is making two subqueries to the same table with the same parameters (feels like performance drain unless MySQL groups those, which I doubt since it makes you do many similar subqueries)
select attribute,
(select id from attributeTranslation where attribute=a.id and language=1),
(select translation from attributeTranslation where attribute=a.id and language=1),
from attribute a;
I would like to be able to get id and translation from one query, so I concat columns and get the id from string later, which is at least making single subquery but still not looking right.
select attribute,
(select concat(id,';',title)
from offerAttribute_language
where offerAttribute=a.id and _language=1
)
from offerAttribute a
So the question part.
Is there a way to get multiple columns from a single subquery or should I use two subqueries (MySQL is smart enough to group them?) or is joining the following way to go:
[[attribute to language] to translation] (joining 3 tables seems like a worse performance than subquery).
Yes, you can do this. The knack you need is the concept that there are two ways of getting tables out of the table server. One way is ..
FROM TABLE A
The other way is
FROM (SELECT col as name1, col2 as name2 FROM ...) B
Notice that the select clause and the parentheses around it are a table, a virtual table.
So, using your second code example (I am guessing at the columns you are hoping to retrieve here):
SELECT a.attr, b.id, b.trans, b.lang
FROM attribute a
JOIN (
SELECT at.id AS id, at.translation AS trans, at.language AS lang, a.attribute
FROM attributeTranslation at
) b ON (a.id = b.attribute AND b.lang = 1)
Notice that your real table attribute is the first table in this join, and that this virtual table I've called b is the second table.
This technique comes in especially handy when the virtual table is a summary table of some kind. e.g.
SELECT a.attr, b.id, b.trans, b.lang, c.langcount
FROM attribute a
JOIN (
SELECT at.id AS id, at.translation AS trans, at.language AS lang, at.attribute
FROM attributeTranslation at
) b ON (a.id = b.attribute AND b.lang = 1)
JOIN (
SELECT count(*) AS langcount, at.attribute
FROM attributeTranslation at
GROUP BY at.attribute
) c ON (a.id = c.attribute)
See how that goes? You've generated a virtual table c containing two columns, joined it to the other two, used one of the columns for the ON clause, and returned the other as a column in your result set.