MySQL LEFT Join displaying incorrect data - mysql

I have got 5 tables of which the structures are the same. Only the PAGEVISITS field is unique
ie. table 1:
ITEM | PAGEVISITS | Commodity
1813 50 Griddle
1851 10 Griddle
11875 100 Refrigerator
2255 25 Refrigerator
ie. table 2:
ITEM | PAGEVISITS | Commodity
1813 0 Griddle
1851 10 Griddle
11875 25 Refrigerator
2255 10 Refrigerator
I want it to add up the Commodity to spit out:
table1 | table2 | Commodity
60 10 Griddle
125 35 Refrigerator
Some of the data is actually correct but some are WAY off given the below query:
SELECT
SUM(MT.PAGEVISITS) as table1,
SUM(CT1.PAGEVISITS) as table2,
SUM(CT2.PAGEVISITS) as table3,
SUM(CT3.PAGEVISITS) as table4,
SUM(CT4.PAGEVISITS) as table5,
(COUNT(DISTINCT MT.ITEM)) + (COUNT(DISTINCT CT1.ITEM)) + (COUNT(DISTINCT CT2.ITEM)) + (COUNT(DISTINCT CT3.ITEM)) + (COUNT(DISTINCT CT4.ITEM)) as Total,
MT.Commodity
FROM table1 as MT
LEFT JOIN table2 CT1
on MT.ITEM = CT1.ITEM
LEFT JOIN table3 CT2
on MT.ITEM = CT2.ITEM
LEFT JOIN table4 CT3
on MT.ITEM = CT3.ITEM
LEFT JOIN table5 CT4
on MT.ITEM = CT4.ITEM
GROUP BY Commodity
I believe this may be cause by using the LEFT JOIN incorrectly. I have also tried the INNER JOIN with the same inconsistent results.

I would do a UNION on all five of those tables to get them as one rowset (inline view), and then run a query on that, start with something like this...
SELECT SUM(IF(t.source='MT',t.pagevisits,0)) AS table1
, SUM(IF(t.source='CT1',t.pagevisits,0)) AS table2
, t.commodity
FROM ( SELECT 'MT' as source, table1.* FROM table1
UNION ALL
SELECT 'CT1', table2.* FROM table2
UNION ALL
SELECT 'CT2', table3.* FROM table3
UNION ALL
SELECT 'CT3', table4.* FROM table4
UNION ALL
SELECT 'CT4', table5.* FROM table5
) t
GROUP BY t.commodity
(But I would specify the column list for each of those tables, rather than using the '.*' and having my query dependent on no one adding/dropping/renaming/reordering columns in any of those tables.)
I include an "extra" literal value (aliased as "source") to identify which table the row came from. I can use a conditional test in an expression in the SELECT list, to figure out whether the row came from a particular table.
This approach is particularly flexible, and can be used to get more complicated resultsets. For example, if I also wanted to get a total number page visits from table3, 4 and 5 added together, along with the individual counts.
SUM(IF(t.source IN ('CT2','CT3','CT4'),t.pagevisits,0) AS total_345
To get the equivalent of your COUNT(DISTINCT item) + COUNT(DISTINCT item) + ... expression...
I would use an expression that makes a single value from both the "source" and "item" columns, being careful to have some sort of guarantee that any particular "source"+"item" will not create a duplicate of some other "source"+"item". (If we just concatenate strings, for example, we don't have any way to distinguish between 'A'+'11' and 'A1'+'1'.) The most common approach I see here is a carefully chosen delimiter which is guaranteed not to appear in either value. We can distinguish between 'A::11' and 'A1::1', so something like this will work:
COUNT(DISINCT CONCAT(t.source,'::',t.item))
In your current query, if item is NULL, then the row doesn't get included in the COUNT. To fully replicate that behavior, you would need something like this:
COUNT(DISINCT IF(t.item IS NOT NULL,CONCAT(t.source,'::',t.item),NULL)) AS Total
Or course, getting a count of distinct item values over the whole set of five tables is much simpler (but then, it does return a different result)
COUNT(DISINCT t.item)
But to answer your question about the use of the LEFT JOIN, the left side table is the "driver" so a matching row has to be in that table for a corresponding row to be retrieved from a table on the right. That is, unmatched rows from the tables on the right side will not be returned.
If what you have is basically five "partitions", and you want to process all of the rows whether or not a matching row appears in any of the other "partitions", I would go with the UNION ALL approach to simply concatenate all of the rows from all of those tables together, and process the rows as if they were from a single table.
NOTE: For very large tables, this may not be a feasible approach, since MySQL is going to have to materialize that inline view. There are other approaches which don't require concatenating all of the rows together.
Specifying a list of only the columns you need, in the SELECT from each table, may help performance, if there are columns in those tables you don't need to reference in your query.

Related

MySQL INNER JOIN - how to add extra IF statement?

I just learned how to use JOINS (so please be gentle ;) ), and wrote this query:
SELECT 1
FROM `T1`
INNER JOIN `T2` ON `T1`.`T1_ID` = `T2`.`REQUIREMENT`
WHERE `T2`.`T1_ID` = XXX
AND `T1`.`STATEMENT` = YYY
AND `T2`.`REQUIREMENT` != 0 //last row does not work as intended!
It works perfectly without the last condition: if T1_ID from T1 does match REQUIREMENT from T2 for given T1_ID (XXX) - it does work. I wanted additional STATEMENT from T1 to be match (YYY) - and it still does work.
But then I realized, that I need to exclude one cause: when T2.REQUIREMENT is equal to 0, I want this query to return 1 regardless of the JOIN formula. The problem is, that if T2.REQUIREMENT = 0, I know for sure that there will not be any T1.T1_ID entry that will match the JOIN requirements. So I understand, that this last condition has no right to work like I'd wish it was.
What I need is some kind of IF statement. Something that would work like:
SELECT 1
IF (`T2`.`REQUIREMENT`!=0) //if true, don't even go to join, and return 1
OR (my previous join query)
The thing is, that I have no idea how to implement such IF statement into mysql.
Any ideas? Thanks.
Sample data:
T1:
id STATEMENT T1_ID
1 irrelevant 1
2 irrelevant 5
T2:
id T1_ID REQUIREMENT
1 1 0
2 2 0
3 3 1
4 4 3
5 5 4
6 6 5
7 7 6
Such setup should return 1 for T1_ID equal 1, 2, 3, 6.
In addition, if it's even possible in single query, I'd like it to return 1 as well even if T1 was empty, for all T2.REQUIREMENT=0 - in this case T1_ID equal 1, 2.
Just FYI, good start on your post and example... tableName.columnName references (or alias.columnName) should always be provided to prevent ambiguity that others don't know your table structure. Also, you only really need the tick marks for things like reserved words or column names that have spaces (never like these anyhow).
From my reading your question and sample tables T1 and T2... T1 appears to be some Lookup table and has IDs and descriptions associated to said IDs. Your T2 table appears to be your detail/transaction based table and it may or not have an actual requirement hence your desire to always include those records without a specific requirement.
If this is the case, it sounds like you want "all detail records that have some condition REGARDLESS of a matched requirement ID as found in the lookup table." If this is accurate, you would be looking for
Select
T2.T1_ID,
coalesce( T1.Statement, '' ) StatementFromT1Table
from
SomeMainTable T2
LEFT JOIN SomeLookupTable T1
on T2.T1_ID = T1.T1_ID
where
T2.T1_ID = SomeIdParameterValue
AND ( T1.T1_ID is NULL
OR T1.Statement = SomeOtherParameterValue )
The join between tables I always try to list the left-side table first, then indent to the right-side table and have my ON condition show the left.column = right.column so you always see the relationship and how table A gets to table B (and nested more as other joins come into play).
The different between (INNER) JOIN and LEFT JOIN is that (INNER) JOIN REQUIRES a record to always match on both tables. LEFT JOIN means I want everything from the table on the left side REGARDLESS of an actual match in the right table.
So at this point, I get all from T2 alias table first regardless of the answer in T1 alias. Now, how to deal with the zero remarks id value. If zero indicates you KNOW there wont be a match in T1, then you can just say I want all records on the T2 side if the T1 side IS NULL... But you also care for a specific statement, hence the OR within the parenthesis test.
The first part of the where is if you were specifically looking for all things of a T1_ID = some value, so that is a primary parameter and only applicable to the T2 side... you are qualifying the T1 side via the AND ( null or other equality test ) condition.
If you have some confidential data, it is ok to randomize / provide sample data, but having real table name reference / context will help us better understand what you are trying to accomplish. Ambiguity in table names and columns does not help us mentally understand and might have better query solutions having a better understanding.
If I am close and you need additional clarification, please advise and/or edit your original post with additional sample data and final output... such as parameters being filtered for too.
POST CLARIFICATION.
Per your comments, here are the clarifications...
The "SomeMainTable T2" is actually a breakdown of the actual table name within your database and "T2" is the ALIAS reference. Imagine your table name is "SomethingReallyLongDetail". Would you prefer to write your query something like
select
SomethingReallyLongDetail.Column1,
SomethingReallyLongDetail.Column7,
SomethingReallyLongDetail.Column20
from
SomethingReallyLongDetail
OR...
select
SRL.Column1,
SRL.Column7,
SRL.Column20
from
SomethingReallyLongDetail SRL
In this case, I used an alias "SRL" to more easily correlate to the table name as an acronym / abbreviation vs having to type the long value over and over, then have more chance of typing mistakes. Simply for readability providing the "alias" reference within the query. So, I did not know your ACTUAL table name, so I made it up but using the "T1" and "T2" references to stay in-line with your original post.
Next, COALESCE(). Since this query does a LEFT-JOIN, The right-side table may (or not) actually have a record match on the ID as you know might not always exist. Since I was trying to pull the "Statement" column from that second table (alias T1), that description could be NULL which you probably would not want to show in any sort of output. To prevent that, COALESCE() says, give me the value from the first parameter in the list... If that value is null, give me the second value. In this case the second value is just an empty string.
Parameters in the query. Your original query had reference to XXX and YYY such as you knew of a specific T1.ID value you wanted to narrow down to pulling out, but a different value YYY as being part of the statement. So the place where you had an "XXX", I just put a place-holder here for you to apply/put any value you were specifically looking for. Similarly for your "YYY" value, another place-holder for that. Just substitute whatever criteria you were looking for.
Finally that AND part of the where clause. This is for the condition of the LEFT-JOIN. Since you KNOW that not all records will have a match in the "T1" secondary table, with the LEFT JOIN, the ID will be found and HAVE a value, or there will not be a value and thus NULL.
If there is no matching record, you would never be able to compare some string, int, date, whatever to a column as it would be null. So I am doing
(T1.T1_ID IS NULL -- as in there was no match
OR T1.Statement = SomeOtherParameterValue ) -- there WAS a match, and I only want where the statement equals a given value.
Per your comments and example results, your query SHOULD be simplified to...
Select
T2.ID,
T2.T1_ID,
T2.Requirement,
coalesce( T1.Statement, '' ) StatementFromT1Table
from
T2
LEFT JOIN T1
on T2.Requirement = T1.T1_ID
where
T2.Requirement = 0
OR T1.T1_ID IS NOT NULL
In your case, the final answer is... I want all records where there is no requirement (thus = 0) OR the record DOES have a match in the T1 table (thus T1.T1_ID IS NOT NULL)
I am thinking that you want a LEFT JOIN:
SELECT 1
FROM `T2` LEFT JOIN
`T1`
ON `T1`.`T1_ID` = `T2`.`REQUIREMENT`
WHERE (`T2`.`REQUIREMENT` <> 0) OR
(`T1`.`STATEMENT` = YYY AND `T2`.`T1_ID` = XXX);
This returns rows for all T2 values where REQUIREMENT != 0. It also returns the rows generated by the JOIN. Of course 1 is not very descriptive, so you want be able to tell which rows are which.
Your question would be much easier to follow with sample data and desired results.
if T2.REQUIREMENT=0 I know for sure, that there will not be any
T1.T1_ID entry that will match the JOIN requirements
So in order to get 1 returned when T2.REQUIREMENT=0, the join must match this condition too:
SELECT 1
FROM `T1` INNER JOIN `T2`
ON `T1`.`T1_ID` = `T2`.`REQUIREMENT` OR `T2`.`REQUIREMENT`=0
WHERE `T2`.`T1_ID`=XXX
AND `T1`.`STATEMENT`=YYY
Edit:
or just append 1s with UNION for all rows that have T2.REQUIREMENT=0:
SELECT 1
FROM `T1` INNER JOIN `T2`
ON `T1`.`T1_ID` = `T2`.`REQUIREMENT`
WHERE `T2`.`T1_ID`=XXX
AND `T1`.`STATEMENT`=YYY
UNION ALL
SELECT 1
FROM `T2`
WHERE `T2.REQUIREMENT=0`
this will work even if T1 is empty.

Natural join works but not with all values

I can't understand whats happening...
I use two sql queries which do not return the same thing...
this one :
SELECT * FROM table1 t1 JOIN table1 t2 on t1.attribute1 = t2.attribute1
I get 10 rows
this other :
SELECT * FROM table1 NATURAL JOIN table1
I get 8 rows
With the NATURAL JOIN 2 rows aren't returned... I look for the missing lines and they are the same values ​​for the attribute1 column ...
It's impossible for me.
If anyone has an answer I could sleep better ^^
Best regards
Max
As was pointed out in the comments, the reason you are getting a different row count is that the natural join is connecting your self join using all columns. All columns are being compared because the same table appears on both sides of the join. To test this hypothesis, just check the column values from both tables, which should all match.
The moral of the story here is to avoid natural joins. Besides not being clear as to the join condition, the logic of the join could easily change should table structure change, e.g. if a new column gets added.
Follow the link below for a small demo which tried to reproduce your current results. In a table of 8 records, the natural join returns 8 records, whereas the inner join on one attribute returns 10 records due to some duplicate matching.
Demo
You need to 'project away' the attribute you don't want used in the join e.g. in a derived table (dt):
SELECT *
FROM table1
NATURAL JOIN ( SELECT attribute1 FROM table1 ) dt;

Is it possible to inverse the select statement in SQL?

When I want to select all columns expect foo and bar, what I normally do is just explicitly list all the other columns in select statement.
select a, b, c, d, ... from ...
But if table has dozen columns, this is tedious process for simple means. What I would like to do instead, is something like the following pseudo statement:
select * except(foo, bar) from ...
I would also like to know, if there is a function to filter out rows from the result consisting of multiple columns, if multiple rows has same content in all corresponding columns. In other words duplicate rows would be filtered out.
------------------------
A | B | C
------------------------ ====> ------------------------
A | B | C A | B | C
------------------------ ------------------------
You can query INFORMATION_SCHEMA db and get the list of columns (except two) for that table, e.g.:
SELECT REPLACE(GROUP_CONCAT(COLUMN_NAME), '<foo,bar>,', '')
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = '<your_table>' AND TABLE_SCHEMA = '<database>';
Once you get the list of columns, you can use that in your select query.
You can create view based on this table with all columns except these two columns and then use this view everytime with
select * from view
simple group by on all column will remove such duplicates. there are other options as well - distinct and row_number.
select * except(foo, bar) from
This is a frequently requested feature on SO. However, it has not made it to the SQL Standard and I don't know of any SQL products that support it. I guess when the product managers ask their developers, MVPs, usergroups, etc to measure enthusiasm for this prospective feature, they mostly hear, "SELECT * FROM is considered dangerous, we need to protect new users who don't know what they are doing, etc."
You may find it useful to use NATURAL JOIN rather than INNER JOIN etc which removes what would be duplicated columns from the resulting table expression e.g.
SELECT *
FROM Table1 t1
INNER JOIN Table2 t2
ON t1.foo = t2.foo
AND t1.bar = t2.bar;
will result in two columns named foo and two named bar (and possibly other repeated names), probably de-duplicated in some way e.g. by suffixing the range variable names t1 and t2 that INNER JOIN forced you into using.
Whereas:
SELECT *
FROM Table1 NATURAL JOIN Table2;
doesn't require the use of range variables (a good thing) because there will only be one column named foo and one named bar in the result.
And to remove duplicated rows as well as columns changed the implied SELECT ALL * into the explicit SELECT DISTINCT * e.g.
SELECT DISTINCT *
FROM Table1 NATURAL JOIN Table2;
Doing this may reduce your need for the SELECT ALL BUT { these columns } feature you desire.
Of course, if you do that you will be told, "NATURAL JOIN is considered dangerous, we need to protect you from yourself in case you don't know what you are doing, etc." :)

How can i get a count of two values using multiple tables inner joins and group by?

I have a set of tables and I am trying to get the two counts by state to display. It's proving to be a little tricky partly because some results will have a count of 0. I'm not sure how to deal with those at the moment.
First I'll show my table structure and then I'll explain what counts I'm trying to get. I'm thinking its probably something simple, but I'm a little rusty on sql queries.
Here is how my tables are setup. I have one primary table that I'm using to join the other tables too.
t1 (primary table)
ID, qrtID, sdID, published
t2
qID, qTypes, qSlug
t3
stateID, stateName, stateAbbr
The values link like this. t1.qrtID = t2.qID, t1.sdID = t3.stateID.
The qSlug values has 2 possible values (past and present), so i want to get the counts based on those groups.
What I want to end up with are columns for stateName, qSlug_count1, and qSlug_count2. If there is a count of "0", i want to display "0".
So for now this is what i got.
SELECT * FROM
(SELECT sdID, COUNT(qrtID) AS past_count FROM t1 WHERE qrtID = "1" GROUP BY sdID) c1
LEFT JOIN
(SELECT sdID, COUNT(qrtID) AS pres_count FROM t1 WHERE qrtID = "2" GROUP BY sdID) c2
ON c1.sdID = c2.sdID
The results from this query are close to what I need, but i am missing some data. I need to get the stateName, stateAbbr, and also if there is a count of 0, show a 0 in the column. So all states should be respresented in the results.
So, the question is how can i modify the query I have above to add in the additional tables and join them to correct values AND also be able to show zero values if there are no records that match?
Just use conditional aggregation in a single query:
SELECT sdID,
sum(qrtID = "1") AS past_count
sum(qrtID = "2") AS pres_count
FROM t1
GROUP BY sdID;
Your query is missing rows because some sdIDs have only 1's and others have only 2's. You might want to add:
where qrtID in ("1", "2")
if you don't want rows where two 0s could appear.

MySQL joins count matched record from another table

What I have is 2 tables, the first table I want it to display all results, no "where" or anything to limit it.
The second table I want to match an id to the first table, it can have multiple rows referencing it so I want to count the number.
So lets say the first table is like this:
ID - name
1 - one
2 - two
3 - three
4 - four
And the second table is like this
ID - REF
1 - 1
2 - 1
3 - 2
4 - 2
5 - 3
6 - 3
7 - 4
8 - 4
I want to combine them like so:
ID - name - count
1 - one - 2
2 - two - 2
3 - three- 2
4 - four - 2
I have tried using subqueries, left joins, right joins, inner joins, sub query joins, grouping and 9 times out of ten I get 20 results of the first ID out of 1300 results I should get. The rest I only get an incorrect count and no name.
I feel this is MySQL 101 but after attempting multiple variations and coming up with nothing I feel there must be something I am missing.
I would be happy to be directed to a question that is in the exact same situation (2 hours of looking and nothing that works exactly like this) Or a simple query to point out the logic of this method, Thanks in an advance to anyone that answers, you will have made my day.
If any additional information is needed let me know, I have left out the query deliberately because I have adapted it so many times that it will not have much relevance (I would have to list every query I tried and that would be far to much scrolling)
Ok I have tested the first and answer and it seemed to work in this context so I will expand my answer, the question is "answered" so this is just an expansion if there are no replies I will close this with the answer as follows:
SELECT t.id, t.name, count(*) AS suppliers
FROM #__tiresku AS t
LEFT JOIN #__mrsp AS m ON t.name = m.tiresku_id
GROUP BY t.id, t.name
The expansions is an inner join, I have another table that is more of a list, it has an id and a name and that's it, I reference that table with an id to get the "name" instead.
This might have a better option then joins (like foreign keys or something).
I had this added to the select b.name AS brand_name
And a join INNER JOIN #__brands AS b ON t.brand = b.id
Worked with a sub query rather then join
This is a basic join with aggregation:
select t1.id, t1.name, count(*) as `count`
from table1 t1 join
table2 t2
on t1.id = t2.ref
group by t1.id, t1.name;
As asked, the example does not include records in the first table that are not in the second table, but this may be possible and is implied.
I am inclined to create a nested table of the counts in the second table without regard to "exists in the first table" either, unless the counts are huge and then the probe becomes cheaper.
I would do the count of the values in the second table first as the first table is a defacto decode of a description.
select ID, name, coalesce('count',0)
from (select ref, count(*) as 'count'
from table2
group by ref) as T2
right join table1
on ref = ID;