Why duplicate query results, without "GROUP BY"?

Why duplicate query results, without "GROUP BY"? - mysql

I have this query:
SELECT C.ID_PASS, C.ID_MERCE, CTRL.ESITO, M.ID_CAT, M.QTA, M.DESCRIZ,
CTRL.ID_PUNTO, CTRL.ID_ADDETTO, C.DATE_OPEN, C.DATE_CLOSE, C.NOTE
FROM CONTESTAZIONI C, CONTROLLI CTRL, FUNZIONARI F, ADDETTI A, MERCI M
WHERE A.ID=CTRL.ID_ADDETTO
AND A.ID_FUNZ=501
AND M.ID=C.ID_MERCE
AND M.ID_PASS=C.ID_PASS
AND CTRL.ESITO > 1
GROUP BY C.ID_PASS;
Why, if I don't add GROUP BY C.ID_PASS, do I get 20 rows (instead of 2).

You get a cross-product when doing joins with "," operator. You should include the IDs from all tables into your where clause!
Basically, you need to link all 5 tables together by ensuring equalities of their rows' IDs, because as noted in the comment, you have two groups of tables, but they are not linked and the resulting set thus has lots of duplicates.

GROUP BY is used to group rows together when using aggregate functions, such as SUM or COUNT.
If you had, for example, 4 records, with say 2 for each cust id, each with a value:-
CustId Spend
1 10
1 20
2 30
2 40
If you wanted to know the total of each value for each customer you would use something like:-
SELECT CustId, SUM(Spend) FROM SomeTable GROUP BY CustId
This would give you
CustId Sum(Spend)
1 30
2 70
Part of what it does is remove the duplicated rows and sum up all the values into one row.
It can be misused without an aggregate function to remove duplicates and this is what you have done. Hence 2 records instead of 20.
Note that if you have fields in the SELECT that are not in the GROUP BY variable, and which are not 100% dependent on the group by fields then the value of that field is indeterminate.
For example
CustId Spend ShopId
1 10 1
1 20 2
2 30 3
2 40 4
If you wanted to know the total of each value for each customer you would use something like:-
SELECT CustId, ShopId, SUM(Spend) FROM SomeTable GROUP BY CustId
This would give you
CustId Sum(Spend) ShopId
1 30 Could be 1 or could be 2
2 70 Could be 3 or could be 4
In your query this probably applies to the fields CTRL.ESITO, M.ID_CAT, M.QTA, M.DESCRIZ,
CTRL.ID_PUNTO, CTRL.ID_ADDETTO.

You are doing an implicit cross join on the tables. Try using different type of Join such as Inner, Right or Left.
example:
SELECT *
FROM CONTESTAZIONI c
INNER JOIN CONTROLLI ctrl ON c.ID = ctrl.ContId

One advantage of the modern (meaning post-1992) explicit JOIN notation is that you are less likely to forget joining conditions. You have 5 tables; you need (at least) 4 join conditions. Your SQL has only 2 join conditions (one of them a compound join). You've not given us enough schema to be sure of coming up with the right columns for joining, but your query should probably be something like:
SELECT C.ID_PASS, C.ID_MERCE, CTRL.ESITO, M.ID_CAT, M.QTA, M.DESCRIZ,
L.ID_PUNTO, L.ID_ADDETTO, C.DATE_OPEN, C.DATE_CLOSE, C.NOTE
FROM ADDETTI A
JOIN CONTROLLI L ON A.ID = L.ID_Addetto
JOIN CONTESTAZIONI C ON A.xxx1 = C.xxx2
JOIN FUNZIONARI F ON C.yyy1 = F.yyy2
JOIN MERCI M ON M.ID = C.ID_Merce AND M.ID_Pass = C.ID_Pass
WHERE A.ID_FUNZ=501
AND L.ESITO > 1;
Note that you showed the join of A and L (renamed from CTRL), and M and C. The joins of C to A and of F to C are semi-arbitrary guesses (and the column names xxx1 etc are placeholders for your real column names); you will need to understand your schema and make the appropriate joins.

Thanks to everbody. Now I understood: I have to join together all the tables that I put in select (of course only that are in relationship); without this join I get a cross product. I can do join either "JOIN...ON..." or "WHERE ...". I made join with WHERE. Now works ok:
SELECT C.ID_PASS, C.ID_PUNTO, C.ID_ADDETTO, C.TIME_START,
C.TIME_END, C.ESITO, P.ID_NAZ, C.ID_MERCE, M.QTA, M.DESCRIZ, M.ID_CAT
FROM CONTROLLI C, PASSEGGERI P, MERCI M, FUNZIONARI F, CATEGORIE, ADDETTI, NAZIONI
WHERE
ADDETTI.ID=C.ID_ADDETTO
AND P.ID=M.ID_PASS
AND P.ID_NAZ=NAZIONI.ID
AND M.ID_PASS=C.ID_PASS AND M.ID=C.ID_MERCE -- composite PK (so another AND reuqired)
AND M.ID_CAT=CATEGORIE.ID
AND F.ID=ADDETTI.ID_FUNZ
AND ESITO > 1
AND F.ID = 501

Related

Count of joined items per group in MySql

I need to get a set of results showing the number of items accumulated for each 'esta' group.
I'm grouping the results by establishment.
Establishment is inner joined to base.
Left joined items are joined against the base.
So in Esta group 2, let's say there are 3 base ids. Each written and verbal record attached to the base ID would count towards that esta in the results set. There can be multiple 'written' or 'verbal' attached to each base record.
I have 6 verbals and 4 writtens in the database, they are spread around the different 'esta' records. In my query, they are all counting towards the first row of the result I get.
I have tried the same with much more data, and regardless of the 'esta', the first row contains every left joined element counted together.
sql:
SELECT
esta.enf_esta_id
,SUM(IF(verbal.enf_verbal_id is not null,1,0)) as verbals
,SUM(IF(written.enf_written_id is not null,1,0)) as writtens
FROM
enf_base base
INNER JOIN enf_esta esta ON esta.enf_esta_id = base.enf_esta_id
LEFT JOIN enf_verbal verbal ON verbal.enf_base_id = base.enf_base_id
LEFT JOIN enf_written written ON written.enf_base_id = base.enf_base_id
WHERE
1=1
GROUP BY
esta.enf_esta_id
result:
enf_esta_id verbals writtens
2 10 10
3 1 0
4 1 1
6 0 0
To prove that the top row is incorrect, here are the results of just getting the verbals and writtens from enf_esta_id 2.
SELECT
COUNT( * ) AS total
FROM
enf_written
INNER JOIN enf_base ON enf_base.enf_base_id = enf_written.enf_base_id
INNER JOIN enf_esta ON enf_base.enf_esta_id = enf_esta.enf_esta_id
WHERE
enf_esta.enf_esta_id =2
yields:
5
And the same with enf_verbal yields 2. adding up the totals of each gives us the correct 10 if we discount the top row of the problem query result.
Can anyone help me get the result I need?

You are multiplying. Say there are 2 verbals and 5 writtens then your joins make these 10 records (i.e. all combinations). Rather then joining tables and aggregating then, you should first aggregate and then join your aggregates. In your case this is aggregates per base ID, which you will finally further aggregate to get estas.
select
base.enf_esta_id,
coalesce(sum(verbal.cnt), 0) as verbals,
coalesce(sum(written.cnt), 0) as writtens
from enf_base base
left join
(
select enf_base_id, count(*) as cnt
from enf_verbal
group by enf_base_id
) verbal on verbal.enf_base_id = base.enf_base_id
left join
(
select enf_base_id, count(*) as cnt
from enf_written
group by enf_base_id
) written on written.enf_base_id = base.enf_base_id
group by base.enf_esta_id;

Complicated SQLite or SQL SUM() between several rows

I need to extend this question: SQLite SUM() between several rows
Before table was
and query was :
SELECT Sum(SERVICE)
FROM (
SELECT ft.*, (
SELECT count(*)
FROM fuel_table ft2
WHERE ft2.note='Your tank was full up.' and ft2.id>=ft.id)
AS numNotesAhead
FROM fuel_table ft)
AS ft
WHERE numNotesAhead=1
but now my fuel table is split and looks like:
I've tried:
SELECT Sum(fuel_table.SERVICE)
FROM (
SELECT ft.*, (
SELECT count(*)
FROM fuel_table ft2
LEFT JOIN note_table ON fuel_table.EnterId = note_table.EnterId
WHERE ft2.note='Your tank was full up.' and ft2.id>=ft.id)
AS numNotesAhead
FROM fuel_table ft)
AS ft
WHERE numNotesAhead=1
but it doesn't works. My app just stops.
*Note there is "_" in name of fuel table.

So there are two issues here - one is the joining of the tables, which is different from your previous problem. The second issue is doing the appropriate summation of the right values, which is already beautifully answered here: SQLite SUM() between several rows.
So, let's look at the JOIN.
Since the fuel_table has the greater number of records, we're going to expect some 'NULL' values from the desired JOIN with the note_table. (And we're not going to lose any records from fuel_table since we're joining on enterId, so we don't need a FULL OUTER JOIN). This means (for a LEFT JOIN), we want the fuel_table to be on the left:
SELECT f.id, f.service, f.enterid, n.note
FROM fuel_table f
LEFT JOIN note_table n
ON f.enterid = n.enterid
This will give us output:
id service enterid note
----------------------------------
2 50 25 Yes
3 20 26 NULL
4 20 35 Yes
8 30 36 NULL
9 15 37 NULL
10 20 42 Yes
So far, so good.
Now - rather than thinking about it - I just looked at the answer to the previous question, and substituted this subquery from the fuel_table from this answer: SQLite SUM() between several rows
which is a very good answer, and I can take no credit for any part of it in the following bit ...
Putting them together you get the ugly but functional query:
SELECT SUM(fq.service) AS total
FROM
(SELECT fq2.*,
(SELECT COUNT(*)
FROM (SELECT f.id, f.service, f.enterid, n.note
FROM fuel_table f
LEFT JOIN note_table n
ON f.enterid = n.enterid) fq1
WHERE fq1.note = 'Yes' AND fq1.id >= fq2.id) AS numNotesAhead
FROM
(SELECT f.id, f.service, f.enterid, n.note
FROM fuel_table f
LEFT JOIN note_table n
ON f.enterid = n.enterid) fq2) fq
WHERE numNotesAhead = 1
which returns output:
total
------
65

SubQuery Join Failed

I am trying to find out the missing record in the target. I need the employee whose record are missing.
Suppose I have input source as
1,Jack,type1,add1,reg3,..,..,..,
2,Jack,type2,add1,reg3,..,,.,..,
3,Jack,type3,add2,reg4,..,.,..,.,
4,Rock,,,,,,,,
and I have output as
1,Jack,type1,add1,reg3,..,..,..,
4,Rock,,,,,,,,
I have 1000 numbers of rows for other employees and in target i don't have any duplicate records.
I need the employee who are present in source and target having different occurance
means for e.g in above sample data I have 3 entries of jack and 1 entry of Rock in source
and in target I have only on entry of Jack and one for Rock
I am running below query and required output is Jack,3
How can I get it. I am getting error in below query
select A.EMP_NUMBER,A.CNT1
from
(select EMP_NUMBER,count(EMP_NUMBER) as CNT1
from EMPLOYEE_SOURCE
group by EMP_NUMBER ) as A
INNER JOIN
(select B.EMP_NUMBER,B.CNT2
from (select EMP_NUMBER,count(EMP_NUMBER) as CNT2
from EMPLOYEE_TARGET
group by EMP_NUMBER )as B )
ON (A.EMP_NUMBER = B.EMP_NUMBER)
where A.CNT1 != B.CNT2
Please help.

Why don't get the employee that have different number of rows in the two table when grouped by their name (I suppose Emp_Number is the field that contain the name if that what the query in the question return)
SELECT s.Emp_Number, Count(s.Emp_Number)
FROM EMPLOYEE_SOURCE s
LEFT JOIN EMPLOYEE_TARGET t ON s.Emp_Number = t.Emp_Number
GROUP BY s.Emp_Number
HAVING Count(s.Emp_Number) != Count(t.Emp_Number)

It would be really helpful if you specified the exact error you get.
If this is you actual query there are two things: There's no alias name for the 2nd Derived Table (btw, you don't need it at all) and at least in Teradata !=is not valid, this is SQL and not C.
select A.EMP_NUMBER,A.CNT1
from
(
select EMP_NUMBER,count(EMP_NUMBER) as CNT1
from EMPLOYEE_SOURCE
group by EMP_NUMBER
) as A
INNER JOIN
(
select EMP_NUMBER,count(EMP_NUMBER) as CNT2
from EMPLOYEE_TARGET
group by EMP_NUMBER
) as B
ON (A.EMP_NUMBER = B.EMP_NUMBER)
where A.CNT1 <> B.CNT2
If an employee is missing in the 2nd table you might have to use an Outer Join as Serpiton suggested and add an additional WHERE-condition:
where A.CNT1 <> B.CNT2
or b.CNT2 IS NULL

Comparing two values from the same select query

I have a select query which selects all products from my inventory table and joins them with two other tables (tables l_products and a_products)
SELECT
i.*,
b.title,
ROUND((i.price/100*80) - l.price,2) AS margin,
l.price AS l_price,
a.price AS a_price,
ROUND((a.price/100*80) - l.price, 2) AS l_margin
FROM inventory i
LEFT JOIN products b ON i.id = b.id
LEFT JOIN a_products a ON i.id = a.id
LEFT JOIN l_products l ON i.id = l.id
WHERE
a.condition LIKE IF(i.condition = 'New', 'New%', 'Used%')
AND l.condition LIKE IF(i.condition = 'New', 'New%', 'Used%')
This select query will normally give me a table such as...
id, title, condition, margin, l_price, a_price ...
001-new ... new 10 20 10
001-used ... used 10 25 20
002....
Now I need a condition in the query which will ignore all used products that are more expensive (have a higher a_price) than their 'new' counterparts, such as in the example above you can see that 001-used has a higher a_price than 001-new.
How can I achieve this with out having to resolve to using php

FULL JOIN this query with it self on a column which has a uniquely same value for each id prefix.
You may achieve this effect by adding another field to your SELECT call which produces same unique value for 001-new and 001-used, 002-new and 002-used...
Such value generation can be done by defining your own SQL Routine to extract first 3 characters from a column.

Most efficient SQL, DISTINCT or WHERE...AND

Both of these work, but is there a better way to write this?
1.
SELECT asset_id,
asset.category_id,
x,
y
FROM asset
INNER JOIN map_category
ON map_category.category_id = asset.category_id
WHERE asset.map_id = 5
AND map_category.map_id = 5
2. (Added DISTINCT and removed last line)
SELECT DISTINCT asset_id,
asset.category_id,
x,
y
FROM asset
INNER JOIN map_category
ON map_category.category_id = asset.category_id
WHERE asset.map_id = 5
Without either DISTINCT or the last line AND map_cate..., I get 3 records. One for each:
map_category table
asset table

These two queries do completely different things. DISTINCT selects only unique asset_id rows and another query selects only rows where asset.map_id = 5.
The reason you have the same result is your data. On some other data you will have completely different results. So you can't compare efficiency.

since your foreign key consists of both the columns, you should join on both columns...
SELECT asset_id,
asset.category_id,
x,
y
FROM asset
INNER JOIN map_category
ON map_category.category_id = asset.category_id
AND asset.map_id = map_category.map_id

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008