Query is very slow when I add a third LEFT JOIN - mysql

Hi there I have been playing with this query hours and I can't get it return results in a reasonable execution time.
Here is the case:
I have three tables -
Table 1 called : rowsall
1 id int(11)
2 masterCaseId varchar(50)
3 RowNum int(11)
4 fullCaseNumber varchar(50)
5 rowKtavNameFull varchar(250)
6 DateOpen varchar(50)
7 DateProccess varchar(50)
8 rowStatus varchar(50)
9 rowCourt varchar(100)
10 rowProcedure varchar(50)
11 rowCaseType varchar(50)
12 rowIntrest varchar(50)
13 rowDetailsGen varchar(250)
14 rowTypeTeanot varchar(50)
15 rowHisayon varchar(50)
16 rowAmount varchar(50)
17 rowZacautPtor varchar(50)
18 rowZacautApproove varchar(50)
19 rowStatIravon varchar(50)
20 rowDateClose varchar(50)
21 rowCloseReason varchar(50)
22 rowResultTaken varchar(50)
23 rowOldFile varchar(50)
24 rowOpenedInCourse varchar(50)
25 rowGniza varchar(50)
26 rowReasonDeposit varchar(50)
27 rowTypeJudgeType varchar(50)
28 rowJudgeTypeDate
29 rowJudgeTypeName varchar(50)
30 rowGishurType varchar(50)
31 rowGishurDetails varchar(250)
Total rows: 13001, size 11.7mb
Indexes:
PRIMARY BTREE Yes No id 13001 A No
RowNum BTREE No No RowNum 12 A No
rowStatus 12 A No
rowResultTaken 12 A No
rowJudgeTypeName BTREE No No rowJudgeTypeName 1083 A No
masterCaseId BTREE No No masterCaseId 13001 A No
RowNum_2 BTREE No No rowJudgeTypeName 1857 A No
RowNum 1857 A No
fullCaseNumber BTREE No No fullCaseNumber 203 A No
Table 2 called : casses_rows
1 id int(11)
2 caseFullNum varchar(50)
3 statusCrawl varchar(50)
4 courtPlace text
5 rowsNum int(11)
6 caseJudge varchar(50)
7 caseFullName text
8 whenCrawled datetime
9 yearVal varchar(5)
10 monthVal varchar(5)
11 caseVal int(11)
Total rows: ~23,846, size 4.8mb
Indexes:
PRIMARY BTREE Yes No id 26302 A No
Table 3 called : casedocs
1 id int(11)
2 caseNum varchar(20)
3 DocTitle varchar(250)
4 DocDateStr varchar(20)
5 KeyWords text
6 content text
7 DocDateParsed timestamp
Total rows: ~1,163,669, size 4.1g
Indexes:
PRIMARY BTREE Yes No id 895132 A No
caseNum BTREE No No caseNum 895132 A No
My goal:
I need to join those tables to get most of the cols in table1 + one col in table 2 + one col in table 3 or NULL if there is no match:
My Query is:
SELECT
A.`id` AS idRowCase,
C.`caseNum` AS isPaperAva,
A.`rowCaseType`,
A.`fullCaseNumber`,
A.`rowProcedure`,
B.`caseFullName`,
A.`rowCourt`,
A.`rowAmount`,
A.`rowResultTaken`, A.`rowStatus`, A.`rowIntrest` ,A.`DateOpen` ,A.`DateProccess`, A.`rowDateClose`, A.`rowJudgeTypeDate`
FROM (SELECT * FROM `rowsall` WHERE `rowJudgeTypeName` LIKE '%#value1%' AND `RowNum` ='1' ) A
INNER JOIN ( SELECT `id`,`caseFullName` FROM `casses_rows` ) B
ON A.`masterCaseId` = B.`id`
LEFT JOIN (SELECT `caseNum` FROM `casedocs` GROUP BY `caseNum` ORDER BY NULL ) C
ON A.`fullCaseNumber` = C.`caseNum`
The result is as I wanted, but the problem is that it takes 1 min to return the results...
Here is the EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 121
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 24185 Using where; Using join buffer
1 PRIMARY <derived4> ALL NULL NULL NULL NULL 343438
4 DERIVED casedocs index NULL caseNum 62 NULL 768024 Using index
3 DERIVED casses_rows ALL NULL NULL NULL NULL 29872
2 DERIVED rowsall ref RowNum RowNum 4 6500 Using where
As you can see I'm grouping table 3 to prevent the join creating duplicate rows in the results - actually the third join is to test if there are docs that correspond to the case or not (will be NULL).
More information:
If I remove the third join the query take 1 sec
If I execute only the third join select statement it take 0.003 sec.
When profiling the query the " sending data " is 99.9% of the time.
Any Ideas why it takes so long to execute the third join????
Mission accomplished!
Thanks to #Turophile and #Joel Coehoorn new test results are arount 0.004 sec!!!
Here is the finall query:
SELECT DISTINCT A.`id` AS idRowCase, C.`caseNum` AS isPaperAva, A.`rowCaseType` , A.`fullCaseNumber` , A.`rowProcedure` , B.`caseFullName` , A.`rowCourt` , A.`rowAmount` , A.`rowResultTaken` , A.`rowStatus` , A.`rowIntrest` , A.`DateOpen` , A.`DateProccess` , A.`rowDateClose` , A.`rowJudgeTypeDate`
FROM `rowsall` A
INNER JOIN `casses_rows` B ON A.`masterCaseId` = B.`id`
LEFT JOIN `casedocs` C ON A.`fullCaseNumber` = C.`caseNum`
WHERE A.`rowJudgeTypeName` LIKE '%#value1%'
AND A.`RowNum` = '1'

My advice would be to not sort and group unnecessarily. So, something like this:
SELECT
A.`id` AS idRowCase,
C.`caseNum` AS isPaperAva,
A.`rowCaseType`,
A.`fullCaseNumber`,
A.`rowProcedure`,
B.`caseFullName`,
A.`rowCourt`,
A.`rowAmount`,
A.`rowResultTaken`,
A.`rowStatus`,
A.`rowIntrest`,
A.`DateOpen` ,
A.`DateProccess`,
A.`rowDateClose`,
A.`rowJudgeTypeDate`
FROM `rowsall` AS A
INNER JOIN `casses_rows` AS B
ON A.`masterCaseId` = B.`id`
LEFT JOIN `casedocs` AS C
ON A.`fullCaseNumber` = C.`caseNum`
WHERE `rowJudgeTypeName` LIKE '%#value1%'
AND `RowNum` ='1'
(may return different results (multiple rows) if caseNum isn't unique).
You could also turn the LEFT JOIN into a sub-select:
SELECT
A.`id` AS idRowCase,
A.`fullCaseNumber` AS isPaperAva,
A.`rowCaseType`,
A.`fullCaseNumber`,
A.`rowProcedure`,
B.`caseFullName`,
A.`rowCourt`,
A.`rowAmount`,
A.`rowResultTaken`,
A.`rowStatus`,
A.`rowIntrest`,
A.`DateOpen` ,
A.`DateProccess`,
A.`rowDateClose`,
A.`rowJudgeTypeDate`
FROM `rowsall` AS A
INNER JOIN `casses_rows` AS B
ON A.`masterCaseId` = B.`id`
WHERE `rowJudgeTypeName` LIKE '%#value1%'
AND `RowNum` ='1'
AND A.`fullCaseNumber` in (SELECT `caseNum` FROM `casedocs` )
But this shows that using table casedocs is kind of redundant - is it really needed?

Firstly, the first two tables have no need for subqueries at all. This can be better expressed directly through join conditions and the WHERE clause.
Also, the last join uses a sub query with a group by:
LEFT JOIN (SELECT caseNum FROM casedocs GROUP BY caseNum ORDER BY NULL )
This breaks MySql's ability to use any indexes when computing that last join. If you can re-write this to join the table first, and do the GROUP BY in the outer query, so that you get the same results, it might perform much better, because you'll have better use of indexes.
SELECT
A.`id` AS idRowCase,
C.`caseNum` AS isPaperAva,
A.`rowCaseType`,
A.`fullCaseNumber`,
A.`rowProcedure`,
B.`caseFullName`,
A.`rowCourt`,
A.`rowAmount`,
A.`rowResultTaken`, A.`rowStatus`, A.`rowIntrest` ,A.`DateOpen` ,A.`DateProccess`, A.`rowDateClose`, A.`rowJudgeTypeDate`
FROM `rowsall` A
INNER JOIN `casses_rows` B ON A.`masterCaseId` = B.`id`
LEFT JOIN (SELECT `caseNum` FROM `casedocs` GROUP BY `caseNum` ) C ON c.`caseNum` = A.`fullCaseNumber`
WHERE A.`rowJudgeTypeName` LIKE '%#value1%' AND A.`RowNum` ='1'

Related

MySQL complex semi-join without group by

Summary
I am looking for a semi-join(ish) query that selects a number of customers and joins their most recent data from other tables.
At a later time, I wish to directly append conditions to the end of the query: WHERE c.id IN (1,2,3)
Problem
As far as I am aware, my requirement rules out GROUP BY:
SELECT * FROM customer c
LEFT JOIN customer_address ca ON ca.customer_id = c.id
GROUP BY c.id
# PROBLEM: Cannot append conditions *after* GROUP BY!
With most subquery-based attempts, my problem is the same.
As an additional challenge, I cannot strictly use a semi-join, because I allow at least two types of phone numbers (mobile and landline), which come from the same table. As such, from the phone table I may be joining multiple records per customer, i.e. this is no longer a semi-join. My current solution below illustrates this.
Questions
The EXPLAIN result at the bottom looks performant to me. Am I correct? Are each of the subqueries executed only once? Update: It appears that DEPENDENT SUBQUERY is executed once for each row in the outer query. It would be great if we could avoid this.
Is there a better solution to what I am doing?
DDLs
DROP TABLE IF EXISTS customer;
CREATE TABLE `customer` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
);
DROP TABLE IF EXISTS customer_address;
CREATE TABLE `customer_address` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`customer_id` bigint(20) unsigned NOT NULL,
`street` varchar(85) DEFAULT NULL,
`house_number` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`)
);
DROP TABLE IF EXISTS customer_phone;
CREATE TABLE `customer_phone` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`customer_id` bigint(20) unsigned NOT NULL,
`phone` varchar(32) DEFAULT NULL,
`type` tinyint(3) unsigned NOT NULL COMMENT '1=mobile,2=landline',
PRIMARY KEY (`id`)
);
insert ignore customer values (1);
insert ignore customer_address values (1, 1, "OldStreet", 1),(2, 1, "NewStreet", 1);
insert ignore customer_phone values (1, 1, "12345-M", 1),(2, 1, "12345-L-Old", 2),(3, 1, "12345-L-New", 2);
SELECT * FROM customer;
+----+
| id |
+----+
| 1 |
+----+
SELECT * FROM customer_address;
+----+-------------+-----------+--------------+
| id | customer_id | street | house_number |
+----+-------------+-----------+--------------+
| 1 | 1 | OldStreet | 1 |
| 2 | 1 | NewStreet | 1 |
+----+-------------+-----------+--------------+
SELECT * FROM customer_phone;
+----+-------------+-------------+------+
| id | customer_id | phone | type |
+----+-------------+-------------+------+
| 1 | 1 | 12345-M | 1 |
| 2 | 1 | 12345-L-Old | 2 |
| 3 | 1 | 12345-L-New | 2 |
+----+-------------+-------------+------+
Solution so far
SELECT *
FROM customer c
# Join the most recent address
LEFT JOIN customer_address ca ON ca.id = (SELECT MAX(ca.id) FROM customer_address ca WHERE ca.customer_id = c.id)
# Join the most recent mobile phone number
LEFT JOIN customer_phone cphm ON cphm.id = (SELECT MAX(cphm.id) FROM customer_phone cphm WHERE cphm.customer_id = c.id AND cphm.`type` = 1)
# Join the most recent landline phone number
LEFT JOIN customer_phone cphl ON cphl.id = (SELECT MAX(cphl.id) FROM customer_phone cphl WHERE cphl.customer_id = c.id AND cphl.`type` = 2)
# Yay conditions appended at the end
WHERE c.id IN (1,2,3)
Fiddle
This fiddle gives the appropriate result set using the given solution. See my questions above.
http://sqlfiddle.com/#!9/98c57/3
I would avoid those dependent subqueries, instead try this:
SELECT
*
FROM customer c
LEFT JOIN (
SELECT
customer_id
, MAX(id) AS currid
FROM customer_phone
WHERE type = 1
GROUP BY
customer_id
) gm ON c.id = gm.customer_id
LEFT JOIN customer_phone mobis ON gm.currid = mobis.id
LEFT JOIN (
SELECT
customer_id
, MAX(id) AS currid
FROM customer_phone
WHERE type = 2
GROUP BY
customer_id
) gl ON c.id = gl.customer_id
LEFT JOIN customer_phone lands ON gl.currid = lands.id
WHERE c.id IN (1, 2, 3)
;
or, perhaps:
SELECT
*
FROM customer c
LEFT JOIN (
SELECT
customer_id
, MAX(case when type = 1 then id end) AS mobid
, MAX(case when type = 2 then id end) AS lndid
FROM customer_phone
GROUP BY
customer_id
) gp ON c.id = gp.customer_id
LEFT JOIN customer_phone mobis ON gp.mobid = mobis.id
LEFT JOIN customer_phone lands ON gp.lndid = lands.id
WHERE c.id IN (1, 2, 3)
;
see: http://sqlfiddle.com/#!9/ef983/1/

SQL RIGHT JOIN misunderstanding

I'm working on ASP.NET application whose SQL backend (MySQL 5.6) has 4 tables:
The first table is defined in this way:
CREATE TABLE `items` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`descr` varchar(45) NOT NULL,
`modus` varchar(8) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`)
);
These are the items managed in the application.
the second table:
CREATE TABLE `files` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`file_path` varchar(255) NOT NULL,
`id_item` int(11) NOT NULL,
`id_type` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`)
);
these are files that are required for items management. Each 'item' can have 0 or multiple files ('id_item' field is filled with a valid 'id' of 'items' table).
the third table:
CREATE TABLE `file_types` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`file_type` varchar(32) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`)
);
this table describe the type of the file.
the fourth table:
CREATE TABLE `checklist` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`id_type` int(11) NOT NULL,
`modus` varchar(8) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`)
);
this table, as suggested by its name, is a checklist. It describe what types of files needs to be collected for a particular 'modus', 'modus' field holds the same values as for 'modus' in 'items' table, 'id_type' holds valid 'id' values from 'file_types' table.
Let's suppose that the first table holds those items:
id descr modus
--------------------
1 First M
2 Second P
3 Third M
4 Fourth M
--------------------
The second:
id file_path id_item id_type
--------------------------------------
1 file1.jpg 1 1
2 file2.jpg 1 2
3 file3.jpg 2 1
4 file4.jpg 1 4
5 file5.jpg 1 1
--------------------------------------
The third:
id file_type
--------------
1 red
2 blue
3 green
4 default
--------------
The fourth table:
id id_type modus
--------------------
1 1 M
2 2 M
3 3 M
4 4 M
5 1 P
6 4 P
--------------------
What I need to obtain is a table with such items (referred to id_item = 1):
id_item file_path id_type file_type
--------------------------------------------
1 file1.jpg 1 red
1 file5.jpg 1 red
1 file2.jpg 2 blue
1 file4.jpg 4 default
<null> <null> 3 green
--------------------------------------------
While the result table for id_item = 2 should be the following:
id_item file_path id_type file_type
--------------------------------------------
2 file3.jpg 1 red
<null> <null> 4 default
--------------------------------------------
where of course 'id_item' is the 'id' of 'items' table, 'id_type' is the 'id' of the 'types' table etc.
In short I need to have a table that depicts the checklist status for a particularm 'item' id i.e. which files have been collected but also which of them are missing.
I tried to use RIGHT JOIN clause without success:
SELECT
items.id AS id_item,
files.file_path AS file_path,
file_types.id AS id_type,
file_types.file_type AS file_type
FROM
files
RIGHT JOIN
checklist ON (files.id_type = checklist.id_type )
INNER JOIN
items ON (files.id_item = items.id)
AND (items.modus = checklist.modus)
INNER JOIN
file_types ON (checklist.id_type = file_types.id)
WHERE (items.id = 1);
the result of this query is:
id_item file_path id_type file_type
------------------------------------------
1 file1.jpg 1 red
1 file5.jpg 1 red
1 file2.jpg 2 blue
1 file4.jpg 4 default
it lacks of the last row (the missing file from the checklist).
Following query gives you status of each item as following (kind of checklist). I had to change some of the column names which were reserved words in my test environment.
select item_id,
fp filepath,
m_type,
item_desc,
modee,
(select t.type from typess t where t.id = m_type)
from (select null item_id,
i.descr item_desc,
c.modee modee,
c.id_type m_type,
null fp
from items i, checklist c
where c.modee = i.modee
and i.id = 0
and c.id_type not in
(select f.id_type from files f where f.id_item = i.id)
union all
select i.id item_id,
i.descr item_desc,
c.modee modee,
c.id_type m_type,
f.file_path fp
from items i, checklist c, files f
where c.modee = i.modee
and i.id = 0
and f.id_item = i.id
and f.id_type = c.id_type)
order by item_id asc, m_type asc
Try this:
SELECT
files.file_path,
types.type
FROM files
LEFT JOIN checklist ON (files.id_type = checklist.id_type )
LEFT JOIN items ON (files.id_item = items.id)
AND (items.mode = checklist.mode)
LEFT JOIN types ON (checklist.id_type = types .id)
WHERE (items.id = 0);
I have created and populated your tables, but I a discrepancy between what you request (for each item) and your example output (for each item type). However, I have created a query based on the output:
;with cte as (
SELECT i.id, f.file_path, f.id_type
from checklist ck
JOIN files f on f.id_type = ck.id_type
JOIN items i on i.id = f.id_item AND i.mode = ck.mode AND i.id = 0
)
SELECT cte.id, cte.file_path, T.id, T.[type]
FROM types T
LEFT JOIN cte on cte.id_type = T.id
[edit]
My result is the following (SQL):
id file_path id type
---------------------------------
0 file1.jpg 0 red
0 file5.jpg 0 red
0 file2.jpg 1 blue
NULL NULL 2 green
0 file4.jpg 3 default
No CTE version:
SELECT cte.id, cte.file_path, T.id, T.[type]
FROM types T
LEFT JOIN (
SELECT i.id, f.file_path, f.id_type
from checklist ck
JOIN files f on f.id_type = ck.id_type
JOIN items i on i.id = f.id_item AND i.mode = ck.mode AND i.id = 0
) cte on cte.id_type = T.id

Reduce mysql database to elminiate duplicates

I have a table called "lane" with the following properties.
CREATE TABLE `lane` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`origRegion` varchar(45) NOT NULL,
`origState` char(2) NOT NULL,
`destRegion` varchar(45) NOT NULL,
`destState` char(2) NOT NULL
PRIMARY KEY (`id`)
)
There are duplicate rows in this table of the following columns: origState, origRegion, destState, destRegion. I'd like to be able to select all rows JOINED to what the min(id) is of the first occurance.
For example, with data:
1 ALL MA ALL OH
2 ALL MA ALL OH
3 ALL MA ALL OH
and a SQL similar to this (which misses all the duplicate rows):
select l.*, l2.count, l2.minId from tmpLane l
JOIN (SELECT id, min(ID) as minId from tmpLane
GROUP BY origRegion, origState, destRegion, destState) l2 on l.id = l2.id;
Result (note the count and minId at the end):
1 ALL MA ALL OH 3 1
2 ALL MA ALL OH 3 1
3 ALL MA ALL OH 3 1
Note, that the query used above is an adaptation of the solution here (which doesn't work in this situation)
SELECT ID,
origRegion,
origState,
destRegion,
destState,
(SELECT COUNT(*)
FROM Lane l3
WHERE l.origRegion = l3.origRegion
and l.origState = l3.origState
and l.destRegion = l3.destRegion
and l.destState = l3.destState) as 'Count',
(SELEcT MIN(ID)
FROM Lane l2
WHERE l.origRegion = l2.origRegion
and l.origState = l2.origState
and l.destRegion = l2.destRegion
and l.destState = l2.destState) as minID
FROM lane l
You can run this query to remove all the duplicate rows from your database:-
ALTER IGNORE TABLE `lane`
ADD UNIQUE INDEX (`origRegion`, `origState`, `destRegion`, `destState`);
This will add unique index to your table and remove all dulicate rows and will make sure that no duplicate rows being inserted in future.

Query with multiple left joins - points column value is incorrect

I have the following database structure, and I am trying to run a single query that will show classrooms and how many students are part of the classroom, and how many rewards a classroom has allocated out, as well as how many points allocated to a single classroom (based on the classroom_id column).
Using the query at the very bottom I am trying to collect the 'totalPoints' that a classroom has assigned - based on counting the points column in the classroom_redeemed_codes table and return this as a single integer.
For some reason the values are incorrect for the totalPoints - I am doing something wrong but not sure what...
-- UPDATE --
Here is the sqlfiddle:-
http://sqlfiddle.com/#!2/a9f45
My Structure:
CREATE TABLE `organisation_classrooms` (
`classroom_id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
`active` tinyint(1) NOT NULL,
`organisation_id` int(11) NOT NULL,
`period` int(1) DEFAULT '0',
`classroom_bg` int(2) DEFAULT '3',
`sortby` varchar(6) NOT NULL DEFAULT 'points',
`sound` int(1) DEFAULT '0',
PRIMARY KEY (`classroom_id`)
);
CREATE TABLE organisation_classrooms_myusers (
`classroom_id` int(11) NOT NULL,
`user_id` bigint(11) unsigned NOT NULL,
);
CREATE TABLE `classroom_redeemed_codes` (
`redeemed_code_id` int(11) NOT NULL AUTO_INCREMENT,
`myuser_id` bigint(11) unsigned NOT NULL DEFAULT '0',
`ssuser_id` bigint(11) NOT NULL DEFAULT '0',
`classroom_id` int(11) NOT NULL,
`order_product_id` int(11) NOT NULL DEFAULT '0',
`order_product_images_id` int(11) NOT NULL DEFAULT '0',
`date_redeemed` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`points` int(11) NOT NULL,
`type` int(1) NOT NULL DEFAULT '0',
`notified` int(1) NOT NULL DEFAULT '0',
`inactive` tinyint(3) NOT NULL,
PRIMARY KEY (`redeemed_code_id`),
);
SELECT
t.classroom_id,
title,
COALESCE (
COUNT(DISTINCT r.redeemed_code_id),
0
) AS totalRewards,
COALESCE (
COUNT(DISTINCT ocm.user_id),
0
) AS totalStudents,
COALESCE (sum(r.points), 0) AS totalPoints
FROM
`organisation_classrooms` `t`
LEFT OUTER JOIN classroom_redeemed_codes r ON (
r.classroom_id = t.classroom_id
AND r.inactive = 0
AND (
r.date_redeemed >= 1393286400
OR r.date_redeemed = 0
)
)
LEFT OUTER JOIN organisation_classrooms_myusers ocm ON (
ocm.classroom_id = t.classroom_id
)
WHERE
t.organisation_id =37383
GROUP BY title
ORDER BY t.classroom_id ASC
LIMIT 10
-- EDIT --
OOPS! I hate SQL sometimes... I have made a big mistake, I am trying to count the number of STUDENTS in the classroom_redeemed_codes rather than the organisation_classrooms_myuser table. I'm really sorry I should have picked that up sooner?!
classroom_id | totalUniqueStudents
16 1
17 2
46 1
51 1
52 1
There are 7 rows in the classroom_redeemed_codes table but as classroom_id 46 has two rows although with the same myuser_id (this is the student id) this should appear as one unique student.
Does this make sense? Essentially trying to grab the number of unique students in the classroom_redeemed_codes tables based on the myuser_id column.
e.g a classroom id 46 could have 100 rows in the classroom_redeemed_codes tables, but if it is the same myuser_id for each this should show the totalUniqueStudents count as 1 and not 100.
Let me know if this isn't clear....
-- update --
I have the following query which seems to work borrowed from a user below which seems to work... (my head hurts) i'll accept the answer again. Sorry for the confusion - I think I was just over thinking this somewhat
select crc.classroom_id,
COUNT(DISTINCT crc.myuser_id) AS users,
COUNT( DISTINCT crc.redeemed_code_id ) AS classRewards,
SUM( crc.points ) as classPoints, t.title
from classroom_redeemed_codes crc
JOIN organisation_classrooms t
ON crc.classroom_id = t.classroom_id
AND t.organisation_id = 37383
where crc.inactive = 0
AND ( crc.date_redeemed >= 1393286400
OR crc.date_redeemed = 0 )
group by crc.classroom_id
I ran by first doing a pre-query aggregate of your points per specific class, then used left-join to it. I am getting more rows in the result set than your sample expected, but don't have MySQL to test/confirm directly. Howeverhere is a SQLFiddle of your query By doing your query with sum of points, and having a Cartesian result when applying the users table, it is probably the basis of duplicating the points. By pre-querying on the redeem codes itself, you just grab that value, then join to users.
SELECT
t.classroom_id,
title,
COALESCE ( r.classRewards, 0 ) AS totalRewards,
COALESCE ( r.classPoints, 0) AS totalPoints,
COALESCE ( r.uniqStudents, 0 ) as totalUniqRedeemStudents,
COALESCE ( COUNT(DISTINCT ocm.user_id), 0 ) AS totalStudents
FROM
organisation_classrooms t
LEFT JOIN ( select crc.classroom_id,
COUNT( DISTINCT crc.redeemed_code_id ) AS classRewards,
COUNT( DISTINCT crc.myuser_id ) as uniqStudents,
SUM( crc.points ) as classPoints
from classroom_redeemed_codes crc
JOIN organisation_classrooms t
ON crc.classroom_id = t.classroom_id
AND t.organisation_id = 37383
where crc.inactive = 0
AND ( crc.date_redeemed >= 1393286400
OR crc.date_redeemed = 0 )
group by crc.classroom_id ) r
ON t.classroom_id = r.classroom_id
LEFT OUTER JOIN organisation_classrooms_myusers ocm
ON t.classroom_id = ocm.classroom_id
WHERE
t.organisation_id = 37383
GROUP BY
title
ORDER BY
t.classroom_id ASC
LIMIT 10
You need sum(r.points) and a subquery in the left outer join see below
SELECT
t.classroom_id,
title,
COALESCE (
COUNT(DISTINCT r.redeemed_code_id),
0
) AS totalRewards,
COALESCE(sum(r.points),0) AS totalPoints
,COALESCE(sum(T1.cnt),0) as totalStudents
FROM
`organisation_classrooms` `t`
left outer join (select classroom_id, count(user_id) cnt
from organisation_classrooms_myusers
group by classroom_id) T1 on (T1.classroom_id=t.classroom_id)
LEFT OUTER JOIN classroom_redeemed_codes r ON (
r.classroom_id = t.classroom_id
AND r.inactive = 0
AND (
r.date_redeemed >= 1393286400
OR r.date_redeemed = 0
)
)
WHERE
t.organisation_id =37383
GROUP BY title
ORDER BY t.classroom_id ASC
LIMIT 10
I simplified your query; there is no need to use COALLESCE together with COUNT() because COUNT() never returns NULL. For SUM() I prefer to use IFNULL() because it is shorter and more readable. The results displayed below contain only the data for classroom_id #16, #17 and #46 for easier comparison with the example provided in the question. The actual result sets are bigger and contain all the classroom_ids present in the tables. However, their presence is not needed to understand how and why it works.
SELECT
t.classroom_id,
t.title,
COUNT(DISTINCT r.redeemed_code_id) AS totalRewards,
COUNT(DISTINCT ocm.user_id) AS totalStudents,
IFNULL(SUM(r.points), 0) AS totalPoints
FROM `organisation_classrooms` t
LEFT JOIN `classroom_redeemed_codes` r
ON r.classroom_id = t.classroom_id
AND r.inactive = 0
AND (r.date_redeemed >= 1393286400 OR r.date_redeemed = 0)
LEFT JOIN `organisation_classrooms_myusers` ocm
ON ocm.classroom_id = t.classroom_id
WHERE t.organisation_id = 37383
GROUP BY t.classroom_id
ORDER BY t.classroom_id ASC
Let's try to split it in pieces and put them together after that. First, let's see what users are selected:
Query #1
SELECT
t.classroom_id,
t.title,
ocm.user_id
FROM `organisation_classrooms` t
LEFT JOIN `organisation_classrooms_myusers` ocm
ON ocm.classroom_id = t.classroom_id
WHERE t.organisation_id = 37383
ORDER BY t.classroom_id ASC
I removed the classroom_redeemed_codes table and it fields, removed GROUP BY and replaced the aggregate function COUNT(ocm.user_id) with ocm.user_id to see what users are selected.
The result show us this part of the query is correct:
classroom_id | title | user_id
-------------+-------+--------
16 | BLUE | 2
16 | BLUE | 1
17 | GREEN | 508835
17 | GREEN | 508826
46 | PINK | NULL
There are 2 users in classroom #16, another 2 in #7 and none in class #46.
Putting back the GROUP BY clause will make it return the correct values (2, 2, 0) in the totalStudents column.
Let's check now the relationship with table classroom_redeemed_codes:
Query #2
SELECT
t.classroom_id,
t.title,
r.redeemed_code_id, r.points
FROM `organisation_classrooms` t
LEFT JOIN `classroom_redeemed_codes` r
ON r.classroom_id = t.classroom_id
AND r.inactive = 0
AND (r.date_redeemed >= 1393286400 OR r.date_redeemed = 0)
WHERE t.organisation_id = 37383
ORDER BY t.classroom_id ASC
The result is:
classroom_id | title | redeemed_code_id | points
-------------+-------+------------------+-------
16 | BLUE | 7 | 50
17 | GREEN | 8 | 25
17 | GREEN | 9 | 75
46 | PINK | 5 | 250
46 | PINK | 6 | 100
Again, grouping by classroom_id will produce (1, 2, 2) in column totalRewards and (50, 100, 350) in column totalPoints which is correct.
The trouble starts when you want to combine these into a single query. No matter what kind of join you use, for the provided input you will get (2*1, 2*2, 1*2) rows for classroom_id having the values 16, 17 and 46 (in this order). The values I multiplied in parenthesis are the number of rows for each classroom_id in the first and in the query result set above.
Combined
Let' try the query that selects the rows before grouping them:
SELECT
t.classroom_id,
t.title,
r.redeemed_code_id, ocm.user_id, r.points
FROM `organisation_classrooms` t
LEFT JOIN `classroom_redeemed_codes` r
ON r.classroom_id = t.classroom_id
AND r.inactive = 0
AND (r.date_redeemed >= 1393286400 OR r.date_redeemed = 0)
LEFT JOIN `organisation_classrooms_myusers` ocm
ON ocm.classroom_id = t.classroom_id
WHERE t.organisation_id = 37383
ORDER BY t.classroom_id ASC
It returns this result set:
classroom_id | title | redeemed_code_id | user_id | points
-------------+-------+------------------+---------+-------
16 | BLUE | 7 | 2 | 50
16 | BLUE | 7 | 1 | 50 <- *
-------------+-------+------------------+---------+-------
17 | GREEN | 8 | 508835 | 25
17 | GREEN | 8 | 508826 | 25 <- *
17 | GREEN | 9 | 508835 | 75
17 | GREEN | 9 | 508826 | 75 <- *
-------------+-------+------------------+---------+-------
46 | PINK | 5 | NULL | 250
46 | PINK | 6 | NULL | 100
I added horizontal rules to separate the rows that belongs to the same group when we add the GROUP BY clause. This is basically the way a SQL query with GROUP BY is executed, no matter the name of the actual software that implements it.
As you can see, for each classroom, it combines all the redeemed codes associated with the classroom with all the users associated with the classroom. If you add more users and redeemed codes for classrooms #16, #17 and #46 in your tables you will get a much larger result set.
The next step on the execution of a GROUP BY query is to produce a single row from each group you see above. There is no problem with columns classroom_id and title, they contain a single value in each group. For the columns redeemed_code_id and user_id your query counts distinct values and that works fine too. The problem is with the addition of points.
If you just SUM() them, you get a redeemed code added for each user_id in the group. If you use SUM(DISTINCT points) it is also wrong because it will ignore the duplicates even when they are different entries in table classroom_redeemed_codes.
What you want is to add points for DISTINCT redeemed_code_id. I marked on the above result set the rows you don't want.
This is not possible using this query because on calculation of the aggregate values each column is independent of the other. We need a query that selects the desired rows before grouping them.
An Idea
We can try to add the missing columns (with NULL values) to the two simple queries above, UNION ALL them then select from this and GROUP BY.
First, let's be sure it selects what we need:
SELECT
t.classroom_id,
t.title,
NULL AS redeemed_code_id, ocm.user_id, NULL AS points
FROM `organisation_classrooms` t
LEFT JOIN `organisation_classrooms_myusers` ocm
ON ocm.classroom_id = t.classroom_id
WHERE t.organisation_id = 37383
UNION ALL
SELECT
t.classroom_id,
t.title,
r.redeemed_code_id, NULL AS user_id, r.points
FROM `organisation_classrooms` t
LEFT JOIN `classroom_redeemed_codes` r
ON r.classroom_id = t.classroom_id
AND r.inactive = 0
AND (r.date_redeemed >= 1393286400 OR r.date_redeemed = 0)
WHERE t.organisation_id = 37383
ORDER BY classroom_id
Attention! The ORDER BY clause applies to the UNIONed result set. If you want to order the rows of each SELECT (it doesn't help because UNION doesn't keep the order) you need to enclose that query in parenthesis and put the ORDER BY clauses there.
The result set looks great:
classroom_id | title | redeemed_code_id | user_id | points
-------------+-------+------------------+---------+-------
16 | BLUE | NULL | 1 | NULL
16 | BLUE | NULL | 2 | NULL
16 | BLUE | 7 | NULL | 50
-------------+-------+------------------+---------+-------
17 | GREEN | 8 | NULL | 25
17 | GREEN | 9 | NULL | 75
17 | GREEN | NULL | 508826 | NULL
17 | GREEN | NULL | 508835 | NULL
-------------+-------+------------------+---------+-------
46 | PINK | 5 | NULL | 250
46 | PINK | 6 | NULL | 100
46 | PINK | NULL | NULL | NULL
Now we could put some parenthesis around the query above (strip ORDER BY) and use it in another query, grouping the data by classroom_id, counting the users and the redeemed codes and summing their points.
You will get a query that looks awful and, on your current database schema, crawls when your tables have several hundred rows. This is why I will not write it here.
Attention!
Its performance can be improved by adding the missing indexes to your tables, on the fields that appear in the ON, WHERE, ORDER BY and GROUP BY clauses of the query.
It will bring a significant improvement but I won't rely very much on that. For really big tables (hundreds of thousands of rows) it will still crawl.
Another Idea
We can also add GROUP BY on both Query #1 and Query #2 first and UNION ALL them after that:
SELECT
t.classroom_id,
t.title,
NULL AS totalRewards,
COUNT(DISTINCT ocm.user_id) AS totalStudents,
NULL AS totalPoints
FROM `organisation_classrooms` t
LEFT JOIN `organisation_classrooms_myusers` ocm
ON ocm.classroom_id = t.classroom_id
WHERE t.organisation_id = 37383
GROUP BY t.classroom_id
UNION ALL
SELECT
t.classroom_id,
t.title,
COUNT(DISTINCT redeemed_code_id) AS totalRewards,
NULL AS totalStudents,
SUM(points) AS totalPoints
FROM `organisation_classrooms` t
LEFT JOIN `classroom_redeemed_codes` r
ON r.classroom_id = t.classroom_id
AND r.inactive = 0
AND (r.date_redeemed >= 1393286400 OR r.date_redeemed = 0)
WHERE t.organisation_id = 37383
GROUP BY t.classroom_id
ORDER BY classroom_id, totalRewards
This produces a nice result set:
classroom_id | title | totalRewards | totalStudents | totalPoints
-------------+-------+--------------+---------------+-------------
16 | BLUE | NULL | 2 | NULL
16 | BLUE | 1 | NULL | 50
17 | GREEN | NULL | 2 | NULL
17 | GREEN | 2 | NULL | 100
46 | PINK | NULL | 0 | NULL
46 | PINK | 2 | NULL | 350
This query can be embedded in another query that groups by classroom_id and SUM()s the total columns above to get the final result. But again, the final query is big and ugly and it
doesn't run very fast for large tables. And again, this is the reason I don't write it here.
Conclusion
It can be done in a single query but it doesn't look good and it doesn't work well on large tables.
Regarding the performance, put EXPLAIN in front of your query then check the values in columns type, key and Extra of the result. See the documentation for explanation of the possible values of these columns, what to try to achieve and what to avoid.
Both queries I created on both ideas produce joins of type range or ALL and having Using filesort in column Extra (all these are slow). Using them as sub-queries in bigger queries will not improve the way they are execution, on the contrary.
I recommend you to run the individual SELECT queries from the last code example as two separate queries; they will return the odd and the even rows from the above result set. Then combine their results into the client code. It will run faster this way.

Validating presence of value(s) in a (sub)table and return a "boolean" result

I want to create a query in MySQL, on an order table and verify if it has a booking id, if it does not have a booking_id it should available on all relations in the invoice table.
I want the value returned to be a boolean in a single field.
Taken the example given, in
Case of id #1 I expect an immediate true, because it's available
Case of id #2 I expect an "delayed" false from the invoice table as not all related invoices have an booking_id, it should only return true if invoice id #3 actually has an booking id, meaning all invoices have an booking_id when the order does not.
I've tried several ways but still failed and don't even know what the best way to tackle this is.
Thanks for your input in advance!
Table order:
|----+------------+
| id | booking_id |
|----+------------+
| 1 | 123 |
| 2 | NULL |
|----+------------+
Table invoice:
+----+----------+------------+
| id | order_id | booking_id |
+----+----------+------------+
| 1 | 1 | 123 |
| 2 | 2 | 124 |
| 3 | 2 | NULL |
+----+----------+------------+
Schema
CREATE TABLE IF NOT EXISTS `invoice` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`order_id` int(11) NOT NULL,
`booking_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
)
CREATE TABLE IF NOT EXISTS `order` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`booking_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
If I understand you correctly, this is the base query for your request:
SELECT
O.id
, SUM(CASE WHEN I.booking_id IS NOT NULL THEN 1 ELSE 0 END) AS booked_count
, COUNT(1) AS total_count
, CASE WHEN SUM(CASE WHEN I.booking_id IS NOT NULL THEN 1 ELSE 0 END) = COUNT(1) THEN 1 ELSE 0 END AS has_all_bookings
FROM
`order` O
LEFT JOIN invoice I
ON O.id = I.order_id
GROUP BY
O.id
If you want to check if there is no record in the invoice table add the COUNT(1) to the last CASE statement as an additional condition (COUNT(1) = 0)
Fiddle Demo
I have not understood how the logic works out when the order is booked but some of the invoices are not. I'll presume either is good for a true value (OR logic). I'd avoid COUNT and GROUP BY and go for a SUBSELECT, which works fine in MySQL (I'm using and old 5.1.73-1 version).
This query gives you both values in distinct columns:
SELECT o.*
, (booking_id IS NOT NULL) AS order_booked
, (NOT EXISTS (SELECT id FROM `invoice` WHERE order_id=o.id AND booking_id IS NULL)) AS invoices_all_booked
FROM `order` o
Of course you can combine the values:
SELECT o.*
, (booking_id IS NOT NULL OR NOT EXISTS (SELECT id FROM `invoice` WHERE order_id=o.id AND booking_id IS NULL)) AS booked
FROM `order` o
Here you go, create a view that does it
create view booked_view as
select `order`.id as order_id
,
case when booking_id > 0 then true
when exists (SELECT id FROM invoice WHERE order_id=`order`.id AND invoice.booking_id IS NULL) then true
else false
end as booked
from `order` ;
Then just join your view to the order table and you will have your boolean column 'booked'
select o.id, booked from `order` o
join booked_view on (o.id = booked_view.order_id)