How to split SQL query results into columns based on two WHERE conditions and two calculated COUNT fields? - sql-server-2008

I have the following (simplified) database schema:
Persons:
[Id] [Name]
-------------------
1 'Peter'
2 'John'
3 'Anna'
Items:
[Id] [ItemName] [ItemStatus]
-------------------
10 'Cake' 1
20 'Dog' 2
ItemDocuments:
[Id] [ItemId] [DocumentName] [Date]
-------------------
101 10 'CakeDocument1' '2016-01-01 00:00:00'
201 20 'DogDocument1' '2016-02-02 00:00:00'
301 10 'CakeDocument2' '2016-03-03 00:00:00'
401 20 'DogDocument2' '2016-04-04 00:00:00'
DocumentProcessors:
[PersonId] [DocumentId]
-------------------
1 101
1 201
2 301
I have also set up an SQL fiddle to play with: http://www.sqlfiddle.com/#!3/e6082
The relation logic is the following: every Person can work on zero or infinite number of ItemDocuments (many-to-many); each ItemDocument belongs to exactly one Item (one-to-many). Item has status 1 - Active, 2 - Closed
What I need is a report that fulfills the following requirements:
for each person in Persons table, display count of Items that have ItemDocuments related to this person
the counts should be split in two columns by ItemStatus
the query should be filterable by two optional date periods (using two BETWEEN conditions on ItemDocuments.Date field) and the Item counts should also be split into two periods
if a Person does not have any ItemDocuments assigned, it still should be shown in the results with all count values set to 0
if a Person has more than one ItemDocument for an Item, the Item still should be counted only once
Essentially, here is how the results should look like if I use both periods to NULL (to read all the data):
[PersonName] [Active Items for period 1] [Closed Items for period 1] [Active Items for period 2] [Closed Items for period 2]
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
'Peter' 1 1 1 1
'John' 1 0 1 0
'Anna' 0 0 0 0
While I can create an SQL query for each requirement separately, I have a problem to understand how to combine all of them together into one.
For example, I can split ItemStatus counts in two columns using
COUNT(CASE WHEN t.ItemStatus = 1 THEN 1 ELSE NULL END) AS Active,
COUNT(CASE WHEN t.ItemStatus = 2 THEN 1 ELSE NULL END) AS Closed
and I can filter by two periods (with max/min date constants from MS SQL server specification to avoid NULLs for optional period dates) using
between coalesce(#start1, '1753-01-01') and coalesce(#end1, '9999-12-31')
between coalesce(#start2, '1753-01-01') and coalesce(#end2, '9999-12-31')
but how to combine all of this together, considering also JOINs between tables?
Is there any technique, join or MS SQL Server specific approach to do this in efficient way?
My first attempt seems to work as required but it looks like ugly subquery duplications multiple times:
DECLARE #start1 DATETIME, #start2 DATETIME, #end1 DATETIME, #end2 DATETIME
-- SET #start2 = '2017-01-01'
SELECT
p.Name,
(SELECT COUNT(1)
FROM Items i
WHERE i.ItemStatus = 1 AND EXISTS(
SELECT 1
FROM DocumentProcessors AS dcp
INNER JOIN ItemDocuments AS idc ON dcp.DocumentId = idc.Id
WHERE dcp.PersonId = p.Id AND idc.ItemId = i.Id
AND idc.Date BETWEEN COALESCE(#start1, '1753-01-01') AND COALESCE(#end1, '9999-12-31')
)
) AS Active1,
(SELECT COUNT(*)
FROM Items i
WHERE i.ItemStatus = 2 AND EXISTS(
SELECT 1
FROM DocumentProcessors AS dcp
INNER JOIN ItemDocuments AS idc ON dcp.DocumentId = idc.Id
WHERE dcp.PersonId = p.Id AND idc.ItemId = i.Id
AND idc.Date BETWEEN COALESCE(#start1, '1753-01-01') AND COALESCE(#end1, '9999-12-31')
)
) AS Closed1,
(SELECT COUNT(1)
FROM Items i
WHERE i.ItemStatus = 1 AND EXISTS(
SELECT 1
FROM DocumentProcessors AS dcp
INNER JOIN ItemDocuments AS idc ON dcp.DocumentId = idc.Id
WHERE dcp.PersonId = p.Id AND idc.ItemId = i.Id
AND idc.Date BETWEEN COALESCE(#start2, '1753-01-01') AND COALESCE(#end2, '9999-12-31')
)
) AS Active2,
(SELECT COUNT(*)
FROM Items i
WHERE i.ItemStatus = 2 AND EXISTS(
SELECT 1
FROM DocumentProcessors AS dcp
INNER JOIN ItemDocuments AS idc ON dcp.DocumentId = idc.Id
WHERE dcp.PersonId = p.Id AND idc.ItemId = i.Id
AND idc.Date BETWEEN COALESCE(#start2, '1753-01-01') AND COALESCE(#end2, '9999-12-31')
)
) AS Closed2
FROM Persons p

I'm not absolutely sure if I really got what you want, but you might try this
WITH AllData AS
(
SELECT p.Id AS PersonId
,p.Name AS Person
,id.Date AS DocDate
,id.DocumentName AS DocName
,i.ItemName AS ItemName
,i.ItemStatus AS ItemStatus
,CASE WHEN id.Date BETWEEN COALESCE(#start1, '1753-01-01') AND COALESCE(#end1, '9999-12-31') THEN 1 ELSE 0 END AS InPeriod1
,CASE WHEN id.Date BETWEEN COALESCE(#start2, '1753-01-01') AND COALESCE(#end2, '9999-12-31') THEN 1 ELSE 0 END AS InPeriod2
FROM Persons AS p
LEFT JOIN DocumentProcessors AS dp ON p.Id=dp.PersonId
LEFT JOIN ItemDocuments AS id ON dp.DocumentId=id.Id
LEFT JOIN Items AS i ON id.ItemId=i.Id
)
SELECT PersonID
,Person
,COUNT(CASE WHEN ItemStatus = 1 AND InPeriod1 = 1 THEN 1 ELSE NULL END) AS ActiveIn1
,COUNT(CASE WHEN ItemStatus = 2 AND InPeriod1 = 1 THEN 1 ELSE NULL END) AS ClosedIn1
,COUNT(CASE WHEN ItemStatus = 1 AND InPeriod2 = 1 THEN 1 ELSE NULL END) AS ActiveIn2
,COUNT(CASE WHEN ItemStatus = 2 AND InPeriod2 = 1 THEN 1 ELSE NULL END) AS ClosedIn2
FROM AllData
GROUP BY PersonID,Person

Related

MySQL query slow in Where MONTH(datetime)

I am trying to add index in datetime, but the result still same.
SELECT s.id, s.player,
COUNT(case when dg.winner = 1 AND dp.colour <= 5 then 1 when dg.winner = 2 AND dp.colour > 5 then 1 else null end) as totalwin,
COUNT(case when dg.winner = 2 AND dp.colour <= 5 then 1 when dg.winner = 1 AND dp.colour > 5 then 1 else null end) as totallose,
COUNT(dg.winner) as totalgames
FROM dotaplayers AS dp
LEFT JOIN gameplayers AS gp ON gp.gameid = dp.gameid and dp.colour = gp.colour
LEFT JOIN stats AS s ON s.player_lower = gp.name
LEFT JOIN dotagames AS dg ON dg.gameid = dp.gameid
LEFT JOIN games AS g ON g.id = dp.gameid
LEFT JOIN bans as b ON b.name=gp.name
WHERE MONTH(g.datetime) = 4
GROUP by gp.name
ORDER BY totalwin DESC LIMIT 0,10
Showing rows 0 - 9 (10 total, Query took 7.7552 seconds.)
I want order the most winner in 4th month (April). Then it shows id, username, totalwins, totallose, totaldraw, totalgames. The case in my query is the how to get that. The result is correct, but slow.
Assuming g.datetime is indexed, try this instead:
WHERE g.`datetime` BETWEEN 20150401 AND 20150430`
Using the MONTH function, or any other function, on the field data in the WHERE eliminates the benefits of any indexes you might have on those fields; this results in the query requiring a full scan of the values in the table.
Rearranging the order of JOINs will probably help as well:
SELECT s.id, s.player
, SUM(case
when dg.winner = 1 AND dp.colour <= 5 then 1
when dg.winner = 2 AND dp.colour > 5 then 1
else 0
end
) as totalwin
, SUM(case
when dg.winner = 2 AND dp.colour <= 5 then 1
when dg.winner = 1 AND dp.colour > 5 then 1
else 0
end
) as totallose
, COUNT(dg.winner) as totalgames -- Not, sure of the nature of dg.`winner`, a SUM might be more appropriate here as well.
FROM games AS g
INNER JOIN dotaplayers AS dp ON g.id = dp.gameid
LEFT JOIN gameplayers AS gp ON gp.gameid = dp.gameid and dp.colour = gp.colour
LEFT JOIN stats AS s ON s.player_lower = gp.name
LEFT JOIN dotagames AS dg ON dg.gameid = dp.gameid
LEFT JOIN bans as b ON b.name=gp.name
WHERE g.`datetime` BETWEEN 20150401000000 AND 20150430235959
GROUP by gp.name
ORDER BY totalwin DESC
LIMIT 0,10
;
Another thing to note: Depending on the relationship between tables, some of the intermediate joins may result in effectively multiplying the resulting totals; this can be resolved by doing the sums in subqueries and joining those instead.

Complex querying on table with multiple userids

I have a table like this:
score
id week status
1 1 0
2 1 1
3 1 0
4 1 0
1 2 0
2 2 1
3 2 0
4 2 0
1 3 1
2 3 1
3 3 1
4 3 0
I want to get all the id's of people who have a status of zero for all weeks except for week 3. something like this:
Result:
result:
id w1.status w2.status w3.status
1 0 0 1
3 0 0 1
I have this query, but it is terribly inefficient on larger datasets.
SELECT w1.id, w1.status, w2.status, w3.status
FROM
(SELECT s.id, s.status
FROM score s
WHERE s.week = 1) w1
LEFT JOIN
(SELECT s.id, s.status
FROM score s
WHERE s.week = 2) w2 ON w1.id=w2.id
LEFT JOIN
(SELECT s.id, s.status
FROM score s
WHERE s.week = 3) w3 ON w1.id=w3.id
WHERE w1.status=0 AND w2.status=0 AND w3.status=1
I am looking for a more efficient way to calculate the above.
select id
from score
where week in (1, 2, 3)
group by id
having sum(
case
when week in (1, 2) and status = 0 then 1
when week = 3 and status = 1 then 1
else 0
end
) = 3
Or more generically...
select id
from score
group by id
having
sum(case when status = 0 then 1 else 0 end) = count(*) - 1
and min(case when status = 1 then week else null end) = max(week)
You can do using not exists as
select
t1.id,
'0' as `w1_status` ,
'0' as `w2_status`,
'1' as `w3_status`
from score t1
where
t1.week = 3
and t1.status = 1
and not exists(
select 1 from score t2
where t1.id = t2.id and t1.week <> t2.week and t2.status = 1
);
For better performance you can add index in the table as
alter table score add index week_status_idx (week,status);
In case of static number of weeks (1-3), group_concat may be used as a hack..
Concept:
SELECT
id,
group_concat(status) as totalStatus
/*(w1,w2=0,w3=1 always!)*/
FROM
tableName
WHERE
totalStatus = '(0,0,1)' /* w1=0,w2=1,w3=1 */
GROUP BY
id
ORDER BY
week ASC
(Written on the go. Not tested)
SELECT p1.id, p1.status, p2.status, p3.status
FROM score p1
JOIN score p2 ON p1.id = p2.id
JOIN score p3 ON p2.id = p3.id
WHERE p1.week = 1
AND p1.status = 0
AND p2.week = 2
AND p2.status = 0
AND p3.week = 3
AND p3.status = 1
Try this, should work

How to use user variable as counter with inner join queries that contains GROUP BY statement?

I have 2 tables odds and matches :
matches : has match_id and match_date
odds : has id, timestamp, result, odd_value, user_id, match_id
I had a query that get the following information from those tables for each user:
winnings : the winning bets for each user. (when odds.result = 1)
loses : the lost bets for each user.(when odds.result != 1)
points : the points of each user.(the sum of the odds.odd_value) for each user.
bonus : for each continuous 5 winnings i want to add extra bonus to this variable. (for each user)
How to calculate bonus?
I tried to use this query and I faced a problem : (you can check it here SQL Fiddle)
the calculated bonus are not right for all the users :
first user:(winnings:13, bonus=2).
second user:(winnings:8, bonus=2)bonus here should be 1.
third user:(winnings:14, bonus=3)bonus here should be 2.
why does the query not calculate the bonus correctly?
select d.user_id,
sum(case when d.result = 1 then 1 else 0 end) as winnings,
sum(case when d.result = 2 then 1 else 0 end) as loses,
sum(case when d.result = 1 then d.odd_value else 0 end) as points,
f.bonus
FROM odds d
INNER JOIN
(
SELECT
user_id,SUM(CASE WHEN F1=5 THEN 1 ELSE 0 END) AS bonus
FROM
(
SELECT
user_id,
CASE WHEN result=1 and #counter<5 THEN #counter:=#counter+1 WHEN result=1 and #counter=5 THEN #counter:=1 ELSE #counter:=0 END AS F1
FROM odds o
cross join (SELECT #counter:=0) AS t
INNER JOIN matches mc on mc.match_id = o.match_id
WHERE MONTH(STR_TO_DATE(mc.match_date, '%Y-%m-%d')) = 2 AND
YEAR(STR_TO_DATE(mc.match_date, '%Y-%m-%d')) = 2015 AND
(YEAR(o.timestamp)=2015 AND MONTH(o.timestamp) = 02)
) Temp
group by user_id
)as f on f.user_id = d.user_id
group by d.user_id
I am not sure how your result related to matches table,
you can add back WHERE / INNER JOIN clause if you need.
Here is link to fiddle
and the last iteration according to your comments:
And here is a query:
SET #user:=0;
select d.user_id,
sum(case when d.result = 1 then 1 else 0 end) as winnings,
sum(case when d.result = 2 then 1 else 0 end) as loses,
sum(case when d.result = 1 then d.odd_value else 0 end) as points,
f.bonus
FROM odds d
INNER JOIN
(
SELECT
user_id,SUM(bonus) AS bonus
FROM
(
SELECT
user_id,
CASE WHEN result=1 and #counter<5 AND #user=user_id THEN #counter:=#counter+1
WHEN result=1 and #counter=5 AND #user=user_id THEN #counter:=1
WHEN result=1 and #user<>user_id THEN #counter:=1
ELSE
#counter:=0
END AS F1,
#user:=user_id,
CASE WHEN #counter=5 THEN 1 ELSE 0 END AS bonus
FROM odds o
ORDER BY user_id , match_id
) Temp
group by user_id
)as f on f.user_id = d.user_id
group by d.user_id

mysql sort group by total and name not working

I have Php program that outputs names with the corresponding events attended and the number of times each event was attended over a period of time. As an example of the output
Name | Run | Swim | Bike | Total
John 3 2 5 10
MySQL query look something like this:
$sql = 'SELECT
e.name as Leader,
SUM(CASE WHEN c.catid = 26 THEN 1 ELSE null END) as "Swim",
SUM(CASE WHEN c.catid = 25 THEN 1 ELSE null END) as "Bike",
SUM(CASE WHEN c.catid = 24 THEN 1 ELSE null END) as "Run",
COUNT("Swim"+"Bike"+"Run") as total
FROM
events as e
LEFT JOIN event_categories as c ON c.uid = e.uid
WHERE
(DATE(e.event_start) BETWEEN "'.$from_date.'" and "'.$to_date.'")
GROUP BY Leader WITH ROLLUP;';
This works well, however, if I want to sort my data by "total" in descending order I get no output if I replace the last GROUP BY line with the following:
GROUP BY total DESC, Leader WITH ROLLUP;';
so that I get a listing with names who have the highest totals to the lowest, and people with the same totals get listed in alphabetical order. What am I doing wrong?
As mentioned in the comments, the ORDER BY and ROLLUP can not be used together. It states this here (http://dev.mysql.com/doc/refman/5.0/en/group-by-modifiers.html) about half way down the page. To get around this, you'll have to do the ORDER BY in another query where your original query acts as the subquery:
SELECT *
FROM
(
SELECT
e.name as Leader,
SUM(CASE WHEN c.catid = 26 THEN 1 ELSE null END) as "Swim",
SUM(CASE WHEN c.catid = 25 THEN 1 ELSE null END) as "Bike",
SUM(CASE WHEN c.catid = 24 THEN 1 ELSE null END) as "Run",
COUNT("Swim"+"Bike"+"Run") as total
FROM
events as e
LEFT JOIN event_categories as c ON c.uid = e.uid
WHERE
(DATE(e.event_start) BETWEEN "'.$from_date.'" and "'.$to_date.'")
GROUP BY Leader WITH ROLLUP
) as rolldup
ORDER BY Total DESC
ORIGINAL (WRONG) ANSWER:
You do not put Sorts in a GROUP BY clause. You put them in your ORDER BY clause:
$sql = 'SELECT
e.name as Leader,
SUM(CASE WHEN c.catid = 26 THEN 1 ELSE null END) as "Swim",
SUM(CASE WHEN c.catid = 25 THEN 1 ELSE null END) as "Bike",
SUM(CASE WHEN c.catid = 24 THEN 1 ELSE null END) as "Run",
COUNT("Swim"+"Bike"+"Run") as total
FROM
events as e
LEFT JOIN event_categories as c ON c.uid = e.uid
WHERE
(DATE(e.event_start) BETWEEN "'.$from_date.'" and "'.$to_date.'")
GROUP BY Leader WITH ROLLUP
ORDER BY total DESC;';
You don't want to GROUP BY Total you just want to ORDER BY total.
So the two last lines of your query should be
GROUP BY Leader WITH ROLLUP
ORDER BY total DESC

MySQL select subqueries

This is what I have at the moment.
$db =& JFactory::getDBO();
$query = $db->getQuery(true);
$query->select('`#__catalog_commit`.`id` as id, `#__catalog_commit`.`date` as date, COUNT(`#__catalog_commit_message`.`commit_id`) as count,
(SELECT COUNT(`#__catalog_commit_message`.`type`) as count_notice FROM `#__catalog_commit_message` WHERE `#__catalog_commit_message`.`type` = 1 GROUP BY `#__catalog_commit_message`.`type`) as count_notice,
(SELECT COUNT(`#__catalog_commit_message`.`type`) as count_warning FROM `#__catalog_commit_message` WHERE `#__catalog_commit_message`.`type` = 2 GROUP BY `#__catalog_commit_message`.`type`) as count_warning,
(SELECT COUNT(`#__catalog_commit_message`.`type`) as count_error FROM `#__catalog_commit_message` WHERE `#__catalog_commit_message`.`type` = 3 GROUP BY `#__catalog_commit_message`.`type`) as count_error');
$query->from('#__catalog_commit_message');
$query->leftjoin('`#__catalog_commit` ON `#__catalog_commit`.`id` = `#__catalog_commit_message`.`commit_id`');
$query->group('`#__catalog_commit_message`.`commit_id`');
$query->order('`#__catalog_commit`.`id` DESC');
What I have is 2 tables with the following structures:
catalog_commit
==============
id
date
catalog_commit_message
======================
id
commit_id
type
message
Basically I want to have the count of each different types of messages per group items. In what I have it actually select every rows (Which is normal) but I'm looking for a way (nicier if possible) to have the count per messages type within the query.
EDIT: Just wanted to add that it's a JModelList.
From what I gather, this should be your query:
SELECT c.id
,c.date
,count(cm.commit_id) as ct_total
,sum(CASE WHEN cm.type = 1 THEN 1 ELSE 0 END) AS count_notice
,sum(CASE WHEN cm.type = 2 THEN 1 ELSE 0 END) AS count_warning
,sum(CASE WHEN cm.type = 3 THEN 1 ELSE 0 END) AS count_error
FROM catalog_commit c
LEFT JOIN catalog_commit_message cm ON cm.commit_id = c.id
GROUP BY c.id, c.date
ORDER BY c.id DESC
You had the order of your tables reversed in the LEFT JOIN. Also, you had weird subqueries in the SELECT list.