Complex MySQL COUNT query - mysql

Evening folks,
I have a complex MySQL COUNT query I am trying to perform and am looking for the best way to do it.
In our system, we have References. Each Reference can have many (or no) Income Sources, each of which can be validated or not (status). We have a Reference table and an Income table - each row in the Income table points back to Reference with reference_id
On our 'Awaiting' page (the screen that shows each Income that is yet to be validated), we show it grouped by Reference. So you may, for example, see Mr John Smith has 3 Income Sources.
We want it to show something like "2 of 3 Validated" beside each row
My problem is writing the query that figures this out!
What I have been trying to do is this, using a combination of PHP and MySQL to bridge the gap where SQL (or my knowledge) falls short:
First, select a COUNT of the number of incomes associated with each reference:
SELECT `reference_id`, COUNT(status) AS status_count
FROM (`income`)
WHERE `income`.`status` = 0
GROUP BY `reference_id`
Next, having used PHP to generate a WHERE IN clause, proceed to COUNT the number of confirmed references from these:
SELECT `reference_id`, COUNT(status) AS status_count
FROM (`income`)
WHERE `reference_id` IN ('8469', '78969', '126613', ..... etc
AND status = 1
GROUP BY `reference_id`
However this doesn't work. It returns 0 rows.
Any way to achieve what I'm after?
Thanks!

In MySQL, you can SUM() on a boolean expression to get a count of the rows where that expression is true. You can do this because MySQL treats true as the integer 1 and false as the integer 0.
SELECT `reference_id`,
SUM(`status` = 1) AS `validated_count`,
COUNT(*) AS `total_count`
FROM `income`
GROUP BY `reference_id`

Related

MS Access count query does not produce wanted results

I have a table (tblExam) showing exam data score designed as follow:
Exam Name: String
Score: number(pecent)
Basically I am trying to pull the records by Exam name where the score are less than a specific amount (0.695 in my case).
I am using the following statement to get the results:
SELECT DISTINCTROW tblExam.name, Count(tblExam.name) AS CountOfName
FROM tblExam WHERE (((tblExam.Score)<0.695))
GROUP BY tblExam.name;
This works fine but does not display the exam that have 0 records more than 0.695; in other words I am getting this:
Exam Name count
firstExam 2
secondExam 1
thirdExam 3
The count of 0 and any exams with score above 0.695 do not show up. What I would like is something like this:
Exam Name count
firstExam 2
secondExam 1
thirdExam 3
fourthExam 0
fifthExam 0
sixthExam 2
.
..
.etc...
I hope that I am making sense here. I think that I need somekind of LEFT JOIN to display all of the exam name but I can not come up with the proper syntax.
It seems you want to display all name groups and, within each group, the count of Score < 0.695. So I think you should move < 0.695 from the WHERE to the Count() expression --- actually remove the WHERE clause.
SELECT
e.name,
Count(IIf(e.Score < 0.695, 1, Null)) AS CountOfName
FROM tblExam AS e
GROUP BY e.name;
That works because Count() counts only non-Null values. You could use Sum() instead of Count() if that seems clearer:
Sum(IIf(e.Score < 0.695, 1, 0)) AS CountOfName
Note DISTINCTROW is not useful in a GROUP BY query, because the grouping makes the rows unique without it. So I removed DISTINCTROW from the query.
Do I detect a contradiction? The query calls for results <0.695 but your text says you are also looking for results >0.695. Perhaps I don't understand. Does this give you what you are looking for:
SELECT DISTINCTROW tblExam.ExamName, Count(tblExam.ExamName) AS CountOfExamName
FROM tblExam
WHERE (((tblExam.Score)<0.695 Or (tblExam.Score)>0.695))
GROUP BY tblExam.ExamName;

Can SQL query do this?

I have a table "audit" with a "description" column, a "record_id" column and a "record_date" column. I want to select only those records where the description matches one of two possible strings (say, LIKE "NEW%" OR LIKE "ARCH%") where the record_id in each of those two matches each other. I then need to calculate the difference in days between the record_date of each other.
For instance, my table may contain:
id description record_id record_date
1 New Sub 1000 04/14/13
2 Mod 1000 04/14/13
3 Archived 1000 04/15/13
4 New Sub 1001 04/13/13
I would want to select only rows 1 and 3 and then calculate the number of days between 4/15 and 4/14 to determine how long it took to go from New to Archived for that record (1000). Both a New and an Archived entry must be present for any record for it to be counted (I don't care about ones that haven't been archived). Does this make sense and is it possible to calculate this in a SQL query? I don't know much beyond basic SQL.
I am using MySQL Workbench to do this.
The following is untested, but it should work asuming that any given record_id can only show up once with "New Sub" and "Archived"
select n.id as new_id
,a.id as archive_id
,record_id
,n.record_date as new_date
,a.record_date as archive_date
,DateDiff(a.record_date, n.record_date) as days_between
from audit n
join audit a using(record_id)
where n.description = 'New Sub'
and a.description = 'Archieved';
I changed from OR to AND, because I thought you wanted only the nr of days between records that was actually archived.
My test was in SQL Server so the syntax might need to be tweaked slightly for your (especially the DATEDIFF function) but you can select from the same table twice, one side grabbing the 'new' and one grabbing the 'archived' then linking them by record_id...
SELECT
newsub.id,
newsub.description,
newsub.record_date,
arc.id,
arc.description,
arc.record_date,
DATEDIFF(day, newsub.record_date, arc.record_date) AS DaysBetween
FROM
foo1 arc
, foo1 newsub
WHERE
(newsub.description LIKE 'NEW%')
AND
(arc.description LIKE 'ARC%')
AND
(newsub.record_id = arc.record_id)

How can I return a row for each date, even when there is no data for that date (in which case the row should be filled with zero's)?

I hope I will be able to make my problem clear.
Ik have a table called tweets from which I want to extract information for each data in the daterange table. This table holds 142 dates, of which 102 dates have the property trading (day on which market was open) set to 1 (trading=1).
The below query extracts information from the tweets table for 20 companies (identified by sp100_id). The expected resultset therefore contains 20 x 102 = 2,040 rows. However, I only get returned 1,987 rows because for some date-company combinations, the tweets table holds no data. I need these "empty days" to be included in the resultset however. I thought I could accomplish this by using COALESCE(X, 0), returning 0 if there would be no data, but the result is the same: 1,987 rows.
Based on this information and the query below, does anybody know how I can get it to return 102 rows (1 row for each daterange._date with trading=1) for each sp100_id in the tweets table?
SELECT
sp100.sp100_id,
daterange._date,
COALESCE(SUM(IF(tweets.classify1=2, tweets.`retweet_count`, 0)),0) AS `pos-retweet`,
COALESCE(SUM(IF(tweets.classify1=2, tweets.`user-quality`, 0)),0) AS `pos-quality`,
COALESCE(SUM(IF(tweets.classify1=2, tweets.`follow`, 0)),0) AS `pos-follow`,
COALESCE(SUM(IF(tweets.classify1=3, tweets.`retweet_count`, 0)),0) AS `neg-retweet`,
COALESCE(SUM(IF(tweets.classify1=3, tweets.`user-quality`, 0)),0) AS `neg-quality`,
COALESCE(SUM(IF(tweets.classify1=3, tweets.`follow`, 0)),0) AS `neg-follow`
FROM
sp100
CROSS JOIN
daterange
LEFT JOIN
tweets
ON tweets.nyse_date = daterange._date
AND tweets.sp100_id = sp100.sp100_id
WHERE sp100.sp100_id BETWEEN 1 AND 20 AND tweets.type != 1 AND daterange.trading = 1
GROUP BY
sp100.sp100_id, daterange._date
In any other case, I would provide you with a SQLFiddle, but it would be a lot of work to export a proper portion of the tables used to SQLFiddle while the solution might be clear to some real SQL guru anyway :-)
The problem comes from requiring that tweets.type != 1 in your WHERE clause.
For the dates that have no associated tweets, the outer join will result in all tweets columns, including tweets.type, being NULL. As documented under Working with NULL Values:
Because the result of any arithmetic comparison with NULL is also NULL, you cannot obtain any meaningful results from such comparisons.
In MySQL, 0 or NULL means false and anything else means true. The default truth value from a boolean operation is 1.
Therefore such records are filtered by your WHERE clause.
As #Martin Smith commented, you can move this filter criterion into the ON clause of your outer join (so that the test is performed only against actual tweets records rather than simulated NULL ones).
Alternatively, you could rewrite the filter to handle NULL. For example, using the NULL-safe equality operator:
NOT tweets.type <=> 1
As an aside, I usually don't bother with a daterange table and instead omit dates for which there is no data from the resultset: instead, I handle missing dates within my application code.
You need a calendar table filled with each day. I know it might sound silly, but this solution solves yo a lot of problems. The same solution you can have also with integers ( integer tables)

Formatting a MySQL Query result

I've currently got a table as follows,
Column Type
time datetime
ticket int(20)
agentid int(20)
ExitStatus varchar(50)
Queue varchar(50)
I want to write a query which will break this down by week, providing a column with a count for each ExitStatus. So far I have this,
SELECT ExitStatus,COUNT(ExitStatus) AS ExitStatusCount, DAY(time) AS TimePeriod
FROM `table`
GROUP BY TimePeriod, ExitStatus
Output:
ExitStatus ExitStatusCount TimePeriod
NoAgentID 1 4
Success 3 4
NoAgentID 1 5
Success 5 5
I want to change this so it returns results in this format:
week | COUNT(NoAgentID) | COUNT(Success) |
Ideally, I'd like the columns to be dynamic as other ExitStatus values may be possible.
This information will be formatted and presented to end user in a table on a page. Can this be done in SQL or should I reformat it in PHP?
There is no "general" solution to your problem (called cross tabulation) that can be achieved with a single query. There are four possible solutions:
Hardcode all possible ExitStatus'es in your query and keep it updated as you see the need for more and more of them. For example:
SELECT
Day(Time) AS TimePeriod,
SUM(IF(ExitStatus = 'NoAgentID', 1, 0)) AS NoAgentID,
SUM(IF(ExitStatus = 'Success', 1, 0)) AS Success
-- #TODO: Add others here when/if needed
FROM table
WHERE ...
GROUP BY TimePeriod
Do a first query to get all possible ExitStatus'es and then create your final query from your high-level programming language based on those results.
Use a special module for cross tabulation on your high-level programming language. For Perl, you have the SQLCrossTab module but I couldn't find one for PHP
Add another layer to your application by using OLAP (multi-dimensional views of your data) like Pentaho and then querying that layer instead of your original data
You can read a lot more about these solutions and an overall discussion of the subject
This is one way; you can use SUM() to count the number of items a particular condition is true. At the end you just group by the time as per normal.
SELECT DAY(time) AS TimePeriod,
SUM('NoAgentID' = exitStatus) AS NoAgentID,
SUM('Success' = exitStatus) AS Success, ...
FROM `table`
GROUP BY TimePeriod
Output:
4 1 3
5 1 5
The columns here are not dynamic though, which means you have to add conditions as you go along.
SELECT week(time) AS week,
SUM(ExitStatus = 'NoAgentID') AS 'COUNT(NoAgentID)',
SUM(ExitStatus = 'Success') AS 'COUNT(Success)'
FROM `table`
GROUP BY week
I'm making some guesses about how ExitStatus column works. Also, there are many ways of interpretting "week", such as week of year, of month, or quarter, ... You will need to put the appropriate function there.

Mysql subquery with sum causing problems

This is a summary version of the problems I am encountering, but hits the nub of my problem. The real problem involves huge UNION groups of monthly data tables, but the SQL would be huge and add nothing. So:
SELECT entity_id,
sum(day_call_time) as day_call_time
from (
SELECT entity_id,
sum(answered_day_call_time) as day_call_time
FROM XCDRDNCSum201108
where (day_of_the_month >= 10 AND day_of_the_month<=24)
and LPAD(core_range,4,"0")="0987"
and LPAD(subrange,3,"0")="654"
and SUBSTR(LPAD(core_number,7,"0"),4,7)="3210"
) as summary
is the problem: when the table in the subquery XCDRDNCSum201108 returns no rows, because it is a sum, the column values contain null. And entity_id is part of the primary key, and cannot be null.
If I take out the sum, and just query entity_id, the subquery contains no rows, and thus the outer query does not fail, but when I use sum, I get error 1048 Column 'entity_id' cannot be null
how do I work around this problem ? Sometimes there is no data.
You are completely overworking the query... pre-summing inside, then summing again outside. In addition, I understand you are not a DBA, but if you are ever doing an aggregation, you TYPICALLY need the criteria that its grouped by. In the case presented here, you are getting sum of calls for all entity IDs. So you must have a group by any non-aggregates. However, if all you care about is the Grand total WITHOUT respect to the entity_ID, then you could skip the group by, but would also NOT include the actual entity ID...
If you want inclusive to show actual time per specific entity ID...
SELECT
entity_id,
sum(answered_day_call_time) as day_call_time,
count(*) number_of_calls
FROM
XCDRDNCSum201108
where
(day_of_the_month >= 10 AND day_of_the_month<=24)
and LPAD(core_range,4,"0")="0987"
and LPAD(subrange,3,"0")="654"
and SUBSTR(LPAD(core_number,7,"0"),4,7)="3210"
group by
entity_id
This would result in something like (fictitious data)
Entity_ID Day_Call_Time Number_Of_Calls
1 10 3
2 45 4
3 27 2
If all you cared about were the total call times
SELECT
sum(answered_day_call_time) as day_call_time,
count(*) number_of_calls
FROM
XCDRDNCSum201108
where
(day_of_the_month >= 10 AND day_of_the_month<=24)
and LPAD(core_range,4,"0")="0987"
and LPAD(subrange,3,"0")="654"
and SUBSTR(LPAD(core_number,7,"0"),4,7)="3210"
This would result in something like (fictitious data)
Day_Call_Time Number_Of_Calls
82 9
Would:
sum(answered_day_call_time) as day_call_time
changed to
ifnull(sum(answered_day_call_time),0) as day_call_time
work? I'm assuming mysql here but the coalesce function would/should work too.