Formatting a MySQL Query result - mysql

I've currently got a table as follows,
Column Type
time datetime
ticket int(20)
agentid int(20)
ExitStatus varchar(50)
Queue varchar(50)
I want to write a query which will break this down by week, providing a column with a count for each ExitStatus. So far I have this,
SELECT ExitStatus,COUNT(ExitStatus) AS ExitStatusCount, DAY(time) AS TimePeriod
FROM `table`
GROUP BY TimePeriod, ExitStatus
Output:
ExitStatus ExitStatusCount TimePeriod
NoAgentID 1 4
Success 3 4
NoAgentID 1 5
Success 5 5
I want to change this so it returns results in this format:
week | COUNT(NoAgentID) | COUNT(Success) |
Ideally, I'd like the columns to be dynamic as other ExitStatus values may be possible.
This information will be formatted and presented to end user in a table on a page. Can this be done in SQL or should I reformat it in PHP?

There is no "general" solution to your problem (called cross tabulation) that can be achieved with a single query. There are four possible solutions:
Hardcode all possible ExitStatus'es in your query and keep it updated as you see the need for more and more of them. For example:
SELECT
Day(Time) AS TimePeriod,
SUM(IF(ExitStatus = 'NoAgentID', 1, 0)) AS NoAgentID,
SUM(IF(ExitStatus = 'Success', 1, 0)) AS Success
-- #TODO: Add others here when/if needed
FROM table
WHERE ...
GROUP BY TimePeriod
Do a first query to get all possible ExitStatus'es and then create your final query from your high-level programming language based on those results.
Use a special module for cross tabulation on your high-level programming language. For Perl, you have the SQLCrossTab module but I couldn't find one for PHP
Add another layer to your application by using OLAP (multi-dimensional views of your data) like Pentaho and then querying that layer instead of your original data
You can read a lot more about these solutions and an overall discussion of the subject

This is one way; you can use SUM() to count the number of items a particular condition is true. At the end you just group by the time as per normal.
SELECT DAY(time) AS TimePeriod,
SUM('NoAgentID' = exitStatus) AS NoAgentID,
SUM('Success' = exitStatus) AS Success, ...
FROM `table`
GROUP BY TimePeriod
Output:
4 1 3
5 1 5
The columns here are not dynamic though, which means you have to add conditions as you go along.

SELECT week(time) AS week,
SUM(ExitStatus = 'NoAgentID') AS 'COUNT(NoAgentID)',
SUM(ExitStatus = 'Success') AS 'COUNT(Success)'
FROM `table`
GROUP BY week
I'm making some guesses about how ExitStatus column works. Also, there are many ways of interpretting "week", such as week of year, of month, or quarter, ... You will need to put the appropriate function there.

Related

How to get from nested JSON by int rather then by name in MySQL 8

So I'm currently using MySQL's JSON field to store some data.
So the 'reports' table looks like this:
id | stock_id | type | doc |
1 | 5 | Income_Statement | https://pastebin.com/bj1hdK0S|
The pastebin is the content of the json field
What I want to do is get a number (ebit) from the first object under yearly (2018-12-31) in the JSON and then use that to do a WHERE query on so that it only returns where ebit > 50000000 for example. The issue is that the dates under yearly are not standard (i.e. one might be 2018-12-31, the other might by 2018-12-15). So essentially I want a way to get the data using integer indexes rather than the actual names of the objects, so something like yearly.[0].ebit.
How would I do this in MySQL? Alternatively if it's not possible in MySQL, would it be possible in either PostgeSQL or Mongo? If so, could you give me an example? Most of the data fits well into MySQL only this table has a JSON column which is why I started with MySQL.
so StackOverflow isn't letting my link to pastebin without some code so here's some random code:
if(dog == "poodle") {
print "test"
}
I don't know for MySQL nor MongoDB, but here's a simple version for PostgreSQL JSONB type:
SELECT (doc->'yearly'-> max(years) -> 'ebit')::numeric AS ebit
FROM reports, jsonb_object_keys(doc->'yearly') AS years
GROUP BY reports.doc;
...with simplistic test data:
WITH reports(doc) AS (
SELECT '{"yearly":{"2018-12-31":{"ebit":123},"2017-12-31":{"ebit":1.23}}}'::jsonb
)
SELECT (doc->'yearly'-> max(years) -> 'ebit')::numeric AS ebit
FROM reports, jsonb_object_keys(doc->'yearly') AS years
GROUP BY reports.doc;
...gives:
ebit
------
123
(1 row)
So I've basically selected the latest entry under "yearly" without knowing actual values but assuming that the key date formatting will allow a sort order (in this case it seems to comply with ISO-8601).
Using data type JSON instead of JSONB would preserve object key order but is not as efficient in PostgreSQL further down the road and wouldn't help here either.
IF you want to then select only those reports entries having their latest ebit greater than a certain value, just pack it into a sub-select or a CTE. I usualy prefer CTE's because they are better to read, so here we go:
WITH
reports (id, doc) AS (
VALUES
(1, '{"yearly":{"2018-12-31":{"ebit":123},"2017-12-31":{"ebit":1.23}}}'::jsonb),
(2, '{"yearly":{"2018-12-23":{"ebit":50},"2017-12-22":{"ebit":"1200.00"}}}'::jsonb)
),
r_ebit (id, ebit) AS (
SELECT reports.id, (reports.doc->'yearly'-> max(years) -> 'ebit')::numeric AS ebit
FROM reports, jsonb_object_keys(doc->'yearly') AS years
GROUP BY reports.id, reports.doc
)
SELECT id, ebit
FROM r_ebit
WHERE ebit > 100;
However, as you already see, it is not possible to filter the original rows using this strategy. A pre-processing step would make sense here so that the JSON format actually is filter-friendly.
ADDENDUM
To add the possibility of selecting the values for the n-th completed fiscal year, we need resort to window functions and we also need to reduce the resulting set to only return a single row per actual group (in the demonstration case: reports.id):
WITH reports(id, doc) AS (VALUES
(1, '{"yearly":{"2018-12-31":{"ebit":123},"2017-12-31":{"ebit":1.23},"2016-12-31":{"ebit":"23.42"}}}'::jsonb),
(2, '{"yearly":{"2018-12-23":{"ebit":50},"2017-12-22":{"ebit":"1200.00"}}}'::jsonb)
)
SELECT DISTINCT ON (1) reports.id, (reports.doc->'yearly'-> (lead(years, 0) over (partition by reports.doc order by years desc nulls last)) ->>'ebit')::numeric AS ebit
FROM reports, jsonb_object_keys(doc->'yearly') AS years
GROUP BY 1, reports.doc, years.years ORDER BY 1;
...will behave exactly as using the max aggregate function previously. Increasing the offset parameter within the lead(years, <offset>) function all will select the n-th year backwards (because of descending order of the window partition).
The DISTINCT ON (1) clause is the magic that reduces the result to a single row per distinct column value (first column = reports.id). This is why the NULLS LAST is very important inside the window OVER clause.
Here are results for different offsets (I've added a third historic entry for the first id but not for the second to also show how it deals with absent entries):
N = 0:
id | ebit
----+------
1 | 123
2 | 50
N = 1
id | ebit
----+---------
1 | 1.23
2 | 1200.00
N = 2
id | ebit
----+-------
1 | 23.42
2 |
...which means absent entries will just result in a NULL value.

Can SQL query do this?

I have a table "audit" with a "description" column, a "record_id" column and a "record_date" column. I want to select only those records where the description matches one of two possible strings (say, LIKE "NEW%" OR LIKE "ARCH%") where the record_id in each of those two matches each other. I then need to calculate the difference in days between the record_date of each other.
For instance, my table may contain:
id description record_id record_date
1 New Sub 1000 04/14/13
2 Mod 1000 04/14/13
3 Archived 1000 04/15/13
4 New Sub 1001 04/13/13
I would want to select only rows 1 and 3 and then calculate the number of days between 4/15 and 4/14 to determine how long it took to go from New to Archived for that record (1000). Both a New and an Archived entry must be present for any record for it to be counted (I don't care about ones that haven't been archived). Does this make sense and is it possible to calculate this in a SQL query? I don't know much beyond basic SQL.
I am using MySQL Workbench to do this.
The following is untested, but it should work asuming that any given record_id can only show up once with "New Sub" and "Archived"
select n.id as new_id
,a.id as archive_id
,record_id
,n.record_date as new_date
,a.record_date as archive_date
,DateDiff(a.record_date, n.record_date) as days_between
from audit n
join audit a using(record_id)
where n.description = 'New Sub'
and a.description = 'Archieved';
I changed from OR to AND, because I thought you wanted only the nr of days between records that was actually archived.
My test was in SQL Server so the syntax might need to be tweaked slightly for your (especially the DATEDIFF function) but you can select from the same table twice, one side grabbing the 'new' and one grabbing the 'archived' then linking them by record_id...
SELECT
newsub.id,
newsub.description,
newsub.record_date,
arc.id,
arc.description,
arc.record_date,
DATEDIFF(day, newsub.record_date, arc.record_date) AS DaysBetween
FROM
foo1 arc
, foo1 newsub
WHERE
(newsub.description LIKE 'NEW%')
AND
(arc.description LIKE 'ARC%')
AND
(newsub.record_id = arc.record_id)

How can I return a row for each date, even when there is no data for that date (in which case the row should be filled with zero's)?

I hope I will be able to make my problem clear.
Ik have a table called tweets from which I want to extract information for each data in the daterange table. This table holds 142 dates, of which 102 dates have the property trading (day on which market was open) set to 1 (trading=1).
The below query extracts information from the tweets table for 20 companies (identified by sp100_id). The expected resultset therefore contains 20 x 102 = 2,040 rows. However, I only get returned 1,987 rows because for some date-company combinations, the tweets table holds no data. I need these "empty days" to be included in the resultset however. I thought I could accomplish this by using COALESCE(X, 0), returning 0 if there would be no data, but the result is the same: 1,987 rows.
Based on this information and the query below, does anybody know how I can get it to return 102 rows (1 row for each daterange._date with trading=1) for each sp100_id in the tweets table?
SELECT
sp100.sp100_id,
daterange._date,
COALESCE(SUM(IF(tweets.classify1=2, tweets.`retweet_count`, 0)),0) AS `pos-retweet`,
COALESCE(SUM(IF(tweets.classify1=2, tweets.`user-quality`, 0)),0) AS `pos-quality`,
COALESCE(SUM(IF(tweets.classify1=2, tweets.`follow`, 0)),0) AS `pos-follow`,
COALESCE(SUM(IF(tweets.classify1=3, tweets.`retweet_count`, 0)),0) AS `neg-retweet`,
COALESCE(SUM(IF(tweets.classify1=3, tweets.`user-quality`, 0)),0) AS `neg-quality`,
COALESCE(SUM(IF(tweets.classify1=3, tweets.`follow`, 0)),0) AS `neg-follow`
FROM
sp100
CROSS JOIN
daterange
LEFT JOIN
tweets
ON tweets.nyse_date = daterange._date
AND tweets.sp100_id = sp100.sp100_id
WHERE sp100.sp100_id BETWEEN 1 AND 20 AND tweets.type != 1 AND daterange.trading = 1
GROUP BY
sp100.sp100_id, daterange._date
In any other case, I would provide you with a SQLFiddle, but it would be a lot of work to export a proper portion of the tables used to SQLFiddle while the solution might be clear to some real SQL guru anyway :-)
The problem comes from requiring that tweets.type != 1 in your WHERE clause.
For the dates that have no associated tweets, the outer join will result in all tweets columns, including tweets.type, being NULL. As documented under Working with NULL Values:
Because the result of any arithmetic comparison with NULL is also NULL, you cannot obtain any meaningful results from such comparisons.
In MySQL, 0 or NULL means false and anything else means true. The default truth value from a boolean operation is 1.
Therefore such records are filtered by your WHERE clause.
As #Martin Smith commented, you can move this filter criterion into the ON clause of your outer join (so that the test is performed only against actual tweets records rather than simulated NULL ones).
Alternatively, you could rewrite the filter to handle NULL. For example, using the NULL-safe equality operator:
NOT tweets.type <=> 1
As an aside, I usually don't bother with a daterange table and instead omit dates for which there is no data from the resultset: instead, I handle missing dates within my application code.
You need a calendar table filled with each day. I know it might sound silly, but this solution solves yo a lot of problems. The same solution you can have also with integers ( integer tables)

Complex MySQL COUNT query

Evening folks,
I have a complex MySQL COUNT query I am trying to perform and am looking for the best way to do it.
In our system, we have References. Each Reference can have many (or no) Income Sources, each of which can be validated or not (status). We have a Reference table and an Income table - each row in the Income table points back to Reference with reference_id
On our 'Awaiting' page (the screen that shows each Income that is yet to be validated), we show it grouped by Reference. So you may, for example, see Mr John Smith has 3 Income Sources.
We want it to show something like "2 of 3 Validated" beside each row
My problem is writing the query that figures this out!
What I have been trying to do is this, using a combination of PHP and MySQL to bridge the gap where SQL (or my knowledge) falls short:
First, select a COUNT of the number of incomes associated with each reference:
SELECT `reference_id`, COUNT(status) AS status_count
FROM (`income`)
WHERE `income`.`status` = 0
GROUP BY `reference_id`
Next, having used PHP to generate a WHERE IN clause, proceed to COUNT the number of confirmed references from these:
SELECT `reference_id`, COUNT(status) AS status_count
FROM (`income`)
WHERE `reference_id` IN ('8469', '78969', '126613', ..... etc
AND status = 1
GROUP BY `reference_id`
However this doesn't work. It returns 0 rows.
Any way to achieve what I'm after?
Thanks!
In MySQL, you can SUM() on a boolean expression to get a count of the rows where that expression is true. You can do this because MySQL treats true as the integer 1 and false as the integer 0.
SELECT `reference_id`,
SUM(`status` = 1) AS `validated_count`,
COUNT(*) AS `total_count`
FROM `income`
GROUP BY `reference_id`

Grouping timestamps in MySQL with PHP

I want to log certain activities in MySql with a timecode using time(). Now I'm accumulating thousands of records, I want to output the data by sets of hours/days/months etc.
What would be the suggested method for grouping time codes in MySQL?
Example data:
1248651289
1248651299
1248651386
1248651588
1248651647
1248651700
1248651707
1248651737
1248651808
1248652269
Example code:
$sql = "SELECT COUNT(timecode) FROM timecodeTable";
//GROUP BY round(timecode/3600, 1) //group by hour??
Edit:
There's two groupings that can be made so I should make that clearer: The 24 hours in the day can be grouped but I'm more interested in grouping over time so returning 365 results for each year the tracking is in place, so total's for each day passed, then being able to select a range of dates and see more details on hours/minutes accessed over those times selected.
This is why I've titled it as using PHP, as I'd expect this might be easier with a PHP loop to generate the hours/days etc?
Peter
SELECT COUNT(*), HOUR(timecode)
FROM timecodeTable
GROUP BY HOUR(timecode);
Your result set, given the above data, would look as such:
+----------+----------------+
| COUNT(*) | HOUR(timecode) |
+----------+----------------+
| 10 | 18 |
+----------+----------------+
Many more related functions can be found here.
Edit
After doing some tests of my own based on the output of your comment I determined that your database is in a state of epic fail. :) You're using INT's as TIMESTAMPs. This is never a good idea. There's no justifiable reason to use an INT in place of TIMESTAMP/DATETIME.
That said, you'd have to modify my above example as follows:
SELECT COUNT(*), HOUR(FROM_UNIXTIME(timecode))
FROM timecodeTable
GROUP BY HOUR(FROM_UNIXTIME(timecode));
Edit 2
You can use additional GROUP BY clauses to achieve this:
SELECT
COUNT(*),
YEAR(timecode),
DAYOFYEAR(timecode),
HOUR(timecode)
FROM timecodeTable
GROUP BY YEAR(timecode), DAYOFYEAR(timecode), HOUR(timecode);
Note, I omitted the FROM_UNIXTIME() for brevity.