How can I return a row for each date, even when there is no data for that date (in which case the row should be filled with zero's)? - mysql

I hope I will be able to make my problem clear.
Ik have a table called tweets from which I want to extract information for each data in the daterange table. This table holds 142 dates, of which 102 dates have the property trading (day on which market was open) set to 1 (trading=1).
The below query extracts information from the tweets table for 20 companies (identified by sp100_id). The expected resultset therefore contains 20 x 102 = 2,040 rows. However, I only get returned 1,987 rows because for some date-company combinations, the tweets table holds no data. I need these "empty days" to be included in the resultset however. I thought I could accomplish this by using COALESCE(X, 0), returning 0 if there would be no data, but the result is the same: 1,987 rows.
Based on this information and the query below, does anybody know how I can get it to return 102 rows (1 row for each daterange._date with trading=1) for each sp100_id in the tweets table?
SELECT
sp100.sp100_id,
daterange._date,
COALESCE(SUM(IF(tweets.classify1=2, tweets.`retweet_count`, 0)),0) AS `pos-retweet`,
COALESCE(SUM(IF(tweets.classify1=2, tweets.`user-quality`, 0)),0) AS `pos-quality`,
COALESCE(SUM(IF(tweets.classify1=2, tweets.`follow`, 0)),0) AS `pos-follow`,
COALESCE(SUM(IF(tweets.classify1=3, tweets.`retweet_count`, 0)),0) AS `neg-retweet`,
COALESCE(SUM(IF(tweets.classify1=3, tweets.`user-quality`, 0)),0) AS `neg-quality`,
COALESCE(SUM(IF(tweets.classify1=3, tweets.`follow`, 0)),0) AS `neg-follow`
FROM
sp100
CROSS JOIN
daterange
LEFT JOIN
tweets
ON tweets.nyse_date = daterange._date
AND tweets.sp100_id = sp100.sp100_id
WHERE sp100.sp100_id BETWEEN 1 AND 20 AND tweets.type != 1 AND daterange.trading = 1
GROUP BY
sp100.sp100_id, daterange._date
In any other case, I would provide you with a SQLFiddle, but it would be a lot of work to export a proper portion of the tables used to SQLFiddle while the solution might be clear to some real SQL guru anyway :-)

The problem comes from requiring that tweets.type != 1 in your WHERE clause.
For the dates that have no associated tweets, the outer join will result in all tweets columns, including tweets.type, being NULL. As documented under Working with NULL Values:
Because the result of any arithmetic comparison with NULL is also NULL, you cannot obtain any meaningful results from such comparisons.
In MySQL, 0 or NULL means false and anything else means true. The default truth value from a boolean operation is 1.
Therefore such records are filtered by your WHERE clause.
As #Martin Smith commented, you can move this filter criterion into the ON clause of your outer join (so that the test is performed only against actual tweets records rather than simulated NULL ones).
Alternatively, you could rewrite the filter to handle NULL. For example, using the NULL-safe equality operator:
NOT tweets.type <=> 1
As an aside, I usually don't bother with a daterange table and instead omit dates for which there is no data from the resultset: instead, I handle missing dates within my application code.

You need a calendar table filled with each day. I know it might sound silly, but this solution solves yo a lot of problems. The same solution you can have also with integers ( integer tables)

Related

MYSQL Alternative to UNION for same table reusing same columns selected as new name

I'm trying to generate a result set from a table with effectively a unique/primary key as billyear, billmonth and type along with cost and consumption. So there could be 3 bill year and bill month identical entries but the type could be one of three values: E, W or NG.
I need to create a result set that has just one row per billyear and billmonth entry.
(
select month as billmonth, year as billyear, cost_estimate as eleccost, consumption_estimate as eleccons from tblbillforecast where buildingid=19 and type='E'
)
UNION (
select month as billmonth, year as billyear, cost_estimate as gascost, consumption_estimate as gascons from tblbillforecast where buildingid=19 and type='NG'
)
UNION (
select month as billmonth, year as billyear, cost_estimate as watercost, consumption_estimate as watercons from tblbillforecast where buildingid=19 and type='W'
)
This generates a result set with only billmonth, billyear, eleccost and eleccons columns. I've tried all kinds of solutions but the above example is the simplest to show where it's going wrong.
Additionally it still has 3 rows per billmonth/billyear unique combination instead of merging to one.
UPDATE:
Sample data
SELECT month AS billmonth,
year AS billyear,
SUM(CASE type WHEN 'E' THEN cost_estimate END) AS eleccost,
SUM(CASE type WHEN 'NG' THEN cost_estimate END) AS gascost,
SUM(CASE type WHEN 'W' THEN cost_estimate END) AS watercost
FROM tblbillforecast
WHERE buildingid=19
GROUP BY billmonth, billyear;
Result:
Expected result, eg:
year | month | eleccost | gascost | watercost
2018 | 1 | 32800 | 4460 | 4750
This is behaving correctly. An SQL query result set has one name per column, and this name applies to all the rows. So if you try to rename the column in the second or subsequent queries of the UNION, those new names are ignored. The name of the column is determined only by the first query of the UNION.
Additionally it still has 3 rows per billmonth/billyear unique combination instead of merging to one.
That's also correct behavior, according to the query you tried. UNION does not merge multiple rows into one, it only appends sets of rows.
As Akina hinted in the comments above, you may use multiple columns:
SELECT month AS billmonth,
year AS billyear,
SUM(CASE type WHEN 'E' THEN cost_estimate END) AS eleccost,
SUM(CASE type WHEN 'NG' THEN cost_estimate END) AS gascost,
SUM(CASE type WHEN 'W' THEN cost_estimate END) AS watercost
FROM tblbillforecast
WHERE buildingid=19
GROUP BY billmonth, billyear;
This uses GROUP BY to "merge" rows together, so you get one row in the result per month/year.
A quick bit of guidance on various data shaping operations in SQL:
JOIN - makes resultsets wider (more columns) by bringing together tables/resultsets in a side-by-side fashion generating output rows that have all the columns of the two input column sets
SELECT - typically makes resultsets narrower by allowing you to specify which columns you're interested in and which you are not; by not mentioning an available column it disappears meaning you output fewer columns
UNION - makes resultsets taller (more rows) by bringing together resultsets and outputting one on top of the other. Because columns always have a fixed data type and one name, you must have the same number of and type of, and order of columns
WHERE - makes resultsets shorter (fewer rows) by allowing you to specify truth based filters that exclude rows
It's not hard and fast; you can use select to create more columns too, but just in a very rudimentary sense these concepts hold true - JOIN to widen, UNION for taller, SELECT for narrower and WHERE for shorter. All the work you do with SQL is a data shaping exercise; you're either paring a rectangular block of data down or extending it, and in either a vertical or horizontal direction (or a mix).
I'm not going to get into grouping because that mixes rows up, and isn't something you tried in the question.. The reason for me writing this out was purely because you'd attempted to use a UNION (height-increasing) operation when you actually wanted a widen which, regardless of how it is done (JOIN or as per Bill's answer a SELECT+GROUP, which is valid, but relies on the "mixes rows up" aspect of grouping), specifically isn't done with a UNION. Union only makes stuff taller.
To give an example of how it might be done in an alternative way to Bill's approach, this task of yours has one huge table that is "too tall" - it uses 3 rows where 1 would do, if only it were a bit wider. That is to say if only there were 3 columns for electric/gas/water then we wouldn't need 3 rows with 1 utility in each.
Of course, we have this "one utility per row" because it is very flexible. Database tables don't have varying numbers of columns but they DO have varying numbers of rows. If a new bill type came along tomorrow - internet - no table changes are needed to accommodate it; add a new type I, and away you go, adding another row. We now store 4 rows of 1 utility where 1 row with 4 columns would do, but crucially we didn't have to change the table structure. We could have infinite different kinds of bills, and not need infinite columns because we can already have infinite rows
So you want to reshape your data from 4-rows-by-1-column to 1-row-by-4-columns. It could be solved as :
narrow the table to just year,month,building,type,cost AND shorten it to just electricity
separately narrow the table to just year,month,building,type,cost AND shorten it to just gas
separately narrow the table to just year,month,building,type,cost AND shorten it to just water
join (widening) all these newly created result sets , then narrow to remove the repeated year,month,building,type columns
That would look like:
SELECT e.year, e.month, e.building, e.cost, g.cost, w.cost
FROM
(SELECT year,month,building,cost FROM t WHERE type = 'E') e
JOIN
(SELECT year,month,building,cost FROM t WHERE type = 'NG') g
ON
e.year = g.year AND e.month = g.month AND e.building = g.building
JOIN
(SELECT year,month,building,cost FROM t WHERE type = 'W') w
ON
e.year = w.year AND e.month = w.month AND e.building = w.building
WHERE
e.building = 19
You can see clearly the 3 narrowing-and-shortening operations that pick out "just the gas", "just the electric", and "just the water" - they're the (SELECT year,month,building,cost FROM t WHERE type = 'NG') and that's what reduces the height of the original table, making it three times shorter than it was in each case. If we had 999 rows X 5 cols in the big table it goes to 3 sets of 333 x 5 rows each
You can see that we then JOIN these together to widen the results - our e.g 3 sets of 333 x 5 rows each widens to 333 x 15 when JOINed..
Then went from 333x15 down to 333 X 7 when SELECTed to ditch the repeated columns
It's likely not perfect (I'd perhaps left join all 3 onto a 4th set of numbers that are just the common columns in case some utilities aren't present for a particular month), and perhaps some people will come along complaining that it's less performant because it hits the table 3 times.. All that is accessory to the point I'm making about SQL being an exercise in reshaping data - tables are the starting blocks of data and you cut them up narrower and shorter, then stick them together side by side, or on top of each other and that becomes your new data block that's maybe wider, higher, both.. In any case it's definitely a different shape to what you started with. And then you can cut and shape again, and again..
Go with Bill's conditional agg (though this way would be fine if there is one row per building/year/month) but take away a stronger notion about in what direction these common operations (SELECT/JOIN/WHERE/UNION) reshape your data
Footnote about Bill's conditional aggregation (I know I said I wouldn't talk about it but it might make more sense to now). If you have:
Type, Cost
E, 123
NG, 456
W, 789
And you do a
SELECT
CASE WHEN Type = 'E' THEN Cost END as CostE,
CASE WHEN Type = 'NG' THEN Cost END as CostG,
CASE WHEN Type = 'W' THEN Cost END as CostW
...
It spreads the data out over more columns - the data has "gone from vertical to diagonal"
CostE, CostNG, CostW
123, NULL, NULL
NULL, 456, NULL
NULL, NULL, 789
But it's still too tall. If you then run a GROUP BY, which mixes rows up and ask for e.g. just the MAX from each column, then all the NULLs will disappear (because there is a non null somewhere in the column, and NULL is lost if there is a non null, no matter what you're doing) and the rows collapse, mixing together, into one:
CostE, CostNG, CostW
123, 456, 789
The data has pivoted round from being vertical, to being horizontal - another data shaping. It was pulled wider, and squashed flatter

How to select two MySQL rows and then compare a column and return an output

I've a table with a structure something like this,
Device | paid | time
abc 1 2 days ago
abc 0 1 day ago
abc 0 5 mins ago
Is it possible to write a query that checks the paid column on all the rows where Device = abc and then outputs the most recent two rows that different. Basically, something like an if statement saying if row 1 = 1 and row 2 = 0 output that but only if it's the most recent two columns that are different. For example, in this case, the first and second row. The table is being updated whenever a user changes from a free to paid account etc. It is also updated in different columns for different reasons hence the duplicate 0s for example.
I know this would probably be done better by having another table altogether and updating that every time the user switches account type, but is there any way to make this work?
Thanks
Example:
http://rextester.com/MABU7860 need further testing on edge cases but this seems to work.
SELECT A.*, B.*
FROM SQLfoo A
INNER JOIN SQLFoo B
on A.Device = B.Device
and A.mTime < B.mTime
WHERE A.Paid <> B.Paid
and A.device = 'abc'
ORDER BY B.mTime Desc, A.MTime Desc
LIMIT 1
By performing a self join we on the devices where the time from one table is less than the time from the next table (thus the two records will never matach and we only get the reuslts one way) and we order by those times descending, the highest times appear first in the result since we limit by a single device we don't need to concern ourselves with the devices. We then just need compare the paid from one source to the paid in the 2nd source and return the first result encountered thus limit 1.
Or using user variables
http://rextester.com/TWVEVX7830
in other engines one might accomplish this task by performing the join as in above, assigning a row number partitioned by the device and then simply return all those row_numbers with a value of 1; which would be the earliest date discrepency.
Use LIMIT to limit the number of record on mysql:
http://www.mysqltutorial.org/mysql-limit.aspx
In your case, use LIMIT 2
and then put the 2 record that you just select into an array, then compare the array if the value is different. If they are different then print

Print all rows from a nested Left Join + Union MySQL query

I have a query that calculates information for a revolving monthly retainer. The Project has a certain number of hours assigned to it each monthly period, with periods starting at different times of the month (e.g., February 5th to March 4th). The columns of the query result include:
Project Name
Total Hours Logged
Monthly Hours Remaining
Last Day of Period
Days Remaining
For example, Project A has 15 hours logged to it, with 5 hours remaining in the monthly period. The last day of the period is November 17th, with 3 days remaining from today.
The current Query takes 4 tables that are joined using a left join to print all of the Clients even if there are no hours logged, then it uses a nested subquery (lines 8-86) to calculate columns 2, 3, 4, and 5. However, column 4 and 5 do not print, they just show as always NULL. (Those are Last Day of Period and Days Remaining).
You can see the schema + query code at the SQL fiddle link here: http://sqlfiddle.com/#!2/fc830/12
How can I get column 4 and 5 to print the data and not be null when there are no Hours Logged? I think I may need an additional left join but I am not able to get a solution. If you have any suggestions it would be appreciated. Thanks!
When there has been no hours logged, the left join to the nested query returns nulls for the columns you're having trouble with.
The answer is to provide values for them when they are null and us ifnull() to use those value when the left join returns nulls:
select
...
ifnull(<left joined value>, <value when there's no join>),
...
See the working solution in sqlfiddle.
Incidentally, some values you are returning from the inner query, particularly the "last day" value can be derived directly from client (as seen in my solution). It's best to keep your queries as simple as possible - don't get data in a complicated way when there's an simpler or more direct way.
I think you are not thorough with the idea of LEFT JOIN. LEFT JOIN results in all rows from the outer table. Its otherwise called LEFT Outer JOIN (contrary to RIGHT OUTER JOIN). In your case the inner query results in only record with client id = 4. See the last clause X on X.id=client.id. Now what do you expect the results to be?
OUTER TABLE (client table)
id = 1, 2, 3, 4
INNER TABLE
client id = 4
ON X.id=client.id
An INNER JOIN would result in just one record since there is only matching record - for id = 4.
But a LEFT JOIN would result in all 4 records from outer table but the values for invalid fields will be NULL. Here except for client id 4, there is no valid records from inner table, hence they will be null.
To get more clarity you will have to see the id field along the records. Try this fiddle
You can see the other answer as to how to fill those NULL fields..

Formatting a MySQL Query result

I've currently got a table as follows,
Column Type
time datetime
ticket int(20)
agentid int(20)
ExitStatus varchar(50)
Queue varchar(50)
I want to write a query which will break this down by week, providing a column with a count for each ExitStatus. So far I have this,
SELECT ExitStatus,COUNT(ExitStatus) AS ExitStatusCount, DAY(time) AS TimePeriod
FROM `table`
GROUP BY TimePeriod, ExitStatus
Output:
ExitStatus ExitStatusCount TimePeriod
NoAgentID 1 4
Success 3 4
NoAgentID 1 5
Success 5 5
I want to change this so it returns results in this format:
week | COUNT(NoAgentID) | COUNT(Success) |
Ideally, I'd like the columns to be dynamic as other ExitStatus values may be possible.
This information will be formatted and presented to end user in a table on a page. Can this be done in SQL or should I reformat it in PHP?
There is no "general" solution to your problem (called cross tabulation) that can be achieved with a single query. There are four possible solutions:
Hardcode all possible ExitStatus'es in your query and keep it updated as you see the need for more and more of them. For example:
SELECT
Day(Time) AS TimePeriod,
SUM(IF(ExitStatus = 'NoAgentID', 1, 0)) AS NoAgentID,
SUM(IF(ExitStatus = 'Success', 1, 0)) AS Success
-- #TODO: Add others here when/if needed
FROM table
WHERE ...
GROUP BY TimePeriod
Do a first query to get all possible ExitStatus'es and then create your final query from your high-level programming language based on those results.
Use a special module for cross tabulation on your high-level programming language. For Perl, you have the SQLCrossTab module but I couldn't find one for PHP
Add another layer to your application by using OLAP (multi-dimensional views of your data) like Pentaho and then querying that layer instead of your original data
You can read a lot more about these solutions and an overall discussion of the subject
This is one way; you can use SUM() to count the number of items a particular condition is true. At the end you just group by the time as per normal.
SELECT DAY(time) AS TimePeriod,
SUM('NoAgentID' = exitStatus) AS NoAgentID,
SUM('Success' = exitStatus) AS Success, ...
FROM `table`
GROUP BY TimePeriod
Output:
4 1 3
5 1 5
The columns here are not dynamic though, which means you have to add conditions as you go along.
SELECT week(time) AS week,
SUM(ExitStatus = 'NoAgentID') AS 'COUNT(NoAgentID)',
SUM(ExitStatus = 'Success') AS 'COUNT(Success)'
FROM `table`
GROUP BY week
I'm making some guesses about how ExitStatus column works. Also, there are many ways of interpretting "week", such as week of year, of month, or quarter, ... You will need to put the appropriate function there.

Complex MySQL COUNT query

Evening folks,
I have a complex MySQL COUNT query I am trying to perform and am looking for the best way to do it.
In our system, we have References. Each Reference can have many (or no) Income Sources, each of which can be validated or not (status). We have a Reference table and an Income table - each row in the Income table points back to Reference with reference_id
On our 'Awaiting' page (the screen that shows each Income that is yet to be validated), we show it grouped by Reference. So you may, for example, see Mr John Smith has 3 Income Sources.
We want it to show something like "2 of 3 Validated" beside each row
My problem is writing the query that figures this out!
What I have been trying to do is this, using a combination of PHP and MySQL to bridge the gap where SQL (or my knowledge) falls short:
First, select a COUNT of the number of incomes associated with each reference:
SELECT `reference_id`, COUNT(status) AS status_count
FROM (`income`)
WHERE `income`.`status` = 0
GROUP BY `reference_id`
Next, having used PHP to generate a WHERE IN clause, proceed to COUNT the number of confirmed references from these:
SELECT `reference_id`, COUNT(status) AS status_count
FROM (`income`)
WHERE `reference_id` IN ('8469', '78969', '126613', ..... etc
AND status = 1
GROUP BY `reference_id`
However this doesn't work. It returns 0 rows.
Any way to achieve what I'm after?
Thanks!
In MySQL, you can SUM() on a boolean expression to get a count of the rows where that expression is true. You can do this because MySQL treats true as the integer 1 and false as the integer 0.
SELECT `reference_id`,
SUM(`status` = 1) AS `validated_count`,
COUNT(*) AS `total_count`
FROM `income`
GROUP BY `reference_id`