Ignore Group if LIMIT is not reached in MySQL - mysql

I am working on a rather tricky SQL for my level of knowledge. I have searched and searched for an answer but haven't came across anything. Hopefully someone can shed some light on this.
How can you stop SQL from outputting group of rows if the limit set is not reached?
For example -
Data
Fruits Ordered Date
Orange 4 2015-05-01
Orange 2 2015-05-01
Orange 20 2015-05-01
Apple 30 2015-05-02
Apple 40 2015-05-02
Apple 24 2015-05-02
Apple 19 2015-05-02
Apple 22 2015-05-02
From the data I would like to select and group by Date, but only have a LIMIT of 5.
If there isn't five rows in that group, I want SQL to ignore that group.
So If I did a SUM of all ordered values for each Date Group and SQL ignored the group that didn't consist of 5 values the desired results would look like the following
Desired Result
Fruits SUM(Ordered) Date
Apple 117 2015-05-02
Hope this makes sense, please ask any questions if required!

You can use the having clause to filter out the groups you don't need, keeping only the groups where there are more than 4 dates:
SELECT Fruits, SUM(Ordered), Date
FROM table
GROUP BY Date
HAVING COUNT(Date) > 4

select Fruits,sum(Ordered),Date from Table
group by Fruits, Date
where Fruits in (select Fruits from Table
group by Fruits having count(*) >= 5)

I think you want something like this:
SELECT
Fruits, SUM(Ordered), Date
FROM (
SELECT
*,
CASE WHEN (SELECT COUNT(*) FROM t ti WHERE ti.Fruits = t.Fruits) < 5 THEN Ordered END As gID
FROM t) dt
GROUP BY
Fruits, gID
Actually you need to use your PK column instead of Ordered in the CASE like this:
CASE WHEN (SELECT COUNT(*) FROM t ti WHERE ti.Fruits = t.Fruits) < 5 THEN `PK` END As gID

Related

Group by with sum doesn't return correct result

Say a table has this schema :
grp | number
1 | 10
1 | 10
1 | 10
2 | 30
2 | 30
3 | 20
Note that each unique grp has a unique number even if there are more than 1 grp. I'm looking to sum all numbers for each unique grp.
So I want to group my table by grp to have this :
grp | number
1 | 10
2 | 30
3 | 20
And then get the sum which is now 60, but without grouping it gets me 110 as it calculates the sum of everything without grouping. All in one query, with no sub-queries if possible.
I've tried doing the following :
SELECT sum(number) as f
FROM ...
WHERE ...
GROUP BY grp
But this doesn't work, it returns multiple results and not the single result of the sum. What am I doing wrong?
You can use subquery to select unique records & do the sum:
select sum(number)
from (select distinct grp, number
from table t
) t;
If you group by the group, then you'll get one result for each group. And it won't take into account the fact that you only want to use the value from each group once.
To get your desired result, taking one row from each group, you first need to make a subquery selecting DISTINCT group/number combinations from the table, and then SUM that.
SELECT
sum(`number`) as f
FROM
(SELECT DISTINCT `grp`, `number` FROM table1) g
This will output 60.
Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=8a3b346041731a4b4c85f4e151c10f70

Use most recent creation date in group by statement

I have an invoicing system and am trying to generate reports on hours spent. I'm saving every instance of a change to the order, so there are multiple entries for almost every item on every invoice. Due to this, I'm filtering out the old changes and am trying to only use the most recent.
Each instance sharing a project_id, phase_id, and the same weekstart are the same item on the invoice. I want to generate a report and only grab the most recent versions of those items.
Example table:
id project_id phase_id weekstart created
---------------------------------------------------------------
1 6 apple 2017-04-20 2017-04-23
2 6 apple 2017-04-20 2017-04-24
3 8 banana 2017-04-20 2017-04-23
4 9 pear 2017-04-20 2017-04-23
5 9 pear 2017-04-20 2017-04-25
I want to be able to run a query to get:
id project_id phase_id weekstart created
---------------------------------------------------------------
2 6 apple 2017-04-20 2017-04-24
3 8 banana 2017-04-20 2017-04-23
5 9 pear 2017-04-20 2017-04-25
Currently I'm using something like:
SELECT * from invoiceitems where employee_id = 10
group by project_id, phase_id, weekstart
But this doesn't account for the creation date.
Ordering the results doesn't have any affect on the group by statement. I've checked for similar posts, but only two I found are looking to order by the highest creation date altogether or aren't grouping the results by multiple columns.
Join to a subquery which finds the latest creation times for each project. Note that we use GROUP BY here, but only in the subquery, to aggregate over projects.
SELECT t1.*
FROM invoiceitems t1
INNER JOIN
(
SELECT project_id, phase_id, MAX(created) AS max_created
FROM invoiceitems
GROUP BY project_id
) t2
ON t1.project_id = t2.project_id AND
t1.phase_id = t2.phase_id
t1.created = t2.max_created
Tested and works perfectly
SELECT MAX(`id`) as `id`, `project_id`, `phase_id`,
`weekstart`, MAX(`created`) as `created`
FROM `invoiceitems`
GROUP BY `project_id`
ORDER BY `project_id` ASC

Get highest value for each date

I have a table that logs every time a user completes a survey. It looks a bit like this:
surveyID author timestamp
-----------------------------------------------
1 person1 1461840669000
2 person2 1461840670000
3 person1 1461840680000
I'm trying to run a query that shows me the top surveyor every day (i.e. the person that does the highest number of surveys per day) since April 1st.
So far I've tried this:
SELECT author,
COUNT (DISTINCT surveyid) AS num_surveys,
STRFTIME_UTC_USEC(creation_time*1000, "%Y-%m-%d") AS date,
FROM myTable
WHERE creation_time > 1459468800000 //since April 1st
GROUP BY date, author
ORDER BY 3 DESC,2 DESC;
Which gives me this result:
author num_surveys date
------------------------------------
user1 116 2016-04-27
user2 109 2016-04-27
user3 99 2016-04-27
user3 102 2016-04-28
user1 98 2016-04-28
user2 97 2016-04-28
However, I would really just like the top record from each day:
author num_surveys date
------------------------------------
user1 116 2016-04-27
user3 102 2016-04-28 etc...
I've tried MAX() and TOP() in various places but none of them have worked so far hence the above example of my query that gets me closest to what I want... Any suggestions would be much appreciated. I'm very new to SQL!
EDIT
Thanks for the suggestions to far. Have managed to get it to work with:
DEFINE INLINE TABLE A
SELECT author,
COUNT (DISTINCT featureid) AS num_surveys,
STRFTIME_UTC_USEC(creation_time*1000, "%Y-%m-%d") AS date,
FROM placesense.surveys
WHERE creation_time > 1459468800000
GROUP BY date, author
ORDER BY 3 DESC,2 DESC;
SELECT
MAX(num_surveys),
date
FROM A AS B
WHERE date = B.date
GROUP BY date
Any other more efficient suggestions welcome though.
A pretty simple way uses a correlated subquery:
select t.*
from t
where t.num_surveys = (select max(t2.num_surveys) from t t2 where t2.date = t.date);
Note: this will return duplicates for a date in the case of ties.
SELECT MAX( surveyid) AS m_surveys,
STRFTIME_UTC_USEC(creation_time*1000, "%Y-%m-%d") AS date,
FROM myTable
WHERE creation_time > 1459468800000 //since April 1st
GROUP BY date, author
ORDER BY 3 DESC,2 DESC;

MySQL Group by week num w/ multiple date column

I have a table with columns similar to below , but with about 30 date columns and 500+ records
id | forcast_date | actual_date
1 10/01/2013 12/01/2013
2 03/01/2013 06/01/2013
3 05/01/2013 05/01/2013
4 10/01/2013 09/01/2013
and what I need to do is get a query with output similar to
week_no | count_forcast | count_actual
1 4 6
2 5 7
3 2 1
etc
My query is
SELECT weekofyear(forcast_date) as week_num,
COUNT(forcast_date) AS count_forcast ,
COUNT(actual_date) AS count_actual
FROM
table
GROUP BY
week_num
but what I am getting is the forcast_date counts repeated in each column, i.e.
week_no | count_forcast | count_actual
1 4 4
2 5 5
3 2 2
Can any one please tell me the best way to formulate the query to get what I need??
Thanks
try:
SELECT weekofyear(forcast_date) AS week_forcast,
COUNT(forcast_date) AS count_forcast, t2.count_actual
FROM
t t1 LEFT JOIN (
SELECT weekofyear(actual_date) AS week_actual,
COUNT(forcast_date) AS count_actual
FROM t
GROUP BY weekOfYear(actual_date)
) AS t2 ON weekofyear(forcast_date)=week_actual
GROUP BY
weekofyear(forcast_date), t2.count_actual
sqlFiddle
You have to write about 30 (your date columns) left join, and the requirement is that your first date column shouldn'd have empty week (with a count of 0) or the joins will miss.
Try:
SELECT WeekInYear, ForecastCount, ActualCount
FROM ( SELECT A.WeekInYear, A.ForecastCount, B.ActualCount FROM (
SELECT weekofyear(forecast_date) as WeekInYear,
COUNT(forecast_date) as ForecastCount, 0 as ActualCount
FROM TableWeeks
GROUP BY weekofyear(forecast_date)
) A
INNER JOIN
( SELECT * FROM
(
SELECT weekofyear(forecast_date) as WeekInYear,
0 as ForecastCount, COUNT(actual_date) as ActualCount
FROM TableWeeks
GROUP BY weekofyear(actual_date)
) ActualTable ) B
ON A.WeekInYear = B.WeekInYear)
AllTable
GROUP BY WeekInYear;
Here's my Fiddle Demo
Just in case someone else comes along with the same question:
Instead of trying to use some amazing query, I ended up creating an array of date_columns_names and a loop in the program that was calling this query, and for each date_column_name, performing teh asme query. It is a bit slower, but it does work

MySQL - Count Yearly Totals when some Years have nulls

I have 1 table with similar data:
CustomerID | ProjectID | DateListed | DateCompleted
123456 | 045 | 07-29-2010 | 04-03-2011
123456 | 123 | 10-12-2011 | 11-30-2011
123456 | 157 | 12-12-2011 | 02-10-2012
123456 | 258 | 06-07-2011 | NULL
Basically, a customer contacts us, we get a project on our list, and we mark it completed when we're done with it.
What I'm after is a simple (you'd think, at least) count of all projects, with expected output like below:
YEAR | TotalListed | TotalCompleted
2010 | 1 | 0
2011 | 3 | 2
2012 | 0 | 1
However, my query below - because of the join - isn't showing 2012's count, because there's been no listed project for 2012. However, I can't really reverse the query, as then 2010's count wouldn't show up (since nothing was completed in 2010).
I'm open to any suggestions, or tips like how to do this. I've pondered a temp table, is that the best way to go? I'm open to anything that gets me what I need!
(If the code looks familiar, ya'll helped me get the subquery made! MySQL Subquery with main query data variable)
SELECT YEAR(p1.DateListed) AS YearListed, COUNT(p1.ProjectID) As Listed, PreQuery.Completed
FROM(
SELECT YEAR(DateCompleted) AS YearCompleted, COUNT(ProjectID) AS Completed
FROM projects
WHERE CustomerID = 123456 AND DateListed >= DATE_SUB(Now(), INTERVAL 5 YEAR)
GROUP BY YEAR(DateCompleted)
) PreQuery
RIGHT OUTER JOIN projects p1 ON PreQuery.YearCompleted = YEAR(p1.DateListed)
WHERE CustomerID = 123456 AND DateListed >= DATE_SUB(Now(), INTERVAL 5 YEAR)
GROUP BY YearListed
ORDER BY p1.DateListed
After reviewing your table, query, and expected results - I believe I have found a more-revised query to suit your needs. It is a fairly-full rewrite of your existing query though, but I've tested it with your given data and received the same results you want/expect:
SELECT
years.`year`,
SUM(IF(YEAR(DateListed) = years.`year`, 1, 0)) AS TotalListed,
SUM(IF(YEAR(DateCompleted) = years.`year`, 1, 0)) AS TotalCompleted
FROM
projects
LEFT JOIN (
SELECT DISTINCT `year` FROM (
SELECT YEAR(DateListed) AS `year` FROM projects
UNION SELECT YEAR(DateCompleted) AS `year` FROM projects WHERE DateCompleted IS NOT NULL
) as year_inner
) AS years
ON YEAR(DateListed) = `year`
OR YEAR(DateCompleted) = `year`
WHERE
CustomerID = 123456 AND DateListed >= DATE_SUB(Now(), INTERVAL 5 YEAR)
GROUP BY
years.`year`
ORDER BY
years.`year`
To explain, we should start with the inner query (aliased as year_inner). It selects a full list of years in the DateListed and DateCompleted columns and then selects a DISTINCT list of those to create the years alias sub-query. This sub-query is used to get a full list of "years" that we want data for. Doing it this way, opposed to a sub-query with counts and groupings will allow you to only have to define the WHERE clause on the outermost query (though, if efficiency becomes an issue with thousands and thousands of records, you could always add a WHERE clause to the inner query too; or an index to the date columns).
After we've built our inner queries, we join the projects table on the results with a LEFT JOIN for the DateListed or DateCompleted's YEAR() value - which will allow us to bring back null columns too!
For the field selections, we use the year column from our inner query to assure that we get a full list of years to display. Then, we compare the current row's DateListed & DateCompleted YEAR() value to the current year; if they're equal, add 1 - else add 0. When we GROUP BY year, our SUM() will count all of the 1's for that year for each column and give you the output you want (hopefully, of course =P).