SQL WHERE IF clause issue - mysql

I have a SQL/Java code issue. The basic overlay is as follows: a MySQL database with a table. In this table there are multiple columns. One column consists of names. An associated column is months. In the third column there is counts. So a sample table would be
john - january - 5
john - january - 6
mary - january - 5
Alex - February- 5
John - February - 6
John - February - 4
Mary - February - 3
John - march - 4
The table continues to month May.
So John appears in five months, Mary in 3, and Alex in one. Currently, my SQL query somewhat looks like this.
select name, sum(count)/4
from table where (category ='something'
AND month not like 'May') group by name;
Basically, what this query is supposed to do is just display each name with the average counts per month. Hence, the sum will be divided by four (because I exclude May, so it must divide Jan-April/4). However, the issue is that some names only appear in one month (or two or three).
This means for that name, the sum of the counts would only be divided by that specific number, to get the average counts over the months. How would I go about this? I feel as if this will require some if statement in a where clause. Kind of like where if the count of the distinct (because months may repeat) is a certain number, then divide the sum(count) by that number for each name?
Also, I think it may not be a where if clause issue. I've read some forums where possibly some use of case could be utilized?

If you need average per month, you can GROUP BY name and month and use AVG function:
SELECT `name`, `month`, avg(`count`)
FROM table
WHERE `category` ='something' AND `month` NOT LIKE 'May'
GROUP BY `name`, `month`;
If you need average for all period, just GROUP BY name and AVG count:
SELECT `name`, avg(`count`)
FROM table
WHERE `category` ='something' AND `month` NOT LIKE 'May'
GROUP BY `name`;
And another option, if you don't like AVG:
SELECT `name`, sum(`count`)/(SELECT count(*) FROM `table` AS `t2` WHERE `category` ='something' AND `month` NOT LIKE 'May' and `t1`.`name` = `t2`.`name`)
FROM `table` AS `t1`
WHERE `category` ='something' AND `month` NOT LIKE 'May')
GROUP BY name;
But I would stay with AVG.
Actually, i prefer to use != instead of NOT LIKE it's improves readability

Just for completness sake here is a WORKING FIDDLE. using the AVG function is the way to go as it will do the average per person per month. look at John in January.. his result is 5.5 when the count (in january) is 5 and 6.. average = 5.5.
SELECT
person,
month,
avg(counter)
FROM testing
where
(
category ='something'
AND month <> 'May'
)
GROUP BY person, month;
If you want to see the data in one like as it sounds like that from your post then you can do this. ANOTHER FIDDLE
SELECT
person,
group_concat(month),
group_concat(average_count)
FROM(
SELECT
person,
month,
avg(counter) as average_count
FROM testing
where
(
category ='something'
AND month <> 'May'
)
GROUP BY person, month
) as t
group by person;

Try this :
SELECT name, SUM(count) / COUNT(DISTINCT month)
FROM table
WHERE month != 'May'
AND category = 'something'
GROUP BY name

Related

How to show months if it has no record and force it to zero if null on MySQL

i have an orders table, and i need to fetch the orders record by month. but i have terms if there is no data in a month it should still show the data but forcing to zero like this:
what i have done is using my query:
select sum(total) as total_orders, DATE_FORMAT(created_at, "%M") as date
from orders
where is_active = 1
AND tenant_id = 2
AND created_at like '%2021%'
group by DATE_FORMAT(created_at, "%m")
but the result is only fetched the existed data:
can anyone here help me to create the exactly query?
Thank you so much
Whenever you're trying to use a value that doesn't exist in the table, one option is to use a reference; whether it's from a table or a query-generated value.
I'm guessing that in terms of date data, the column created_at in table orders may have a complete list all the 12 months in a year regardless of which year.
Let's assume that the table data for orders spans from 2019 to present date. With that you can simply create a 12 months reference table for a LEFT JOIN operation. So:
SELECT MONTHNAME(created_at) mnt FROM orders GROUP BY MONTHNAME(created_at);
You can append that into your query like:
SELECT IFNULL(SUM(total),0) as total_orders, mnt
from (SELECT MONTHNAME(created_at) mnt FROM orders GROUP BY MONTHNAME(created_at)) mn
LEFT JOIN orders o
ON mn.mnt=MONTHNAME(created_at)
AND is_active = 1
AND tenant_id = 2
AND created_at like '%2021%'
GROUP BY mnt;
Apart from adding the 12 months sub-query and a LEFT JOIN, there are 3 other changes from your original query:
IFNULL() is added to the SUM() operation in SELECT to return 0 if the value is non-existent.
All the WHERE conditions has been switched to ON since remaining it as WHERE will make the LEFT JOIN becoming a normal JOIN.
GROUP BY is using the sub-query generated month (mnt) value instead.
Taking consideration of table orders might not have the full 12 months, you can generate it from query. There are a lot of ways of doing it but here I'm only going to show the UNION method that works with most MySQL version.
SELECT MONTHNAME(CONCAT_WS('-',YEAR(NOW()),mnt,'01')) dt
FROM
(SELECT 1 AS mnt UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION
SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION
SELECT 9 UNION SELECT 10 UNION SELECT 11 UNION SELECT 12) mn
If you're using MariaDB version that supports SEQUENCE ENGINE, the same query above is much shorter:
SELECT MONTHNAME(CONCAT_WS('-',YEAR(NOW()),mnt,'01'))
FROM (SELECT seq AS mnt FROM seq_1_to_12) mn
I'm using MariaDB 10.5 in this demo fiddle however it seems like the month name ordering is based on the name value rather than the month itself so it looks un-ordered. It's in the correct order if it's in MySQL 8.0 fiddle though.
Thanks all for the answers & comments i really appreciate it.
i solved it by create table helper for static months then use union and aliasing, since i need the months in indonesia, i create case-when function too.
so, the query is like this:
SELECT total_orders,
(CASE date WHEN 01 THEN 'Januari'
WHEN 02 THEN 'Februari'
WHEN 03 THEN 'Maret'
WHEN 04 THEN 'April'
WHEN 05 THEN 'Mei'
WHEN 06 THEN 'Juni'
WHEN 07 THEN 'Juli'
WHEN 08 THEN 'Agustus'
WHEN 09 THEN 'September'
WHEN 10 THEN 'Oktober'
WHEN 11 THEN 'November'
WHEN 12 THEN 'Desember'
ELSE date END ) AS date
FROM (SELECT SUM(total) AS total_orders,
DATE_FORMAT(created_at, "%m") AS date
FROM orders
WHERE is_active = 1
AND tenant_id = 2
AND created_at like '%2021%'
GROUP BY DATE_FORMAT(created_at, "%m")
UNION
SELECT 0 AS total_orders,
code AS date
FROM quantum_default_months ) as Q
GROUP BY date
I still don't know if this query is fully correct or not, but I get my exact result.
cmiiw.
thanks all

How can I randomly select n rows per group a different number of times for each group in MySQL?

I basically have two tables, both of which contain two columns - a list of people ID and their birth year, like this:
ID birth_year
1 1981
2 1982
3 1982
4 1983
etc
For each person in table 1 I need to select 6 people at random with the same birth year from table 2.
I think I can use ORDER BY RAND() and LIMIT 6 in this query. My issue is that in table 1 there are many with the same birth year, maybe 102 people born in 1987 and 88 people born in 1988. How can I write the query such that it selects 6 random people born in 1987 102 times, and 88 times for 1988?
I'm starting to learn and begin to understand this OVER/Partition concept some and think its what you are looking for.
My understanding of the "OVER" clause states... Of the records you are querying, I want you to break them into groups based on whatever the declared "PARTITION" component is defined. So, with your request, you want to group each by their respective year.
Now, the ORDER BY within the partition call is where you would apply your RAND(). So, within each PARTITION (the birth year), order those records by the RAND().
So now, we can grab the built-in function ROW_NUMBER() to get the records as they are returned and sorted in the group 1, 2, 3, etc...
By making this an inner pre-query, you can now filter down where the final row is <= 6, so you get 6 from each birth year group.
select
pq.*
from
( select
id,
birth_year,
ROW_NUMBER() OVER(PARTITION BY birth_year order by rand()) finalRow
from
YourTable
order by
birth_year ) pq
where
pq.finalRow <= 6

sql: group by multiple correlated fields (date, weekday, month)

I am working on a SQL task. The goal is to know how many flights there are on average, for a given day in a given month from the flights table.
Input table:
flights
id BIGINT
dep_day_of_week varchar (255)
dep_month varchar (255)
dep_date text
An example of the flights table. There could be multiple entries for the same date.
id dep_day_of_week dep_month dep_date
1 Thursday January 4/7/2005 15:24:00
2 Friday February 5/6/2005 12:12:12
3 Friday February 5/6/2005 15:12:12
I read a solution as following:
SELECT a.dep_month,
a.dep_day_of_week,
AVG(a.flight_count) AS average_flights
FROM (
SELECT dep_month, dep_day_of_week, dep_date,
COUNT(*) AS flight_count
FROM flights
GROUP BY 1,2,3
) a
GROUP BY 1,2
ORDER BY 1,2;
My question is in the subquery which calculate the number of flights per day:
SELECT dep_month, dep_day_of_week, dep_date, COUNT(*) AS flight_count
FROM flights
GROUP BY 1,2,3
Since dep_month, dep_day_of_week, dep_date are three correlated attributes, with the dep_date might be the most detailed resolution of the three. So I thought GROUP BY 1,2,3 will do the same function as GROUP BY 3.
To examine what could be the possible differences, I use count(*) from ... to select all the terms resulted from the above subquery,
Select count(*) from (
SELECT dep_month, dep_day_of_week, dep_date, COUNT(*) AS flight_count
FROM flights
GROUP BY 1,2,3 or Group Group by 3)
In the output, the counts for GROUP BY 1,2,3 and GROUP BY 3 , are 447 and 441, respectively. Why there is any difference between these two grouping methods?
Updates:
Thanks to #trincot excellent answer. I use his suggested codes and found inconsistency in the input database.
SELECT dep_date, count(distinct dep_month), count(distinct dep_day_of_week)
FROM flights
GROUP BY dep_date
HAVING count(distinct dep_month) > 1
OR count(distinct dep_day_of_week) > 1
Output:
dep_date count(distinct dep_month) count(distinct dep_day_of_week)
1/16/2001 1 2
10/25/2003 1 2
2/23/2000 1 2
3/29/2001 1 2
4/3/2001 1 2
5/13/2000 1 2
Specifically, the database assigns Monday for 1/16/2001 8:25:00 and Tuesday for 1/16/2001 7:56:00. That is the reason of the inconsistency.
As the date field has a time component, the count(*) in your subquery is going to be 1 every time, since the time component will be different and generate a new group. Your groups are actually per second.
You could get your results without subquery, like this:
select dep_month,
dep_day_of_week,
count(*) /
count(distinct substring_index(dep_date, ' ', 1)) avg_flights
from flights
group by dep_month,
dep_day_of_week
This counts all the flight records, and divides that by the number of different dates these flights are on. The date is extracted by only taking the part before the space.
Note that this means that when you don't have a record at all for a certain date, this day will not count in the average and might give a false impression. For instance, if in January there is only one Friday for which you have flights (let's say 10 of them), but there are 4 Fridays in January, you will still get an average of 10, even though 2.5 would be more reasonable.
About the difference in count
You state that this query returns 447 records:
Select count(*) from (
SELECT dep_month, dep_day_of_week, dep_date, COUNT(*) AS flight_count
FROM flights
GROUP BY 1,2,3)
And this only 441:
Select count(*) from (
SELECT dep_month, dep_day_of_week, dep_date, COUNT(*) AS flight_count
FROM flights
GROUP BY 3)
This seems to indicate that you have identical dates in multiple records, but yet with difference in one of the first two columns, which would be an inconsistency. You can find out with this query:
SELECT dep_date, count(distinct dep_month), count(distinct dep_day_of_week)
FROM flights
GROUP BY dep_date
HAVING count(distinct dep_month) > 1
OR count(distinct dep_day_of_week) > 1
In a healthy data set, this query should return 0 records. If it returns records, you'll get the dates for which the month is not correctly set in at least one record, or the day of the week is not correctly set in at least one record.

Selecting rows which have a similar value for a particular column which occurs n times each month

I have a database with multiple columns from which I have created this view where I have multiple rows similar to the ones shown below. The data is available for each day of the month from 2009 to 2010 and for all the month for the 5 names given. I have to get the 'Name' for which the occurrence of category 'Super' is more than 5 times each month and list them out separately for each month. The view contains data for all months together.
Name Dates Category
--------------------------------
PAT 2009-01-01 Super
YAT 2009-01-01 No
ROT 2009-01-01 No
SUP 2009-01-01 Super
ANT 2009-01-01 Super
I tried getting a count of the Name in MySQL using
SELECT `NAME`,`DATES`
FROM (
SELECT `NAME`, `CATEGORY`,MONTH(`DATES`)
FROM VIEW
GROUP BY `NAME`, `CATEGORY`,MONTH(`DATES`)
HAVING COUNT(`CATEGORY`)>5
) a
GROUP BY `NAME`
HAVING count(`CATEGORY`)>5;
But it does not return any rows.
You're trying to group your rows into months. The best way to do this is to start with an expression that will take any date and convert it to the first day of the month in which it occurs. That expression is.
DATE(DATE_FORMAT(dates, '%Y-%m-01'))
Next, you use this expression in a query with a GROUP BY clause.
SELECT NAME, CATEGORY,
DATE(DATE_FORMAT(DATES, '%Y-%m-01')) DATES
FROM VIEW
GROUP BY NAME, CATEGORY, DATE(DATE_FORMAT(DATES, '%Y-%m-01'))
HAVING COUNT(*) > 5
This will yield all the name / category / month combinations occurring more than five times.
I think that's what you want. But maybe you want all the monthly items listed in any month where the Super category appears more than five times for some name. To do that first we write a subquery to get a list of those dates:
SELECT DATE(DATE_FORMAT(DATES, '%Y-%m-01')) DATES
FROM VIEW
WHERE CATEGORY = 'Super'
GROUP BY NAME, DATE(DATE_FORMAT(DATES, '%Y-%m-01'))
HAVING COUNT(*) > 5
Then we write a main query to get all the data
SELECT DISTINCT
NAME, CATEGORY,
DATE(DATE_FORMAT(DATES, '%Y-%m-01')) DATES
FROM VIEW
WHERE DATE(DATE_FORMAT(DATES, '%Y-%m-01')) IN
(
SELECT DATE(DATE_FORMAT(DATES, '%Y-%m-01')) DATES
FROM VIEW
WHERE CATEGORY = 'Super'
GROUP BY NAME, DATE(DATE_FORMAT(DATES, '%Y-%m-01'))
HAVING COUNT(*) > 5
)
The trick to getting this sort of thing to work is choosing the right date-arithmetic expression to use in your GROUP BY clause. The functions like DAY(), MONTH(), and YEAR() are surprisingly difficult to use correctly, so I think you may find DATE(DATE_FORMAT(dates, '%Y-%m-01')) more reliable.

sql multiple columns plus sum of each column

Using MySQL, I am counting the occurrence of several events (fields) over a time span of years. I then display this in columns by year. My query works perfect when grouped by year. I now want to add a final column which displays the aggregate of the years. How do I include the total of columns query?
Event 2008 2009 2010 2011 total
A 0 2 0 1 3
B 1 2 3 0 6
etc.
Here is the real query:
select
count(*) as total_docs,
YEAR(field_document_date_value) as doc_year,
field_document_facility_id_value as facility,
IF(count(IF(field_document_type_value ='LIC809',1, NULL)) >0,count(IF(field_document_type_value ='LIC809',1, NULL)),'-') as doc_type_LIC809,
IF(count(IF(field_document_type_value ='LIC9099',1, NULL)) >0,count(IF(field_document_type_value ='LIC9099',1, NULL)),'-') as doc_type_LIC9099,
IF(count(field_document_f1_value) >0,count(field_document_f1_value),'-') as substantial_compliance,
IF(count(field_document_f2_value) >0,count(field_document_f2_value),'-') as deficiencies_sited,
IF(count(field_document_f3_value) >0,count(field_document_f3_value),'-') as admin_outcome_809,
IF(count(field_document_f4_value) >0,count(field_document_f4_value),'-') as unfounded,
IF(count(field_document_f5_value) >0,count(field_document_f5_value),'-') as substantiated,
IF(count(field_document_f6_value) >0,count(field_document_f6_value),'-') as inconclusive,
IF(count(field_document_f7_value) >0,count(field_document_f7_value),'-') as further_investigation,
IF(count(field_document_f8_value) >0,count(field_document_f8_value),'-') as admin_outcome_9099,
IF(count(field_document_type_a_value) >0,count(field_document_type_a_value),'-') as penalty_type_a,
IF(count(field_document_type_b_value) >0,count(field_document_type_b_value),'-') as penalty_type_b,
IF(sum(field_document_civil_penalties_value) >0,CONCAT('$',sum(field_document_civil_penalties_value)),'-') as total_penalties,
IF(count(field_document_noncompliance_value) >0,count(field_document_noncompliance_value),'-') as total_noncompliance
from rcfe_content_type_facility_document
where YEAR(field_document_date_value) BETWEEN year(NOW()) -9 AND year(NOW())
and field_document_facility_id_value = :facility
group by doc_year
You can not GROUP row twice in a SELECT, so you can only count row in a year or in total. You can UNION two SELECT (one grouped by year, second not grouped - total) to overcome this limitation, but I think it is better to count total from year result in script if there is any.
Simplified example:
SELECT by_year.amount, years.date_year FROM
-- generating years pseudo table
(
SELECT 2008 AS date_year
UNION ALL SELECT 2009
UNION ALL SELECT 2010
UNION ALL SELECT 2011
) AS years
-- joining with yearly stat data
LEFT JOIN
(
SELECT SUM(value_field) AS amount, YEAR(date_field) AS date_year FROM data
GROUP BY YEAR(date_field)
) AS by_year USING(date_year)
-- appending total
UNION ALL SELECT SUM(value_field) AS amount, 'total' AS date_year FROM data
WITH ROLLUP is your friend:
http://dev.mysql.com/doc/refman/5.7/en/group-by-modifiers.html
Use your original query and simply add this to the last line:
GROUP BY doc_year WITH ROLLUP
That will add a final cumulative row to your query's result set.