COUNT() the number of MAX() occurrences in MySQL - mysql

I have a donation database and one of the reports I run against it I would like to include the number of donations that equal the months maximum donation. For example the months highest donation may be $100, but there may be 5 people who all donated $100, I would like to get that count.
My current query is:
SELECT SUM(mc_gross) AS Donations,
SUM(mc_fee) AS Fees,
COUNT(payment_date) AS DontationCount,
COUNT(DISTINCT payer_email) AS DonatorCount,
MAX(mc_gross) AS MaxDonation,
#MaxD:=MAX(mc_gross),
(
SELECT COUNT(*)
FROM #__paypal_donations
WHERE MONTH(payment_date) = MONTH(CURDATE())
AND YEAR(payment_date) = YEAR(CURDATE())
AND mc_gross = #MaxD
) as MaxDonationMultiplier,
AVG(mc_gross) AS AverageDonation
FROM #__paypal_donations
WHERE MONTH(payment_date) = MONTH(CURDATE())
AND YEAR(payment_date) = YEAR(CURDATE())
So I think I may be close, but it looks like either the value I am storing in #MaxD for use in my subquery is not working or the comparison itself in mc_gross = #MaxD is not working because if I replace #MaxD with a real value I get a proper count.

You cannot depend on the order of assignment of expressions in MySQL. That makes a query such as yours quite dangerous. Fortunately, you can easily solve this problem with a correlated subquery:
SELECT SUM(mc_gross) AS Donations, SUM(mc_fee) AS Fees, COUNT(payment_date) AS DontationCount,
COUNT(DISTINCT payer_email) AS DonatorCount, MAX(mc_gross) AS MaxDonation,
(SELECT COUNT(*)
FROM #__paypal_donations pd2
WHERE MONTH(pd2payment_date) = MONTH(pd.payment_date)) AND
YEAR(pd2payment_date) = YEAR(pd.payment_date) AND
pd2.mc_gross = MAX(mc_gross)
) as MaxDonationMultiplier,
AVG(mc_gross) AS AverageDonation
FROM #__paypal_donations pd
WHERE MONTH(payment_date) = MONTH(CURDATE()) AND
YEAR(payment_date) = YEAR(CURDATE());

Related

mysql max and min subquery using date range

I have the following query:
SELECT
(Date + INTERVAL -(WEEKDAY(Date)) DAY) `Date`,
I would like to use a subquery here to get the oldest and newest inventory from the max and min Date:
(select sellable from clabDevelopment.fba_history_daily where Date =
max(Date))
max(Date), min(Date),
ASIN,
ItemSKU,
it.avgInv,
kt.Account, kt.Country, SUM(Sessions) `Sessions`, avg(Session_Pct)`Session_Pct`,
sum(Page_Views)`Page_Views`, avg(Page_Views_Pct)`Page_Views_Pct`, avg(Buy_Box_Pct)`Buy_Box_Pct`,
sum(Units_Ordered)`Units_Ordered`, sum(Units_Ordered_B2B) `Units_Ordered_B2B`,
avg(Unit_Session_Pct)`Unit_Session_Pct`, avg(Unit_Session_Pct_B2B)`Unit_Session_Pct_B2B`,
sum(Ordered_Product_Sales)`Ordered_Product_Sales`, sum(Total_Order_Items) `Total_Order_Items`, sum(Actual_Sales) `Actual_Sales`,
sum(Orders) `Orders`, sum(PPC_Revenue) `PPC_Revenue`, sum(PPC_Orders) `PPC_Orders`,
sum(Revenue)`Revenue`, sum(Sales_Tax_Collected) `Sales_Tax_Collected`, sum(Total_Ad_Spend) `Total_Ad_Spend`, sum(Impressions) `Impressions`,
sum(Profit_after_Fees_before_Costs) `Profit_after_Fees_before_Cost`
FROM clabDevelopment.KPI_kpireport as kt
left outer join
(SELECT Month(Date) as mnth, sku, account, country, avg(sellable)`avgInv` FROM clabDevelopment.`fba_history_daily`
where sellable >= 0
group by Month(Date), sku, account, country) as it
on kt.ItemSKU = it.SKU
and kt.Account = it.account
and kt.Country = it.country
and it.mnth = Month(kt.Date)
WHERE kt.Country = 'USA' or kt.Country = 'CAN'
GROUP BY Account, Country,(Date + INTERVAL -(WEEKDAY(Date)) DAY), ItemSKU
ORDER BY Date desc
The sub-query would be from the same table I am joining on the bottom except I group by month there. So I want to run this subquery and grab the value under sellable for the date of max(Date):
(select sellable from clabDevelopment.`fba_history_daily where Date = max(Date))
When I do it this way I get invalid use of group function.
Without known your schema and the engine/db it is difficult to understand the problem. But, here is a best guess with the following schema:
fba_history_daily
- mnth
- sku
- account
- country
- sellable
- SKU
KPI_kpireport
- Account
- Country
- ItemSKU
- Account
- Date
- Country
- ASIN
The following query would give you what you're looking for. This uses a GROUP_CONCAT in order to build the required results through aggregation. With the nested query join MySQL might be building a temporary table within memory to sort through those records which would not be optimal. You can check this using EXPLAIN and you would see Using temporary in the details.
SELECT
(Date + INTERVAL -(WEEKDAY(Date)) DAY) `Date`,
ASIN,
ItemSKU,
-- MIN
(SUBSTRING_INDEX(GROUP_CONCAT(it.sellable ORDER BY it.Date ASC),',', 1) AS minSellable),
-- MAX
(SUBSTRING_INDEX(GROUP_CONCAT(it.sellable ORDER BY it.Date DESC),',', 1) AS maxSellable),
-- AVG
AVG(it.sellable) avgInv,
kt.Account, kt.Country, SUM(Sessions) `Sessions`, avg(Session_Pct)`Session_Pct`,
sum(Page_Views)`Page_Views`, avg(Page_Views_Pct)`Page_Views_Pct`, avg(Buy_Box_Pct)`Buy_Box_Pct`,
sum(Units_Ordered)`Units_Ordered`, sum(Units_Ordered_B2B) `Units_Ordered_B2B`,
avg(Unit_Session_Pct)`Unit_Session_Pct`, avg(Unit_Session_Pct_B2B)`Unit_Session_Pct_B2B`,
sum(Ordered_Product_Sales)`Ordered_Product_Sales`, sum(Total_Order_Items) `Total_Order_Items`, sum(Actual_Sales) `Actual_Sales`,
sum(Orders) `Orders`, sum(PPC_Revenue) `PPC_Revenue`, sum(PPC_Orders) `PPC_Orders`,
sum(Revenue)`Revenue`, sum(Sales_Tax_Collected) `Sales_Tax_Collected`, sum(Total_Ad_Spend) `Total_Ad_Spend`, sum(Impressions) `Impressions`,
sum(Profit_after_Fees_before_Costs) `Profit_after_Fees_before_Cost`
FROM KPI_kpireport as kt
left outer join fba_history_daily it on
kt.ItemSKU = it.SKU
and kt.Account = it.account
and kt.Country = it.country
and Month(it.Date) = Month(kt.Date)
and it.sellable >= 0
WHERE kt.Country = 'USA' or kt.Country = 'CAN'
GROUP BY Account, Country,(Date + INTERVAL -(WEEKDAY(Date)) DAY), ItemSKU
ORDER BY Date desc

date_format(from_unixtime()) subselect very slow

I have the following query I'm trying to use to spit out each day in a date range and show the # of leads, assignments, & returns:
select
date_format(from_unixtime(date_created), '%m/%d/%Y') as date_format,
(select count(distinct(id_lead)) from lead_history where (date_format(from_unixtime(date_created), '%m/%d/%Y') = date_format) and (id_vertical in (2)) and (id_website in (3,8))) as leads,
(select count(id) from assignments where deleted=0 and (date_format(from_unixtime(date_assigned), '%m/%d/%Y') = date_format) and (id_vertical in (2)) and (id_website in (3,8))) as assignments,
(select count(id) from assignments where deleted=1 and (date_format(from_unixtime(date_deleted), '%m/%d/%Y') = date_format) and (id_vertical in (2)) and (id_website in (3,8))) as returns
from lead_history
where date_created between 1509494400 and 1512086399
group by date_format
The date_created, date_assigned, and date_deleted fields are integers representing timestamps. id, id_lead, id_vertical and id_website are already indexed.
Would adding indexes to date_created, date_assigned, date_deleted, and deleted help make this faster? The issue I'm having is that it is very slow, and I'm not sure an index will help when using date_format(from_unixtime(...
Here is the EXPLAIN:
Looking to your code you could rewrite the query as ..
select
date_format(from_unixtime(date_created), '%m/%d/%Y') as date_format
, count(distinct(h.id_lead) as leads
, sum(case a.deleted = 1 then 1 else 0 end) assignments
, sum(case b.deleted = 0 then 1 else 0 end) returns
from lead_history h
inner join assignments on a a.date_assigned = h.date_created
and a.id_vertical = 2
and id_website in (3,8))
inner join assignments on b b.deleted = h.date_created
and a.id_vertical = 2
and id_website in (3,8))
where date_created between 1509494400 and 1512086399
group by date_format
anyway you shold avoid unuseful () and nested (), avoid unuseful conversion between date and use join instead of subselect .. or at least reduce similar sabuselect using case
PS for what concern the index remember that the use of conversion on a column value invalid the use of related the index ..

Select column(s) corresponding to max/min of another column without joins

I have a table (id, employee_id, device_id, logged_time) [simplified] that logs attendances of employees from biometric devices.
I generate reports showing the first in and last out time of each employee by date.
Currently, I am able to fetch the first in and last out time of each employee by date, but I also need to fetch the first in and last out device_ids of each employee. The entries are not in sequential order of the logged time.
I do not want to (and probably cannot) use joins as in one of the reports the columns are dynamically generated and can lead to thousands of joins. Furthermore, these are subqueries and are joined to other queries to get further details.
A sample setup of the table and queries are at http://sqlfiddle.com/#!9/3bc755/4
The first one just shows lists the entry and exit time by date of every employee
select
attendance_logs.employee_id,
DATE(attendance_logs.logged_time) as date,
TIME(MIN(attendance_logs.logged_time)) as entry_time,
TIME(MAX(attendance_logs.logged_time)) as exit_time
from attendance_logs
group by date, attendance_logs.employee_id
The second one builds up an attendance chart given a date range
select
`attendance_logs`.`employee_id`,
DATE(MIN(case when DATE(`attendance_logs`.`logged_time`) = '2017-09-18' THEN `attendance_logs`.`logged_time` END)) as date_2017_09_18,
MIN(case when DATE(`attendance_logs`.`logged_time`) = '2017-09-18' THEN `attendance_logs`.`logged_time` END) as entry_2017_09_18,
MAX(case when DATE(`attendance_logs`.`logged_time`) = '2017-09-18' THEN `attendance_logs`.`logged_time` END) as exit_2017_09_18,
DATE(MIN(case when DATE(`attendance_logs`.`logged_time`) = '2017-09-19' THEN `attendance_logs`.`logged_time` END)) as date_2017_09_19,
MIN(case when DATE(`attendance_logs`.`logged_time`) = '2017-09-19' THEN `attendance_logs`.`logged_time` END) as entry_2017_09_19,
MAX(case when DATE(`attendance_logs`.`logged_time`) = '2017-09-19' THEN `attendance_logs`.`logged_time` END) as exit_2017_09_19
/*
* dynamically generated columns for dates in date range
*/
from `attendance_logs`
where `attendance_logs`.`logged_time` >= '2017-09-18 00:00:00' and `attendance_logs`.`logged_time` <= '2017-09-19 23:59:59'
group by `attendance_logs`.`employee_id`;
Tried:
Similar to max and min logged_time of each date using case, tried to select the device_id where logged_time is max/min.
```MIN(case
when
`attendance_logs.logged_time` = MIN(
case when DATE(`attendance_logs`.`logged_time`)
= '2017-09-18' THEN `attendance_logs`.`logged_time` END
)
then `attendance_logs`.`device_id` end) as entry_device_2017_09_18 ```
This results in invalid use of group by
A quick hack for your query to pick the device id for in and out by using GROUP_CONCAT with in SUBSTRING_INDEX
SUBSTRING_INDEX(GROUP_CONCAT(case when DATE(`l`.`logged_time`) = '2017-09-18' THEN `l`.`device_id` END ORDER BY `l`.`device_id` desc),',',1) exit_device_2017_09_18,
Or if device id will be same for each in and its out then simply it can be written with GROUP_CONCAT only
GROUP_CONCAT(DISTINCT case when DATE(`l`.`logged_time`) = '2017-09-18' THEN `l`.`device_id` END)
DEMO
To avoid joins I suggest you try "correlated subqueries" instead:
select
employee_id
, logdate
, TIME(entry_time) entry_time
, (select MIN(l.device_id)
from attendance_logs l
where l.employee_id = d.employee_id
and l.logged_time = d.entry_time) entry_device
, TIME(exit_time) exit_time
, (select MAX(l.device_id)
from attendance_logs l
where l.employee_id = d.employee_id
and l.logged_time = d.exit_time) exit_device
from (
select
attendance_logs.employee_id
, DATE(attendance_logs.logged_time) as logdate
, MIN(attendance_logs.logged_time) as entry_time
, MAX(attendance_logs.logged_time) as exit_time
from attendance_logs
group by
attendance_logs.employee_id
, DATE(attendance_logs.logged_time)
) d
;
see: http://sqlfiddle.com/#!9/06e0e2/3
Note: I have used MIN() and MAX() on those subqueries only to avoid any possibility that these return more than one value. You could use limit 1 instead if you prefer.
Note also: I do not normally recommend correlated subqueries as they can cause performance issues, but they do supply the data you need.
oh, and please try to avoid using date as a column name, it isn't good practice.

SQL select aggregate values in columns

I have a table in this structure:
editor_id
rev_user
rev_year
rev_month
rev_page
edit_count
here is the sqlFiddle: http://sqlfiddle.com/#!2/8cbb1/1
I need to surface the 5 most active editors during March 2011 for example - i.e. for each rev_user - sum all of the edit_count for each rev_month and rev_year to all of the rev_pages.
Any suggestions how to do it?
UPDATE -
updated fiddle with demo data
You should be able to do it like this:
Select the total using SUM and GROUP BY, filtering by rev_year and rev_month
Order by the SUM in descending order
Limit the results to the top five items
Here is how:
SELECT * FROM (
SELECT rev_user, SUM(edit_count) AS total_edits
FROM edit_count_user_date
rev_year='2006' AND rev_month='09'
GROUP BY rev_user
) x
ORDER BY total_edits DESC
LIMIT 5
Demo on sqlfiddle.
Surely this is as straightforward as :
SELECT rev_user, SUM(edit_count) as TotalEdits
FROM edit_count_user_date
WHERE rev_month = 'March' and rev_year = '2014'
GROUP BY rev_user
ORDER BY TotalEdits DESC
LIMIT 5;
SqlFiddle here
May I also suggest using a more appropriate DATE type for the year and month storage?
Edit, re new Info
The below will return all edits for the given month for the 'highest' MonthTotal editor, and then re-group the totals by the rev_page.
SELECT e.rev_user, e.rev_page, SUM(e.edit_count) as TotalEdits
FROM edit_count_user_date e
INNER JOIN
(
SELECT rev_user, rev_year, rev_month, SUM(edit_count) AS MonthTotal
FROM edit_count_user_date
WHERE rev_month = '09' and rev_year = '2010'
GROUP BY rev_user, rev_year, rev_month
ORDER BY MonthTotal DESC
LIMIT 1
) as x
ON e.rev_user = x.rev_user AND e.rev_month = x.rev_month AND e.rev_year = x.rev_year
GROUP BY e.rev_user, e.rev_page;
SqlFiddle here - I've adjusted the data to make it more interesting.
However, if you need to do this across several months at a time, it will be more difficult given MySql's lack of partition by / analytical windowing functions.

SQL query not returning expect result

I wrote the following query to return some statistics about purchases made in the X amount of time. But for some reason every "COUNT" column return the total number of rows. Did I organize the query incorrectly?
SELECT COUNT(*) as countTotal, SUM(`cost`) as cost, COUNT(`paymentType` = 'credit') as count_credit, COUNT(`paymentType` = 'cash') as count_cash
FROM `purchase` WHERE `date` >= '2011-5-4'
update
I just decided to use sub-queries. This is what I ended up with.
SELECT
COUNT(*) as countTotal,
SUM(`cost`) as cost,
(SELECT COUNT(*) FROM `purchase` WHERE `paymentType` = 'credit') as count_credit,
(SELECT COUNT(*) FROM `purchase` WHERE `paymentType` = 'cash') as count_cash
FROM `purchase` WHERE `date` >= '2011-5-4'
update2
Used ypercubes answer below.
count does return the number of rows for the domain or group queried. Looks like you need to group by PaymentType to achieve what you are looking for.
SELECT PaymentType, COUNT(*) as countTotal, SUM(`cost`) as cost,
FROM `purchase`
WHERE `date` >= '2011-5-4'
Group by PaymentType
here is a reference
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html
It doesn't look correct but changing COUNT() to SUM() works fine:
SELECT COUNT(*) AS countTotal
, SUM(cost) AS cost
, SUM(paymentType = 'credit') AS count_credit --- SUM does counting here
, SUM(paymentType = 'cash') AS count_cash --- and here
FROM purchase
WHERE `date` >= '2011-05-04'
Explanation: True == 1 and False == 0 for MySQL.
You need a GROUP BY clause after your WHERE clause