Conditionally counting while also grouping by - mysql

I am trying to join two tables
ad_data_grouped
adID, adDate (date), totalViews
This is data that has already been grouped by both adID and adDate.
The second table is
leads
leadID, DateOfBirth, adID, state, createdAt(dateTime)
What I'm struggling with is joining these two tables so I can have a column that counts the number of leads when it shares the same adID and where the adDate = createdAt
The problem I'm running into is that when the counts are all the same for all groupings of adID....I have a few other things I'm trying to do, but it's based on similar similar conditional counting.
Query:(I know the temp table is probably overkill, but I'm trying to break this up into small pieces where I can understand what each piece does)
CREATE TEMPORARY TABLE ad_stats_grouped
SELECT * FROM `ad_stats`
LIMIT 0;
INSERT INTO ad_stats_grouped(AdID, adDate, DailyViews)
SELECT
AdID,
adDate,
sum(DailyViews)
FROM `ad_stats`
GROUP BY adID, adDate;
SELECT
ad_stats_grouped.adID,
ad_stats_grouped.adDate,
COUNT(case when ad_stats_grouped.adDate = Date(Leads.CreatedAt) THEN 1 ELSE 0 END)
FROM `ad_stats_grouped` INNER JOIN `LEADS` ON
ad_stats_grouped.adID = Leads.AdID
GROUP BY adID, adDate;

The problem with your original query is the logic in the COUNT(). This aggregate functions takes in account all non-null values, so it counts 0 and 1s. One solution would be to change COUNT() to SUM().
But I think that the query can be furtermore improved by moving the date condition on the date to the on part of a left join:
select
g.adid,
g.addate,
count(l.adid)
from `ad_stats_grouped` g
left join `leads` l
on g.adid = l.adid
and l.createdat >= g.addate
and l.createdat < g.ad_stats + interval 1 day
group by g.adid, g.addate;

Related

SQL Count on JOIN query is taking forever to execute?

I'm trying to run count query on a 2 table join. e_amazing_client table is having million entries/rows and m_user has just 50 rows BUT count query is taking forever!
SELECT COUNT(`e`.`id`) AS `count`
FROM `e_amazing_client` AS `e`
LEFT JOIN `user` AS `u` ON `e`.`cx_hc_user_id` = `u`.`id`
WHERE ((`e`.`date_created` >= '2018-11-11') AND (`e`.`date_created` >= '2018-11-18')) AND (`e`.`id` >= 1)
I don't know what is wrong with this query?
First, I'm guessing that this is sufficient:
SELECT COUNT(*) AS `count`
FROM e_amazing_client e
WHERE e.date_created >= '2018-11-11' AND e.id >= 1;
If user has only 50 rows, I doubt it is creating duplicates. The comparisons on date_created are redundant.
For this query, try creating an index on e_amazing_client(date_created, id).
Maybe you wanted this:
SELECT COUNT(`e`.`id`) AS `count`
FROM `e_amazing_client` AS `e`
LEFT JOIN `user` AS `u` ON `e`.`cx_hc_user_id` = `u`.`id`
WHERE ((`e`.`date_created` >= '2018-11-11') AND (`e`.`date_created` <= '2018-11-18')) AND (`e`.`id` >= 1)
to check between dates?
Also, do you really need
AND (`e`.`id` >= 1)
If id is what an id is usually in a table, is there a case to be <1?
Your query is pulling ALL records on/after 2018-11-11 because your WHERE clause is ID >= 1 You have no clause in there for a specific user. You also had in your original query based on a date of >= 2018-11-18. You MAY have meant you only wanted the count WITHIN the week 11/11 to 11/18 where the sign SHOULD have been >= 11-11 and <= 11-18.
As for the count, you are getting ALL people (assuming no entry has an ID less than 1) and thus a count within that date range. If you want it per user as you indicated you need to group by the cx_hc_user_id (user) column to see who has the most, or make the user part of the WHERE clause to get one person.
SELECT
e.cx_hc_user_id,
count(*) countPerUser
from
e_amazing_client e
WHERE
e.date_created >= '2018-11-11'
AND e.date_created <= '2018-11-18'
group by
e.cx_hc_user_id
You can order by the count descending to get the user with the highest count, but still not positive what you are asking.

Select most recent record grouped by 3 columns

I am trying to return the price of the most recent record grouped by ItemNum and FeeSched, Customer can be eliminated. I am having trouble understanding how I can do that reasonably.
The issue is that I am joining about 5 tables containing hundreds of thousands of rows to end up with this result set. The initial query takes about a minute to run, and there has been some trouble with timeout errors in the past. Since this will run on a client's workstation, it may run even slower, and I have no access to modify server settings to increase memory / timeouts.
Here is my data:
Customer Price ItemNum FeeSched Date
5 70.75 01202 12 12-06-2017
5 70.80 01202 12 06-07-2016
5 70.80 01202 12 07-21-2017
5 70.80 01202 12 10-26-2016
5 82.63 02144 61 12-06-2017
5 84.46 02144 61 06-07-2016
5 84.46 02144 61 07-21-2017
5 84.46 02144 61 10-26-2016
I don't have access to create temporary tables, or views and there is no such thing as a #variable in C-tree, but in most ways it acts like MySql. I wanted to use something like GROUP BY ItemNum, FeeSched and select MAX(Date). The issue is that unless I put Price into the GROUP BY I get an error.
I could run the query again only selecting ItemNum, FeeSched, Date and then doing an INNER JOIN, but with the query taking a minute to run each time, it seems there is a better way that maybe I don't know.
Here is my query I am running, it isn't really that complicated of a query other than the amount of data it is processing. Final results are about 50,000 rows. I can't share much about the database structure as it is covered under an NDA.
SELECT DISTINCT
CustomerNum,
paid as Price,
ItemNum,
n.pdate as newest
from admin.fullproclog as f
INNER JOIN (
SELECT
id,
itemId,
MAX(TO_CHAR(pdate, 'MM-DD-YYYY')) as pdate
from admin.fullproclog
WHERE pdate > timestampadd(sql_tsi_year, -3, NOW())
group by id, itemId
) as n ON n.id = f.id AND n.itemId = f.itemId AND n.pdate = f.pdate
LEFT join (SELECT itemId AS linkid, ItemNum FROM admin.itemlist) AS codes ON codes.linkid = f.itemId AND ItemNum >0
INNER join (SELECT DISTINCT parent_id,
MAX(ins1.feesched) as CustomerNum
FROM admin.customers AS p
left join admin.feeschedule AS ins1
ON ins1.feescheduleid = p.primfeescheduleid
left join admin.group AS c1
ON c1.insid = ins1.feesched
WHERE status =1
GROUP BY parent_id)
AS ip ON ip.parent_id = f.parent_id
WHERE CustomerNum >0 AND ItemNum >0
UNION ALL
SELECT DISTINCT
CustomerNum,
secpaid as Price,
ItemNum,
n.pdate as newest
from admin.fullproclog as f
INNER JOIN (
SELECT
id,
itemId,
MAX(TO_CHAR(pdate, 'MM-DD-YYYY')) as pdate
from admin.fullproclog
WHERE pdate > timestampadd(sql_tsi_year, -3, NOW())
group by id, itemId
) as n ON n.id = f.id AND n.itemId = f.itemId AND n.pdate = f.pdate
LEFT join (SELECT itemId AS linkid, ItemNum FROM admin.itemlist) AS codes ON codes.linkid = f.itemId AND ItemNum >0
INNER join (SELECT DISTINCT parent_id,
MAX(ins1.feesched) as CustomerNum
FROM admin.customers AS p
left join admin.feeschedule AS ins1
ON ins1.feescheduleid = p.secfeescheduleid
left join admin.group AS c1
ON c1.insid = ins1.feesched
WHERE status =1
GROUP BY parent_id)
AS ip ON ip.parent_id = f.parent_id
WHERE CustomerNum >0 AND ItemNum >0
I feel it quite simple when I'd read the first three paragraphs, but I get a little confused when I've read the whole question.
Whatever you have done to get the data posted above, once you've got the data like that it's easy to retrive "the most recent record grouped by ItemNum and FeeSched".
How to:
Firstly, sort the whole result set by Date DESC.
Secondly, select fields you need from the sorted result set and group by ItemNum, FeeSched without any aggregation methods.
So, the query might be something like this:
SELECT t.Price, t.ItemNum, t.FeeSched, t.Date
FROM (SELECT * FROM table ORDER BY Date DESC) AS t
GROUP BY t.ItemNum, t.FeeSched;
How it works:
When your data is grouped and you select rows without aggregation methods, it will only return you the first row of each group. As you have sorted all rows before grouping, so the first row would exactly be "the most recent record".
Contact me if you got any problems or errors with this approach.
You can also try like this:
Select Price, ItemNum, FeeSched, Date from table where Date IN (Select MAX(Date) from table group by ItemNum, FeeSched,Customer);
Internal sql query return maximum date group by ItemNum and FeeSched and IN statement fetch only the records with maximum date.

Is there a way to create an SQL query faster than this one?

I have a MySQL table which stores the data of a hotel's reservations.
I need a query to see the amount of guests who stayed in the hotel for each date.
I was able to create a query (using a subquery) but it performs very slowly. Is there a better way to get the requested data? (For example join the table to itself, or whatever.)
My query is:
SELECT CheckOutDate AS Date,
(SELECT SUM(NrOfGuests) FROM tblGuests tG
WHERE tG.CheckInDate <= tblGuests.CheckOutDate
AND tG.CheckOutDate > tblGuests.CheckOutDate
AND tG.IsCancelled = False AND tG.NoShow = False)
AS NrOfGestsStaying
FROM tblGuests
GROUP BY CheckOutDate
What is the best way to make it perform faster?
In the original query, the SELECT returns a SUM on every row of the table using a subquery. The duplicates are removed afterwards using a group by CheckOutDate. So, in other words, this is the SUM(NrOfGuests) for distinct CheckOutDate.
You can remove duplicate CheckOutDate in advance by subquerying distinct CheckOutDate. So in the receiving query the SUM is applied just one time for distinct CheckOutDate:
SELECT dT.CheckOutDate
,(SELECT SUM(NrOfGuests)
FROM tblGuests tG
WHERE tG.CheckInDate <= dT.CheckOutDate
AND tG.CheckOutDate >= dT.CheckOutDate
AND tG.IsCancelled = 0
AND tG.NoShow = 0
) AS NrOfGuests
FROM (
SELECT DISTINCT CheckOutDate
FROM tblGuests
) AS dT
ORDER BY dT.CheckOutDate

Display zero in group by sql for a particular period

I am trying to run the following query to obtain the sales for each type of job for a particular period. However for certain months where there are no jobs of a particular job type performed no 0 is displayed in sales.
How can i display the zeros in such a condition.
Here is the sql query-
select Year(postedOn), month(postedOn), jobType, sum(price)
from tbl_jobs
group by jobType, year(postedOn), month(postedOn)
order by jobType, year(postedOn), month(postedOn)
Typically, this is where your all-purpose calendar or numbers table comes in to anchor the query with a consistent sequential set:
SELECT job_summary.*
FROM Calendar
CROSS JOIN (
-- you may not have though about this part of the problem, though
-- what about years/months with missing job types?
SELECT distinct jobType FROM tbl_jobs
) AS job_types
LEFT JOIN (
select Year(postedOn) AS year,month(postedOn) as month,jobType ,sum(price)
from tbl_jobs
group by jobType, year(postedOn), month(postedOn)
) job_summary
ON job_summary.jobType = job_types.jobType
AND job_summary.year = Calendar.year
AND job_summary.month = Calendar.month
WHERE Calendar.day = 1 -- Assuming your calendar is every day
AND calendar.date BETWEEN some_range_goes_here -- you don't want all time, right?
order by job_types.jobType, Calendar.year, Calendar.month

Joining tables in MySQL where columns are summed

I am trying to join 3 tables. Two of the tables I am taking sums of a column. I want to apply conditions on the sums but am not producing the result I want with the below script. The sums are not summing correctly.
SELECT
account_list.Account_ID,
account_list.Account_Name,
account_list.Short_Name,
account_list.Trader,
account_list.Status,
account_list.Notes,
sum(account_commissions.Commission),
sum(connection_cost.Monthly_Cost)
FROM
account_commissions
Join
connection_cost
ON
account_commissions.Account_ID = connection_cost.Account_ID
AND
connection_cost.Cost_Date > '2013-06-01'
AND
account_commissions.TDate > '2013-06-01'
Right Join
account_list
ON
account_list.Account_ID = connection_cost.Account_ID
WHERE
account_list.status = 'Active'
GROUP BY
Account_ID;
The conditions I want on the sums are:
sum account_commissions.Commission where account_commissions.TDate > '2013-06-01 Group BY Account_ID
and
sum connection_cost.Monthy_Cost where connection_cost.Date > '2013-06-01' Group BY Account_ID.
I tried to achieve that using the above AND statements but it is not computing correctly. Any help on how to apply these conditions to the sum columns would be appreciated.
I've changed to a LEFT-JOIN as it appears you want all account list entries, and any corresponding summation of costs and commissions per account. So, the JOINs are based on sum() of each table individually, but grouped by account, THEN joined back to the main account list table.
SELECT
AL.Account_ID,
AL.Account_Name,
AL.Short_Name,
AL.Trader,
AL.Status,
AL.Notes,
coalesce( preSumCC.CC_Costs, 0 ) as MonthlyCosts,
coalesce( preSumComm.AC_Commission, 0 ) as Commissions
FROM
account_list AL
LEFT JOIN ( SELECT CC.Account_ID,
SUM( CC.Monthly_Cost ) CC_Costs
FROM
connection_cost CC
where
CC.Cost_Date > '2013-06-01'
group by
CC.Account_ID ) preSumCC
ON AL.Account_ID = preSumCC.Account_ID
LEFT JOIN ( select AC.Account_ID,
SUM( AC.Commission ) AC_Commission
FROM
account_commissions AC
where
AC.TDate > '2013-06-01'
group by
AC.Account_ID ) preSumComm
ON AL.Account_ID = preSumComm.Account_ID
You can create a conditional SUM in MySQL using IF:
SUM(IF(account_commisions.TDate >'2013-01-01',account_commissions_Commission, 0))
To be more portable, you should use CASE, as IF is not part of the SQL standard. I tend to find IF more readable though.