Using a subquery in an sum(IF) statement - mysql

I am creating a pipeline report so we can count opportunities that have been added each week.
My query is:
SELECT
sum(IF(o.date_entered > date_sub(now(), INTERVAL 1 WEEK), 1,0))
Pretty simple and works. The problem is, sales now also wants to count any opportunity as new that has been moved out of a loss status. So, I left-joined to an audit table to include this use case. But now, it counts every instance of the audit table for a given account where the field = sales_stage and the before_value is a loss status. So, and not that this would happen that often if ever, but if an opportunity moves from loss to lead, back to loss, and back to lead, it will count it as 2 new opportunities. I just want to get the latest instance of field=sales_stage and before_value is a loss status, and count that one time.
I want something like a sub-query in the left join, and I keep trying to use MAX, but nothing's working. Here's part of my join:
INNER JOIN opportunities o ON ao.opportunity_id=o.id
LEFT JOIN opportunities_audit oa ON o.id=oa.parent_id
AND after_value_string = 'Loss'
AND date_created > date_sub(now(), INTERVAL 1 WEEK)
Does anybody know the solution to this type of problem? Thank you in advance for any advice!

Related

MySQL 5.7.34, calculate number of days since previous relevant event

I have a query that brings back dates of inbound and outbound payments, and for each outbound payment i want to calculate the number of days since the previous inbound payment.
E.g.
SELECT
ps.clientid AS 'clientid',
psi.id AS 'scheduleid',
case when psi.status IN (4,5,6) then 'IB' when psi.status = 9 then 'OB' END AS 'type',
case when psi.status IN (4,5,6) then FROM_UNIXTIME(psit.date_cleared_on) when psi.status = 9 then FROM_UNIXTIME(psi.due_date) END AS 'date'
FROM payment_schedule_inbound psi
LEFT JOIN payment_schedule_inbound_transaction psit ON psit.payment_schedule_inbound_id = psi.id
INNER JOIN payment_schedule ps ON ps.id = psi.payment_schedule_id
WHERE psi.`status` IN (4,5,6,9)
AND ps.clientid IN (913244,913174) /*example id's, will usually run on multiple at same time or likely the full book*/
ORDER BY ps.clientid,(case when psi.status = 9 then psi.due_date else psit.date_cleared_on END)
(my CRM system stores dates as unixtime for some reason - not my fault, i didn't build the thing!)
What i want to do is, for each 'OB' event, display the datediff between that date and the previous 'IB' event for that clientid. In an ideal world i'd then like to have it only show the number of working days (excluding weekends), but that's whole other can of worms i can get to later!
I know the theory behind it would be to join the query back in on itself and get the max(date) of an IB event where the date is less than the date of the 'OB' event, but i'm just a layman and it's all got a bit much for me!
Any advice would be appreciated.
Thanks,
Ben.

How to limit result set to only the latest instance in the JOIN

I'm creating a sales pipeline report, where I capture the wins and losses for for each sales person each week.
The report works for the most part, except for this corner case that bugs me. It wouldn't typically occur, but if a sales person moves an opportunity to a win status, then back to a loss status, then again to a win status - it will count as 2 wins. I am looking for some way to only get the latest row from the audit (detail) table in which (a) the date is within the last week, (b) the after_value is a loss or win or loss value.
I have tried doing this as much as possible in the join, like so:
FROM
opportunities o ON ao.opportunity_id=o.id
LEFT JOIN opportunities_audit oa ON o.id=oa.parent_id
AND after_value_string IN ('Loss', 'Win')
AND date_created > date_sub(now(), INTERVAL 1 WEEK)
INNER JOIN sweet.users u ON o.assigned_user_id=u.id
but I haven't found a way to use something like MAX(id) in the join. I also tried a MAX(id) in the SELECT, but I have several sum(IF) statements, and I didn't think it made sense to have to do it for every sum(IF) - plus I couldn't figure out how to make it work for just one of them anyway.
I keep going to MAX, or maybe a subquery to join the table to itself and get the MAX(id) that way, but I just haven't figured out where to put the subquery, since I don't want every SELECT to use it. And if that is in fact even the best solution. Oh, AND, the id in these tables look like hash values, so I don't know if MAX would work anyway. Le sigh.
Here's just part of the SELECT, in case it helps:
, sum(IF(o.sales_stage = 'Win'
AND (o.date_modified > date_sub(now(), INTERVAL 1 WEEK))
, 1,0))
AS 'W'
I hope I've given enough information, any direction/advice would be much appreciated!
Thanks!
select Top number|percentage at the beginning from the select

sum up multilple datediffs of datetimes in mysql

I have a table with one user and one day's worth of punches (clockin, breakout, breakin, clockout). Now say the user takes 2 or more breaks. I need to sum up the total time of all breaks taken. I have created a sqlfiddle to make it easier to show what I am trying to do. Here is my example: http://sqlfiddle.com/#!2/21542/6 Now I need to take (12:30:21 - 12:04:44) + (12:36:00 - 12:34:00) to get the total of all breaks taken. How can I do that in my query. Now pretend I have 10 users and 10 days in my table. I would need to group by day and user I know.
I would start by finding some way to link the punch-out records with the punch-in records from the same table. We can then put this data into a table and use it for querying against.
CREATE TEMPOARY TABLE breakPunchInOut (
SELECT
DATE(punchout.PunchDateTime) AS ShiftDate,
punchout.EmpId,
punchout.PunchId AS PunchOutID,
(SELECT
PunchId
FROM
timeclock
WHERE
timeclock.EmpId = punchout.EmpId
AND
timeclock.`In-Out` = 1
AND
timeclock.PunchDateTime > punchout.PunchDateTime
AND
DATE(timeclock.PunchDateTime) = DATE(punchout.PunchDateTime)
ORDER BY
timeclock.PunchDateTime ASC
LIMIT 1
) AS PunchInID
FROM
timeclock AS punchout
WHERE
punchout.`In-Out` = 0
HAVING
PunchInID IS NOT NULL
);
The way this query works is looking for all the "punch-outs" in a specific day, for each of these it then looks for the next "punch-in" which happened on the same day, by the same person. The HAVING clause filters out records where there is no punch-in after a punch-out - so maybe where the employee goes home for the day. This is something to remember because if someone goes home halfway through a shift then their break time will not be added to the total.
It's important to point out that this approach will only work for shifts which start and end on the same day. If you have a night shift which starts in the evening and finishes in the morning the next day, then you'll have to alter the way that you join the punch outs and punch ins together.
Now that we have this linking table, its relatively simple to use it to create a summary report for each employee and each shift:
SELECT
breakPunchInOut.ShiftDate,
breakPunchInOut.EmpId,
SUM(
TIMESTAMPDIFF(MINUTE, punchOut.PunchDateTime, punchIn.PunchDateTime)
) AS TotalBreakLengthMins
FROM
breakPunchInOut
INNER JOIN
timeclock AS punchOut
ON
punchOut.PunchId = breakPunchInOut.PunchOutId
INNER JOIN
timeclock AS punchIn
ON
punchIn.PunchId = breakPunchInOut.PunchInId
GROUP BY
breakPunchInOut.ShiftDate,
breakPunchInOut.EmpId
;
Notice we use the TIMESTAMPDIFF function, not the DATEDIFF. DATEDIFF only calculates the number of days between two dates - it's not used for time.

Tricky Rails3/mysql query

In rails 3 (also with meta_where gem if you feel like using it in your query), I got a really tricky query that I have been banging my head for:
Suppose I have two models, customers and purchases, customer have many purchases. Let's define customers with at least 2 purchases as "repeat_customer". I need to find the total number of repeat_customers by each day for the past 3 months, something like:
Date TotalRepeatCustomerCount
1/1/11 10 (10 repeat customers by the end of 1/1/11)
1/2/11 15 (5 more customer gained "repeat" status on this date)
1/3/11 16 (1 more customer gained "repeat" status on this date)
...
3/30/11 150
3/31/11 160
Basically I need to group customer count based on the date of creation of their second purchase, since that is when they "gain repeat status".
Certainly this can be achieved in ruby, something like:
Customer.includes(:purchases).all.select{|x| x.purchases.count >= 2 }.group_by{|x| x.purchases.second.created_at.to_date }.map{|date, customers| [date, customers.count]}
However, the above code will fire query on the same lines of Customer.all and Purchase.all, then do a bunch of calculation in ruby. I would much prefer doing selection, grouping and calculations in mysql, since it is not only much faster, it also reduces the bandwith from the database. In large databases, the code above is basically useless.
I have been trying for a while to conjure up the query in rails/active_record, but have no luck even with the nice meta_where gem. If I have to, I will accept a solution in pure mysql query as well.
Edited: I would cache it (or add a "repeat" field to customers), though only for this simplified problem. The criteria for repeat customer can change by the client at any point (2 purchases, 3 purchases, 4 purchases etc), so unfortunately I do have to calculate it on the spot.
SELECT p_date, COUNT(customers.id) FROM
(
SELECT p_date - INTERVAL 1 day p_date, customers.id
FROM
customers NATURAL JOIN purchases
JOIN (SELECT DISTINCT date(purchase_date) p_date FROM purchases) p_dates
WHERE purchases.purchase_date < p_date
GROUP BY p_date, customers.id
HAVING COUNT(purchases.id) >= 2
) a
GROUP BY p_date
I didn't test this in the slightest, so I hope it works. Also, I hope I understood what you are trying to accomplish.
But please note that you should not do this, it'll be too slow. Since the data never changes once the day is passed, just cache it for each day.

Is it possible to create multi-tiered WHERE statements in mySQL

I'm currently developing a program that will generate reports based upon lead data. My issue is that I'm running 3 queries for something that I would like to only have to run one query for. For instance, I want to gather data for leads generated in the past day
submission_date > (NOW() - INTERVAL 1 DAY)
and I would like to find out how many total leads there were, and how many sold leads there were in that timeframe. (sold=1 / sold=0). The issue comes with the fact that this query is currently being done with 2 queries, one with WHEREsold= 1 and one with WHEREsold= 0. This is all well and good, but when I want to generate this data for the past day,week,month,year,and all time I will have to run 10 queries to obtain this data. I feel like there HAS to be a more efficient way of doing this. I know I can create a mySQL function for this, but I don't see how this could solve the problem.
Thanks!!
Why not GROUP BY sold so you get the totals for sold and not sold
One way to do is to exploit the aggregate functions (usually SUM and COUNT help you the most in this situation) along with MySQL's IF() function.
For example, you could use a query such as:
SELECT
SUM(IF(sold = 1, sold, 0)) AS TotalSold,
SUM(IF(sold = 0, sold, 0)) AS TotalUnsold,
SUM(IF(submission_date > (NOW() - INTERVAL 1 WEEK)
AND sold = 1, sold, 0) AS TotalSoldThisWeek
FROM ...
WHERE ...
The condition (e.g. sold = 1) could be as complex as you want by using AND and OR.
Disclamer: code wasn't tested, this was just provided as an example that should work with minor modifications.