How do I add a where clause to a sum aggregate? - mysql

I am trying to figure out best way to get the aggregate of a person's hours spent on a project name that follows a certain pattern
Current Tables
+--------------------+----------------+-----------------+
| Tbl_Employee | Tbl Projects | tbl_timesheet |
+--------------------+----------------+-----------------+
| employee_id | project_id | timesheet_id |
| employee_full_name | cws_project_id | employee_id |
| | | project_id |
| | | timesheet_hours |
+--------------------+----------------+-----------------+
Here is the query I have so far
select
te.employee_id,
te.employee_last_name,
te.employee_first_name,
te.employee_department,
te.employee_type_id,
te.timesheet_routing,
sum(tt.timesheet_hours) as total_hours,
month(tt.timesheet_date) as "month",
year(tt.timesheet_date) as "year"
from tbl_employee te
left join tbl_timesheet tt
on te.employee_id = tt.employee_id
join tbl_projects tp
on tp.project_id = tt.project_id
where te.employee_active = 1
and te.employee_id > 0
and employee_department IN ("Project Management","Engineering","Deployment Srvs.")
and year(tt.timesheet_date) = 2015
group by te.employee_last_name, year(tt.timesheet_date), month(tt.timesheet_date)
order by employee_last_name
What I need to add to my select statement is something to the effect of
sum(tt.timesheet_hours) as where cws_project_id like '%Training%' as training
In short I need to know the sum of hours an employee has contributed to a project where the cws_project_id contains the word Training. I know you cant add a where clause to a Sum but I cant seem to find another way to do it.
If this makes a difference I need to do this several times - ie where the project_name contains a different word.
Thank you so much for any help that can be provided. I hope that is not clear as mud.

Here is the general form of what you are looking for:
SELECT SUM(IF(x LIKE '%y%', z, 0)) AS ySum
even more general
SELECT SUM(IF([condition on row], [value or calculation from row], 0)) AS [partialSum]
Edit: For more RDBMS portability (earlier versions of MS SQL do not support this form of IF):
SELECT SUM(CASE WHEN [condition on row] THEN [value or calculation from row] ELSE 0 END) AS [partialSum]

Related

MySQL limitations to simplify Query

Please note that I'm an absolute n00b in MySQL but somehow I managed to build some (for me) complex queries that work as they should. My main problem now is that for a many of the queries we're working on:
The querie is becoming too big and very hard to see through.
The same subqueries get repeated many times and that is adding to the complexity (and probably to the time needed to process the query).
We want to further expand this query but we are reaching a point where we can no longer oversee what we are doing. I've added one of these subqueries at the end of this post, just as an example.
!! You can fast foward to the Problem section if you want to skip the details below. I think the question can be answered also without the additional info.
What we want to do
Create a MySQL query that calculates purchase orders and forecasts for a given supplier based on:
Sales history in a given period (past [x] months = interval)
Current stock
Items already in backorder (from supplier)
Reserved items (for customers)
Supplier ID
I've added an example of a subquery at the bottom of this message. We're showing just this part to keep things simple for now. The output of the subquery is:
Part number
Units sold
Units sold (outliers removed)
Units sold per month (outliers removed)
Number of invoices with the part number in the period (interval)
It works quite OK for us, although I'm sure it can be optimised. It removes outliers from the sales history (e.g. one customer that orders 50 pcs of one product in one order). Unfortunately it can only remove outliers with substantial data, so if the first order happens to be 50 pcs then it is not considered an outlier. For that reason we take the amount of invoices into account in the main query. The amount of invoices has to exceed a certain number otherwise the system wil revert to a fixed value of "maximum stock" for that product.
As mentioned this is only a small part of the complete query and we want to expand it even further (so that it takes into account the "sales history" of parts that where used in assembled products).
For example if we were to build and sell cars, and we want to place an
order with our tyre supplier, the query calculates the amount of tyres we need to order based on the sales history of the various car models (while also taking into account the stock of the cars, reserved cars and stock of the tyres).
Problem
The query is becomming massive and incomprehensible. We are repeating the same subqueries many times which to us seems highly inefficient and it is the main cause why the query is becomming so bulky.
What we have tried
(Please note that we are on MySQL 5.5.33. We will update our server soon but for now we are limited to this version.)
Create a VIEW from the subqueries.
The main issue here is that we can't execute the view with parameters like supplier_id and interval period. Our subquery calculates the sum of the sold items for a given supplier within the given period. So even if we would build the VIEW so that it calculates this for ALL products from ALL suppliers we would still have the issue that we can't define the interval period after the VIEW has been executed.
A stored procedure.
Correct me if I'm wrong but as far as I know, MySQL only allows us to perform a Call on a stored procedure so we still can't run it against the parameters (period, supplier id...)
Even this workaround won't help us because we still can't run the SP against the parameters.
Using WITH at the beginning of the query
A common table expression in MySQL is a temporary result whose scope is confined to a single statement. You can refer this expression multiple times with in the statement.
The WITH clause in MySQL is used to specify a Common Table Expression, a with clause can have one or more comms-separated subclauses.
Not sure if this would be the solution because we can't test it. WITH is not supported untill MySQL version 8.0.
What now?
My last resort would be to put the mentioned subqueries in a temp table before starting the main query. This might not completely eliminate our problems but at least the main query will be more comprehensible and with less repetition of fetching the same data. Would this be our best option or have I overlooked a more efficient way to tackle this?
Many thanks for your kind replies.
SELECT
GREATEST((verkocht_sd/6*((100 + 0)/100)),0) as 'units sold p/month ',
GREATEST(ROUND((((verkocht_sd/6)*3)-voorraad+reserved-backorder),0),0) as 'Order based on units sold',
SUM(b.aantal) as 'Units sold in period',
t4.verkocht_sd as 'Units sold in period, outliers removed',
COUNT(*) as 'Number of invoices in period',
b.art_code as 'Part number'
FROM bongegs b -- Table that has all the sales records for all products
RIGHT JOIN totvrd ON (totvrd.art_code = b.art_code) -- Right Join stock data to also include items that are not in table bongegs (no sales history).
LEFT JOIN artcred ON (artcred.art_code = b.art_code) -- add supplier ID to the part numbers.
LEFT JOIN
(
SELECT
SUM(b.aantal) as verkocht_sd,
b.art_code
FROM bongegs b
RIGHT JOIN totvrd ON (totvrd.art_code = b.art_code)
LEFT JOIN artcred ON (artcred.art_code = b.art_code)
WHERE
b.bon_datum > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
and b.bon_soort = "f" -- Selects only invoices
and artcred.vln = 1 -- 1 = Prefered supplier
and artcred.cred_nr = 9117 -- Supplier ID
and b.aantal < (select * from (SELECT AVG(b.aantal)+3*STDDEV(aantal)
FROM bongegs b
WHERE
b.bon_soort = 'f' and
b.bon_datum > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)) x)
GROUP BY b.art_code
) AS t4
ON (b.art_code = t4.art_code)
WHERE
b.bon_datum > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
and b.bon_soort = "f"
and artcred.vln = 1
and artcred.cred_nr = 9117
GROUP BY b.art_code
Bongegs | all rows from sales forms (invoices F, offers O, delivery notes V)
| art_code | bon_datum | bon_soort | aantal |
|:---------|:---------: |:---------:|:------:|
| item_1 | 2021-08-21 | f | 6 |
| item_2 | 2021-08-29 | v | 3 |
| item_6 | 2021-09-03 | o | 2 |
| item_4 | 2021-10-21 | f | 6 |
| item_1 | 2021-11-21 | o | 6 |
| item_3 | 2022-01-17 | v | 6 |
| item_1 | 2022-01-21 | o | 6 |
| item_4 | 2022-01-26 | f | 6 |
Artcred | supplier ID's
| art_code | vln | cred_nr |
|:---------|:----:|:-------:|
| item_1 | 1 | 1001 |
| item_2 | 1 | 1002 |
| item_3 | 1 | 1001 |
| item_4 | 1 | 1007 |
| item_5 | 1 | 1004 |
| item_5 | 2 | 1008 |
| item_6 | 1 | 1016 |
| item_7 | 1 | 1567 |
totvrd | stock
| art_code | voorraad | reserved | backorder |
|:---------|:---------: |:--------:|:---------:|
| item_1 | 1 | 0 | 5 |
| item_2 | 0 | 0 | 0 |
| item_3 | 88 | 0 | 0 |
| item_4 | 9 | 0 | 0 |
| item_5 | 67 | 2 | 20 |
| item_6 | 112 | 9 | 0 |
| item_7 | 65 | 0 | 0 |
| item_8 | 7 | 1 | 0 |
Now, on to the query. You have LEFT JOINs to the artcred table, but then include artcred in the WHERE clause making it an INNER JOIN (required both left and right tables) in the result. Was this intended, or are you expecting more records in the bongegs table that do NOT exist in the artcred.
Well to be honest I was not fully aware that this would essentially form an INNER JOIN but in this case it doesn't really matter. A record that exists in bongegs always exists in artcred as well (every sold product must have a supplier). That doesn't work both ways since a product can be in artcred without ever being sold.
You also have RIGHT JOIN on totvrd which implies you want every record in the TotVRD table regardless of a record in the bongegs table. Is this correct?
Yes it is intended. Otherwise only products with actual sales in the period would end up in the result and we also wanted to include products with zero sales.
One simplification:
and b.aantal < ( SELECT * from ( SELECT AVG ...
-->
and b.aantal < ( SELECT AVG ...
A personal problem: my brain hurts when I see RIGHT JOIN; please rewrite as LEFT JOIN.
Check you RIGHTs and LEFTs -- that keeps the other table's rows even if there is no match; are you expecting such NULLs? That is, it looks like they can all be plain JOINs (aka INNER JOINs).
These might help performance:
b: INDEX(bon_soort, bon_datum, aantal, art_code)
totvrd: INDEX(art_code)
artcred: INDEX(vln, cred_nr, art_code)
Is b the what you keep needing? Build a temp table:
CREATE TEMPORARY TABLE tmp_b
SELECT ...
FROM b
WHERE ...;
But if you need to use tmp_b multiple times in the same query, (and since you are not yet on MySQL 8.0), you may need to make it a non-TEMPORARY table for long enough to run the query. (If you have multiple connections building the same permanent table, there will be trouble.)
Yes, 5.5.33 is rather antique; upgrade soon.
(pre
By getting what I believe are all the pieces you had, I think this query significantly simplifies the query. Lets first start with the fact that you were trying to eliminate the outliers by selecting the standard deviation stuff as what to be excluded. Then you had the original summation of all sales also from the bongegs table.
To simplify this, I have the sub-query ONCE internal that does the summation, counts, avg, stddev of all orders (f) within the last 6 months. I also computed the divide by 6 for per-month you wanted in the top.
Since the bongegs is now all pre-aggregated ONCE, and grouped per art_code, it does not need to be done one after the other. You can use the totals directly at the top (at least I THINK is similar output without all actual data and understanding of your context).
So the primary table is the product table (Voorraad) and LEFT-JOINED to the pre-query of bongegs. This allows you to get all products regardless of those that have been sold.
Since the one aggregation prequery has the avg and stddev in it, you can simply apply an additional AND clause when joining based on the total sold being less than the avg/stddev context.
The resulting query below.
SELECT
-- appears you are looking for the highest percentage?
-- typically NOT a good idea to name columns starting with numbers,
-- but ok. Typically let interface/output name the columns to end-users
GREATEST((b.verkocht_sdperMonth * ((100 + 0)/100)),0) as 'units sold p/month',
-- appears to be the total sold divided by 6 to get monthly average over 6 months query of data
GREATEST( ROUND(
( (b.verkocht_sdperMonth * 3) - v.voorraad + v.reserved - v.backorder), 0), 0)
as 'Order based on units sold',
b.verkocht_sd as 'Units sold in period',
b.AvgStdDev as 'AvgStdDeviation',
b.NumInvoices as 'Number of invoices in period',
v.art_code as 'Part number'
FROM
-- stock, master inventory, regardless of supplier
-- get all products, even though not all may be sold
Voorraad v
-- LEFT join to pre-query of Bongegs pre-grouped by the art_code which appears
-- to be basis of all other joins, std deviation and average while at it
LEFT JOIN
(select
b.arc_code,
count(*) NumInvoices,
sum( b.aantal ) verkocht_sd,
sum( b.aantal ) / 6.0 verkocht_sdperMonth,
avg( b.aantal ) AvgSale,
AVG(b.aantal) + 3 * STDDEV( b.aantal) AvgStdDev
from
bongegs b
JOIN artcred ac
on b.art_code = ac.art_code
AND ac.vln = 1
and ac.cred_nr = 9117
where
-- only for ORDERS ('f') and within last 6 months
b.bon_soort = 'f'
AND b.bon_datum > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
group by
b.arc_code ) b
-- result is one entry per arc_code, thus preventing any Cartesian product
ON v.art_code = b.art_code
GROUP BY
v.art_code

Count substrings in SQL (in Digital Metaphors' ReportBuilder)

I'm trying to create a report in ReportBuilder (Digital Metaphors, not Microsoft) and I'm having trouble getting the SQL to do what I want.
I have one table with a field building:
| building |
+------------+
| WhiteHouse |
| TajMahal |
and another table with a field locations:
| id | locations |
+----+-----------------------------------------------------------------+
| 1 | WhiteHouse:RoseGarden,WhiteHouse:MapRoom,TajMahal:MainSanctuary |
| 2 | TajMahal:NorthGarden,WhiteHouse:GreenRoom |
I would like to create a table showing how many times each building is used in locations, like so:
| building | count |
+------------+-------+
| WhiteHouse | 3 |
| TajMahal | 2 |
The characters : and , are never used in building or room names. Even a quick-and-dirty solution that assumes that building names never appear in room names would be good enough for me.
Of course this would be easy to do in just about any sane programming language (total over something like /\bWhiteHouse:/); the trick will be getting RB to do it. Suggestions for workarounds are welcome.
it is possible to split locations string into pieces using the "," and ":" characters as seperators as follows in SQL Server with the help of a custom sql split function
select
p2.val,
count(p2.val)
from locations l
cross apply dbo.split(l.locations,',') p1
cross apply dbo.split(p1.val,':') p2
inner join building b
on b.building = p2.val
group by p2.val
I'm not sure there is a similar one in mysql, if so please check following solution as a template
You can try this, probably not the fastest, but certainly easier solution.
SELECT t1.building,
( SELECT SUM( ROUND( (LENGTH(t2.locations)
- LENGTH(REPLACE(t2.locations, concat(t1.building, ':'), ''))
) / (LENGTH(t1.building) + 1)
)
)
FROM table2 AS t2
) as count
FROM table1 as t1
SQL Fiddle Demo

How can I write this SQL query correctly?

I'm trying to figure out how to write a query to get the right results.
I'll keep it simple. First, the foundation:
I have two tables: DEALS and TASKS
A Deal can have 1 or more Task inside, so TASKS has a deal_id field.
Also, every Task has a time_start field (unix timestamp) and a completed field (1 or 0).
Ok. Now, what I need? In my view I need to show all deals with the "Next Task" column rendered.
So for every Deal, if I have a Task (one or more) it must show only the closest. If no Task, I'll render an alert.
Deal Title | Value | Next step
deal 1 | 1.000 | tomorrow at 11:00
deal 2 | 1.000 | NO TASK IN THIS DEAL
deal 3 | 1.500 | 12/03/2017 at 9:00
In this example, deal 1 has 3 tasks inside, but the nearest start tomorrow. I don't want "deal 1" repeated 3 times. <-- GROUP BY deals.id??
To get this right, currently, I run the deals query without JOIN and I'm using a custom PDO class to run a new query for tasks for every row.
But this is BAD! I have a new query for every row of the DEALS table.
I'm pretty sure that there is a way to write one single query to get this result.
PS: Don't care about the rendering of the text, I used "tomorrow" only to write the example, next_spet is the unix timestamp from the db ... I can easily use moment.js to format it correctly.
EDIT:
I'll provide the data inside the 2 tables, just to complete the example.
DEALS
ID | TITLE | VALUE
1 | Deal 1 | 1000
2 | Deal 2 | 1000
3 | Deal 3 | 1000
TASKS
ID | DEAL_ID | TITLE | TIME_START | COMPLETED
1 | 1 | Send Proposal | 1483678800 | 0
2 | 1 | Follow up | 1483441200 | 0
3 | 1 | Ask for referrals | 1484441200 | 0
4 | 2 | Send email | 1483678900 | 0
5 | 3 | Sort out meeting | 1483678900 | 0
NOTE: timestamps don't match with the results that I have written in the first table. They were just an example, I take the timestamp of the time_start field and format it in a human readable mode, but this isn't my question.
If you want the time of the next task, you can use a correlated subquery:
select d.*,
(select t.time_start
from tasks t
where t.deal_id = d.deal_id and t.time_start > d.time_col
order by t.time_start
limit 1
) as task_time_start
from deals d;
EDIT:
If you want the future, then just change the time comparison:
select d.*,
(select t.time_start
from tasks t
where t.deal_id = d.deal_id and t.completed = 0 and
t.time_start > now()
order by t.time_start
limit 1
) as task_time_start
from deals d;
Something like this should do it.
select
d.title as 'Deal Title',
d.value as 'Value',
IsNull(CONVERT(VARCHAR(50), min(t.time_start)), 'NO TASK IN THIS DEAL') as 'Next step'
from Deals d
left join Tasks t
on t.deal_id = d.deal_id
and t.is_completed = 0
and t.time_start > GetDate()
group by d.deal_id, d.title, d.value
Here's the SQLite equivalent:
select
d.title as 'Deal Title',
d.value as 'Value',
IfNull(min(t.time_start), 'NO TASK IN THIS DEAL') as 'Next step'
from Deals d
left join Tasks t
on t.deal_id = d.deal_id
and t.is_completed = 0
and t.time_start > now() --remove this line to include past Tasks
group by d.deal_id, d.title, d.value

How to select a MAX record based on multiple results from a unique ID in MySQL

I run a report in mysql which returns all active members and their associated plan selections. If a member is terminated, this is displayed on the report as well. The issue I am facing, however is when a member simply decides to 'change' plans. Our system effective 'terminates' the original plan and starts fresh with the new one while keeping the original enrollment information for the member.
Sample report data may look like the following:
MemberID | Last | First | Plan Code | Original Effective | Change Date | Term Date
--------------------------------------------------------------------------------------------
12345 | Smith | John | A | 05-01-2011 | | 06-01-2011
12345 | Smith | John | B | 06-01-2011 | |
In the example above, a member had plan code A from May 1, 2011 to June 1, 2011. On June 1, 2011, the member changed his coverage to Plan B (and is still active since no term or change data).
Ultimately, I need my report to generate the following instead of the 2 line items above:
MemberID | Last | First | Plan Code | Original Effective | Change Date | Term Date
--------------------------------------------------------------------------------------------
12345 | Smith | John | B | 05-01-2011 | 06-01-2011 |
which shows his original plan effective date, the date of the change, and leaves the termination field blank.
I know I can get into a single row by using Group By the Member ID and I can even add the change date into the appropriate field with a (IF COUNT() > 1) statement but I am not able to figure out how to show the correct plan code (C) or how to leave the term date blank keeping the original effective date.
Any ideas?
Although I hate columns with embedded spaces and having to tick them, this should do it for you. The PreQuery will detect how many policy entries there are, preserve the first and last too, grouped by member. Once THAT is done, it can re-join the the plan enrollment on the same member but matching ONLY for the last "Original Effective" date for the person... That will give you the correct Plan Code. Additionally, if the person was terminated, it's date would be filled in for you. If still employed, it will be blank on that record.
select STRAIGHT_JOIN
pe2.MemberID,
pe2.Last,
pe2.First,
pe2.`Plan Code`
PreQuery.PlanStarted `Original Effective`,
case when PreQuery.PlanEntries > 1 then PreQuery.LastChange end `Change Date`,
pe2.`Term Date`
from
( select
pe.MemberID,
count(*) as PlanEntries,
min( `pe`.`Original Effective` ) PlanStarted,
max( `pe`.`Original Effective`) LastChange
from
PlanEnrollment pe
group by
pe.MemberID ) PreQuery
join PlanEnrollment pe2
on PreQuery.MemberID = pe2.MemberID
AND PreQuery.LastChange = pe2.`Original Effective`
I remember having the problem, but not coming up with a GROUP BY-based solution ;)
Instead, I think you can do something like
SELECT memberid, plancode
FROM members
INNER JOIN plans ON members.id=plans.member_id
WHERE plans.id IN
(SELECT id FROM plans
WHERE member_id = members.id
ORDER BY date DESC
LIMIT 0,1)

MySQL: Return 0 if row doen't exist

I've been bashing my head on this for a while, so now I'm here :) I'm a SQL beginner, so maybe this will be easy for you guys...
I have this query:
SELECT COUNT(*) AS counter, recur,subscribe_date
FROM paypal_subscriptions
WHERE recur='monthly' and subscribe_date > "2010-07-16" and subscribe_date < "2010-07-23"
GROUP BY subscribe_date
ORDER BY subscribe_date
Now the dates I've shown above are hard coded, my application will supply a variable date range.
Right now I'm getting a result table where there is a value for that date.
counter |recur | subscribe_date
2 | Monthly | 2010-07-18
3 | Monthly | 2010-07-19
4 | Monthly | 2010-07-20
6 | Monthly | 2010-07-22
I'd like to return in the counter column if the date doesn't exist.
counter |recur | subscribe_date
0 | Monthly | 2010-07-16
0 | Monthly | 2010-07-17
2 | Monthly | 2010-07-18
3 | Monthly | 2010-07-19
4 | Monthly | 2010-07-20
0 | Monthly | 2010-07-21
6 | Monthly | 2010-07-22
0 | Monthly | 2010-07-23
Is this possible?
You will need a table of dates (new table added), and then you will have to do an outer join between that table and your query.
This question is also similar to another question. Answers can be quite similar.
Insert Dates in the return from a query where there is none
You will need a table of dates to group against. This is quite easy in MSSQL using CTE's like this - I'm not sure if MySQL has something similar?
Otherwise you will need to create a hard table as a one off exercise
EDIT : Give this a try:
SELECT COUNT(pp.subscribe_date) AS counter, dl.date, MIN(pp.recur)
FROM date_lookup dl
LEFT OUTER JOIN paypal pp
on (pp.subscribe_date = dl.date AND pp.recur ='monthly')
WHERE dl.date >= '2010-07-16' and dl.date <= '2010-07-23'
GROUP BY dl.date
ORDER BY dl.date
The subject of the query needs to be changed to the date_lookup table
(the order of the Left Outer Join becomes important)
Count(*) isn't going to work since the 'date' record always exists - need to count something in the PayPay table
pp.recur ='monthly' is now a join condition, not a filter because of the LOJ
Finally, showing pp.recur in the select list isn't going to work.
I've used an aggregate, but MIN(pp.recur) will return null if there are no PayPal records
What you could do when you parameterize your query is to just repeat the Recur Type Filter?
Again, plz excuse the MSSQL syntax
SELECT COUNT(pp.subscribe_date) AS counter, dl.date, #ppRecur
FROM date_lookup dl
LEFT OUTER JOIN paypal pp
on (pp.subscribe_date = dl.date AND pp.recur =#ppRecur)
WHERE dl.date >= #DateFrom and dl.date <= #DateTo
GROUP BY dl.date
ORDER BY dl.date
Since there was no easy way to do this, I had to have the application fill in the blanks for me rather than have the database return the data I wanted. I do get a performance hit for this, but it was necessary for the completion of the report.
I will definitely look into making this return what I want from the DB in the near future. I'll give nonnb's solution a try.
thanks everyone!