MySQL limitations to simplify Query - mysql

Please note that I'm an absolute n00b in MySQL but somehow I managed to build some (for me) complex queries that work as they should. My main problem now is that for a many of the queries we're working on:
The querie is becoming too big and very hard to see through.
The same subqueries get repeated many times and that is adding to the complexity (and probably to the time needed to process the query).
We want to further expand this query but we are reaching a point where we can no longer oversee what we are doing. I've added one of these subqueries at the end of this post, just as an example.
!! You can fast foward to the Problem section if you want to skip the details below. I think the question can be answered also without the additional info.
What we want to do
Create a MySQL query that calculates purchase orders and forecasts for a given supplier based on:
Sales history in a given period (past [x] months = interval)
Current stock
Items already in backorder (from supplier)
Reserved items (for customers)
Supplier ID
I've added an example of a subquery at the bottom of this message. We're showing just this part to keep things simple for now. The output of the subquery is:
Part number
Units sold
Units sold (outliers removed)
Units sold per month (outliers removed)
Number of invoices with the part number in the period (interval)
It works quite OK for us, although I'm sure it can be optimised. It removes outliers from the sales history (e.g. one customer that orders 50 pcs of one product in one order). Unfortunately it can only remove outliers with substantial data, so if the first order happens to be 50 pcs then it is not considered an outlier. For that reason we take the amount of invoices into account in the main query. The amount of invoices has to exceed a certain number otherwise the system wil revert to a fixed value of "maximum stock" for that product.
As mentioned this is only a small part of the complete query and we want to expand it even further (so that it takes into account the "sales history" of parts that where used in assembled products).
For example if we were to build and sell cars, and we want to place an
order with our tyre supplier, the query calculates the amount of tyres we need to order based on the sales history of the various car models (while also taking into account the stock of the cars, reserved cars and stock of the tyres).
Problem
The query is becomming massive and incomprehensible. We are repeating the same subqueries many times which to us seems highly inefficient and it is the main cause why the query is becomming so bulky.
What we have tried
(Please note that we are on MySQL 5.5.33. We will update our server soon but for now we are limited to this version.)
Create a VIEW from the subqueries.
The main issue here is that we can't execute the view with parameters like supplier_id and interval period. Our subquery calculates the sum of the sold items for a given supplier within the given period. So even if we would build the VIEW so that it calculates this for ALL products from ALL suppliers we would still have the issue that we can't define the interval period after the VIEW has been executed.
A stored procedure.
Correct me if I'm wrong but as far as I know, MySQL only allows us to perform a Call on a stored procedure so we still can't run it against the parameters (period, supplier id...)
Even this workaround won't help us because we still can't run the SP against the parameters.
Using WITH at the beginning of the query
A common table expression in MySQL is a temporary result whose scope is confined to a single statement. You can refer this expression multiple times with in the statement.
The WITH clause in MySQL is used to specify a Common Table Expression, a with clause can have one or more comms-separated subclauses.
Not sure if this would be the solution because we can't test it. WITH is not supported untill MySQL version 8.0.
What now?
My last resort would be to put the mentioned subqueries in a temp table before starting the main query. This might not completely eliminate our problems but at least the main query will be more comprehensible and with less repetition of fetching the same data. Would this be our best option or have I overlooked a more efficient way to tackle this?
Many thanks for your kind replies.
SELECT
GREATEST((verkocht_sd/6*((100 + 0)/100)),0) as 'units sold p/month ',
GREATEST(ROUND((((verkocht_sd/6)*3)-voorraad+reserved-backorder),0),0) as 'Order based on units sold',
SUM(b.aantal) as 'Units sold in period',
t4.verkocht_sd as 'Units sold in period, outliers removed',
COUNT(*) as 'Number of invoices in period',
b.art_code as 'Part number'
FROM bongegs b -- Table that has all the sales records for all products
RIGHT JOIN totvrd ON (totvrd.art_code = b.art_code) -- Right Join stock data to also include items that are not in table bongegs (no sales history).
LEFT JOIN artcred ON (artcred.art_code = b.art_code) -- add supplier ID to the part numbers.
LEFT JOIN
(
SELECT
SUM(b.aantal) as verkocht_sd,
b.art_code
FROM bongegs b
RIGHT JOIN totvrd ON (totvrd.art_code = b.art_code)
LEFT JOIN artcred ON (artcred.art_code = b.art_code)
WHERE
b.bon_datum > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
and b.bon_soort = "f" -- Selects only invoices
and artcred.vln = 1 -- 1 = Prefered supplier
and artcred.cred_nr = 9117 -- Supplier ID
and b.aantal < (select * from (SELECT AVG(b.aantal)+3*STDDEV(aantal)
FROM bongegs b
WHERE
b.bon_soort = 'f' and
b.bon_datum > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)) x)
GROUP BY b.art_code
) AS t4
ON (b.art_code = t4.art_code)
WHERE
b.bon_datum > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
and b.bon_soort = "f"
and artcred.vln = 1
and artcred.cred_nr = 9117
GROUP BY b.art_code
Bongegs | all rows from sales forms (invoices F, offers O, delivery notes V)
| art_code | bon_datum | bon_soort | aantal |
|:---------|:---------: |:---------:|:------:|
| item_1 | 2021-08-21 | f | 6 |
| item_2 | 2021-08-29 | v | 3 |
| item_6 | 2021-09-03 | o | 2 |
| item_4 | 2021-10-21 | f | 6 |
| item_1 | 2021-11-21 | o | 6 |
| item_3 | 2022-01-17 | v | 6 |
| item_1 | 2022-01-21 | o | 6 |
| item_4 | 2022-01-26 | f | 6 |
Artcred | supplier ID's
| art_code | vln | cred_nr |
|:---------|:----:|:-------:|
| item_1 | 1 | 1001 |
| item_2 | 1 | 1002 |
| item_3 | 1 | 1001 |
| item_4 | 1 | 1007 |
| item_5 | 1 | 1004 |
| item_5 | 2 | 1008 |
| item_6 | 1 | 1016 |
| item_7 | 1 | 1567 |
totvrd | stock
| art_code | voorraad | reserved | backorder |
|:---------|:---------: |:--------:|:---------:|
| item_1 | 1 | 0 | 5 |
| item_2 | 0 | 0 | 0 |
| item_3 | 88 | 0 | 0 |
| item_4 | 9 | 0 | 0 |
| item_5 | 67 | 2 | 20 |
| item_6 | 112 | 9 | 0 |
| item_7 | 65 | 0 | 0 |
| item_8 | 7 | 1 | 0 |
Now, on to the query. You have LEFT JOINs to the artcred table, but then include artcred in the WHERE clause making it an INNER JOIN (required both left and right tables) in the result. Was this intended, or are you expecting more records in the bongegs table that do NOT exist in the artcred.
Well to be honest I was not fully aware that this would essentially form an INNER JOIN but in this case it doesn't really matter. A record that exists in bongegs always exists in artcred as well (every sold product must have a supplier). That doesn't work both ways since a product can be in artcred without ever being sold.
You also have RIGHT JOIN on totvrd which implies you want every record in the TotVRD table regardless of a record in the bongegs table. Is this correct?
Yes it is intended. Otherwise only products with actual sales in the period would end up in the result and we also wanted to include products with zero sales.

One simplification:
and b.aantal < ( SELECT * from ( SELECT AVG ...
-->
and b.aantal < ( SELECT AVG ...
A personal problem: my brain hurts when I see RIGHT JOIN; please rewrite as LEFT JOIN.
Check you RIGHTs and LEFTs -- that keeps the other table's rows even if there is no match; are you expecting such NULLs? That is, it looks like they can all be plain JOINs (aka INNER JOINs).
These might help performance:
b: INDEX(bon_soort, bon_datum, aantal, art_code)
totvrd: INDEX(art_code)
artcred: INDEX(vln, cred_nr, art_code)
Is b the what you keep needing? Build a temp table:
CREATE TEMPORARY TABLE tmp_b
SELECT ...
FROM b
WHERE ...;
But if you need to use tmp_b multiple times in the same query, (and since you are not yet on MySQL 8.0), you may need to make it a non-TEMPORARY table for long enough to run the query. (If you have multiple connections building the same permanent table, there will be trouble.)
Yes, 5.5.33 is rather antique; upgrade soon.
(pre

By getting what I believe are all the pieces you had, I think this query significantly simplifies the query. Lets first start with the fact that you were trying to eliminate the outliers by selecting the standard deviation stuff as what to be excluded. Then you had the original summation of all sales also from the bongegs table.
To simplify this, I have the sub-query ONCE internal that does the summation, counts, avg, stddev of all orders (f) within the last 6 months. I also computed the divide by 6 for per-month you wanted in the top.
Since the bongegs is now all pre-aggregated ONCE, and grouped per art_code, it does not need to be done one after the other. You can use the totals directly at the top (at least I THINK is similar output without all actual data and understanding of your context).
So the primary table is the product table (Voorraad) and LEFT-JOINED to the pre-query of bongegs. This allows you to get all products regardless of those that have been sold.
Since the one aggregation prequery has the avg and stddev in it, you can simply apply an additional AND clause when joining based on the total sold being less than the avg/stddev context.
The resulting query below.
SELECT
-- appears you are looking for the highest percentage?
-- typically NOT a good idea to name columns starting with numbers,
-- but ok. Typically let interface/output name the columns to end-users
GREATEST((b.verkocht_sdperMonth * ((100 + 0)/100)),0) as 'units sold p/month',
-- appears to be the total sold divided by 6 to get monthly average over 6 months query of data
GREATEST( ROUND(
( (b.verkocht_sdperMonth * 3) - v.voorraad + v.reserved - v.backorder), 0), 0)
as 'Order based on units sold',
b.verkocht_sd as 'Units sold in period',
b.AvgStdDev as 'AvgStdDeviation',
b.NumInvoices as 'Number of invoices in period',
v.art_code as 'Part number'
FROM
-- stock, master inventory, regardless of supplier
-- get all products, even though not all may be sold
Voorraad v
-- LEFT join to pre-query of Bongegs pre-grouped by the art_code which appears
-- to be basis of all other joins, std deviation and average while at it
LEFT JOIN
(select
b.arc_code,
count(*) NumInvoices,
sum( b.aantal ) verkocht_sd,
sum( b.aantal ) / 6.0 verkocht_sdperMonth,
avg( b.aantal ) AvgSale,
AVG(b.aantal) + 3 * STDDEV( b.aantal) AvgStdDev
from
bongegs b
JOIN artcred ac
on b.art_code = ac.art_code
AND ac.vln = 1
and ac.cred_nr = 9117
where
-- only for ORDERS ('f') and within last 6 months
b.bon_soort = 'f'
AND b.bon_datum > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
group by
b.arc_code ) b
-- result is one entry per arc_code, thus preventing any Cartesian product
ON v.art_code = b.art_code
GROUP BY
v.art_code

Related

how to sum amount based on 2 columns that are both dates in mysql

i have a tbl_remit where i need to get the last remittance.
I'm developing as system wherein I need to get the potential collection of each Employer using the Employer's last remittance x 12. Ideally, Employers should remit once every month. But there are cases where an Employer remits again for the same month for the additional employee that is newly hired. The Mysql Statement that I used was this.
SELECT Employer, MAX(AP_From) as AP_From,
MAX(AP_To) as AP_To,
MAX(Amount) as Last_Remittance,
(MAX(Amount) *12) AS LastRemit_x12
FROM view_remit
GROUP BY PEN
Result
|RemitNo.| Employer | ap_from | ap_to | amount |
| 1 | 1 |2016-01-01 |2016-01-31 | 2000 |
| 2 | 1 |2016-02-01 |2016-02-28 | 2000 |
| 3 | 1 |2016-03-01 |2016-03-31 | 2000 |
| 4 | 1 |2016-03-01 |2016-03-31 | 400 |
By doing that statement, i ended up getting the wrong potential collection.
What I've got:
400 - Last_Remittance
4800 - LastRemit_x12 (potential collection)
What I need to get:
2400 - Last_Remittance
28800 - LastRemit_x12 (potential collection)
Any help is greatly appreciated. I don't have a team in this project. this may be a novice question to some but to me it's really a complex puzzle. thank you in advance.
You want to filter the data for the last time period. So, think where rather than group by. Then, you want to aggregate by employer.
Here is one method:
SELECT Employer, MAX(AP_From) as AP_From, MAX(AP_To) as AP_To,
SUM(Amount) as Last_Remittance,
(SUM(Amount) * 12) AS LastRemit_x12
FROM view_remit vr
WHERE vr.ap_from = (SELECT MAX(vr2.ap_from)
FROM view_remit vr2
WHERE vr2.Employer = vr.Employer
)
GROUP BY Employer;
EDIT:
For performance, you want an index on view_remit(Employer, ap_from). Of course, that assumes that view_remit is really a table . . . which may be unlikely.
If you want to improve performance, you'll need to understand the view.

Find date range overlaps within the same table, for specific user MySQL

I am by no means an MySQL expert, so I am looking for any help on this matter.
I need to perform a simple test (in principle), I have this (simplified) table:
tableid | userid | car | From | To
--------------------------------------------------------
1 | 1 | Fiesta | 2015-01-01 | 2015-01-31
2 | 1 | MX5 | 2015-02-01 | 2015-02-28
3 | 1 | Navara | 2015-03-01 | 2015-03-31
4 | 1 | GTR | 2015-03-28 | 2015-04-30
5 | 2 | Focus | 2015-01-01 | 2015-01-31
6 | 2 | i5 | 2015-02-01 | 2015-02-28
7 | 2 | Aygo | 2015-03-01 | 2015-03-31
8 | 2 | 206 | 2015-03-29 | 2015-04-30
9 | 1 | Skyline | 2015-04-29 | 2015-05-31
10 | 2 | Skyline | 2015-04-29 | 2015-05-31
I need to find two things here:
If any user has date overlaps in his car assignments of more than one day (end of the assignment can be on the same day as the new assignment start).
Did any two users tried to get the same car assigned on the same date, or the date ranges overlap for them on the same car.
So the query (or queries) I am looking for should return those rows:
tableid | userid | car | From | To
--------------------------------------------------------
3 | 1 | Navara | 2015-03-01 | 2015-03-31
4 | 1 | GTR | 2015-03-28 | 2015-04-30
7 | 2 | Aygo | 2015-03-01 | 2015-03-31
8 | 2 | 206 | 2015-03-29 | 2015-04-30
9 | 1 | Skyline | 2015-04-29 | 2015-05-31
10 | 2 | Skyline | 2015-04-29 | 2015-05-31
I feel like I am bashing my head against the wall here, I would be happy with being able to do these comparisons in separate queries. I need to display them in one table but I could always then join the results.
I've done research and few hours of testing but I cant get nowhere near the result I want.
SQLFiddle with the above test data
I've tried these posts btw (they were not exactly what I needed but were close enough, or so I thought):
Comparing two date ranges within the same table
How to compare values of text columns from the same table
This was the closest solution I could find but when I tried it on a single table (joining table to itself) I was getting crazy results: Checking a table for time overlap?
EDIT
As a temporary solution I have adapted a different approach, similar to the posts I have found during my research (above). I will now check if the new car rental / assignment date overlaps with any date range within the table. If so I will save the id(s) of the rows that the date overlaps with. This way at least I will be able to flag overlaps and allow a user to look at the flagged rows and to resolve any overlaps manually.
Thanks to everyone who offered their help with this, I will flag philipxy answer as the chosen one (in next 24h) unless someone has better way of achieving this. I have no doubt that following his answer I will be able to eventually reach the results I need. At the moment though I need to adopt any solution that works as I need to finish my project in next few days, hence the change of approach.
Edit #2
The both answers are brilliant and to anyone who finds this post having the same issue as I did, read them both and look at the fiddles! :) A lot of amazing brain-work went into them! Temporarily I had to go with the solution I mention in #1 Edit of mine but I will be adapting my queries to go with #Ryan Vincent approach + #philipxy edits/comments about ignoring the initial one day overlap.
Here is the first part: Overlapping cars per user...
SQLFiddle - correlated Query and Join Query
Second part - more than one user in one car at the same time: SQLFiddle - correlated Query and Join Query. Query below...
I use the correlated queries:
You will likely need indexes on userid and 'car'. However - please check the 'explain plan' to see how it mysql is accessing the data. And just try it :)
Overlapping cars per user
The query:
SELECT `allCars`.`userid` AS `allCars_userid`,
`allCars`.`car` AS `allCars_car`,
`allCars`.`From` AS `allCars_From`,
`allCars`.`To` AS `allCars_To`,
`allCars`.`tableid` AS `allCars_id`
FROM
`cars` AS `allCars`
WHERE
EXISTS
(SELECT 1
FROM `cars` AS `overlapCar`
WHERE
`allCars`.`userid` = `overlapCar`.`userid`
AND `allCars`.`tableid` <> `overlapCar`.`tableid`
AND NOT ( `allCars`.`From` >= `overlapCar`.`To` /* starts after outer ends */
OR `allCars`.`To` <= `overlapCar`.`From`)) /* ends before outer starts */
ORDER BY
`allCars`.`userid`,
`allCars`.`From`,
`allCars`.`car`;
The results:
allCars_userid allCars_car allCars_From allCars_To allCars_id
-------------- ----------- ------------ ---------- ------------
1 Navara 2015-03-01 2015-03-31 3
1 GTR 2015-03-28 2015-04-30 4
1 Skyline 2015-04-29 2015-05-31 9
2 Aygo 2015-03-01 2015-03-31 7
2 206 2015-03-29 2015-04-30 8
2 Skyline 2015-04-29 2015-05-31 10
Why it works? or How I think about it:
I use the correlated query so I don't have duplicates to deal with and it is probably the easiest to understand for me. There are other ways of expressing the query. Each has advantages and drawbacks. I want something I can easily understand.
Requirement: For each user ensure that they don't have two or more cars at the same time.
So, for each user record (AllCars) check the complete table (overlapCar) to see if you can find a different record that overlaps for the time of the current record. If we find one then select the current record we are checking (in allCars).
Therefore the overlap check is:
the allCars userid and the overLap userid must be the same
the allCars car record and the overlap car record must be different
the allCars time range and the overLap time range must overlap.
The time range check:
Instead of checking for overlapping times use positive tests. The easiest approach, is to check it doesn't overlap, and apply a NOT to it.
One car with More than One User at the same time...
The query:
SELECT `allCars`.`car` AS `allCars_car`,
`allCars`.`userid` AS `allCars_userid`,
`allCars`.`From` AS `allCars_From`,
`allCars`.`To` AS `allCars_To`,
`allCars`.`tableid` AS `allCars_id`
FROM
`cars` AS `allCars`
WHERE
EXISTS
(SELECT 1
FROM `cars` AS `overlapUser`
WHERE
`allCars`.`car` = `overlapUser`.`car`
AND `allCars`.`tableid` <> `overlapUser`.`tableid`
AND NOT ( `allCars`.`From` >= `overlapUser`.`To` /* starts after outer ends */
OR `allCars`.`To` <= `overlapUser`.`From`)) /* ends before outer starts */
ORDER BY
`allCars`.`car`,
`allCars`.`userid`,
`allCars`.`From`;
The results:
allCars_car allCars_userid allCars_From allCars_To allCars_id
----------- -------------- ------------ ---------- ------------
Skyline 1 2015-04-29 2015-05-31 9
Skyline 2 2015-04-29 2015-05-31 10
Edit:
In view of the comments, by #philipxy , about time ranges needing 'greater than or equal to' checks I have updated the code here. I havn't changed the SQLFiddles.
For each input and output table find its meaning. Ie a statement template parameterized by column names, aka predicate, that a row makes into a true or false statement, aka proposition. A table holds the rows that make its predicate into a true proposition. Ie rows that make a true proposition go in a table and rows that make a false proposition stay out. Eg for your input table:
rental [tableid] was user [userid] renting car [car] from [from] to [to]
Then phrase the output table predicate in terms of the input table predicate. Don't use descriptions like your 1 & 2:
If any user has date overlaps in his car assignments of more than one day (end of the assignment can be on the same day as the new assignment start).
Instead find the predicate that an arbitrary row states when in the table:
rental [tableid] was user [user] renting car [car] from [from] to [to]
in self-conflict with some other rental
For the DBMS to calculate the rows making this true we must express this in terms of our given predicate(s) plus literals & conditions:
-- query result holds the rows where
FOR SOME t2.tableid, t2.userid, ...:
rental [t1.tableid] was user [t1.userid] renting car [t1.car] from [t1.from] to [t1.to]
AND rental [t2.tableid] was user [t2.userid] renting car [t2.car] from [t2.from] to [t2.to]
AND [t1.userid] = [t2.userid] -- userids id the same users
AND [t1.to] > [t2.from] AND ... -- tos/froms id intervals with overlap more than one day
...
(Inside an SQL SELECT statement the cross product of JOINed tables has column names of the form alias.column. Think of . as another character allowed in column names. Finally the SELECT clause drops the alias.s.)
We convert a query predicate to an SQL query that calculates the rows that make it true:
A table's predicate gets replaced by the table alias.
To use the same predicate/table multiple times make aliases.
Changing column old to new in a predicate adds ANDold=new.
AND of predicates gets replaced by JOIN.
OR of predicates gets replaced by UNION.
AND NOT of predicates gets replaced by EXCEPT, MINUS or appropriate LEFT JOIN.
ANDcondition gets replaced by WHERE or ON condition.
For a predicate true FOR SOMEcolumns to drop or when THERE EXISTScolumns to drop, SELECT DISTINCTcolumns to keep.
Etc. (See this.)
Hence (completing the ellipses):
SELECT DISTINCT t1.*
FROM t t1 JOIN t t2
ON t1.userid = t1.userid -- userids id the same users
WHERE t1.to > t2.from AND t2.to > t1.from -- tos/froms id intervals with overlap more than one day
AND t1.tableid <> t2.tableid -- tableids id different rentals
Did any two users tried to get the same car assigned on the same date, or the date ranges overlap for them on the same car.
Finding the predicate that an arbitrary row states when in the table:
rental [tableid] was user [user] renting car [car] from [from] to [to]
in conflict with some other user's rental
In terms of our given predicate(s) plus literals & conditions:
-- query result holds the rows where
FOR SOME t2.*
rental [t1.tableid] was user [t1.userid] renting car [t1.car] from [t1.from] to [t1.to]
AND rental [t2.tableid] was user [t2.userid] renting car [t2.car] from [t2.from] to [t2.to]
AND [t1.userid] <> [t2.userid] -- userids id different users
AND [t1.car] = [t2.car] -- .cars id the same car
AND [t1.to] >= [t2.from] AND [t2.to] >= [t1.from] -- tos/froms id intervals with any overlap
AND [t1.tableid] <> [t2.tableid] -- tableids id different rentals
The UNION of queries for predicates 1 & 2 returns the rows for which predicate 1ORpredicate 2.
Try to learn to express predicates--what rows state when in tables--if only as the goal for intuitive (sub)querying.
PS It is good to always have data checking edge & non-edge cases for a condition being true & being false. Eg try query 1 with GTR starting on the 31st, an overlap of only one day, which should not be a self-conflict.
PPS Querying involving duplicate rows, as with NULLs, has quite complex query meanings. It's hard to say when a tuple goes in or stays out of a table and how many times. For queries to have the simple intuitive meanings per my correspondences they can't have duplicates. Here SQL unfortunately differs from the relational model. In practice people rely on idioms when allowing non-distinct rows & they rely on rows being distinct because of constraints. Eg joining on UNIQUE columns per UNIQUEs, PKs & FKs. Eg: A final DISTINCT step is only doing work at a different time than a version that doesn't need it; time might or might not be be an important implementation issue affecting the phrasing chosen for a given predicate/result.

Fewest grouped by distinct - SQL

Ok, I think the answer of this is somewhere but I can't find it...
(and even my title is bad)
To be short, I want to get the fewest number of group I can make from a part of an association table
1st, Keep in mind this is already a result of a 5 table (+1k line) join with filter and grouping, that I'll have to run many time on a prod server as powerful as a banana...
2nd, This is a fake case that picture you my problem
After some Querying, I've got this data result :
+--------------------+
|id_course|id_teacher|
+--------------------+
| 6 | 1 |
| 6 | 4 |
| 6 | 14 |
| 33 | 1 |
| 33 | 4 |
| 34 | 1 |
| 34 | 4 |
| 34 | 10 |
+--------------------+
As you can see, I've got 3 courses, witch are teach by up to 3 teacher. I need to attend at one of every course, but I want as few different teacher as possible (I'm shy...).
My first query
Should answer : what is the smallest number of teacher I need to cover every unique course ?
With this data, it's a 1, cause Teacher 1 or Teacher 4 make courses for these 3 one.
Second query
Now that I've already get these courses, I want to go to two other courses, the 32 and the 50, with this schedule :
+--------------------+
|id_course|id_teacher|
+--------------------+
| 32 | 1 |
| 32 | 12 |
| 50 | 12 |
+--------------------+
My question is : For id_course N, will I have to get one more teacher ?
I want to check course by course, so "check for course 32", no need to check many at the same time
The best way I think is to count an inner join with a list of teacher of same fewest rank from the first query, so with our data we got only two : Teacher(1, 4).
For the Course 32, Teacher2 don't do this one, but as the Teacher1 do Courses(6, 33, 34, 32) I don't have to get another teacher.
For the Course 50, the only teacher to do it is the Teacher12, so I'll not find a match in my choice of teacher, and I'll have to get one more (so two in total with these data)
Here is a base [SQLFiddle
Best regards, Blag
You want to get a distinct count of ID_Teachers with the least count then... get a distinct count and limit the results to 1 record.
So perhaps something like...
SELECT count(Distinct ID_Teacher), Group_concat(ID_Teacher) as TeachersIDs
FROM Table
WHERE ID_Course in ('Your List')
ORDER BY count(Distinct ID_Teacher) ASC Limit 1
However this will randomly select if a tie exists... so do you want to provide the option to select which group of teachers and classes should ties exist? Meaning there are multiple paths to fulfill all classes involving the same number of teachers... For example teachers A, B and A, C fulfill all required classes.... should both records return in the result or is 1 sufficient?
So I've finally found a way to do what I want !
For the first query, as my underlying real need was "is there a single teacher to do everything", I've lower a bit my expectation and go for this one (58 lines on my true case u_u") :
SELECT
(
SELECT count(s.id_teacher) nb
FROM t AS m
INNER JOIN t AS s
ON m.id_teacher = s.id_teacher
GROUP BY m.id_course, m.id_teacher
ORDER BY nb DESC
LIMIT 1
) AS nbMaxBySingleTeacher,
(
SELECT COUNT(DISTINCT id_course) nb
FROM t
) AS nbTotalCourseToDo
[SQLFiddle
And I get back two value that answer my question "is one teacher enough ?"
+--------------------------------------+
|nbMaxBySingleTeacher|nbTotalCourseToDo|
+--------------------------------------+
| 4 | 5 |
+--------------------------------------+
The 2nd query use the schedule of new course, and take the id of one I want to check. It should tell me if I need to get one more teacher, or if it's ok with my actual(s) one.
SELECT COUNT(*) nb
FROM (
SELECT
z.id_teacher
FROM z
WHERE
z.id_course = 50
) t1
WHERE
FIND_IN_SET(t1.id_teacher, (
SELECT GROUP_CONCAT(t2.id_teacher) lst
FROM (
SELECT DISTINCT COUNT(s.id_teacher) nb, m.id_teacher
FROM t AS m
INNER JOIN t AS s
ON m.id_teacher = s.id_teacher
GROUP BY m.id_course, m.id_teacher
ORDER BY nb DESC
) t2
GROUP BY t2.nb
ORDER BY nb DESC
LIMIT 1
));
[SQLFiddle
This tell me the number of teacher that are able to teach the courses I already have AND the new one I want. So if it's over zero, then I don't need a new teacher :
+--+
|nb|
+--+
|1 |
+--+

SQL calculating difference between columns

I'm a bit of a newby at SQL and I don't really understand what to do here, so any help is really appreciated. I have a table full of readings from different readers, there's like 500.000 of them, so I can't do this by hand.
I received the table without the difference in it. I managed to calculate it, but there's a bit of a problem there...
It looks a bit like this:
reader_id | date | reading | difference
1 | 01-01-2013 | 205 | 0
1 | 02-01-2013 | 210 | 5
1 | 03-01-2013 | 213 | 3
... | ... | ... | ...
1 | 31-12-2013 | 2451 | 4
2 | 01-01-2013 | 8543 | 6092
2 | 02-01-2013 | 8548 | 5
reader_id and date form the primary key. The combination is unique.
How can I make sure I don't get the difference calculated when the last column contained a different reader_id?
When querying my data with a query like this one, the data get skewed by the incorrect difference between the two reader_ids:
SELECT AVG(difference), reader_id FROM table GROUP BY reader_id
For
I just want to get the average difference for each reader.
your query is perfectly good. I think you got something wrong in your difference calculation. The first value for reader_id=2, 6092, is the difference of the last reading from reader1 and the first reading from reader 2, i don't think that makes sense. If i'm not mistaken, the difference value is the current day reading - previous day reading. Therefore you should set the difference value of the first reading of each reader to 0.
You can do this with the following query:
UPDATE table t INNER JOIN (SELECT reader_id, min(date) as first_day FROM table GROUP BY reader_id) as tmp ON tmp.reader_id=t.reader_id AND tmp.first_day=t.date SET t.difference=0
Then
SELECT AVG(difference), reader_id FROM table GROUP BY reader_id
will do what you expect.
If you simply want the average difference, you can use the following query:
SELECT
meter_id,
MAX(reading) - MIN(reading) / COUNT(*) average_difference
FROM table
GROUP BY meter_id
ORDER BY meter_id;
It works on the logic that the the total difference for a given meter_id should be equal to MAX(reading) - MIN(reading).

MySQL: Return 0 if row doen't exist

I've been bashing my head on this for a while, so now I'm here :) I'm a SQL beginner, so maybe this will be easy for you guys...
I have this query:
SELECT COUNT(*) AS counter, recur,subscribe_date
FROM paypal_subscriptions
WHERE recur='monthly' and subscribe_date > "2010-07-16" and subscribe_date < "2010-07-23"
GROUP BY subscribe_date
ORDER BY subscribe_date
Now the dates I've shown above are hard coded, my application will supply a variable date range.
Right now I'm getting a result table where there is a value for that date.
counter |recur | subscribe_date
2 | Monthly | 2010-07-18
3 | Monthly | 2010-07-19
4 | Monthly | 2010-07-20
6 | Monthly | 2010-07-22
I'd like to return in the counter column if the date doesn't exist.
counter |recur | subscribe_date
0 | Monthly | 2010-07-16
0 | Monthly | 2010-07-17
2 | Monthly | 2010-07-18
3 | Monthly | 2010-07-19
4 | Monthly | 2010-07-20
0 | Monthly | 2010-07-21
6 | Monthly | 2010-07-22
0 | Monthly | 2010-07-23
Is this possible?
You will need a table of dates (new table added), and then you will have to do an outer join between that table and your query.
This question is also similar to another question. Answers can be quite similar.
Insert Dates in the return from a query where there is none
You will need a table of dates to group against. This is quite easy in MSSQL using CTE's like this - I'm not sure if MySQL has something similar?
Otherwise you will need to create a hard table as a one off exercise
EDIT : Give this a try:
SELECT COUNT(pp.subscribe_date) AS counter, dl.date, MIN(pp.recur)
FROM date_lookup dl
LEFT OUTER JOIN paypal pp
on (pp.subscribe_date = dl.date AND pp.recur ='monthly')
WHERE dl.date >= '2010-07-16' and dl.date <= '2010-07-23'
GROUP BY dl.date
ORDER BY dl.date
The subject of the query needs to be changed to the date_lookup table
(the order of the Left Outer Join becomes important)
Count(*) isn't going to work since the 'date' record always exists - need to count something in the PayPay table
pp.recur ='monthly' is now a join condition, not a filter because of the LOJ
Finally, showing pp.recur in the select list isn't going to work.
I've used an aggregate, but MIN(pp.recur) will return null if there are no PayPal records
What you could do when you parameterize your query is to just repeat the Recur Type Filter?
Again, plz excuse the MSSQL syntax
SELECT COUNT(pp.subscribe_date) AS counter, dl.date, #ppRecur
FROM date_lookup dl
LEFT OUTER JOIN paypal pp
on (pp.subscribe_date = dl.date AND pp.recur =#ppRecur)
WHERE dl.date >= #DateFrom and dl.date <= #DateTo
GROUP BY dl.date
ORDER BY dl.date
Since there was no easy way to do this, I had to have the application fill in the blanks for me rather than have the database return the data I wanted. I do get a performance hit for this, but it was necessary for the completion of the report.
I will definitely look into making this return what I want from the DB in the near future. I'll give nonnb's solution a try.
thanks everyone!