I have two MySQL tables
Payments:
| employeeID | period_begin | period_end |
and
Services:
| serviceID | date | employeeID |
I need to find all serviceIDs performed by a given employee whose date is not between any of the period ranges described in Payments. So, for example if I have the following records on Payments and Services for employee 10000:
Payments:
| employeeID | period_begin | period_end |
...
| 10000 | 2013-05-01 | 2013-05-16 |
| 10000 | 2013-05-17 | 2013-06-02 |
| 10000 | 2013-07-01 | 2013-07-16 |
| 10000 | 2013-07-17 | 2013-08-02 |
...
Services:
| serviceID | date | employeeID |
...
| 2001 | 2013-01-01 | 10000 |
| 2002 | 2013-05-15 | 10000 |
| 2003 | 2013-06-01 | 10000 |
| 2004 | 2013-07-10 | 10000 |
| 2005 | 2013-08-01 | 10000 |
...
The output should be
2001,
2003,
2005
because the dates for services 2002, 2004 are in one of the intervals in the Payments table.
Any ideas? I'm having trouble checking that a service's date is not accounted for one of the intervals recorded on the Payments table. I'm currently joining Services and Payments on employeeID and stating the date condition there, but I'm not getting the right answer; I should probably be joining on a different condition:
select distinct serviceID from Services as X left join Payments as Y on
(X.employeeID=Y.employeeID AND (X.date < Y.period_begin OR X.date > Y.period_end))
where X.employeeID='10000';
is not working.
... AND X.date < Y.period_begin and X.date > Y.period_end
Is obviously impossible (the date cannot be before the start date and after the end date...)
You probably want to write:
... AND (X.date < Y.period_begin or X.date > Y.period_end)
Please wrap the "OR" expression is parenthesis. I think this is important regarding operator precedence and it improves readability.
EDIT: As suggested by #BigToach in a comment, you could use NOT BETWEEN ... AND ... (if the AND word is not too confusing in that context):
... AND (X.date NOT BETWEEN Y.period_begin AND Y.period_end)
Concerning the main problem which is "select the services not in any payment range" you could use a LEFT JOIN to keep only the row that does not have a corresponding payment range:
SELECT Services.* FROM Services
LEFT JOIN
(SELECT serviceID,period_begin FROM Services
JOIN Payments
USING (employeeID)
WHERE employeeID = #EID
AND the_date BETWEEN Payments.period_begin AND Payments.period_end
) AS X
USING (serviceID)
WHERE employeeID = #EID
AND period_begin is NULL;
Or, use a subquery -- somewhat more readable, but usually less efficient:
SELECT Services.* FROM Services
WHERE employeeID = #EID
AND serviceID NOT IN (SELECT serviceID FROM Services JOIN Payments
USING (employeeID)
WHERE employeeID = #EID
AND the_date BETWEEN Payments.period_begin AND Payments.period_end);
See http://sqlfiddle.com/#!2/a3557/11
As indicated by BigToach and Sylvain Leroux, there was clearly a logic-typo confusing an AND with an OR. However, once fixed, that query still doesn't give the right answer. I managed to get the right solution by first finding all serviceIDs for contained in some interval, and then excluding those from the list of all serviceIDs:
select distinct serviceID from Services as X left join Payments as Y on
(X.employeeID=Y.employeeID) where X.employeeID='10000'
and X.serviceID not in (
select serviceID from Services as Z join Payments as W on
(Z.employeeID=W.employeeID and
(Z.date between W.period_begin and W.period_end))
where Z.employeeID='10000'
);
This query, however, is not very pretty as I'm really doing two queries, but it's basically the same thing than Sylvain Leroux first answer, perhaps a bit more human-readable. Maybe there is yet another way we all haven't seen yet. Sylvain's subquery is indeed very nice.
Related
I'm very average with MySQL, but usually I can write all the needed queries after reading documentation and searching for examples. Now, I'm in the situation where I spent 3 days re-searching and re-writing queries, but I can't get it to work the exact way I need. Here's the deal:
1st table (mpt_companies) contains companies:
| company_id | company_title |
------------------------------
| 1 | Company A |
| 2 | Company B |
2nd table (mpt_payment_methods) contains payment methods:
| payment_method_id | payment_method_title |
--------------------------------------------
| 1 | Cash |
| 2 | PayPal |
| 3 | Wire |
3rd table (mpt_payments) contains payments for each company:
| payment_id | company_id | payment_method_id | payment_amount |
----------------------------------------------------------------
| 1 | 1 | 1 | 10.00 |
| 2 | 2 | 3 | 15.00 |
| 3 | 1 | 1 | 20.00 |
| 4 | 1 | 2 | 10.00 |
I need to list each company along with many stats. One of stats is the sum of payments in each payment method. In other words, the result should be:
| company_id | company_title | payment_data |
--------------------------------------------------------
| 1 | Company A | Cash:30.00,PayPal:10.00 |
| 2 | Company B | Wire:15.00 |
Obviously, I need to:
Select all the companies;
Join payments for each company;
Join payment methods for each payment;
Calculate sum of payments in each method;
GROUP_CONCAT payment methods and sums;
Unfortunately, SUM() doesn't work with GROUP_CONCAT. Some solutions I found on this site suggest using CONCAT, but that doesn't produce the list I need. Other solutions suggest using CAST(), but maybe I do something wrong because it doesn't work too. This is the closest query I wrote, which returns each company, and unique list of payment methods used by each company, but doesn't return the sum of payments:
SELECT *,
(some other sub-queries I need...),
(SELECT GROUP_CONCAT(DISTINCT(mpt_payment_methods.payment_method_title))
FROM mpt_payments
JOIN mpt_payment_methods
ON mpt_payments.payment_method_id=mpt_payment_methods.payment_method_id
WHERE mpt_payments.company_id=mpt_companies.company_id
ORDER BY mpt_payment_methods.payment_method_title) AS payment_data
FROM mpt_companies
Then I tried:
SELECT *,
(some other sub-queries I need...),
(SELECT GROUP_CONCAT(DISTINCT(mpt_payment_methods.payment_method_title), ':', CAST(SUM(mpt_payments.payment_amount) AS CHAR))
FROM mpt_payments
JOIN mpt_payment_methods
ON mpt_payments.payment_method_id=mpt_payment_methods.payment_method_id
WHERE mpt_payments.company_id=mpt_companies.company_id
ORDER BY mpt_payment_methods.payment_method_title) AS payment_data
FROM mpt_companies
...and many other variations, but all of them either returned query errors, either didn't return/format data I need.
The closest answer I could find was MySQL one to many relationship: GROUP_CONCAT or JOIN or both? but after spending 2 hours re-writing the provided query to work with my data, I couldn't do it.
Could anyone give me a suggestion, please?
You can do that by aggregating twice. First for the sum of payments per method and company and then to concatenate the sums for each company.
SELECT x.company_id,
x.company_title,
group_concat(payment_amount_and_method) payment_data
FROM (SELECT c.company_id,
c.company_title,
concat(pm.payment_method_title, ':', sum(p.payment_amount)) payment_amount_and_method
FROM mpt_companies c
INNER JOIN mpt_payments p
ON p.company_id = c.company_id
INNER JOIN mpt_payment_methods pm
ON pm.payment_method_id = p.payment_method_id
GROUP BY c.company_id,
c.company_title,
pm.payment_method_id,
pm.payment_method_title) x
GROUP BY x.company_id,
x.company_title;
db<>fiddle
Here you go
SELECT company_id,
company_title,
GROUP_CONCAT(
CONCAT(payment_method_title, ':', payment_amount)
) AS payment_data
FROM (
SELECT c.company_id, c.company_title, pm.payment_method_id, pm.payment_method_title, SUM(p.payment_amount) AS payment_amount
FROM mpt_payments p
JOIN mpt_companies c ON p.company_id = c.company_id
JOIN mpt_payment_methods pm ON pm.payment_method_id = p.payment_method_id
GROUP BY p.company_id, p.payment_method_id
) distinct_company_payments
GROUP BY distinct_company_payments.company_id
;
I have two tables (Invoices and taxes) in mysql:
Invoices:
- id
- account_id
- issued_at
- total
- gross_amount
- country
Taxes:
- id
- invoice_id
- tax_name
- tax_rate
- taxable_amount
- tax_amount
I'm trying to retrive a report like this
rep_month | country | total_amount | tax_name | tax_rate(%) | taxable_amount | tax_amount
--------------------------------------------------------------------------------------
2017-01-01 | ES | 1000 | TAX1 | 21 | 700 | 147
2017-01-01 | ES | 1000 | TAX2 | -15 | 700 | 105
2016-12-01 | FR | 100 | TAX4 | 20 | 30 | 6
2016-12-01 | FR | 100 | B2B | 0 | 70 | 0
2017-01-01 | GB | 2500 | TAX3 | 20 | 1000 | 200
The idea behind this is that an invoice has a has_many relation with taxes. So an invoice can have or not taxes. The report should show the total amount collected (total_amount) for a given country (regardess if it includes taxes)
and indicate which part of that total amount is taxable (taxable_amount) for an specific tax.
My current approach is this one:
SELECT
DATE_FORMAT(invoices.issued_at, '%Y-%m-01') AS rep_month,
invoices.country AS country
( SELECT sum(docs.gross_amount)
FROM invoices AS docs
WHERE docs.country = invoices.country
AND DATE_FORMAT(docs.issue_date, '%Y-%m-01') = rep_month
) AS total_amount,
taxes.tax_name AS tax_name,
taxes.tax_rate AS tax_rate,
SUM(taxes.taxable_amount) AS taxable_amount,
SUM(taxes.tax_amount) AS tax_amount
FROM invoices
JOIN taxes ON invoices.id = taxes.document_id
AND documents.issue_date BETWEEN '2016-01-01' AND '2017-12-31'
GROUP BY account_id, rep_month, country, tax_name, tax_rate
ORDER BY country desc
Well, this works but for a real dataset (thousands of records) it's really slow as the select subquery for retrieving the total_amount is being run for each row of the report.
I cannot make a LEFT JOIN taxes with a direct SUM(gross_amount) as the GROUP BY groups by tax name and rate and I need to show the total collected per country regardless if the amount was taxed or not. Is there a faster alternative to this?
I do not know the exact use case of using this query but the issue is the way with which you're trying to structure the DB, you're trying to get the entire data in one go.
Ideally, you should run the query you have and store in a different table (summary table) and then query directly from the summary table whenever you want. And if you have a new entry in the Invoices table then you can use the query to run either on every entry or periodically update the summary table via a cronjob.
I am trying to do a custom report right now. It involves running this query over 50 times for different date conditions.
Anyway, this report revolves around two tables:
agreement
(a list of customer promised to pay - tied to customer table by customer.id = agreement.customer_id)
|----|-------------|---------------------|--------|----------|
| id | customer_id | entered_timestamp | amount | campaign |
|----|-------------|---------------------|--------|----------|
| 1 | 123 | 2015-12-22 13:12:00 | 30 | 'xyz' |
|----|-------------|---------------------|--------|----------|
| 2 | 400 | 2015-12-22 13:15:00 | 20 | 'abc' |
|----|-------------|---------------------|--------|----------|
previous_customer_ids
(a list of customer ids that have at least one paid agreement - tied to customer table by customer.id = previous_customer_ids.customer_id)
|----|-------------|
| id | customer_id |
|----|-------------|
| 1 | 123 |
|----|-------------|
I am trying to get a count of all unique customer_ids whose most recent agreement was in jan or july for a certain campaign and also exist in previous_customer_ids.
I was able to figure out how to get a list of each customer's most recent agreement who exists in previous_customer_ids, and get a count of that number of customers.
However, the query takes 35 seconds to run. I have to run it 60 times over each time this report is pulled (using php to display the results).
select count(t1.customer_id)
from agreement t1
inner join (
select customer_id, max(entered_timestamp) as latestOrder
from agreement
where campaign = 'vsf'
group by customer_id
) t2
inner join previous_customer_ids pcids
on t1.customer_id = pcids.customer_id
where t1.customer_id = t2.customer_id
AND t1.entered_timestamp= t2.latestOrder
AND (substr(t1.entered_timestamp,6,2) = '01'
OR substr(t1.entered_timestamp,6,2) = '07')
How to optimize this?
I have a SQL database with a table called staff, having following columns:
workerID (Prim.key), name, department, salary
I am supposed to find the workers with the highest salary per department and used the following statement:
select staff.workerID, staff.name, staff.department, max(staff.salary) AS biggest
from staff
group by staff.department
I get one worker shown from each department, but they are NOT the workers with the highest salary, BUT the biggest salary value is shown, even though the worker does not get that salary.
The person shown is the worker with the "lowest" workerID per department.
So, there is some sorting going on using the primary key, even though it is not mentioned in the group by statement.
Can someone explain, what is going on and maybe how to sort correctly.
Explanation for what is going on:
You are performing a GROUP BY on staff.department, however your SELECT list contains 2 non-grouping columns staff.workerID, staff.name. In standard sql this is a syntax error, however MySql allows it so the query writers have to make sure that they handle such situations themselves.
Reference: http://dev.mysql.com/doc/refman/5.0/en/group-by-handling.html
In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause.
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause.
The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
Starting with MySQL 5.1 the non-standard feature can be disabled by setting the ONLY_FULL_GROUP_BY flag in sql_mode: http://dev.mysql.com/doc/refman/5.6/en/sql-mode.html#sqlmode_only_full_group_by
How to fix:
select staff.workerID, staff.name, staff.department, staff.salary
from staff
join (
select staff.department, max(staff.salary) AS biggest
from staff
group by staff.department
) t
on t.department = staff.department and t.biggest = staff.salary
In the inner query, fetch department and its highest salary using GROUP BY. Then in the outer query join those results with the main table which would give you the desired results.
This is the usual case group by with a aggregate function does not guarantee proper row corresponding to the aggregate function. Now there are many ways to do it and the usual practice is a sub-query and join. But if the table is big then performance wise it kills, so the other approach is to use left join
So lets say we have the table
+----------+------+-------------+--------+
| workerid | name | department | salary |
+----------+------+-------------+--------+
| 1 | abc | computer | 400 |
| 2 | cdf | electronics | 200 |
| 3 | gfd | computer | 400 |
| 4 | wer | physics | 300 |
| 5 | hgt | computer | 700 |
| 6 | juy | electronics | 100 |
| 7 | wer | physics | 400 |
| 8 | qwe | computer | 200 |
| 9 | iop | electronics | 800 |
| 10 | kli | physics | 800 |
| 11 | qsq | computer | 600 |
| 12 | asd | electronics | 300 |
+----------+------+-------------+--------+
SO we can get the data as
select st.* from staff st
left join staff st1 on st1.department = st.department
and st.salary < st1.salary
where
st1.workerid is null
The above will give you as
+----------+------+-------------+--------+
| workerid | name | department | salary |
+----------+------+-------------+--------+
| 5 | hgt | computer | 700 |
| 9 | iop | electronics | 800 |
| 10 | kli | physics | 800 |
+----------+------+-------------+--------+
My favorite solution to this problem uses LEFT JOIN:
SELECT m.workerID, m.name, m.department, m.salary
FROM staff m # 'm' from 'maximum'
LEFT JOIN staff o # 'o' from 'other'
ON m.department = o.department # match rows by department
AND m.salary < o.salary # match each row in `m` with the rows from `o` having bigger salary
WHERE o.salary IS NULL # no bigger salary exists in `o`, i.e. `m`.`salary` is the maximum of its dept.
;
This query selects all the workers that have the biggest salary from their department; i.e. if two or more workers have the same salary and it is the bigger in their department then all these workers are selected.
Try this:
SELECT s.workerID, s.name, s.department, s.salary
FROM staff s
INNER JOIN (SELECT s.department, MAX(s.salary) AS biggest
FROM staff s GROUP BY s.department
) AS B ON s.department = B.department AND s.salary = B.biggest;
OR
SELECT s.workerID, s.name, s.department, s.salary
FROM (SELECT s.workerID, s.name, s.department, s.salary
FROM staff s
ORDER BY s.department, s.salary DESC
) AS s
GROUP BY s.department;
I have a table from which I am trying to retrieve the latest position for each security:
The Table:
My query to create the table: SELECT id, security, buy_date FROM positions WHERE client_id = 4
+-------+----------+------------+
| id | security | buy_date |
+-------+----------+------------+
| 26 | PCS | 2012-02-08 |
| 27 | PCS | 2013-01-19 |
| 28 | RDN | 2012-04-17 |
| 29 | RDN | 2012-05-19 |
| 30 | RDN | 2012-08-18 |
| 31 | RDN | 2012-09-19 |
| 32 | HK | 2012-09-25 |
| 33 | HK | 2012-11-13 |
| 34 | HK | 2013-01-19 |
| 35 | SGI | 2013-01-17 |
| 36 | SGI | 2013-02-16 |
| 18084 | KERX | 2013-02-20 |
| 18249 | KERX | 0000-00-00 |
+-------+----------+------------+
I have been messing with versions of queries based on this page, but I cannot seem to get the result I'm looking for.
Here is what I've been trying:
SELECT t1.id, t1.security, t1.buy_date
FROM positions t1
WHERE buy_date = (SELECT MAX(t2.buy_date)
FROM positions t2
WHERE t1.security = t2.security)
But this just returns me:
+-------+----------+------------+
| id | security | buy_date |
+-------+----------+------------+
| 27 | PCS | 2013-01-19 |
+-------+----------+------------+
I'm trying to get the maximum/latest buy date for each security, so the results would have one row for each security with the most recent buy date. Any help is greatly appreciated.
EDIT: The position's id must be returned with the max buy date.
You can use this query. You can achieve results in 75% less time. I checked with more data set. Sub-Queries takes more time.
SELECT p1.id,
p1.security,
p1.buy_date
FROM positions p1
left join
positions p2
on p1.security = p2.security
and p1.buy_date < p2.buy_date
where
p2.id is null;
SQL-Fiddle link
You can use a subquery to get the result:
SELECT p1.id,
p1.security,
p1.buy_date
FROM positions p1
inner join
(
SELECT MAX(buy_date) MaxDate, security
FROM positions
group by security
) p2
on p1.buy_date = p2.MaxDate
and p1.security = p2.security
See SQL Fiddle with Demo
Or you can use the following in with a WHERE clause:
SELECT t1.id, t1.security, t1.buy_date
FROM positions t1
WHERE buy_date = (SELECT MAX(t2.buy_date)
FROM positions t2
WHERE t1.security = t2.security
group by t2.security)
See SQL Fiddle with Demo
This is done with a simple group by. You want to group by the securities and get the max of buy_date. The SQL:
SELECT security, max(buy_date)
from positions
group by security
Note, this is faster than bluefeet's answer but does not display the ID.
The answer by #bluefeet has two more ways to get the results you want - and the first will probably be more efficient than your query.
What I don't understand is why you say that your query doesn't work. It seems pretty fine and returns the expected result. Tested at SQL-Fiddle
SELECT t1.id, t1.security, t1.buy_date
FROM positions t1
WHERE buy_date = ( SELECT MAX(t2.buy_date)
FROM positions t2
WHERE t1.security = t2.security ) ;
If the problems appears when you add the client_id = 4 condition, then it's because you add it only in one WHERE clause while you have to add it in both:
SELECT t1.id, t1.security, t1.buy_date
FROM positions t1
WHERE client_id = 4
AND buy_date = ( SELECT MAX(t2.buy_date)
FROM positions t2
WHERE client_id = 4
AND t1.security = t2.security ) ;
select security, max(buy_date) group by security from positions;
is all you need to get max buy date for each security (when you say out loud what you want from a query and you include the phrase "for each x", you probably want a group by on x)
When you use a group by, all columns in your select must either be columns that have been grouped by or aggregates, so if, for example, you wanted to include id, you'd probably have to use a subquery similar to what you had before, since there doesn't seem to be any aggregate you can reasonably use on the ids, and another group by would give you too many rows.