Use SELECT through three table - mysql

I tried to write a query, but unfortunately I didn't succeed.
I want to know how many packages delivered over a given period by a person.
So I want to know how many packages were delivered by John (user_id = 1) between 01-02-18 and 28-02-18. John drives another car (another plate_id) every day.
(orders_drivers.user_id, plates.plate_name, orders.delivery_date, orders.package_amount)
I have 3 table:
orders with plate_id delivery_date package_amount
plates with plate_id plate_name
orders_drivers with plate_id plate_date user_id
I tried some solutions but didn't get the expected result. Thanks!

Try using JOINS as shown below:
SELECT SUM(o.package_amount)
FROM orders o INNER JOIN orders_drivers od
ON o.plate_id=od.plate_id
WHERE od.user_id=<the_user_id>;
See MySQL Join Made Easy for insight.
You can also use a subquery:
SELECT SUM(o.package_amount)
FROM orders o
WHERE EXISTS (SELECT 1
FROM orders_drivers od
WHERE user_id=<user_id> AND o.plate_id=od.plate_id);

SELECT sum(orders.package_amount) AS amount
FROM orders
LEFT JOIN plates ON orders.plate_id = orders_drivers.plate_id
LEFT JOIN orders_driver ON orders.plate_id = orders_drivers.plate_id
WHERE orders.delivery_date > date1 AND orders.delivery_date < date2 AND orders_driver.user_id = userid
GROUP BY orders_drivers.user_id
But seriously, you need to ask questions that makes more sense.
sum is a function to add all values that has been grouped by GROUP BY.
LEFT JOIN connects all tables by id = id. Any other join can do this in this case, as all ids are unique (at least I hope).
WHERE, where you give the dates and user.
And GROUP BY userid, so if there are more records of the same id, they are returned as one (and summed by their pack amount.)
With the AS, your result is returned under the name 'amount',

If you want the total of packageamount by user in a period, you can use this query:
UPDATE: add a where clause on user_id, to retrieve John related data
SELECT od.user_id
, p.plate_name
, SUM(o.package_amount) AS TotalPackageAmount
FROM orders_drivers od
JOIN plates p
ON o.plate_id = od.plate_id
JOIN orders o
ON o.plate_id = od.plate_id
WHERE o.delivery_date BETWEEN convert(datetime,01/02/2018,103) AND convert(datetime,28/02/2018,103)
AND od.user_id = 1
GROUP BY od.user_id
, p.plate_name
It groups rows on user_id and plate_name, filter a period of delivery_date(s) and then calculate the sum of packageamount for the group

Related

MySQL View in place of subquery does not return the same result

The query below is grabbing some information about a category of toys and showing the most recent sale price for three levels of condition (e.g., Brand New, Used, Refurbished). The price for each sale is almost always different. One other thing - the sales table row id's are not necessarily in chronological order, e.g., a toy with a sale id of 5 could have happened later than a toy with a sale id of 10).
This query works but is not performant. It runs in a manageable amount of time, usually about 1s. However, I need to add yet another left join to include some more data, which causes the query time to balloon up to about 9s, no bueno.
Here is the working but nonperformant query:
SELECT b.brand_name, t.toy_id, t.toy_name, t.toy_number, tt.toy_type_name, cp.catalog_product_id, s.date_sold, s.condition_id, s.sold_price FROM brands AS b
LEFT JOIN toys AS t ON t.brand_id = b.brand_id
JOIN toy_types AS tt ON t.toy_type_id = tt.toy_type_id
LEFT JOIN catalog_products AS cp ON cp.toy_id = t.toy_id
LEFT JOIN toy_category AS tc ON tc.toy_category_id = t.toy_category_id
LEFT JOIN (
SELECT date_sold, sold_price, catalog_product_id, condition_id
FROM sales
WHERE invalid = 0 AND condition_id <= 3
ORDER BY date_sold DESC
) AS s ON s.catalog_product_id = cp.catalog_product_id
WHERE tc.toy_category_id = 1
GROUP BY t.toy_id, s.condition_id
ORDER BY t.toy_id ASC, s.condition_id ASC
But like I said it's slow. The sales table has about 200k rows.
What I tried to do was create the subquery as a view, e.g.,
CREATE VIEW sales_view AS
SELECT date_sold, sold_price, catalog_product_id, condition_id
FROM sales
WHERE invalid = 0 AND condition_id <= 3
ORDER BY date_sold DESC
Then replace the subquery with the view, like
SELECT b.brand_name, t.toy_id, t.toy_name, t.toy_number, tt.toy_type_name, cp.catalog_product_id, s.date_sold, s.condition_id, s.sold_price FROM brands AS b
LEFT JOIN toys AS t ON t.brand_id = b.brand_id
JOIN toy_types AS tt ON t.toy_type_id = tt.toy_type_id
LEFT JOIN catalog_products AS cp ON cp.toy_id = t.toy_id
LEFT JOIN toy_category AS tc ON tc.toy_category_id = t.toy_category_id
LEFT JOIN sales_view AS s ON s.catalog_product_id = cp.catalog_product_id
WHERE tc.toy_category_id = 1
GROUP BY t.toy_id, s.condition_id
ORDER BY t.toy_id ASC, s.condition_id ASC
Unfortunately, this change causes the query to no longer grab the most recent sale, and the sales price it returns is no longer the most recent.
Why is it that the table view doesn't return the same result as the same select as a subquery?
After reading just about every top-n-per-group stackoverflow question and blog article I could find, getting a query that actually worked was fantastic. But now that I need to extend the query one more step I'm running into performance issues. If anybody wants to sidestep the above question and offer some ways to optimize the original query, I'm all ears!
Thanks for any and all help.
The solution to the subquery performance issue was to use the answer provided here: Groupwise maximum
I thought that this approach could only be used when querying a single table, but indeed it works even when you've joined many other tables. You just have to left join the same table twice using the s.date_sold < s2.date_sold join condition and make sure the where clause looks for the null value in the second table's id column.

How to join on a row with max value

I have three tables: households, voters, door_knocks
Each household can have several voters associated with it. Each household can also have several door knocks associated with it.
I'm trying to pull together all the voters in a household and the date of the last door_knock from the door_knocks table and I'm having trouble figuring out the proper query syntax. Here is my latest attempt:
SELECT households.hh_id, voters.id
FROM households
INNER JOIN voters ON households.hh_id = voters.hh_id
INNER JOIN ( SELECT MAX(dk.date), dk.hh_id FROM door_knocks dk GROUP BY dk.date) dks
ON dks.hh_id = households.hh_id
WHERE households.street = ?
The above query pulls up one result for each door knock, however. I just want the the date from the last door knock.
So, what it sounds like you're hoping for conceptually is a table that lists the last date of a knock for each houshold.
You'd like to join against that table and combine it with the voters and the households.
what your query does is give you a table of all the dates (group by dk.date) and for each date list all the households.
If you group by hh_id instead, then you will get the max date for each given household.
SELECT households.hh_id, voters.id, dks.max_date
FROM households
INNER JOIN voters ON households.hh_id = voters.hh_id
INNER JOIN ( SELECT MAX(dk.date) as max_date, dk.hh_id FROM door_knocks dk GROUP BY dk.hh_id dks
ON dks.hh_id = households.hh_id
WHERE households.street = ?

SQL Query: How to use sub-query or AVG function to find number of days between a new entry?

I have a two tables, one called entities with these relevant columns:
id, company_id ,and integration_id. The other table is transactions with columns id, entity_id and created_at. The foreign keys linking the two tables are integration_id and entity_id.
The transactions table shows the number of transactions received from each company from the entities table.
Ultimately, I want to find date range with highest volume of transactions occurring and then from that range find the average number of days between transaction for each company.
To find the date range I used this query.
SELECT DATE_FORMAT(t.created_at, '%Y/%m/%d'), COUNT(t.id)
FROM entities e
JOIN transactions t
ON ei.id = t.entity_id
GROUP BY t.created_at;
I get this:
Date_FORMAT(t.created_at, '%Y/%m/%d') | COUNT(t.id)
+-------------------------------------+------------
2015/11/09 4
etc
From that I determine the range I want to use as 2015/11/09 to 2015/12/27
and I made this query
SELECT company_id, COUNT(t.id)
FROM entities e
INNER JOIN transactions t
ON e.integration_id = t.entity_id
WHERE tp.created_at BETWEEN '2015/11/09' AND '2015/12/27'
GROUP BY company_id;
I get this:
company_id | COUNT(t.id)
+-----------+------------
1234 17
and so on
Which gives me the total transactions made by each company over this date range. What's the best way now to query for the average number of days between transactions by company? How can I sub-query or is there a way to use the AVG function on dates in a WHERE clause?
EDIT:
playing around with the query, I'm wondering if there is a way I can
SELECT company_id, (49 / COUNT(t.id))...
49, because that is the number of days in that date range, in order to get the average number of days between transactions?
I think this might be it, does that make sense?
I think this may work:
Select z.company_id,
datediff(max(y.created_at),min(created_at))/count(y.id) as avg_days_between_orders,
max(y.created_at) as latest_order,
min(created_at) as earliest_order,
count(y.id) as orders
From
(SELECT entity_id, max(t.created_at) latest, min(t.created_at) earliest
FROM entities e, transactions t
Where e.id = t.entity_id
group by entity_id
order by COUNT(t.id) desc
limit 1) x,
transactions y,
entities z
where z.id = x.entity_id
and z.integration_id = y.entity_id
and y.created_at between x.earliest and x.latest
group by company_id;
It's tough without the data. There's a possibility that I have reference to integration_id incorrect in the subquery/join on the outer query.

Can i use the row result of a query to run a sub query and get the data returned?

to be clear I want to avoid for loop in my node.js program
my current approach is a group_concat() query [which is working correctly]
SELECT DISTINCT(c.main), GROUP_CONCAT(c.cId) AS cId_List FROM customers c LEFT JOIN boxes b ON b.boxId = c.boxId WHERE c.opId = ? GROUP BY c.conNo ORDER BY c.conNo ASC;
//response to json
{
"main": 2,
"cId_List": "512,513"
},{
"main": 3,
"cId_List": "514,515,516,517"
},....
The next query i need to run is for every "cId_List"
for(every cId_List){
qry = "SELECT SUM(amount) FROM payments p WHERE p.cId IN (cId_List);"
}
how can I avoid it?
Reasons to avoid it is because there is no limit to no.queries. It Can be 10000+ at a single request.
Added Info
What is happening?
There is are two tables namely customers, payments
There can be multiple rows in customer table with same "connection number [main]"
by doing group concat I am getting the ids of those rows into cId_List
now for every cId_List I want to run the SUM() Query in payments Table
so my result shall be
{
"main": 2,
"cId_List": "512,513", //multiple rows of customers table
"amount_sum": 500 //data from payments table using above cId_List
},{
"main": 3,
"cId_List": "514,515,516,517",
"amount_sum": -200
},....
sqlFiddle
as asked: sqlfiddle explanation
customers.conNo is a unifying column for multiple customers (basically of a family, they are billed together)
customers.cId is the primary key and the separator factor (when we need to bill per person basis)
payments.cId is foreign key of customers.cId and payments are entered as per cId
report needs to be generated according to conNo
so to get all the payments of a conNo I need to send all the appropriate cId to payments table.
I hope this will clear the doubts.
EDIT:
I am checking this query which may be the answer, I would like to know if this query format is good performance wise?
SELECT GROUP_CONCAT(DISTINCT(customers.cId)) AS cId_List, customers.*, payments.cId, SUM(amount) AS amt FROM `payments` left join customers on customers.cId = payments.cId GROUP BY `customers`.`conNo` ORDER BY `customers`.`conNo` ASC
So it seems that you can simply replace all of your code with the following:
SELECT c.conno
, SUM(p.amount) total
FROM customers c
LEFT
JOIN payments p
ON p.cid = c.cid
GROUP
BY c.conno
http://sqlfiddle.com/#!9/a65cf6/11
SELECT SUM(p.amount)
FROM customers AS c
LEFT JOIN payments AS p ON p.cid = c.cid
GROUP BY c.cid
This query seems to work. Can any one tell me if it is appropriate performance wise. Also would like your suggestions if any Thanks to #Strawberry and #Luca Giardina
SELECT
GROUP_CONCAT(DISTINCT(customers.cId)) AS cId_List,
customers.*, payments.cId,
SUM(amount) AS amt
FROM `payments` LEFT JOIN customers ON customers.cId = payments.cId
GROUP BY `customers`.`conNo`
ORDER BY `customers`.`conNo` ASC

Complex SQL query over four tables does not fetch wanted result

Imagine the following scenario: Employees of a company can give votes to an arbitrary question (integer value).
I have a complex request where I want to fetch five information:
Name of the company
Average vote value per company
Number of employees
Number of votes
Participation (no of votes/no of employees)
The SQL query shall only fetch votes of companies, that the current user is employed at.
Therefore I am accessing four different tables, following you see an excerpt of the table declarations:
User
- id
Company
- id
- name
Employment
- user_id (FK User.id)
- company_id (FK Company.id)
Vote
- company_name
- vote_value
- timestamp
User and Company are related by an Employment (n:m relation, but needs to be extra table). The table Vote shall not be connected by PK/FK-relation, but they can be related to a company by their company name (Company.name = Vote.company_name).
I managed to fetch all information except for the number of employees correctly by the following SQL query:
SELECT
c.name AS company,
AVG(v.vote_value) AS value,
COUNT(e.user_id) AS employees,
COUNT(f.face) AS votes,
(COUNT(e.user_id) / COUNT(v.vote_value)) AS participation
FROM Company c
JOIN Employment e ON e.company_id = c.id
JOIN User u ON u.id = e.user_id
JOIN Vote v
ON v.company_name = c.name
AND YEAR(v.timestamp) = :year
AND MONTH(v.timestamp) = :month
AND DAY(v.timestamp) = :day
WHERE u.id = :u_id
GROUP BY v.company_name, e.company_id
But instead of fetching the correct number of employees, the employee field is always equal the number of votes. (And therefore the participation value is also wrong.)
Is there any way to perform this in one query without subqueries1? What do I have to change so that the query fetches the correct number of employees?
1 I am using Doctrine2 and try to avoid subqueries as Doctrine does not support them. I just did not want to pull this into a Doctrine discussion. That's I why I broke this topic down to SQL level.
If you want to fetch the number of employees then the issue is that you are filtering by only 1 employee:
WHERE u.id = :u_id
Secondly, bear in mind that if you want to count the amount of employees and you have gotten into the vote grouping level, then of course you will have the amount of rows equal to the amount of votes. So you will have to distinct count as #Przem... mentioned:
COUNT(DISTINCT e.user_id) AS employees,
That way you will uniquely count the employees for the company (getting rid of the repeated employee ids for all the votes the employee has).
As you mentioned in a comment:
It returns the 1 as employee count
This is because of the where condition forcing to 1 employee with many votes. The distinct will only count the unique 1 employee filtered by the where clause and that is why you get only 1. However, that is the correct result (based on your filter condition).
Adding subqueries in the select clause will also get you to the right result but at the expense of performance.
Try this--it calculates the votes as one subquery and the employees as another subquery.
SELECT c.name,
ce.employee_count,
cv.vote_count,
cv.vote_count / ce.employee_count,
cv.vote_value
FROM
(select company, count(*) AS 'employee_count'
FROM employment GROUP BY company) ce
INNER JOIN company c
ON c.id = ce.company
INNER JOIN
(select company, AVG(vote_value) AS 'vote_value', count(*) as 'vote_count'
FROM vote v GROUP BY company) cv
ON c.id = cv.company
Well I think with a query defined like that you should add the DISTINCT keyword while counting the number of employees:
SELECT
c.name AS company,
AVG(v.vote_value) AS value,
COUNT(DISTINCT e.user_id) AS employees,
COUNT(f.face) AS votes,
(COUNT(DISTINCT e.user_id) / COUNT(v.vote_value)) AS participation
FROM Company c
JOIN Employment e ON e.company_id = c.id
JOIN User u ON u.id = e.user_id
JOIN Vote v
ON v.company_name = c.name
AND YEAR(v.timestamp) = :year
AND MONTH(v.timestamp) = :month
AND DAY(v.timestamp) = :day
GROUP BY v.company_name, e.company_id;
Not sure if it is possible in MySQL, though.
Edit: as #Mosty Mostacho pointed out, the condition on u.id was the problem, and without it and with addition of DISTINCT keyword, the query returns correct results and I edited the above query.