mysql - Finding count of 0 when using joined tables - mysql

EDIT I've put up an sqlfiddle with this schema here: http://sqlfiddle.com/#!2/0726f2. I'm trying to select customers 3, 4, 5, 6.
Consider a db with three tables:
customers
---------
id
seats
-----
id
buyer_id (fk to customers)
flight_id
flights
-------
id
datetime (This is the UTC time of the flight)
I'm trying to find customers who have not booked seats on any flight in March.
This query provides a list of customers who have not booked seats on any flight:
SELECT customers.id, count(seats.id) as seat_count FROM `customers`
LEFT JOIN `seats` ON `seats`.`buyer_id` = `customers`.`id`
LEFT JOIN `flights` ON `flights`.`id` = `seats`.`flight_id`
GROUP BY customers.id
HAVING seat_count=0
I tried this query to find a list of customers who have not booked seats on any flight in March
SELECT customers.id, count(seats.id) as seat_count FROM `customers`
LEFT JOIN `seats` ON `seats`.`buyer_id` = `customers`.`id`
LEFT JOIN `flights` ON `flights`.`id` = `seats`.`flight_id`
WHERE flights.datetime >= '2014-03-01 00:00:00'
AND flights.datetime <= '2014-04-01 00:00:00'
GROUP BY customers.id
HAVING seat_count=0
But it returns an empty list. I understand why: I'm selecting a list of customers who have booked seats in March then finding customers in that list who have not booked seats. Clearly an empty set.
Likewise with adding this to the WHERE clause
AND seats.is is null
I can't figure a proper way to do this.
I've tried:
Flipping the JOINs every which way
Using a subquery in the LEFT JOIN statement. Performance was prohibitively bad.
Trying SELECT customers.id from customers where id not in ([above query]) MySql uses a correlated subquery and performance is also prohibitively awful.
Because this is wrapped up in a larger search feature, I can't come at this from another direction (selecting from seats and going from there, for example). Schema changes are not possible.
Thanks.

You can use NOT EXISTS like
SELECT *
FROM customers
WHERE NOT EXISTS (
SELECT * FROM seats
INNER JOIN flights ON flights.id = seats.flight_id
WHERE flights.datetime >= '2014-03-01 00:00:00'
AND flights.datetime <= '2014-04-01 00:00:00'
AND seats.buyer_id = customers.id
)
here is a corresponding SQLFiddle.
By the way you should at least add an index on seats.buyer_id, since this is a column you need to join on. With the named index the execution plan does not look that bad.

this works:
SELECT customers.id, count(seats.id) as seat_count FROM `seats`
INNER JOIN (SELECT id FROM flights WHERE DATE(flights.datetime) >= '2014-03-01'
AND DATE(flights.datetime) <='2014-04-01') `flights` ON `flights`.`id` = `seats`.`flight_id`
RIGHT JOIN customers ON customers.id=seats.buyer_id
GROUP BY customers.id
HAVING seat_count=0
here's the fiddle
here's another way to do it:
SELECT customers.id FROM customers WHERE id NOT IN (SELECT seats.buyer_id FROM seats
INNER JOIN `flights` ON `flights`.`id` = `seats`.`flight_id`
WHERE flights.datetime >= '2014-03-01 00:00:00'
AND flights.datetime <= '2014-04-01 00:00:00')
second fiddle

Related

fetching values from database which are not from specific month

I am trying to fetch hotel id, hotel name and hotel type of hotels which has not taken any orders in the month of 'MAY 19' but i am not getting proper output what is wrong in my query?
select hotel_details.hotel_id,hotel_name,hotel_type
from hotel_details inner join orders on hotel_details.hotel_id=orders.hotel_id
where Month(order_date) between 1 and 4 or Month(order_date) between 6 and 12
order by hotel_id;
You can use the following, using NOT EXISTS to check if there is any order for the hotel in May 2019:
SELECT hotel_id, hotel_name, hotel_type
FROM hotel_details
WHERE NOT EXISTS (
SELECT 1
FROM orders
WHERE hotel_id = hotel_details.hotel_id
AND MONTH(order_date) = 5
AND YEAR(order_date) = 2019
)
The sub-query on EXISTS checks if the hotel_id is available in orders on May 2019. Using NOT in front of EXISTS filters all hotels which have no order in May 2019. The sub-query is connected to the outer part of the query with hotel_id = hotel_details.hotel_id.
Here's a standard, if somewhat old-fashioned approach...
(I've assumed a column name on the orders table, but you can change it to any non-nullable orders column, if it's wrong)
SELECT d.hotel_id
, d.hotel_name
, d.hotel_type
FROM hotel_details d
LEFT
JOIN orders o
ON d.hotel_id = o.hotel_id
AND d.order_date >= '2019-05-01'
AND d.order_date < '2019-06-01'
WHERE o.id IS NULL
ORDER
BY d.hotel_id;
For next time, see: Why should I provide an MCRE for what seems to me to be a very simple SQL query?
SELECT HOTEL_ID,HOTEL_NAME,HOTEL_TYPE FROM HOTEL_DETAILS
WHERE HOTEL_ID NOT IN
(SELECT HOTEL_ID FROM ORDERS
WHERE MONTH(ORDER_DATE) = 5)
ORDER BY HOTEL_ID ASC;
Here in the below sub query we are trying to obtain the HOTEL_ID(s) which have placed order in the month of May using the MONTH function. In outer query which receives a list of HOTEL_ID(s) which have an ordered in the month of may. Now the NOT IN condition omits the HOTEL_ID present in the list and displays the other HOTEL_ID which have not ordered in the month of May.
SELECT DISTINCT h.hotel_id,
h.hotel_name,
h.hotel_type
FROM hotel_details h
WHERE h.hotel_id NOT IN (SELECT od.hotel_id
FROM orders od
WHERE ( h.hotel_id = od.hotel_id
AND Month(order_date) = 05 )
GROUP BY h.hotel_id
ORDER BY h.hotel_id ASC);
Use Nested Queries:
SELECT hotel_id, hotel_name, hotel_type
FROM hotel_details
WHERE hotel_id NOT IN (
SELECT DISTINCT hotel_id
FROM orders
WHERE order_date BETWEEN '2019-05-01' AND '2019-05-31'
)
ORDER BY hotel_id;
Explanation :
In the Inner Query, we are selecting distinct hotel IDs from the order table with orders between May 1 and May 31.
Once we have list of hotel IDs, in the outer query we can display the required columns of the hotel table which have IDs not in the list.

SQL beginner practice problems

Given two tables, orders (order_id, date, $, customer_id) and customers (ID, name)
Here's my method but I'm not sure if it's working & I'd like to know if there's faster/better way of solving these problems:
1) find out number of customers who made at least one order on date 7/9/2018
Select count (distinct customer_id)
From
(
Select customer_id from orders a
Left join customer b
On a.customer_id = b.ID
Group by customer_id,date
Having date = 7/9/2018
) a
2) find out number of customers who did not make an order on 7/9/2018
Select count (customer_id) from customer where customer_id not in
(
Select customer_id from orders a
Left join customer b
On a.customer_id = b.ID
Group by customer_id,date
Having date = 7/9/2018
)
3) find the date with most sales between 7/1 and 7/30
select date, max($)
from (
Select sum($),date from orders a
Left join customer b
On a.customer_id = b.ID
Group by date
Having date between 7/1 and 7/30
)
Thanks,
For problem 1, a valid solution might look like this:
SELECT COUNT(DISTINCT customer_id) x
FROM orders
WHERE date = '2018-09-07'; -- or is that '2018-07-09' ??
For problem 2, a valid solution might look like this:
SELECT COUNT(*) x
FROM customer c
LEFT
JOIN orders o
ON o.customer_id = x.customer_id
AND o.date = '2018-07-09'
WHERE o.crder_id IS NULL;
Assuming there are no ties, a valid solution to problem 3 might look like this:
SELECT date
, COUNT(*) sales
FROM orders
WHERE date BETWEEN '2018-07-01' AND '2018-07-30'
GROUP
BY date
ORDER
BY sales DESC
LIMIT 1;
The default format for a date in MySQL is YYYY-MM-DD, although this can be customized. You have to put quotes around it, otherwise it's treated as an arithmetic expression.
And none of your queries need to join with the customer table. The customer ID is already in the orders table, and you're not returning any info about the customers (like the name or address), you're just counting them.
1) You don't need the subquery or grouping.
SELECT COUNT(DISTINCT customer_id)
FROM orders
WHERE date = '2018-07-09'
2) Again, you don't need GROUP BY in the subquery. There's also a better pattern than NOT IN to get the count of non-matching rows.
SELECT COUNT(*)
FROM customer AS c
LEFT JOIN order AS o on c.id = o.customer_id AND o.date = '2018-07-09'
WHERE o.id IS NULL
See Return row only if value doesn't exist for various patterns to do this.
3) You can't use MAX($) in the outer query because the inner query doesn't return a column with that name. But even if you fix that, it still won't work, because the date column won't necessarily come from the same row that has the maximum. See SQL select only rows with max value on a column for more explanation of this.
You don't need a subquery at all. Use a query that returns the total sales for each day, then use ORDER BY to get the highest one.
SELECT date, SUM($) AS total_sales
FROM orders
WHERE date BETWEEN '2018-07-01' AND '2017-07-30'
GROUP BY date
ORDER BY total_sales DESC
LIMIT 1
If "most sales" is supposed to mean "most number of sales", replace SUM($) with COUNT(*).

Optimising MySql Query with LEFT JOINS

I am trying to get a list of customer who haven't ordered for 6months or more. I have 4 tables which I have used in the query
accounts (account_id)
stores (store_id, account_id)
customers (store_id, customer_id)
orders (order_id, customer_id, store_id)
The customer and orders table are very big, 3M and 26M rows respectively, so using left joins in my query make the query time extremely long. I believe I have index my tables correctly
here is my query i have used
SELECT cus.customer_id, MAX(o.order_date), cus.store_id, s.account_id, store_name
FROM customers cus
LEFT JOIN stores s ON s.store_id=cus.store_id
LEFT JOIN orders o ON o.customer_id=cus.customer_id AND o.store_id=cus.store_id
WHERE account_id=26 AND
(SELECT order_id
FROM orders o
WHERE o.customer_id=cus.customer_id
AND o.store_id=cus.store_id
AND o.order_date < CURRENT_DATE() - INTERVAL 6 MONTH
ORDER BY order_id DESC LIMIT 0,1) IS NOT NULL
GROUP BY cus.customer_id, cus.client_id;
I need to get the last order date and this is the reason why I have joined the orders table, however since the customers can have multiple orders it is returning multiple rows of the customer and that is why I have used the group by clause.
If anyone can assist me with my query.
Start with this:
SELECT customer_id, MAX(order_date) AS last_order_date
FROM orders
GROUP BY customer_id
HAVING last_order_date < NOW() - INTERVAL 6 MONTH;
Assuming that gives you the relevant customer_ids, then move on to
SELECT ...
FROM ( that-select-as-a-subquery ) AS old
JOIN other-tables-as-needed ON USING(customer_id)
If necessary, JOIN back to orders to get more info. Do not try to get other columns in that subquery. (That's a "groupwise max" problem.)
Your strategy of using an ordered and limited subquery on your orders table is probably responsible for your poor performance.
This subquery will generate a virtual table showing the date of the most recent order for each distinct customer. (I guess a distinct customer is distinguished by the pair customer_id, store_id).
SELECT MAX(order_date) recent_order_date,
customer_id, store_id
FROM orders
GROUP BY customer_id, store_id
Then, you can use that subquery as if it were a table in your query.
SELECT cus.customer_id, summary.recent_order_date,
cus.store_id, s.account_id, store_name
FROM customers cus
JOIN stores s ON s.store_id=cus.store_id
JOIN (
SELECT MAX(order_date) recent_order_date,
customer_id, store_id
FROM orders
GROUP BY customer_id, store_id
) summary ON summary.customer_id = cus.customer_id
AND summary.store_id = s.store_id
WHERE summary.recent_order_date < CURRENT_DATE - INTERVAL 6 MONTH
AND store.account_id = 26
This approach moves the GROUP BY to an inner query, and eliminates the wasteful ORDER BY ... LIMIT query pattern. The inner query doesn't have to be remade for every row in the outer query.
I don't understand why you used LEFT JOIN operations in your query.
And, by the way, most people, when they're new to SQL, don't have great intuition about which indexes are useful and which aren't. So, when asking for help, it's always good to show your indexes. In the meantime, read this:
http://use-the-index-luke.com/

MySQL query help (join, subselect)

I have 2 tables: orders and bookings.
The orders table contain unique orders from customers. Each customer has booked a locker for a period of 1 to 5 years and therefore the bookings table can contain 1 to 5 rows pr. order (1 row for each year). Each row in the booking table contains and end_date which is the same date every year (20XX-06-30).
I want to select all the orders where the corresponding final end_date in the bookings table is this year (2014-06-30).
SELECT DISTINCT orders.id
FROM orders,
bookings
WHERE orders.id = bookings.order_id
AND bookings.end_date = '2014-06-30'
The problem with this query is that it also selects the orders where the end_date in the booking rows continue the following years (2015-06-30, 2016-06-30 etc).
I am not sure I understood well, but here's a solution for what I understood, this should get you the order ids where there last end date (max) is 2014-06-30.
SELECT orders.id, MAX(bookings.end_date)
FROM orders INNER JOIN bookings
ON orders.id = bookings.order_id
GROUP BY bookings.order_id
HAVING MAX(bookings.end_date) = '2014-06-30'
Maybe join to the bookings again, checking for a larger booking date for the same order id:-
SELECT orders.id
FROM orders
INNER JOIN bookings ON orders.id = bookings.order_id
LEFT OUTER JOIN bookings2 ON orders.id = bookings2.order_id AND bookings2.end_date > bookings.end_date
WHERE bookings.end_date = '2014-06-30'
AND bookings2.end_date IS NULL

Slow aggregate query with join on same table

I have a query to show customers and the total dollar value of all their orders. The query takes about 100 seconds to execute.
I'm querying on an ExpressionEngine CMS database. ExpressionEngine uses one table exp_channel_data, for all content. Therefore, I have to join on that table for both customer and order data. I have about 14,000 customers, 30,000 orders and 160,000 total records in that table.
Can I change this query to speed it up?
SELECT link.author_id AS customer_id,
customers.field_id_122 AS company,
Sum(orders.field_id_22) AS total_orders
FROM exp_channel_data customers
JOIN exp_channel_titles link
ON link.author_id = customers.field_id_117
AND customers.channel_id = 7
JOIN exp_channel_data orders
ON orders.entry_id = link.entry_id
AND orders.channel_id = 3
GROUP BY customer_id
Thanks, and please let me know if I should include other information.
UPDATE SOLUTION
My apologies. I noticed that entry_id for the exp_channel_data table customers corresponds to author_id for the exp_channel_titles table. So I don't have to use field_id_117 in the join. field_id_117 duplicates entry_id, but in a TEXT field. JOINING on that text field slowed things down. The query is now 3 seconds
However, the inner join solution posted by #DRapp is 1.5 seconds. Here is his sql with a minor edit:
SELECT
PQ.author_id CustomerID,
c.field_id_122 CompanyName,
PQ.totalOrders
FROM
( SELECT
t.author_id
SUM( o.field_id_22 ) as totalOrders
FROM
exp_channel_data o
JOIN
exp_channel_titles t ON t.author_id = o.entry_id AND o.channel_id = 3
GROUP BY
t.author_id ) PQ
JOIN
exp_channel_data c ON PQ.author_id = c.entry_id AND c.channel_id = 7
ORDER BY CustomerID
If this is the same table, then the same columns across the board for all alias instances.
I would ensure an index on (channel_id, entry_id, field_id_117 ) if possible. Another index on (author_id) for the prequery of order totals
Then, start first with what will become an inner query doing nothing but a per customer sum of order amounts.. Since the join is the "author_id" as the customer ID, just query/sum that first. Not completely understanding the (what I would consider) poor design of the structure, knowing what the "Channel_ID" really indicates, you don't want to duplicate summation values because of these other things in the mix.
select
o.author_id,
sum( o.field_id_22 ) as totalOrders
FROM
exp_channel_data customers o
where
o.channel_id = 3
group by
o.author_id
If that is correct on the per customer (via author_id column), then that can be wrapped as follows
select
PQ.author_id CustomerID,
c.field_id_122 CompanyName,
PQ.totalOrders
from
( select
o.author_id,
sum( o.field_id_22 ) as totalOrders
FROM
exp_channel_data customers o
where
o.channel_id = 3
group by
o.author_id ) PQ
JOIN exp_channel_data c
on PQ.author_id = c.field_id_117
AND c.channel_id = 7
Can you post the results of an EXPLAIN query?
I'm guessing that your tables are not indexed well for this operation. All of the columns that you join on should probably be indexed. As a first guess I'd look at indexing exp_channel_data.field_id_117
Try something like this. Possibly you have error in joins. also check whether joins on columns are correct in your databases. Cross join may takes time to fetch large data, by mistake if your joins are not proper on columns.
select
link.author_id as customer_id,
customers.field_id_122 as company,
sum(orders.field_id_22) as total_or_orders
from exp_channel_data customers
join exp_channel_titles link on (link.author_id = customers.field_id_117 and
link.author_id = customer.channel_id = 7)
join exp_channel_data orders on (orders.entry_id = link.entry_id and orders.entry_id = orders.channel_id = 3)
group by customer_id