Rewrite IN subquery as JOIN - mysql

I've never had good performance with IN in MySQL and I've hit a performance issue with it again.
I'm trying to create a view. The relevant part of it is:
SELECT
c.customer_id,
....
IF (c.customer_id IN (
SELECT cn.customer_id FROM customer_notes cn
), 1, 0) AS has_notes
FROM customers c;
Basically, I just want to know if the customer has a note attached to it or not. It doesn't matter how many notes. How can I rewrite this using JOIN to speed it up?
The customers table currently has 1.5 million rows so performance is an issue.

Don't you need the customer ID selected? As it stands, aren't you running the subquery once per customer, and getting a stream of true or false values with no idea which one applies to which customer?
If that is what you need, you don't need to reference the customers table (unless you keep your database in a state of semantic disintegrity and there could be entries in customer_notes for which there is no corresponding customer - but then you have bigger problems than the performance of this query); you can simply use:
SELECT DISTINCT Customer_ID
FROM Customer_Notes
ORDER BY Customer_ID;
to obtain the list of customer ID values with at least one entry in the Customer_Notes table.
If you want a list of Customer ID values and an associated true/false value, then you need to do a join:
SELECT C.Customer_ID,
CASE WHEN N.Have_Notes IS NULL THEN 0 ELSE 1 END AS Has_Notes
FROM Customers AS C
LEFT JOIN (SELECT Customer_ID, COUNT(*) AS Have_Notes
FROM Customer_Notes
GROUP BY Customer_ID) AS N
ON C.Customer_ID = N.Customer_ID
ORDER BY C.Customer_ID;
If this gives poor performance, check that you have an index on Customer_Notes.Customer_ID. If that isn't the issue, study the query plan.
Can't do ... in a view
The petty restrictions on what is allowed in a view is always a nuisance in any DBMS (MySQL is not alone in having restrictions). However, we can do it with a single regular join. I just remembered. COUNT(column) only counts non-null values, returning 0 if all values are null, so - if you don't mind getting a count rather than just 0 or 1 - you can use:
SELECT C.Customer_ID,
COUNT(N.Customer_ID) AS Num_Notes
FROM Customers AS C
LEFT JOIN Customer_Notes AS N
ON C.Customer_ID = N.Customer_ID
GROUP BY C.Customer_ID
ORDER BY C.Customer_ID;
And if you absolutely must have 0 or 1:
SELECT C.Customer_ID,
CASE WHEN COUNT(N.Customer_ID) = 0 THEN 0 ELSE 1 END AS Has_Notes
FROM Customers AS C
LEFT JOIN Customer_Notes AS N
ON C.Customer_ID = N.Customer_ID
GROUP BY C.Customer_ID
ORDER BY C.Customer_ID;
Note that the use of 'N.Customer_ID' is crucial - though any column in the table would do (but you've not divulged the names of any other columns, AFAICR) and I'd normally use something other than the joining column for clarity.

I think EXISTS suits your situation better than JOIN or IN.
SELECT
IF (EXISTS (
SELECT *
FROM customer_notes cn
WHERE c.customer_id = cn.customer_id),
1, 0) AS filter_notes
FROM customers

Try this
SELECT
CASE WHEN cn.customer_id IS NOT NULL THEN 1
ELSE 0
END AS filter_notes
FROM customers c LEFT JOIN customer_notes cn
ON c.customer_id= cn.customer_id

Related

MySQL: Optimizing Sub-queries

I have this query I need to optimize further since it requires too much cpu time and I can't seem to find any other way to write it more efficiently. Is there another way to write this without altering the tables?
SELECT category, b.fruit_name, u.name
, r.count_vote, r.text_c
FROM Fruits b, Customers u
, Categories c
, (SELECT * FROM
(SELECT *
FROM Reviews
ORDER BY fruit_id, count_vote DESC, r_id
) a
GROUP BY fruit_id
) r
WHERE b.fruit_id = r.fruit_id
AND u.customer_id = r.customer_id
AND category = "Fruits";
This is your query re-written with explicit joins:
SELECT
category, b.fruit_name, u.name, r.count_vote, r.text_c
FROM Fruits b
JOIN
(
SELECT * FROM
(
SELECT *
FROM Reviews
ORDER BY fruit_id, count_vote DESC, r_id
) a
GROUP BY fruit_id
) r on r.fruit_id = b.fruit_id
JOIN Customers u ON u.customer_id = r.customer_id
CROSS JOIN Categories c
WHERE c.category = 'Fruits';
(I am guessing here that the category column belongs to the categories table.)
There are some parts that look suspicious:
Why do you cross join the Categories table, when you don't even display a column of the table?
What is ORDER BY fruit_id, count_vote DESC, r_id supposed to do? Sub query results are considered unordered sets, so an ORDER BY is superfluous and can be ignored by the DBMS. What do you want to achieve here?
SELECT * FROM [ revues ] GROUP BY fruit_id is invalid. If you group by fruit_id, what count_vote and what r.text_c do you expect to get for the ID? You don't tell the DBMS (which would be something like MAX(count_vote) and MIN(r.text_c)for instance. MySQL should through an error, but silently replacescount_vote, r.text_cbyANY_VALUE(count_vote), ANY_VALUE(r.text_c)` instead. This means you get arbitrarily picked values for a fruit.
The answer hence to your question is: Don't try to speed it up, but fix it instead. (Maybe you want to place a new request showing the query and explaining what it is supposed to do, so people can help you with that.)
Your Categories table seems not joined/related to the others this produce a catesia product between all the rows
If you want distinct resut don't use group by but distint so you can avoid an unnecessary subquery
and you dont' need an order by on a subquery
SELECT category
, b.fruit_name
, u.name
, r.count_vote
, r.text_c
FROM Fruits b
INNER JOIN Customers u ON u.customer_id = r.customer_id
INNER JOIN Categories c ON ?????? /Your Categories table seems not joined/related to the others /
INNER JOIN (
SELECT distinct fruit_id, count_vote, text_c, customer_id
FROM Reviews
) r ON b.fruit_id = r.fruit_id
WHERE category = "Fruits";
for better reading you should use explicit join syntax and avoid old join syntax based on comma separated tables name and where condition
The next time you want help optimizing a query, please include the table/index structure, an indication of the cardinality of the indexes and the EXPLAIN plan for the query.
There appears to be absolutely no reason for a single sub-query here, let alone 2. Using sub-queries mostly prevents the DBMS optimizer from doing its job. So your biggest win will come from eliminating these sub-queries.
The CROSS JOIN creates a deliberate cartesian join - its also unclear if any attributes from this table are actually required for the result, if it is there to produce multiples of the same row in the output, or just an error.
The attribute category in the last line of your query is not attributed to any of the tables (but I suspect it comes from the categories table).
Further, your code uses a GROUP BY clause with no aggregation function. This will produce non-deterministic results and is a bug. Assuming that you are not exploiting a side-effect of that, the query can be re-written as:
SELECT
category, b.fruit_name, u.name, r.count_vote, r.text_c
FROM Fruits b
JOIN Reviews r
ON r.fruit_id = b.fruit_id
JOIN Customers u ON u.customer_id = r.customer_id
ORDER BY r.fruit_id, count_vote DESC, r_id;
Since there are no predicates other than joins in your query, there is no scope for further optimization beyond ensuring there are indexes on the join predicates.
As all too frequently, the biggest benefit may come from simply asking the question of why you need to retrieve every single row in the tables in a single query.

subquery shows more that one row group by

I am trying to get the data for the best 5 customers in a railway reservation system. To get that, I tried getting the max value by summing up their fare every time they make a reservation. Here is the code.
SELECT c. firstName, c.lastName,MAX(r.totalFare) as Fare
FROM customer c, Reservation r, books b
WHERE r.resID = b.resID
AND c.username = b.username
AND r.totalfare < (SELECT sum(r1.totalfare) Revenue
from Reservation r1, for_res f1, customer c1,books b1
where r1.resID = f1.resID
and c1.username = b1.username
and r1.resID = b1.resID
group by c1.username
)
GROUP BY c.firstName, c.lastName, r.totalfare
ORDER BY r.totalfare desc
LIMIT 5;
this throws the error:[21000][1242] Subquery returns more than 1 row
If I remove the group by from the subquery the result is:(its a tabular form)
Jade,Smith,1450
Jade,Smith,725
Jade,Smith,25.5
Monica,Geller,20.1
Rach,Jones,10.53
But that's not what I want, as you can see, I want to add the name 'Jade' with the total fare.
I just don't see the point for the subquery. It seems like you can get the result you want with a sum()
select c.firstname, c.lastname, sum(totalfare) as totalfare
from customer c
inner join books b on b.username = c.username
inner join reservation r on r.resid = b.resid
group by c.username
order by totalfare desc
limit 5
This sums all reservations of each client, and use that information to sort the resulstet. This guarantees one row per customer.
The query assumes that username is the primary key of table customer. If that's not the case, you need to add columns firstname and lastname to the group by clause.
Note that this uses standard joins (with the inner join ... on keywords) rather than old-school, implicit joins (with commas in the from clause: these are legacy syntax, that should not be used in new code.

SQL retrieving filtered value in subquery

in this cust_id is a foreign key and ords returns the number of orders for every customers
SELECT cust_name, (
SELECT COUNT(*)
FROM Orders
WHERE Orders.cust_id = Customers.cust_id
) AS ords
FROM Customers
The output is correct but i want to filter it to retrieve only the customers with less than a given amount of orders, i don't know how to filter the subquery ords, i tried WHERE ords < 2 at the end of the code but it doesn't work and i've tried adding AND COUNT(*)<2 after the cust_id comparison but it doesn't work. I am using MySQL
Use the HAVING clause (and use a join instead of a subquery).....
SELECT Customers.cust_id, Customers.cust_name, COUNT(*) ords
FROM Orders, Customers
WHERE Orders.cust_id = Customers.cust_id
GROUP BY 1,2
HAVING COUNT(*)<2
If you want to include people with zero orders you change the join to an outer join.
There is no need for a correlated subquery here, because it calculates the value for each row which doesn't give a "good" performance. A better approach would be to use a regular query with joins, group by and having clause to apply your condition to groups.
Since your condition is to return only customers that have less than 2 orders, left join instead of inner join would be appropriate. It would return customers that have no orders as well (with 0 count).
select
cust_name, count(*)
from
customers c
left join orders o on c.cust_id = o.cust_id
group by cust_name
having count(*) < 2

Slow aggregate query with join on same table

I have a query to show customers and the total dollar value of all their orders. The query takes about 100 seconds to execute.
I'm querying on an ExpressionEngine CMS database. ExpressionEngine uses one table exp_channel_data, for all content. Therefore, I have to join on that table for both customer and order data. I have about 14,000 customers, 30,000 orders and 160,000 total records in that table.
Can I change this query to speed it up?
SELECT link.author_id AS customer_id,
customers.field_id_122 AS company,
Sum(orders.field_id_22) AS total_orders
FROM exp_channel_data customers
JOIN exp_channel_titles link
ON link.author_id = customers.field_id_117
AND customers.channel_id = 7
JOIN exp_channel_data orders
ON orders.entry_id = link.entry_id
AND orders.channel_id = 3
GROUP BY customer_id
Thanks, and please let me know if I should include other information.
UPDATE SOLUTION
My apologies. I noticed that entry_id for the exp_channel_data table customers corresponds to author_id for the exp_channel_titles table. So I don't have to use field_id_117 in the join. field_id_117 duplicates entry_id, but in a TEXT field. JOINING on that text field slowed things down. The query is now 3 seconds
However, the inner join solution posted by #DRapp is 1.5 seconds. Here is his sql with a minor edit:
SELECT
PQ.author_id CustomerID,
c.field_id_122 CompanyName,
PQ.totalOrders
FROM
( SELECT
t.author_id
SUM( o.field_id_22 ) as totalOrders
FROM
exp_channel_data o
JOIN
exp_channel_titles t ON t.author_id = o.entry_id AND o.channel_id = 3
GROUP BY
t.author_id ) PQ
JOIN
exp_channel_data c ON PQ.author_id = c.entry_id AND c.channel_id = 7
ORDER BY CustomerID
If this is the same table, then the same columns across the board for all alias instances.
I would ensure an index on (channel_id, entry_id, field_id_117 ) if possible. Another index on (author_id) for the prequery of order totals
Then, start first with what will become an inner query doing nothing but a per customer sum of order amounts.. Since the join is the "author_id" as the customer ID, just query/sum that first. Not completely understanding the (what I would consider) poor design of the structure, knowing what the "Channel_ID" really indicates, you don't want to duplicate summation values because of these other things in the mix.
select
o.author_id,
sum( o.field_id_22 ) as totalOrders
FROM
exp_channel_data customers o
where
o.channel_id = 3
group by
o.author_id
If that is correct on the per customer (via author_id column), then that can be wrapped as follows
select
PQ.author_id CustomerID,
c.field_id_122 CompanyName,
PQ.totalOrders
from
( select
o.author_id,
sum( o.field_id_22 ) as totalOrders
FROM
exp_channel_data customers o
where
o.channel_id = 3
group by
o.author_id ) PQ
JOIN exp_channel_data c
on PQ.author_id = c.field_id_117
AND c.channel_id = 7
Can you post the results of an EXPLAIN query?
I'm guessing that your tables are not indexed well for this operation. All of the columns that you join on should probably be indexed. As a first guess I'd look at indexing exp_channel_data.field_id_117
Try something like this. Possibly you have error in joins. also check whether joins on columns are correct in your databases. Cross join may takes time to fetch large data, by mistake if your joins are not proper on columns.
select
link.author_id as customer_id,
customers.field_id_122 as company,
sum(orders.field_id_22) as total_or_orders
from exp_channel_data customers
join exp_channel_titles link on (link.author_id = customers.field_id_117 and
link.author_id = customer.channel_id = 7)
join exp_channel_data orders on (orders.entry_id = link.entry_id and orders.entry_id = orders.channel_id = 3)
group by customer_id

MySQL query - 'CAST' ' CONCAT' - How to multiply several rows of data by a certain amount and display their individual totals in a new column in £'s?

What's the best way to query a total price?
I want to multiply several rows of data by a certain amount and display their individual totals in a new column in £'s in my database?
What syntax do I need?
Edit:
I have loads of customers. Some have only one order, some multiple orders. I want to start charging them £1.50 per order, therefore, I need x(times) the order amount by £1.50 and display it in a new column in £. E.g customers with 4 order would need to be x £1.50 which would display £6.00 in column 3 and so on... 1.st column is name, second column is order amount. 3rd column needs to be total price. Hope that makes sense
Update from comments:
It's counted the orders, however it's returning BLOB values in the 3rd column where I want to display £ values for the * calculation of orders:
SELECT CONCAT_WS(" "
, c.customer_title
, c.customer_fname
, c.customer_sname
) AS Customer
, COUNT(O.order_name) AS Ordertotal
, concat('£' * 1.5) TotalPrice
FROM Order O, Friend F, Customer C, FriendOrder
WHERE C.customer_id = F.Customer_id
AND F.Friend_id = FriendOrder.friend_id
AND O.order_id = FriendOrder.order_id
GROUP BY Customer
ORDER BY C.customer_sname, C.customer_fname
It is a best practice to separate tasks, leaving computation to the SQL programming, and presentation to whatever programming language you use for the front end.
So your SQL should use it's native * operator. Your query might look like:
SELECT `column_1` * `column_2` as `product`;
this would return the product of two columns in a column named 'product'.
The £ sign is formatting. You should leave that to whatever architecture you have written for presenting the information. (PHP or java for example)
Applying a lot of imagination because of the lack of description of your data and fields, this should do the trick:
select c.name, count(*) orderAmount, concat('£', count(*) * 1.5) totalPrice
from customers c
join orders o on c.customerId = o.customerId
group by c.customerId, c.name
You shouldn't add the answer to the original question, since this makes finding out the question confusing.
It looks like everything was answered except the blob part -- here is the final result:
SELECT
CONCAT_WS(
" ",
c.customer_title,
c.customer_fname,
c.customer_sname
) AS Customer,
COUNT(*) AS Ordertotal,
CONCAT('£', cast(count(*) * 1.5 as char)) AS TotalPrice
FROM Order O
INNER JOIN FriendOrder fo
ON O.order_id = fo.order_id
INNER JOIN Friend F
ON fo.friend_id = F.Friend_id
INNER JOIN Customer C
ON F.Customer_id = C.customer_id
GROUP BY Customer
ORDER BY C.customer_sname, C.customer_fname
To avoid the 'blobs', cast to char since you are creating a display string. Here is the snippet from the query:
cast(count(*) * 1.5 as char)