MYSQL Subquery and count number of rows containing a value - mysql

I have two tables oc_order o and oc_order_total ot
oc_order o has fields o.customer_id, o.date_added, o.email,o.total
And oc_order_total ot has fields ot.code and ot.value
I want to show results only if customer orders more than 3 times that is if customer_id repeats thrice or more in the result and show ot.value where ot.code = 'shipping'
I am trying to do following
SELECT COUNT(o.customer_id) AS 'Orders Count', o.date_added, o.email,o.total, ot.value
FROM oc_order o
Inner join oc_order_total ot ON ot.order_id = o.order_id
WHERE count(o.customer_id) > 3 AND ot.value = (select value from oc_order_total where code = 'shipping' )
GROUP BY o.customer_id
I am getting Invalid use of group error and I think I am not using subquery correctly in where clause.

You can not use SUM/COUNT in WHERE statement. You need to use for this HAVING operator. Try this query:
SELECT COUNT(o.customer_id) AS 'Orders Count', o.date_added, o.email,o.total, ot.value
FROM oc_order o
INNER JOIN oc_order_total ot ON ot.order_id = o.order_id AND
WHERE ot.code IN (select value from oc_order_total where code = 'shipping' )
GROUP BY o.customer_id
HAVING count(o.customer_id) > 3
EDIT: Adding example with ot.value not affected by GROUP BY.
SELECT o.order_id, q.orders_count AS 'Orders Count', o.date_added, o.email,o.total, ot.value
FROM oc_order o
INNER JOIN oc_order_total ot ON ot.order_id = o.order_id
INNER JOIN (SELECT o.customer_id, COUNT(o.customer_id) AS orders_count
FROM oc_order o
INNER JOIN oc_order_total ot ON ot.order_id = o.order_id
WHERE ot.code IN (select value from oc_order_total where code = 'shipping' )
GROUP BY o.customer_id
HAVING count(o.customer_id) > 3) AS q ON q.customer_id = o.customer_id
Basically what happens here is that you pre-filter customers with previous query and then use these pre-filtered list to get individual orders for these customers meeting your criteria. On this individual orders_id you can perform any operation without grouping cause you already eliminated customers not meeting your needs. Hope it helps.

NO, you are getting error cause your GROUP BY doesn't contains all the column listed in SELECT list and you can't use COUNT() aggregate function like that in WHERE condition; it's only allowed in HAVING clause. You can modify your query like
SELECT o.date_added
, o.email
, o.total
, ot.value
FROM oc_order o
JOIN oc_order_total ot
ON ot.order_id = o.order_id
JOIN
( SELECT COUNT(customer_id) AS 'Orders Count'
, customer_id
FROM oc_order
GROUP
BY customer_id
) xxx
ON xxx .customer_id = o.customer_id
WHERE xxx.`Orders Count` > 3
AND ot.code = 'shipping';

For one thing, aggregates cannot be referenced in the WHERE clause.
The predicates in the WHERE clause are evaluated when the rows are accessed. At the time the rows are being retrieved, MySQL doesn't have any information about the value returned by aggregate functions (e.g. COUNT()).
The COUNT() aggregate will be evaluated after the rows are accessed, and after the GROUP BY operation.
An aggregate can be referenced in a HAVING clause, following the GROUP BY clause.
GROUP BY ...
HAVING COUNT(...) > 3
Note that the HAVING clause is evaluated after the GROUP BY operation, and after the values of the aggregate expressions are evaluated. Much different than the WHERE clause.
Also, the subquery is a bit odd. Because it's being referenced in an equality comparison, the subquery can return at most one row. We don't see anything in the query that would prevent that, unless code is guaranteed to be unique in `ot_.
If code is unique, then we wouldn't need a subquery at all, we could just test for code.
WHERE ot.code = 'shipping'
If there are multiple rows with "shipping" in the code column, we don't see any guarantee that the value column on those rows will be the same. To test for any of the possible values, we could use an IN operator instead of the equality (scalar) comparison. e.g.
WHERE ot.value IN ( SELECT v.value
FROM oc_order_total v
WHERE v.code = 'shipping'
)
But that still looks really odd. What's strange is that it's using the same table as the outer query. If we are using a subquery to lookup the set of values related to a string code, that's usually a separate lookup table. And we'd normally prefer a JOIN operation rather than an IN (subquery). Very strange.
Also, the non-aggregate expressions in the SELECT list
o.date_added, o.email, o.total, ot.value
Do not appear in the GROUP BY clause. Most relational databases will throw an error "non-aggregate in select not in group by" something of that ilk.
But a MySQL extension will allow the query to run, but the values returned for the non-aggregates is indeterminate. MySQL will return values for those expressions based on some row included in the collapsed set, but there's no guarantee which row that will be.
We can also get MySQL to throw an error like other databases, by including ONLY_FULL_GROUP_BY in the sql_mode variable.

Related

MySQL aliasing not working when subjoining

I have two main tables: orders and PayPal transactions. I'm trying to get only the distinct values from my PayPal transactions table. Since there is no unique identifier in my transactions table I have tried to use a subquery to retrieve them.
The problem with my query is that MySQL doesn't recognize my aliases. Therefore, it gives me an Unknown column error.
/* SQL Error (1054): Unknown column 'pp.Date' in 'field list' */
SELECT
pp.Date
FROM hub.orders o
LEFT JOIN
(SELECT p.transaction_of_interest AS ppID
FROM financial.paypal AS p
GROUP BY p.transaction_of_interest
) AS pp ON pp.ppID = o.ex_trans_id
You are not getting DATE column from PP sub-query. If you include that column it will work as you would expect. If your result set multiplying because of TRANSACTION_OF_INTEREST values are not distinct then you should use a function on P.DATE like MAX to singularize yor TRANSACTION_OF_INTEREST values.
Which PP.DATE values you are need ? Is there any condition like last date or something ?
SELECT PP.DATE
FROM HUB.ORDERS O
LEFT JOIN (SELECT P.TRANSACTION_OF_INTEREST AS PPID,P.DATE
FROM FINANCIAL.PAYPAL AS P
GROUP BY P.TRANSACTION_OF_INTEREST,P.DATE
) AS PP ON PP.PPID = O.EX_TRANS_ID
You can only refer to those fields via the derived table's alias that you included in the select list for the derived table. Since you did not include the date field in the select list, you cannot reference it.
You need to add the ¬Date¬ field to the select list in the subquery and to the group by clause as well.
SELECT
pp.Date
FROM hub.orders o
LEFT JOIN
(SELECT p.transaction_of_interest AS ppID, p.Date
FROM financial.paypal AS p
GROUP BY p.transaction_of_interest, p.Date
) AS pp ON pp.ppID = o.ex_trans_id
My best interpretation of your question is that you want distinct dates of PayPal transactions.
If you only want dates from the paypal table, doesn't this do what you want?
SELECT DISTINCT p.DATE
FROM financial.paypal p;
If the dates come from the orders table, but you only want them for PayPal transactions, then LEFT JOIN is not appropriate:
SELECT DISTINCT o.Date
FROM hub.orders o JOIN
financial.paypal p
ON pp.ppID = o.ex_trans_id

select DISTINCT email based off of matching ID across tables

I'm trying to pull unique emails that have matching IDs across two tables.
SELECT line_items.order_id, line_items.id, orders.email, orders.name
FROM orders INNER JOIN
line_items
ON orders.id = line_items.order_id
WHERE line_items.order_id IN (SELECT DISTINCT email FROM orders WHERE status = 0 AND created_at BETWEEN '2018-01-10' AND NOW() )
LIMIT 50;
I know my error is based upon the fact that the line_items.order_is is an INT and therefore the IN parameter is looking for another int column to match against. However, I'm not sure how to modify this to get pull the proper results. Any and all help is greatly appreciated.
I'm trying to pull unique emails that have matching IDs across two tables.
If you mean distinct emails, then your subquery would appear to do this:
SELECT DISTINCT o.email
FROM o.orders
WHERE o.status = 0 AND o.created_at BETWEEN '2018-01-10' AND NOW();
Because an order should have at least one line item, I don't see why that table is necessary.
Your query comes close to answering the question: "Pull all orders for emails that have made a recent order with status 0". If that is what you want:
SELECT li.order_id, li.id, o.email, o.name
FROM orders o INNER JOIN
line_items li
ON o.id = li.order_id
WHERE o.email IN (SELECT o2.email FROM orders o2 WHERE o2.status = 0 AND o2.created_at BETWEEN '2018-01-10' AND NOW() )
LIMIT 50;
Hard to follow but I think you want:
SELECT sub.order_id, sub.line_item_id, sub.email, o.name
FROM
(SELECT o.email, MIN(o.id) AS order_id, MIN(i.id) AS line_item_id
FROM orders o
INNER JOIN line_items i
ON o.id = i.order_id
WHERE o.status = 0
AND o.created_at BETWEEN '2018-01-10' AND NOW()
GROUP BY o.email) sub
LEFT JOIN orders o
ON sub.order_id = o.id
In the sub-query, select each email along with the first order ID and line item ID. Then join this back to orders to pull the order name. This does assume that the MIN(line_item) will show up with the MIN(order_id) for each email, so you'll have to let me know if that is not a valid assumption.

SQL retrieving filtered value in subquery

in this cust_id is a foreign key and ords returns the number of orders for every customers
SELECT cust_name, (
SELECT COUNT(*)
FROM Orders
WHERE Orders.cust_id = Customers.cust_id
) AS ords
FROM Customers
The output is correct but i want to filter it to retrieve only the customers with less than a given amount of orders, i don't know how to filter the subquery ords, i tried WHERE ords < 2 at the end of the code but it doesn't work and i've tried adding AND COUNT(*)<2 after the cust_id comparison but it doesn't work. I am using MySQL
Use the HAVING clause (and use a join instead of a subquery).....
SELECT Customers.cust_id, Customers.cust_name, COUNT(*) ords
FROM Orders, Customers
WHERE Orders.cust_id = Customers.cust_id
GROUP BY 1,2
HAVING COUNT(*)<2
If you want to include people with zero orders you change the join to an outer join.
There is no need for a correlated subquery here, because it calculates the value for each row which doesn't give a "good" performance. A better approach would be to use a regular query with joins, group by and having clause to apply your condition to groups.
Since your condition is to return only customers that have less than 2 orders, left join instead of inner join would be appropriate. It would return customers that have no orders as well (with 0 count).
select
cust_name, count(*)
from
customers c
left join orders o on c.cust_id = o.cust_id
group by cust_name
having count(*) < 2

Optimize SQL: Customers that haven't ordered for x days

I have created this SQL in order to find customers that haven't ordered for X days.
It is returning a result set, so this post is mainly just to get a second opinion on it, and possible optimizations.
SELECT o.order_id,
o.order_status,
o.order_created,
o.user_id,
i.identity_firstname,
i.identity_email,
(SELECT COUNT(*)
FROM orders o2
WHERE o2.user_id=o.user_id
AND o2.order_status=1) AS order_count,
(SELECT o4.order_created
FROM orders o4
WHERE o4.user_id=o.user_id
AND o4.order_status=1
ORDER BY o4.order_created DESC LIMIT 1) AS last_order
FROM orders o
INNER JOIN user_identities ui ON o.user_id=ui.user_id
INNER JOIN identities i ON ui.identity_id=i.identity_id
AND i.identity_email!=''
INNER JOIN subscribers s ON i.identity_id=s.identity_id
AND s.subscriber_status=1
AND s.subsriber_type=e
AND s.subscription_id=1
WHERE DATE(o.order_created) = "2013-12-14"
AND o.order_status=1
AND o.user_id NOT IN
(SELECT o3.user_id
FROM orders o3
WHERE o3.user_id=o.user_id
AND o3.order_status=1
AND DATE(o3.order_created) > "2013-12-14")
Can you guys find any potential problems with this SQL? Dates are dynamically inserted.
The final SQL that I put in production, will basically only include o.order_id, i.identity_id and o.order_count - this order_count will need to be correct. The other selected fields and 'last_order' subquery will not be included, it's only for testing.
This should give me a list of users that have their last order on that particular day, and is a newsletter subscriber. I am particular in doubt about correctness of the NOT IN part in the WHERE clause, and the order_count subquery.
There are several problems:
A. Using functions on indexable columns
You are searching for orders by comparing DATE(order_created) with some constant. This is a terrible idea, because a) the DATE() function is executed for every row (CPU) and b) the database can't use an index on the column (assuming one existed)
B. Using WHERE ID NOT IN (...)
Using a NOT IN (...) is almost always a bad idea, because optimizers usually have trouble with this construct, and often get the plan wrong. You can almost always express it as an outer join with a WHERE condition that filters for misses using an IS NULL condition for a joined column (and adds the side benefit of not needing DISTINCT, because there's only ever one miss returned)
C. Leaving joins that filtering out of large portions of rows too late
The earlier you can mask off rows by not making joins the better. You can do this by joining less likely to match tables earlier in the joined table list, and by putting non-key conditions into join rather than the where clause to get the rows excluded as early as possible. Some optimizers to this anyway, but I've often found they don't
D. Avoid correlated subqueries like the plague!
You have several correlated subqueries - ones that are executed for every row of the main table. That's really an incredibly bad idea. Again sometimes the optimizer can craft them into a join, but why rely (hope) on that. Most correlated subqueries can be expressed as a join; you examples are no exception.
With the above in mind, there are some specific changes:
o2 and o4 are the same join, so o4 may be dispensed with entirely - just use o2 after conversion to a join
DATE(order_created) = "2013-12-14" should be written as order_created between "2013-12-14 00:00:00" and "2013-12-14 23:59:59"
This query should be what you want:
SELECT
o.order_id,
o.order_status,
o.order_created,
o.user_id,
i.identity_firstname,
i.identity_email,
count(o2.user_id) AS order_count,
max(o2.order_created) AS last_order
FROM orders o
LEFT JOIN orders o2 ON o2.user_id = o.user_id AND o2.order_status=1
LEFT JOIN orders o3 ON o3.user_id = o.user_id
AND o3.order_status=1
AND o3.order_created >= "2013-12-15 00:00:00"
JOIN user_identities ui ON o.user_id=ui.user_id
JOIN identities i ON ui.identity_id=i.identity_id AND i.identity_email != ''
JOIN subscribers s ON i.identity_id=s.identity_id
AND s.subscriber_status=1
AND s.subsriber_type=e
AND s.subscription_id=1
WHERE o.order_created between "2013-12-14 00:00:00" and "2013-12-14 23:59:59"
AND o.order_status=1
AND o3.order_created IS NULL -- This gets only missed joins on o3
GROUP BY
o.order_id,
o.order_status,
o.order_created,
o.user_id,
i.identity_firstname,
i.identity_email;
The last line is how you achieve the same as NOT IN (...) using a LEFT JOIN
Disclaimer: Not tested.
Can't really comment on the results as you have not posted any table declares or example data, but your query has 3 correlated sub queries which is likely to make it perform poorly (OK, one of those is for last_order and is only for testing).
Eliminating the correlated sub queries and replacing them with joins would give something like this:-
SELECT o.order_id,
o.order_status,
o.order_created,
o.user_id,
i.identity_firstname,
i.identity_email,
Sub1.order_count,
Sub2.last_order
FROM orders o
INNER JOIN user_identities ui ON o.user_id=ui.user_id
INNER JOIN identities i ON ui.identity_id=i.identity_id
AND i.identity_email!=''
INNER JOIN subscribers s ON i.identity_id=s.identity_id
AND s.subscriber_status=1
AND s.subsriber_type=e
AND s.subscription_id=1
LEFT OUTER JOIN
(
SELECT user_id, COUNT(*) AS order_count
FROM orders
WHERE order_status=1
GROUP BY user_id
) Sub1
ON o.user_id = Sub1.user_id
LEFT OUTER JOIN
(
SELECT user_id, MAX(order_created) as last_order
FROM orders
WHERE order_status=1
GROUP BY user_id
) AS Sub2
ON o.user_id = Sub2.user_id
LEFT OUTER JOIN
(
SELECT DISTINCT user_id
FROM orders
WHERE order_status=1
AND DATE(order_created) > "2013-12-14"
) Sub3
ON o.user_id = Sub3.user_id
WHERE DATE(o.order_created) = "2013-12-14"
AND o.order_status=1
AND Sub3.user_id IS NULL

Error 1111 (HY000) - Invalid use of group function

I'm getting a problem when trying to run this query:
Select
c.cname as custName,
count(distinct o.orderID) as No_of_orders,
avg(count(distinct o.orderID)) as avg_order_amt
From Customer c
Inner Join Order_ o
On o.customerID = c.customerID
Group by cname;
This is an error message: #1111 (HY000) - Invalid use of group function
I just want to select each customer, find how many orders each customer has, and average the total number of orders for each customer. I think it might have a problem with too many aggregates in query.
The issue is that you need to have two separate groupings if you want to calculate the average over a count, so this expression isn't valid:
avg(count(distinct o.orderID))
Now it's hard to understand what exactly you mean, but it sounds as if you just want to use avg(o.amount) instead.
[edit] I see your addition now, so while the error is still the same, the solution will be slightly more complex. The last value you need, the avarage number of orders per customer, is not a value to calculate per customer. You'd need analytical functions to that, but that might be quite tricky in MySQL. I'd recommend to write a separate query for that, otherwise you would have very complex query which would return the same number for each row anyway.
select c.cname, o.customerID, count(*), avg(order_total)
from order o join customer using(customerID)
group by 1,2
This will calculate the number of orders and average order total (substitute the real column name for order_total) for each customer.
how many orders each customer has,
average the total number of orders.
SELECT
c1.cname AS custName,
c1.No_of_orders,
c2.avg_order_amt
FROM (
SELECT
c.id,
c.cname,
COUNT(DISTINCT o.orderID) AS No_of_orders
FROM
Customer c
JOIN Order_ o ON o.customerID = c.customerID
GROUP BY c.id, c.cname
) c1
CROSS JOIN (SELECT AVG(No_of_orders) AS avg_order_amt FROM (
SELECT
c.id,
COUNT(DISTINCT o.orderID) AS No_of_orders
FROM
Customer c
JOIN Order_ o ON o.customerID = c.customerID
GROUP BY c.id
)) c2