MySQL Querying Multiple Tables - mysql

Thank you for taking a look at my question! I've been trying to figure out a single query to do the following but have been unsuccessful. I would truly appreciate any help. Thank you in advance :-)
I am making an admin page for my e-commerce store that shows products that haven't sold in the last X days. There are three tables that need to be used...
Table: products
Column: product_id (int)
This table/column contains all the products in the store
Table: orders
Columns: order_id (int), date_ordered (datetime)
This table contains all the orders which are identified by order_id and the date for which they were ordered (date_ordered).
Table: order_products
Column: order_id (int), product_id (int)
This table contains a complete listing of all products ordered (product_id) and the corresponding order (order_id).
So, the query I'm trying to figure out would use use the order_id in tables orders and order_products to determine which products have sold in the last X days... Then return any products_id from the products table which have not sold in the last X days.
Any suggestions? Any help would be very appreciated! Thank you :-)

Okay so while I agree in part that you should do some poking around and learn more about left joins, there is also some trickiness to answering this question correctly that might be lost on a beginner. I'm gonna go ahead and help you answer it, but I would recommend learning more about joins.
My exact query would depend on the available indices, but it very likely resemble something like this:
SELECT a.*
FROM products AS a
LEFT JOIN (
SELECT product_id FROM order_products as b
INNER JOIN orders AS c
ON b.order_id = c.order_id
WHERE c.date_ordered >= date_sub(c.date_ordered, INTERVAL 7 day)
GROUP BY product_id
) AS d
ON a.product_id = d.product_id
WHERE d.product_id IS NULL
What I'm doing is I'm writing a subquery that joins orders and orders products together, where date_ordered falls within a certain date range (I would recommend learning about the date_sub function here: http://www.w3schools.com/sql/func_date_sub.asp and also do a few quick SELECT date_sub(date_ordered, INTERVAL X DAY) FROM orders queries to make sure you understand how this calculation works, in practice.
Now, I get my list of orders for the last X days (7 in the query above) and I join it with the orders product table to get the products that were ordered. Here, I want to dedup my products, basically. Product_id = 300 may have been ordered 70 times. Product_id = 200 may have been ordered 50 times. Whatever the case may be, I don't want to join 70 records and 50 records to my product table for product ids 300 and 200, so I dedup them. That last GROUP BY statement does that. It's functionally the same thing as writing DISTINCT (although there can be minor differences in how these are computed in certain circumstances, none of those circumstances seem to apply here... use DISTINCT if that's clearer for you)
Once I have my list of unique product ids that were ordered in the past X days, I join that with my product table. Here, I use a left join. Like the comments noted above, you'll want to look into the notion of joins pretty carefully. Do that, if you haven't already.
Last, I apply a WHERE filter that says "WHERE d.product_id IS NULL." What this is doing is saying, "okay, if product_id = Y was ordered in the past X days, then it will join to my products table successfully with a.product_id = d.product_id. If it wasn't ordered, then a.product_id will exist in my result set, but d.product_id won't. That is, d.product_id will be null."
That last twist may be the part that's not apparent / standing out.
Hope this helps.

Related

Need the list of products and orders

This is my homework task:
Products which were ordered along with the 5 most ordered products more than once and the count of orders they were included in. (Do not include the 5 most ordered products in the final result)
Products and orders are in same table. Order detail contain Order detail ID, order id, product id, quantity.
I've tried everything but I'm struggling with "along with" statement in the query.
Here is a query I have tried:
select
productid,
count
(
(select productid from orderdetails)
and
(select productid from orderdetails order by quantity desc limit 5)
) as ORDERS
from orderdetails
group by productid
order by ORDERS desc
You select from orderdetails, aggregate to get one result row per product and you count. It is very common to count rows with COUNT(*), but you can also count expressions, e.g. COUNT(mycolumn) where you just count those that are not null. You are counting an expresssion (because it is not COUNT(*) but COUNT(something else) that you are using). The expression to test for null and count is
(select productid from orderdetails)
and
(select productid from orderdetails order by quantity desc limit 5)
This, however is not an expression that leads to one value that gets counted (when it's not null) or not (when it's null). You are selecting all product IDs from the orderdetails table and you are selecting all the five product IDs from the orderdetails table that got ordered with the highest quantity. And then you apply AND as if these were two booleans, but they are not, they are data sets. Apart from the inappropriate use of AND which is an operator on booleans and not on data sets, you are missing the point here that you should be looking for products in the same order, i.e. compare the order number somehow.
So all in all: This is completely wrong. Sorry to say that. However, the task is not at all easy in my opinion and in order to solve it, you should go slowly, step by step, to build your query.
Products which were ordered along with the 5 most ordered products more than once
Dammit; such a short sentence, but that is deceiving ;-) There is a lot to do for us...
First we must find the 5 products that got ordered most. That means sum up all sales and find the five top ones:
select productid
from orderdetails
group by productid
order by sum(quantity) desc
limit 5
(The problem with this: What if six products got ordered most, e.g. products A, B, and C with a quantity of 200 and products D, E, and F with a quantity of 100? We would get the top three plus two of the top 4 to 6. In standard SQL we would solve this with a ties clause, but MySQL's LIMIT doesn't feature this.)
Anyway. Now we are looking for products that got ordered with these five products along. Does this mean with all five at once? Probably not. We are rather looking for products that were in the same order with at least one of the top five.
with top_5_products as
(query above)
, orders_with_top_5 as
(select orderid
from orderdetails
where productid in (select productid from top_5_products)
)
, other_products_in_order as
(select productid, orderid
from orderdetails
where orderid in (select orderid from orders_with_top_5)
and productid not in (select productid from top_5_products)
And once we've got there, we must even find products that got ordered with some of the top 5 "more than once" which I interpret as to appear in at least two orders containing top 5 products.
with <all the above>
select productid
from other_products_in_order
group by productid
having count(*) > 1;
And while we have counted how many orders the products share with top 5 products, we are still not there, because we are supposed to show the number of orders the products were included in, which I suppose refers to all orders, not only those containing top 5 products. That is another count, that we can get in the select clause for instance. The query then becomes:
with <all the above>
select
productid,
(select count(*) from orderdetails od where od.productid = opio.productid)
from other_products_in_order opio
group by productid
having count(*) > 1;
That's quite a lot for homework seeing that you are struggling with the syntax still. And we haven't even addressed that top-5-or-more ties problem yet (for which analytic functions come in handy).
The WITH clause is available since MySQL 8 and helps getting such a query that builds up step by step readable. Old MySQL versions don't support this. If working with an old version I suggest you upgrade :-) Else you can use subqueries directly instead.

sql SELECT query for 3 tables

I have 3 tables:
1. products(product_id,name)
2. orders(id,order_id,product_id)
3. factors(id,order_id,date)
I want to retrieve product names(products.name) where have similar order_id on a date in two last tables.
I use this query for this purpose:
select products.name
from products
WHERE products.product_id ~IN
(
SELECT distinct orders.product_id FROM orders WHERE
order_id IN (select order_id FROM factors WHERE
factors.datex ='2017-04-29') GROUP BY product_id
)
but no result. where is my mistake? how can I resolve that? thanks
Your query should be fine. I am rewriting it to make a few changes to the structure, but not the logic (this makes it easier for me to understand the query):
select p.name
from products p
where p.product_id in (select o.product_id
from orders o
where o.order_id in (select f.order_id
from factors f
where f.datex = '2017-04-29'
)
) ;
Notes on the changes:
When using multiple tables in a query, always qualify the column names.
Use table aliases. They make queries easier to write and to read.
SELECT DISTINCT and GROUP BY are unnecessary in IN subqueries. The logic of IN already handles (i.e. ignores) duplicates. And by explicitly including the operations, you run the risk of a less efficient query plan.
Why might your query not work?
factors.datex has a time component. If so, then this will work date(f.datex) = '2017-04-29'.
There are no factors on that date.
There are no orders that match factors on that date.
There are no products in the orders that match the factors on that date.
In factors table column name is date so it should be -
factors.date ='2017-04-29'
You have written -
factors.datex ='2017-04-29'

Joining two tables, including count, and sorting by count in MySQL

Have the need to run a bit more complex of a MySQL query. I have two tables that I need to join where one contains the primary key on the other. That's easy enough, but then I need to find the number of occurrences of each ID returned as well, and ultimately sort all the results by this number.
Normally this would just be a group by, but I also need to see ALL of the results (so if it were a group by containing 10 records, I'd need to see all 10, as well as that count returned as well).
So for instance, two tables could be:
Customers table:
CustomerID name address phone etc..
Orders table:
OrderID CustomerID product info etc..
The idea is to output, and sort the orders table to find the customer with the most orders in a given time period. The resultant report would have a few hundred customers, along with their order info below.
I couldn't figure out a way to have it return the rows containing ALL the info from both tables, plus the number of occurences of each in one row. (customer info, individual orders info, and count).
I considered separating it into multiple queries (get the list of top customers), then a bunch of sub-queries for each order programmatically. But that was going to end up with many hundreds of sub-queries every time this is submitted.
So I was hoping someone might know of an easier way to do this. My thought was to have a return result with repeated information, but get it only in one query.
Thanks in advance!
SELECT CUST.CustomerID, CUST.Name, ORDR.OrderID, ORDR.OrderDate, ORDR.ProductInfo, COUNTS.cnt
FROM Customers CUST
INNER JOIN Orders ORDR
ON ORDR.CustomerID = CUST.CustomerID
INNER JOIN
(
SELECT C.CustomerID, COUNT(DISTINCT O.OrderID) AS cnt
FROM Customers C
INNER JOIN Orders O
ON O.CustomerID = C.CustomerID
GROUP BY C.CustomerID
) COUNTS
ON COUNTS.CustomerID = CUST.CustomerID
ORDER BY COUNTS.cnt DESC, CustomerID
This will return one row per order, displayed by customer, ordered by the number of orders for that customer.

counting the most sold products from mysql

I am trying to get the most sold products list by making a mysql query . The problem is its still getting all of the data even after i use count .
select
mf.*,
od.*,
count(od.product_id) as product_count
from masterflight mf ,
order_details od
where od.product_id=mf.ProductCode
group by od.product_id
order by product_count
Here masterflight is the table where the product details are stored with their ids . And order_details is the table where record of each product being sold individually are stored . What i was trying to put in a logic that suppose a product with id 2 is sold 4 times and each time it has a separate entry then i would count those using COUNT and then display it which does it seems to be working .
Try something a little neater:
select
mf.ProductCode,
count(od.*) as product_count
from
order_details od
inner join masterflight mf on
od.product_id = mf.ProductCode
group by
mf.ProductCode
order by product_count desc
The problem is that you're selecting all of od, but you're not grouping by it, so you're just getting all of the order rows, which doesn't help you really at all. I should note that MySQL is the only one of the major RDBMSes that allows that behavior--and it's confusing and tough to debug. I'd advise against using that particular feature. As a general rule, if you've selected a column but don't have an aggregate (e.g.-sum, avg, min, etc.) on it, then you need it in the group by clause.

MySQL huge tables JOIN makes database collapse

Following my recent question Select information from last item and join to the total amount, I am having some memory problems while generation tables
I have two tables sales1 and sales2 like this:
id | dates | customer | sale
With this table definition:
CREATE TABLE sales (
id int auto_increment primary key,
dates date,
customer int,
sale int
);
sales1 and sales2 have the same definition, but sales2 has sale=-1 in every field. A customer can be in none, one or both tables. Both tables have around 300.000 records and much more fields than indicated here (around 50 fields). They are InnoDB.
I want to select, for each customer:
number of purchases
last purchase value
total amount of purchases, when it has a positive value
The query I am using is:
SELECT a.customer, count(a.sale), max_sale
FROM sales a
INNER JOIN (SELECT customer, sale max_sale
from sales x where dates = (select max(dates)
from sales y
where x.customer = y.customer
and y.sale > 0
)
)b
ON a.customer = b.customer
GROUP BY a.customer, max_sale;
The problem is:
I have to get the results, that I need for certain calculations, separated for dates: information on year 2012, information on year 2013, but also information from all the years together.
Whenever I do just one year, it takes about 2-3 minutes to storage all the information.
But when I try to gather information from all the years, the database crashes and I get messages like:
InternalError: (InternalError) (1205, u'Lock wait timeout exceeded; try restarting transaction')
It seems that joining such huge tables is too much for the database. When I explain the query, almost all the percentage of time comes from creating tmp table.
I thought in splitting the data gathering in quarters. We get the results for every three months and then join and sort it. But I guess this final join and sort will be too much for the database again.
So, what would you experts recommend to optimize these queries as long as I cannot change the tables structure?
300k rows is not a huge table. We frequently see 300 million row tables.
The biggest problem with your query is that you're using a correlated subquery, so it has to re-execute the subquery for each row in the outer query.
It's often the case that you don't need to do all your work in one SQL statement. There are advantages to breaking it up into several simpler SQL statements:
Easier to code.
Easier to optimize.
Easier to debug.
Easier to read.
Easier to maintain if/when you have to implement new requirements.
Number of Purchases
SELECT customer, COUNT(sale) AS number_of_purchases
FROM sales
GROUP BY customer;
An index on sales(customer,sale) would be best for this query.
Last Purchase Value
This is the greatest-n-per-group problem that comes up frequently.
SELECT a.customer, a.sale as max_sale
FROM sales a
LEFT OUTER JOIN sales b
ON a.customer=b.customer AND a.dates < b.dates
WHERE b.customer IS NULL;
In other words, try to match row a to a hypothetical row b that has the same customer and a greater date. If no such row is found, then a must have the greatest date for that customer.
An index on sales(customer,dates,sale) would be best for this query.
If you might have more than one sale for a customer on that greatest date, this query will return more than one row per customer. You'd need to find another column to break the tie. If you use an auto-increment primary key, it's suitable as a tie breaker because it's guaranteed to be unique and it tends to increase chronologically.
SELECT a.customer, a.sale as max_sale
FROM sales a
LEFT OUTER JOIN sales b
ON a.customer=b.customer AND (a.dates < b.dates OR a.dates = b.dates and a.id < b.id)
WHERE b.customer IS NULL;
Total Amount of Purchases, When It Has a Positive Value
SELECT customer, SUM(sale) AS total_purchases
FROM sales
WHERE sale > 0
GROUP BY customer;
An index on sales(customer,sale) would be best for this query.
You should consider using NULL to signify a missing sale value instead of -1. Aggregate functions like SUM() and COUNT() ignore NULLs, so you don't have to use a WHERE clause to exclude rows with sale < 0.
Re: your comment
What I have now is a table with fields year, quarter, total_sale (regarding to the pair (year,quarter)) and sale. What I want to gather is information regarding certain period: this quarter, quarters, year 2011... Info has to be splitted in top customers, ones with bigger sales, etc. Would it be possible to get the last purchase value from customers with total_purchases bigger than 5?
Top Five Customers for Q4 2012
SELECT customer, SUM(sale) AS total_purchases
FROM sales
WHERE (year, quarter) = (2012, 4) AND sale > 0
GROUP BY customer
ORDER BY total_purchases DESC
LIMIT 5;
I'd want to test it against real data, but I believe an index on sales(year, quarter, customer, sale) would be best for this query.
Last Purchase for Customers with Total Purchases > 5
SELECT a.customer, a.sale as max_sale
FROM sales a
INNER JOIN sales c ON a.customer=c.customer
LEFT OUTER JOIN sales b
ON a.customer=b.customer AND (a.dates < b.dates OR a.dates = b.dates and a.id < b.id)
WHERE b.customer IS NULL
GROUP BY a.id
HAVING COUNT(*) > 5;
As in the other greatest-n-per-group query above, an index on sales(customer,dates,sale) would be best for this query. It probably can't optimize both the join and the group by, so this will incur a temporary table. But at least it will only do one temporary table instead of many.
These queries are complex enough. You shouldn't try to write a single SQL query that can give all of these results. Remember the classic quote from Brian Kernighan:
Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?
I think you should try adding an index on sales(customer, date). The subquery is probably the performance bottleneck.
You can make this puppy scream. Dump the whole inner join query. Really. This is a trick virtually no one seems to know about.
Assuming dates is a datetime, convert it to a sortable string, concatenate the values you want, max (or min), substring, cast. You may need to adjust the date convert function (this one works in MS-SQL), but this idea will work anywhere:
SELECT customer, count(sale), max_sale = cast(substring(max(convert(char(19), dates, 120) + str(sale, 12, 2)), 20, 12) as numeric(12, 2))
FROM sales a
group by customer
Voilá. If you need more result columns, do:
SELECT yourkey
, maxval = left(val, N1) --you often won't need this
, result1 = substring(val, N1+1, N2)
, result2 = substring(val, N1+N2+1, N3) --etc. for more values
FROM ( SELECT yourkey, val = max(cast(maxval as char(N1))
+ cast(resultCol1 as char(N2))
+ cast(resultCol2 as char(N3)) )
FROM yourtable GROUP BY yourkey ) t
Be sure that you have fixed lengths for all but the last field. This takes a little work to get your head around, but is very learnable and repeatable. It will work on any database engine, and even if you have rank functions, this will often significantly outperform them.
More on this very common challenge here.