My database has 3 tables. One is called Customer, one is called Orders, and one is called RMA. The RMA table has the info regarding returns. I'll include a screen shot of all 3 so you can see the appropriate attributes. This is the code of the query I'm working on:
SELECT State, SKU, count(*)
from Orders INNER JOIN Customer ON Orders.Customer_ID = Customer.CustomerID
INNER JOIN RMA ON Orders.Order_ID = RMA.Reason
Group by SKU
Order by SKU
LIMIT 10;
I'm trying to get how much of each product(SKU) is returned in each state(State). Any help would really be appreciated. I'm not sure why, but anytime I include a JOIN statement, my query takes anywhere from 5 minutes to 20 minutes to process.
[ Customer table]
!2[ RMA table]
!3
Your query should look like this:
SELECT c.State, o.SKU, COUNT(*)
FROM Orders o INNER JOIN
Customer c
ON o.Customer_ID = c.CustomerID JOIN
RMA
ON o.Order_ID = RMA.Order_Id
GROUP BY c.State, o.SKU
ORDER BY SKU;
Your issue is probably the incorrect JOIN condition between Orders and RMA.
If you have primary keys properly declared on the tables, then this query should have good-enough performance.
Given you are joining with an Orders table I'm going to assume this table contains all the orders that the company has ever done. This can be quite large and would likely cause the slowness you are seeing.
You can likely improve this query if you place some constraint on the Orders you are selecting, restricting what date range you use is common way to do this. If you provide more information about what the query is for and how large the dataset is everyone will be able to provide better guidance as to what filters would work best.
Related
I am trying to make an INNER JOIN statement that will join two tables, the Orders table and the Customer table, these both share the value/key of CustomerID. The Customer table has the information for which state a customer lives in. The Orders table has the information for which customer, according to their customer ID, bought which product. I need to find which products are the top 3 most popular in certain states. Please find the table descriptions images below, so you can understand what I mean.
Orders table:
Customer table:
How can I make this INNER JOIN statement and include the logical operators (and/or) to make this happen?
Thanks!
Try this,
SELECT column_name(s)
FROM Customer
INNER JOIN Order
ON Customer.CustomerID= Order.Customer_ID AND <conditions>;
Inner Join is just only 1 way of joining 2 tables. You can also join these two tables using WHERE closure as follows,
SELECT column_name(s)
FROM Customer c, Order o
WHERE c.CustomerID = o.Customer_ID AND <condition>
I'm having trouble creating some queries.
I'm using Sakila DB. I am trying to create a new column with the number of delays per client, using "count ((datediff (rental.rental_date, rental.return_date))> film.rental_duration as n"...
Which are the top 10 customers with the most delays in returning movies.
Select customer.first_name, customer.last_name, count ((datediff (rental.rental_date, rental.return_date))> film.rental_duration as nTime
From customer,film,rental,inventory
Where customer.customer_id=rental.customer_id
and rental.inventory_id=inventory.inventory_id
and (datediff (rental.rental_date,rental.return_date)) > film.rental_duration
limit 10;
What am I doing wrong?
Thanks!!
I made a few assumptions, but this should be quite close to what you want:
select c.customer_id, count(*)
from
customer c
inner join rental r
on r.customer_id = c.customer_id
inner join film f
on f.film_id = r.film_id
and (datediff (r.rental_date, r.return_date)) > f.rental_duration
group by c.customer_id
order by count(*) desc
limit 10;
Problems with your query:
it is missing aggregation; you need to group records by customers, so you can compute how many late rental returns happened per customer
it is missing a join condition for the film table; I assumed that film relates to rental through column film_id
the previous issue would have been much more easier to spot if you were using standard, explicit joins instead of old-school, implicit joins; this is one of the many reasons why you should always use standard joins
as commented by Thorsten Kettner, the inventory table seems superfluous in this query: the 3 other tables contain all the information you need
I have 3 tables:
1. products(product_id,name)
2. orders(id,order_id,product_id)
3. factors(id,order_id,date)
I want to retrieve product names(products.name) where have similar order_id on a date in two last tables.
I use this query for this purpose:
select products.name
from products
WHERE products.product_id ~IN
(
SELECT distinct orders.product_id FROM orders WHERE
order_id IN (select order_id FROM factors WHERE
factors.datex ='2017-04-29') GROUP BY product_id
)
but no result. where is my mistake? how can I resolve that? thanks
Your query should be fine. I am rewriting it to make a few changes to the structure, but not the logic (this makes it easier for me to understand the query):
select p.name
from products p
where p.product_id in (select o.product_id
from orders o
where o.order_id in (select f.order_id
from factors f
where f.datex = '2017-04-29'
)
) ;
Notes on the changes:
When using multiple tables in a query, always qualify the column names.
Use table aliases. They make queries easier to write and to read.
SELECT DISTINCT and GROUP BY are unnecessary in IN subqueries. The logic of IN already handles (i.e. ignores) duplicates. And by explicitly including the operations, you run the risk of a less efficient query plan.
Why might your query not work?
factors.datex has a time component. If so, then this will work date(f.datex) = '2017-04-29'.
There are no factors on that date.
There are no orders that match factors on that date.
There are no products in the orders that match the factors on that date.
In factors table column name is date so it should be -
factors.date ='2017-04-29'
You have written -
factors.datex ='2017-04-29'
I want to expand UI on my CodeIgniter shop with suggestions on what other people bought with the current product (either when viewing product or when product is put in the cart, irrelevant now for the question).
I have came up with this query (orders table contains order details, while order items contains products that are in specific order via foreign key, prd alias is for products table where all important info about prduct is stored).
Query looks like this
SELECT
pr.product_id,
COUNT(*) AS num,
prd.*
FROM
orders AS o
INNER JOIN order_items AS po ON o.id = po.order_id
INNER JOIN order_items AS pr ON o.id = pr.order_id
INNER JOIN products AS prd ON pr.product_id = prd.id
WHERE
po.product_id = '14211'
AND pr.product_id <> '14211'
GROUP BY
pr.product_id
ORDER BY
num DESC
LIMIT 3
It works nice and dandy, query time is 0.030ish seconds and it returns the products that bought together with the one I am currently viewing.
As for the questions and considerations, Percona query analyzer complains about this two things, Non-deterministic GROUP BY and GROUP BY or ORDER BY on different tables, which both I need so that I can get items on top that are actually relevant for the related query, but absolutely have no idea how to fix it, or even should I be really bothered with this notice from query analyzer.
Second question is regarding performace, since for this query, it using temporary and filesort, I was thinking of creating a view out of this query, and use it instead of actually executing the query each time some product is opened.
Mind you that I am not asking for CI model/view/controller tips, just tips on how to optimize this query, and/or suggestions regarding performance and going for views approach...
Any help is much than appreciated.
SELECT p.num, prd.*
FROM
(
SELECT a.product_id, COUNT(*) AS num
FROM orders AS o
INNER JOIN order_items AS b ON o.id = b.order_id
INNER JOIN order_items AS a ON o.id = a.order_id
WHERE b.product_id = '14211'
AND a.product_id <> '14211'
GROUP BY a.product_id
ORDER BY num DESC
LIMIT 3
) AS p
JOIN products AS prd ON p.product_id = prd.id
ORDER BY p.num DESC
This should
Run faster (especially as your data grows),
Avoid the group by complaint,
not over-inflate the count,
etc
Ignore the complaint about GROUP BY and ORDER BY coming from different tables -- that is a performance issue; you need it.
As for translating that back to CodeIgniter, good luck.
I'm trying to answer to the following query:
Select the first name and last name of the clients which rent films (that have DVD's) from all the categories, ordering by first name and last name.
Database consists in:
(better view - open in a new tab)
Inventory -> DVD's
Rental -> Rents customers did
Category table:
| category_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(25) | YES | | NULL |
My doubt is in how to assign that a field from a query must contain all ids from another query (categories).
I mean I understand the fact we can natural join inventory with rental and film, and then find an id that fails on a single category, then we know he doesn't contain all... But I can't complete this.
I have this solution (But I can't understand it very well):
SELECT first_name, last_name
FROM customer AS C WHERE NOT EXISTS
(SELECT * FROM category AS K WHERE NOT EXISTS
(SELECT * FROM (film NATURAL JOIN inventory) NATURAL JOIN rental
WHERE C.customer_id = customer_id AND K.category_id = category_id));
Are there any other solutions?
On our projects, we NEVER use NATURAL JOIN. That doesn't work for us, because the PRIMARY KEY is always a surrogate column named id, and the foreign key columns are always tablename_id.
A natural join would match id in one table to id in the other table, and that's not what we want. We also frequently have "housekeeping" columns in the tables that are named the same, such as version column used for optimistic locking pattern.
And even if our naming conventions were different, and the join columns were named the same, there would be a potential for a join in an existing query to change if we added a column to a table that was named the same as a column in another table.
And, reading SQL statement that includes a NATURAL JOIN, we can't see what columns are actually being matched, without running through the table definitions, looking for columns that are named the same. That seems to put an unnecessary burden on the reader of the statement. (A SQL statement is going to be "read" many more times than it's written... the author of the statement saving keystrokes isn't a beneficial tradeoff for ambiguity leading to extra work by future readers.
(I know others have different opinions on this topic. I'm sure that successful software can be written using the NATURAL JOIN pattern. I'm just not smart enough or good enough to work with that. I'll give significant weight to the opinions of DBAs that have years of experience with database modeling, implementing schemas, writing and tuning SQL, supporting operational systems, and dealing with evolving requirements and ongoing maintenance.)
Where was I... oh yes... back to regularly scheduled programming...
The image of the schema is way too small for me to decipher, and I can't seem to copy any text from it. Output from a SHOW CREATE TABLE is much easier to work with.
Did you have a SQL Fiddle setup?
I don't thin the query in the question will actually work. I thought there was a limitation on how far "up" a correlated subquery could reference an outer query.
To me, it looks like this predicate
WHERE C.customer_id = customer_id
^^^^^^^^^^^^^
is too deep. The subquery that's in isn't allowed to reference columns from C, that table is too high up. (Maybe I'm totally wrong about that; maybe it's Oracle or SQL Server or Teradata that has that restriction. Or maybe MySQL used to have that restriction, but a later version has lifted it.)
OTHER APPROACHES
As another approach, we could get each customer and a distinct list of every category that he's rented from.
Then, we could compare that list of "customer rented category" with a complete list of (distinct) category. One fairly easy way to do that would be to collapse each list into a "count" of distinct category, and then compare the counts. If a count for a customer is less than the total count, then we know he's not rented from every category. (There's a few caveats, We need to ensure that the customer "rented from category" list contains only categories in the total category list.)
Another approach would be to take a list of (distinct) customer, and perform a cross join (cartesian product) with every possible category. (WARNING: this could be fairly large set.)
With that set of "customer cross product category", we could then eliminate rows where the customer has rented from that category (probably using an anti-join pattern.)
That would leave us with a set of customers and the categories they haven't rented from.
OP hasn't setup a SQL Fiddle with tables and exemplar data; so, I'm not going to bother doing it either.
I would offer some example SQL statements, but the table definitions from the image are unusable; to demonstrate those statements actually working, I'd need some exemplar data in the tables.
(Again, I don't believe the statement in the question actually works. There's no demonstration that it does work.)
I'd be more inclined to test it myself, if it weren't for the NATURAL JOIN syntax. I'm not smart enough to figure that out, without usable table definitions.
If I worked on that, the first think I would do would be to re-write it to remove the NATURAL keyword, and add actual predicates in an actual ON clause, and qualify all of the column references.
And the query would end up looking something like this:
SELECT c.first_name
, c.last_name
FROM customer c
WHERE NOT EXISTS
( SELECT 1
FROM category k
WHERE NOT EXISTS
( SELECT 1
FROM film f
JOIN inventory i
ON i.film_id = f.film_id
JOIN rental r
ON r.inventory_id = i.inventory_id
WHERE f.category_id = k.category_id
AND r.customer_id = c.customer_id
)
)
(I think that reference to c.customer_id is too deep to be valid.)
EDIT
I stand corrected on my conjecture that the reference to C.customer_id was too many levels "deep". That query doesn't throw an error for me.
But it also doesn't seem to return the resultset that we're expecting, I may have screwed it up somehow. Oh well.
Here's an example of getting the "count of distinct rental category" for each customer (GROUP BY c.customer_id, just in case we have two customers with the same first and last names) and comparing to the count of category.
SELECT c.last_name
, c.first_name
FROM customer c
JOIN rental r
ON r.customer_id = c.customer_id
JOIN inventory i
ON i.inventory_id = r.inventory_id
JOIN film f
ON f.film_id = i.film_id
GROUP
BY c.last_name
, c.first_name
, c.customer_id
HAVING COUNT(DISTINCT f.category_id)
= (SELECT COUNT(DISTINCT a.category_id) FROM category a)
ORDER
BY c.last_name
, c.first_name
, c.customer_id
EDIT
And here's a demonstration of the other approach, generating a cartesian product of all customers and all categories (WARNING: do NOT do this on LARGE sets!), and find out if any of those rows don't have a match.
-- customers who have rented from EVERY category
-- h = cartesian (cross) product of all customers with all categories
-- g = all categories rented by each customer
-- perform outer join, return all rows from h and matching rows from g
-- if a row from h does not have a "matching" row found in g
-- columns from g will be null, test if any rows have null values from g
SELECT h.last_name
, h.first_name
FROM ( SELECT hi.customer_id
, hi.last_name
, hi.first_name
, hj.category_id
FROM customer hi
CROSS
JOIN category hj
) h
LEFT
JOIN ( SELECT c.customer_id
, f.category_id
FROM customer c
JOIN rental r
ON r.customer_id = c.customer_id
JOIN inventory i
ON i.inventory_id = r.inventory_id
JOIN film f
ON f.film_id = i.film_id
GROUP
BY c.customer_id
, f.category_id
) g
ON g.customer_id = h.customer_id
AND g.category_id = h.category_id
GROUP
BY h.last_name
, h.first_name
, h.customer_id
HAVING MIN(g.category_id IS NOT NULL)
ORDER
BY h.last_name
, h.first_name
, h.customer_id
I will take a stab at this, only because I am curious why the answer proposed seems so complex. First, a couple of questions.
So your question is: "Select the first name and last name of the clients which rent films (that have DVD's) from all the categories, ordering by first name and last name."
So, just go through the rental database, joining customer. I am not sure what the category part has anything to do with this, as you are not selecting or displaying any category, so that does not need to be part of the search, it is implied as when they rent a DVD, that DVD has a category.
SELECT C.first_name, C.last_name
FROM customer as C JOIN rental as R
ON (C.customer_id = R.customer_id)
WHERE R.return_date IS NOT NULL;
So, you are looking for movies that are currently rented, and displaying the first and last names of customers with active rentals.
You can also do some UNIQUE to reduce the number of duplicate customers that show up in the list.
Does this help?!