I have a MySQL database of 3 tables:
I. Person (id, name, purchases)
II. Purchase(id, product, date_purchased)
III. Catalog(id, product, cost-per-unit)
Person.purchases holds Purchase.id. That is, everytime a person buys something, the order id gets recorded in Person.purchases. For eg. Person.purchases has 1, 300, 292 stored in it.
Each Purchase entry records an instance of any item purchased. So, Purchase.id = 300 could be "foo".
And Catalog holds description about "foo".
What I want to find out is how to answer: "Who bought "foo"? I know how to answer this question in 2 steps as such:
Step 1: SELECT Purchases.id FROM Purchases INNER JOIN Catalog WHERE Purchases.product = Catalog.product;
I would store step 1's result in a variable tmp;
STEP 2: SELECT name FROM Person WHERE Person.orders LIKE "%tmp%";
I am using LIKE above because Person.orders stores multiple Purchase.id.
Is there a way to combine these two into one query?
The question can be answered using a single query:
Using EXISTS
SELECT a.name
FROM PERSON a
WHERE EXISTS(SELECT NULL
FROM PURCHASE b
JOIN CATALOG c ON c.product = b.product
WHERE FIND_IN_SET(b.id, a.purchases) > 0
AND c.product = 'foo')
Using a JOIN:
This requires DISTINCT (or GROUP BY) because duplicates are possible, if a person/customer has bought "foo" more than once.
SELECT DISTINCT a.name
FROM PERSON a
JOIN PURCHASE b ON FIND_IN_SET(b.id, a.purchases) > 0
JOIN CATALOG c ON c.product = b.product
WHERE c.product = 'foo'
Addendum
I agree with the other answers that the data model is poor - there should be a person/customer id in the PURCHASE table, not the PERSON table. But it doesn't change things drastically.
This is a poor database design and it's holding you back from answering a relatively simple question. I'd design your tables somewhat like this:
customers (id, name)
purchases (id, product_id, customer_id, date_purchased)
products (id, product_name, cost_per_unit)
Thus, your query to figure out 'Who bought foo?' is:
SELECT c.id, c.name
FROM products pr
LEFT JOIN purchases pu ON (pr.id = pu.product_id)
INNER JOIN customers c ON (pu.customer_id = c.id)
WHERE product_id = foo
-- could replace with product_name = 'foo' here, but you should know product_id
This has your database in a somewhat normal form (I don't remember which one exactly) so you can take advantage of the features that relational databases offer.
It might also be useful to make another table here, call it receipts, and rename purchases to line_items. This ensures that you can track customers who buy multiple items in one purchase, etc.
For MySQL this might be a better solutions:
SELECT person.*
FROM person
JOIN purchases
ON FIND_IN_SET(purchases.id,person.purchases) > 0
WHERE purchases.product = 'foo';
A much better structure of your tables would be:
I. Person (personid, name) ---purchases deleted from here
II. Purchase (purchaseid, buyerid, productid, date_purchased) ---buyerid added
III. Catalog (productid, product, cost-per-unit)
So, instead of storing purchaces of a person in Person table, store them in Purchase table.
This will have several benefits:
You can store as many purchases as you like. The way it is now, the "purchases" field will eventually be filled with purchases and what will you do then?
Easier to write your queries.
(If Person.purchases has ",1,300,292," stored in it, e.g. commas at start and end of field and no spaces), your question can be answered in one query like that:
If there are spaces and no commas at start and end the condition wili be more complex but surely it can be done.
SELECT p.id, p.name
FROM Person p
JOIN Purchase pur
ON p.purchases LIKE CONCAT("%,",CAST(pur.id AS CHAR),",%")
WHERE pur.product LIKE "foo"
And you don't need the join with Catalog since Product name is in Purchase table too.
If you do want to have info from Catalog, you could have the other join too:
SELECT p.id, p.name, cat.*
FROM Person p
JOIN Purchase pur
ON p.purchases LIKE CONCAT("%,",CAST(pur.id AS CHAR),",%")
JOIN Catalog cat
ON pur.product = cat.product
WHERE pur.product LIKE "foo" ---or cat.product LIKE "foo"
Related
I am using the Chinook database for a project and I have two difficult queries to execute, but both provide errors.
I am looking for all the orders (invoice) that were sent to 'New York' and contain tracks that belong to more than one genre. [InvoiceId, amount of products, total1, total2]. Total1 should be unitprice*quantity and total2 is total. It should show only 2 rows.
So far I have come up with this. I have also tried switching up with left join, full outer join, etc
CREATE TEMPORARY TABLE temp AS
SELECT *
FROM track join invoiceline USING (TrackId)
WHERE (select * from track t1 where EXISTS (select * from track t2 where t1.GenreId <> t2.GenreId));
SELECT invoice.InvoiceId, invoiceline.Quantity, invoiceline.UnitPrice*invoiceline.Quantity, invoice.Total
FROM (SELECT * FROM invoice JOIN invoiceline
WHERE invoice.BillingCity LIKE '%New York%') JOIN temp cc ON invoiceline.TrackId
GROUP BY invoiceline.InvoiceId;
DROP TABLE temp;
It provides the error:
Operand should contain 1 column(s)
I am looking for clients (in couples) that have bought more than two of the same tracks. It should provide 14 rows.
Until now I have come up with this.
SELECT CONCAT(FIRSTNAME,',', LASTNAME) AS name1 FROM customer
JOIN invoice ON customer.CustomerId = invoice.CustomerId
JOIN invoiceline ON invoice.InvoiceId = invoiceline.InvoiceId
JOIN track ON invoiceline.TrackId = track.TrackId
UNION
(
SELECT CONCAT(FIRSTNAME,',', LASTNAME) AS name2 FROM customer
JOIN invoice ON customer.CustomerId = invoice.CustomerId
JOIN invoiceline ON invoice.InvoiceId = invoiceline.InvoiceId
JOIN track ON invoiceline.TrackId = track.TrackId
);
So A) Does anybody know why it provides that error?
B) Could anyone give any tips or suggest a better way to write these queries?
Here are two helpful schemas:ER diagram
relational diagram
Answer to you first question:
The error comes up because many rows would have a single genre id. This method is also very redundant.
You should use count of genre Ids and take track Ids with count more than 1 as shown below:
CREATE TEMPORARY TABLE temp AS
SELECT *
FROM track join invoiceline USING (TrackId)
WHERE TrackId in
(select TrackId from (select TrackId, count(distinct GenreId) as genres from track group by 1 having genres>1));
SELECT invoice.InvoiceId, invoiceline.Quantity, invoiceline.UnitPrice*invoiceline.Quantity, invoice.Total
FROM (SELECT * FROM invoice JOIN invoiceline
WHERE invoice.BillingCity LIKE '%New York%') JOIN temp cc ON invoiceline.TrackId
GROUP BY invoiceline.InvoiceId;
DROP TABLE temp;
I have assumed that track id is the primary key here.
For the second question, I assume that you want to find customers buying the same records. You can use a query like the one below:
SELECT invoiceline.TrackId, group_concat(customer.CustomerId) as customers FROM customer
JOIN invoice ON customer.CustomerId = invoice.CustomerId
JOIN invoiceline ON invoice.InvoiceId = invoiceline.InvoiceId
JOIN track ON invoiceline.TrackId = track.TrackId
group by 1
This will give you comma separated customer ids who have bought the same track. Also, use customer id instead of first name and last name since some customers can have the same name. Using primary key is best.
Since you mentioned, you want customers buying the same records in couples, I would suggest reading up on market basket analysis or association analysis using apriori algorithm. You can import your dataset into R or Python whichever you are comfortable with and build a visualization. Python is faster and can handle more data but its visualizations are bad. R is a bit slow at handling large amounts of data but has good visualizations for apriori algorithm
I need to identify products that have purchased more than 1 time.
ERD diagram looks like this:
I wrote this query
SELECT DISTINCT good_name
FROM Goods
JOIN Payments
on Payments.good = Goods.good_id
WHERE good in (SELECT good
FROM (SELECT good
, COUNT(good) as c
FROM Payments
GROUP
BY good) as a
WHERE c > 1)
It works, but is this code great?
Grouping would work better:
SELECT good_name
FROM Goods
JOIN Payments on Payments.good = Goods.good_id
GROUP BY Goods.good_id
HAVING COUNT(Payments.good) > 1
Probably you also need an index over Payments.good column.
It is also better to create another column in the table Goods which will hold success payments count and update it after each payment.
I have 3 tables:
Store(sID, name, address, mID)
Sells(sID, pID)
Product(pID, name, manufacturer, price)
I need to find which stores stock every product from a given manufacturer. For example: to search Unilever, I'd expect to return only stores which stock ALL Unilever products listed in Product, not just some of them.
I've tried lots of different queries with most being completely off the mark.
Am I right in thinking I need to create a subset of all the products made by Unilever, then somehow go through Sells and check that the list of pIDs for each sID contain all of those in the initial subset? I can then join the result with Store to get the store details.
If that's the correct logic, where would one begin?
This creates a subset of all the unilever products:
SELECT pID FROM Product WHERE manufacturer = "Unilever"
How would I then check this list against each store in Sells to find the ones that contain all the products in the list?
One possible way is to join the product table twice, once via the sells table and once directly, then use COUNT(DISTINT ...) on each joined table to check they match
SELECT st.*
FROM store st
INNER JOIN sells se ON st.sID = se.sID
INNER JOIN product pr1 ON se.pID = pr1.pID AND pr1.manufacturer = "Unilever"
INNER JOIN product pr2 ON pr2.manufacturer = "Unilever"
GROUP BY st.sID,
st.name,
st.address,
st.mID
HAVING COUNT(DISTINCT pr1.pID) = COUNT(DISTINCT pr2.pID)
how about creating a Product column in Store table so that all you have to do is search for Unilever Product in the Store table. Then apply this:
SELECT Product FROM Store WHERE Product = "Unilever"
But of course you wouldnt want the column Product in your Store table so my answer cant be correct then
I'm trying to answer to the following query:
Select the first name and last name of the clients which rent films (that have DVD's) from all the categories, ordering by first name and last name.
Database consists in:
(better view - open in a new tab)
Inventory -> DVD's
Rental -> Rents customers did
Category table:
| category_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(25) | YES | | NULL |
My doubt is in how to assign that a field from a query must contain all ids from another query (categories).
I mean I understand the fact we can natural join inventory with rental and film, and then find an id that fails on a single category, then we know he doesn't contain all... But I can't complete this.
I have this solution (But I can't understand it very well):
SELECT first_name, last_name
FROM customer AS C WHERE NOT EXISTS
(SELECT * FROM category AS K WHERE NOT EXISTS
(SELECT * FROM (film NATURAL JOIN inventory) NATURAL JOIN rental
WHERE C.customer_id = customer_id AND K.category_id = category_id));
Are there any other solutions?
On our projects, we NEVER use NATURAL JOIN. That doesn't work for us, because the PRIMARY KEY is always a surrogate column named id, and the foreign key columns are always tablename_id.
A natural join would match id in one table to id in the other table, and that's not what we want. We also frequently have "housekeeping" columns in the tables that are named the same, such as version column used for optimistic locking pattern.
And even if our naming conventions were different, and the join columns were named the same, there would be a potential for a join in an existing query to change if we added a column to a table that was named the same as a column in another table.
And, reading SQL statement that includes a NATURAL JOIN, we can't see what columns are actually being matched, without running through the table definitions, looking for columns that are named the same. That seems to put an unnecessary burden on the reader of the statement. (A SQL statement is going to be "read" many more times than it's written... the author of the statement saving keystrokes isn't a beneficial tradeoff for ambiguity leading to extra work by future readers.
(I know others have different opinions on this topic. I'm sure that successful software can be written using the NATURAL JOIN pattern. I'm just not smart enough or good enough to work with that. I'll give significant weight to the opinions of DBAs that have years of experience with database modeling, implementing schemas, writing and tuning SQL, supporting operational systems, and dealing with evolving requirements and ongoing maintenance.)
Where was I... oh yes... back to regularly scheduled programming...
The image of the schema is way too small for me to decipher, and I can't seem to copy any text from it. Output from a SHOW CREATE TABLE is much easier to work with.
Did you have a SQL Fiddle setup?
I don't thin the query in the question will actually work. I thought there was a limitation on how far "up" a correlated subquery could reference an outer query.
To me, it looks like this predicate
WHERE C.customer_id = customer_id
^^^^^^^^^^^^^
is too deep. The subquery that's in isn't allowed to reference columns from C, that table is too high up. (Maybe I'm totally wrong about that; maybe it's Oracle or SQL Server or Teradata that has that restriction. Or maybe MySQL used to have that restriction, but a later version has lifted it.)
OTHER APPROACHES
As another approach, we could get each customer and a distinct list of every category that he's rented from.
Then, we could compare that list of "customer rented category" with a complete list of (distinct) category. One fairly easy way to do that would be to collapse each list into a "count" of distinct category, and then compare the counts. If a count for a customer is less than the total count, then we know he's not rented from every category. (There's a few caveats, We need to ensure that the customer "rented from category" list contains only categories in the total category list.)
Another approach would be to take a list of (distinct) customer, and perform a cross join (cartesian product) with every possible category. (WARNING: this could be fairly large set.)
With that set of "customer cross product category", we could then eliminate rows where the customer has rented from that category (probably using an anti-join pattern.)
That would leave us with a set of customers and the categories they haven't rented from.
OP hasn't setup a SQL Fiddle with tables and exemplar data; so, I'm not going to bother doing it either.
I would offer some example SQL statements, but the table definitions from the image are unusable; to demonstrate those statements actually working, I'd need some exemplar data in the tables.
(Again, I don't believe the statement in the question actually works. There's no demonstration that it does work.)
I'd be more inclined to test it myself, if it weren't for the NATURAL JOIN syntax. I'm not smart enough to figure that out, without usable table definitions.
If I worked on that, the first think I would do would be to re-write it to remove the NATURAL keyword, and add actual predicates in an actual ON clause, and qualify all of the column references.
And the query would end up looking something like this:
SELECT c.first_name
, c.last_name
FROM customer c
WHERE NOT EXISTS
( SELECT 1
FROM category k
WHERE NOT EXISTS
( SELECT 1
FROM film f
JOIN inventory i
ON i.film_id = f.film_id
JOIN rental r
ON r.inventory_id = i.inventory_id
WHERE f.category_id = k.category_id
AND r.customer_id = c.customer_id
)
)
(I think that reference to c.customer_id is too deep to be valid.)
EDIT
I stand corrected on my conjecture that the reference to C.customer_id was too many levels "deep". That query doesn't throw an error for me.
But it also doesn't seem to return the resultset that we're expecting, I may have screwed it up somehow. Oh well.
Here's an example of getting the "count of distinct rental category" for each customer (GROUP BY c.customer_id, just in case we have two customers with the same first and last names) and comparing to the count of category.
SELECT c.last_name
, c.first_name
FROM customer c
JOIN rental r
ON r.customer_id = c.customer_id
JOIN inventory i
ON i.inventory_id = r.inventory_id
JOIN film f
ON f.film_id = i.film_id
GROUP
BY c.last_name
, c.first_name
, c.customer_id
HAVING COUNT(DISTINCT f.category_id)
= (SELECT COUNT(DISTINCT a.category_id) FROM category a)
ORDER
BY c.last_name
, c.first_name
, c.customer_id
EDIT
And here's a demonstration of the other approach, generating a cartesian product of all customers and all categories (WARNING: do NOT do this on LARGE sets!), and find out if any of those rows don't have a match.
-- customers who have rented from EVERY category
-- h = cartesian (cross) product of all customers with all categories
-- g = all categories rented by each customer
-- perform outer join, return all rows from h and matching rows from g
-- if a row from h does not have a "matching" row found in g
-- columns from g will be null, test if any rows have null values from g
SELECT h.last_name
, h.first_name
FROM ( SELECT hi.customer_id
, hi.last_name
, hi.first_name
, hj.category_id
FROM customer hi
CROSS
JOIN category hj
) h
LEFT
JOIN ( SELECT c.customer_id
, f.category_id
FROM customer c
JOIN rental r
ON r.customer_id = c.customer_id
JOIN inventory i
ON i.inventory_id = r.inventory_id
JOIN film f
ON f.film_id = i.film_id
GROUP
BY c.customer_id
, f.category_id
) g
ON g.customer_id = h.customer_id
AND g.category_id = h.category_id
GROUP
BY h.last_name
, h.first_name
, h.customer_id
HAVING MIN(g.category_id IS NOT NULL)
ORDER
BY h.last_name
, h.first_name
, h.customer_id
I will take a stab at this, only because I am curious why the answer proposed seems so complex. First, a couple of questions.
So your question is: "Select the first name and last name of the clients which rent films (that have DVD's) from all the categories, ordering by first name and last name."
So, just go through the rental database, joining customer. I am not sure what the category part has anything to do with this, as you are not selecting or displaying any category, so that does not need to be part of the search, it is implied as when they rent a DVD, that DVD has a category.
SELECT C.first_name, C.last_name
FROM customer as C JOIN rental as R
ON (C.customer_id = R.customer_id)
WHERE R.return_date IS NOT NULL;
So, you are looking for movies that are currently rented, and displaying the first and last names of customers with active rentals.
You can also do some UNIQUE to reduce the number of duplicate customers that show up in the list.
Does this help?!
I have two tables, Customers and Products. A customer can have more than one product.
I am trying to retrieve customers that do not have a specific product.
For example, 10 customers bought products A and B, another 10 customers bought A, B, and C. How can I retrieve those customers that do not have the C product?
For your current DB structure, this is what you are looking for:
select c.id, c.name, c.phone, c.address
from Customers c
where not exists (select * from products p
where p.customer_id = c.id and p.id = 'c')
However, you should consider creating a third table to store the individual purchases.
select *
FROM customer c
WHERE NOT EXISTS (SELECT 1 from products p
WHERE p.customer_id = c.id)
You should really (as suggested already by #Tony andrews and #Adrian) have a third table to store details of which customers bought which product.
Somthing like:
**Customer**
Id
Name
Address
Phone
**Product**
Id
Name
Price
**Customer_Product**
customer_id
product_id
This means you're removing redundancy from your product table. Consider what you'd need to do if a product name changed slightly - instead of updating multiple rows (as you'd have to do now), you'd only have to update 1 row, and you wouldn't need to touch your transaction history at all..