Find customers with similar tastes while excluding certain customers - mysql

I have a table documenting purchases from customers, with one row per purchase:
CustomerID | ProductID
1 | 1000
1 | 2000
1 | 3000
2 | 1000
3 | 1000
3 | 3000
... | ...
I am using the following code to find the ten customers with the greatest number of overlapping products with customer #1 (first result is the one with the most overlap etc):
SELECT othercustomers.CustomerID, COUNT(DISTINCT othercustomers.ProductID)
FROM `purchases` AS thiscustomer
JOIN `purchases` AS othercustomers ON
thiscustomer.CustomerID != othercustomers.CustomerID
AND thiscustomer.ProductID = othercustomers.ProductID
WHERE thiscustomer.CustomerID = '1'
GROUP BY othercustomers.CustomerID
ORDER BY COUNT(DISTINCT othercustomers.ProductID) DESC
LIMIT 10
The code yields the expected output (Customer ID + total number of overlapping products with customer #1).
I would now like the query to exclude customers with overlapping purchases who have purchased more than 1000 different products, because these are bulk buyers who purchase the entire stock and whose purchase history therefore has no meaning when searching for customers with a similar taste.
In other words, if customer #500 had bought >1000 different products, I want him/her excluded from the results when searching for customers with a similar taste to that of customer #1 - even if customer #500 has bought all three products that customer #1 had bought and would ordinarily rank first in similarity/overlap.
I suppose some HAVING is in order, but I cannot seem to figure out what the appropriate condition is.
Thanks!

I think that HAVING won't do what you want, since it will only give you the total count of overlaping products, while you want the total count of products for the other customer.
You could filter with a correlated subquery in the WHERE clause:
SELECT othercustomers.CustomerID, COUNT(DISTINCT othercustomers.ProductID)
FROM `purchases` AS thiscustomer
JOIN `purchases` AS othercustomers ON
thiscustomer.CustomerID != othercustomers.CustomerID
AND thiscustomer.ProductID = othercustomers.ProductID
WHERE
thiscustomer.CustomerID = '1'
AND (
SELECT COUNT(DISTINCT ProductID)
FROM `purchases` AS p
WHERE p.CustomerID = othercustomers.CustomerID
) < 1000
GROUP BY othercustomers.CustomerID
ORDER BY COUNT(DISTINCT othercustomers.ProductID) DESC
LIMIT 10
For performance, you want an index on purchases(CustomerID, ProductID).

Related

MySQL Count and SUM from second table with group by

I'm trying to get sales and quantity sale by crossing two tables, group by the first one and sum from the second one.
First table has sales/operations: id_sales, sales_rep
Second table has sales details: id_sales_details, id_sales, quantity
What I need to know is how many operations had each sales_rep and what was the total quantity sum of all those sales.
This MySQL query gives me the first part:
SELECT sales.sales_rep, count(*) AS sales
from sales
Group by sales_rep
Order by sales DESC
What I cannot solve is how to add to that query the second part I need. The result should look something like:
sales_rep sales quantity
Claire 4 13
Peter 2 18
Mary 1 8
John 1 7
Here's a Fiddle to make things clearer: http://sqlfiddle.com/#!9/708234/5
SELECT s.sales_rep, count(*) AS operations, sum(d.quantity)
from sales s, sales_details d
where s.id_sales = d.id_sales
Group by s.sales_rep
Order by operations DESC;
Quick solution
SELECT w.sales_rep, w.sales, SUM(quantity) as quantity
FROM
(SELECT s.sales_rep, t.sales,d.quantity FROM sales AS s
INNER JOIN sales_details AS d ON s.id_sales = d.id_sales
INNER JOIN
(SELECT sales_rep, count(*) AS sales
from sales
Group by sales_rep
Order by sales DESC ) AS t
ON s.sales_rep = t.sales_rep) AS w
GROUP BY w.sales_rep, w.sales
ORDER BY w.sales_rep ASC

Group mysql results by cumulative column value

I have a database table events and a table bets. All bets placed for a particular event are located in the bets table while information about the event is stored in the events table.
Let's say I have these tables:
events table:
id event_title
1 Call of Duty Finals
2 DOTA 2 Semi-Finals
3 GTA V Air Race
bets table:
id event_id amount
1 1 $10
1 2 $50
1 2 $100
1 3 $25
1 3 $25
1 3 $25
I want to be able to sort by popularity aka how many bets have been placed for that event and by prize aka the total amount of money for that event.
SORTING BY PRIZE
Obviously this query doesn't work but I want to do something like this:
SELECT * FROM bets GROUP BY event_id SORT BY amount
amount from the query above should be a cumulative value of all the bet amounts for that event_id added together, so this query would return
Array (
[0]=>Array(
'event_id'=>2
'amount'=>$150
)
[1]=>Array(
'event_id'=>3
'amount'=>$75
)
[2]=>Array(
'event_id'=>1
'amount'=>$10
)
)
SORTING BY POPULARITY
Obviously this query doesn't work either but I want to do something like this:
SELECT * FROM bets GROUP BY event_id SORT BY total_rows
total_rows from the query above should be the number of rows that exist in the bets table added together, so this query would return
Array (
[0]=>Array(
'event_id'=>3
'total_rows'=>3
)
[1]=>Array(
'event_id'=>2
'total_rows'=>2
)
[2]=>Array(
'event_id'=>1
'total_rows'=>1
)
)
I wouldn't necessarily need it to return the total_rows value as I could calculate that, but it does need to be sorted by the number of occurrences for that particular event_id in the bets table.
I think count and sum are your friends here:
SELECT COUNT(event_id) AS NumberBets,
SUM(amount) AS TotalPrize
FROM bets
GROUP BY event_id
Should do the trick.
Then you can ORDER BY either the NumberBets(popularity) or TotalPrize as you need. JOIN only needed if you want event titles.
You can use SUM and COUNT aggregate functions:
SELECT
e.id AS event_id, SUM(amount) AS sum_amount
FROM [events] e
LEFT JOIN bets b
ON b.event_id = e.id
GROUP BY
e.id
ORDER BY
sum_amount DESC
SELECT
e.id AS event_id, COUNT(e.event_id) AS no_of_events
FROM [events] e
LEFT JOIN bets b
ON b.event_id = e.id
GROUP BY
e.id
ORDER BY
no_of_events DESC

Real world total sales SQL query

I'm fairly new to SQL and am having difficulty solving a problem.
'What are the total sales across all products for the salespeople that sell at least one unit of each of the five individual products with the highest sales by unit? Make sure that the query returns the total sales dollars in descending order. Only consider sales that take place over the six complete months prior to a #target_date parameter.'
3 tables exist in the DB.
SalesPerson (SalesPersonID,SalesYTD)
SalesOrderHeader (SalesOrderID,OrderDate,ShipDate)
SalesOrderDetail (SalesOrderID,SalesOrderDetailID,OrderQty,ProductID,UnitPrice)
This is where I'm at so far. I need to compile what I have into one statement and make necessary revisions. Please help!
To capture the top 5 highest sales by unit, the following SYNTAX should work:
SELECT
ProductID,
SUM(Orderqty*Unitprice)
FROM SalesOrderDetail
GROUP BY ProductID
WHERE Orderqty >=1
AND COUNT(productID) =5
ORDER BY SUM(Orderqty*Unitprice) DESC
LIMIT 5;
For the target_date parameter, I think it would be something along these lines?
SELECT
SalespersonID AS ‘Sales Representative’,
SalesYTD AS ‘Total Sales’, target_date
FROM Salesperson
WHERE target_date BETWEEN ‘01-DEC-13’ AND ’01-May-14’;
For the top five highest sales, I would rather propose the slightly simplified
select productid, sum(orderqty * unitprice) as sales
from salesorderdetail
group by productid
order by sales desc
limit 5
and for the six months prior to #target_date something like
where orderdate between date_sub(#target_date, interval 6 months) and #target_date
Assuming a FK SalesOrderDetail(SalesPersonID), you can then join the tables and top five sales as
select p.*
from salesperson p
join salesorderheader h on h.salespersionid = p.salespersionid
join salesorderdetail d on d.salesorderid = h.salesorderid
join (select productid, sum(orderqty * unitprice) as sales
from salesorderdetail
group by productid
order by sales desc
limit 5) t5 on t5.productid = d.productid
where h.orderdate between date_sub(#target_date, interval 6 months) and #target_date
order by p.salesytd desc

(My)SQL: group rows by a given field and force the newest data to be used in the grouped row

I have an invoices table which stores a history of invoices for every order, so one order can have multiple invoices.
However in everyday use I only want to select the newest invoice of every order.
An example of two invoices assigned to the same order:
invoice_id | order_id | invoice_number | created_at
=====================================================
1 | 42 | 10621 | 2014-05-28
2 | 42 | 10621 | 2014-05-31
I tryed the following
SELECT * FROM invoices GROUP BY order_id;
which groups the rows by the first row it finds, which is the oldest invoice. Adding an 'ORDER BY created_at DESC' clause doesn't change that.
Is there a way to get only the newest row for each order_id?
Use a self join on the maximum date of invoice,group by results are indeterminate they can't guarantee the order of results to be grouped below query should do the trick
SELECT
i.*
FROM
invoices i
JOIN
(SELECT
order_id,
MAX(created_at) created_at
FROM
invoices
GROUP BY order_id) ii
ON (
i.order_id = ii.order_id
AND i.created_at = ii.created_at
)
Here is a reasonable way to get the most recent row with your data. Note that it does not use group by:
select i.*
from invoices
where not exists (select 1
from invoices i2
where i2.order_id = i.order_id and i2.created_at > i.created_at
);
If performance is a concern, you will want an index on invoices(order_id, created_at).
This version changes the question from "Get me the invoice with the biggest date for each order" to "Get me the invoice for each order such that no other invoice for that order has a larger date".

MYSQL sum the total up and down votes by all users for the items bought by a single user

I'd like to sum the total up and down votes on only the items bought by a single user. I have a big table so I don't want to sum all votes made by everyone for EVERY item, just the items that a particular user bought.
Here's my query so far:
select SUM(purchaseyesno) AS tots, SUM(rating=1) AS yes, SUM(rating=0) AS no, item_id
from items_purchased
where purchaser_account_id=12373
group by item_id
as you can expect, these sums are only the summing user 12373's info, so its just one value. I'm not sure how to get ALL the purchases of item_ids that are bought by user 12373.
I'm sure there is some kind of subquery,nesting thing I need to include but I'm clueless.
here's how I'd like my data to look, item_id=3,4,5 are all bought by user=12373. Whereas item_id=1,2,6 were bought by other users.
item_id tots yes no
3 7 4 2
4 5 1 3
5 1 0 1
thoughts?
select item_id, SUM(purchaseyesno) tots, SUM(rating = 1) yes, SUM(rating = 0) no
from items_purchased
where item_id in (
select item_id from items_purchased
where purchaser_account_id = 12373
)
group by item_id