Compare individual to group - mysql

I am relatively new to MySQL. I have a huge database of salespeople and each sale they make, and for how much. I know that I can put AVG(SaleAmount) in a SELECT statement to get the average number of sales of the people at large, but I was wondering how I can get a list of each individuals' average sales, and then compare it to the group average to get a group of salesmen who are above that average.
I eventually want a list of those salespeople whose averages are two standard deviations above the group average.
I'm sorry if this is relatively simple, but I would really appreciate the help.

Join with a subquery that gets the global statistics.
SELECT s.id, s.name, AVG(SaleAmount) AS personAvg, avgSale, stdSale
FROM salespeople AS s
JOIN (SELECT AVG(SaleAmount) AS avgSale, STDDEV(SaleAmount) AS stdSale
FROM salespeople) AS av
GROUP BY s.id
HAVING personAvg > avgSale + 2 * stdSale

Related

MYSQL: Averaging the sum of two columns

Using MYSQL I am trying to get the avg amount spent by all the customers after determining the sum of what each customer spent.
select customernumber, round(sum(price_per_each*quantity_ordered),2) as 'ordertotal'
from orderdetails
join orders using (ordernumber)
join customers using (customernumber)
group by customernumber;
This gives me the sum of what each customer has spent across multiple orders. The results of this query are about hundred records, ranging from 8k to 900k.
I now need to get the avg of all the sum totals shown in the previous query. So far every time I try to write this, I get an error message regarding invalid use of group function.
When I try getting the average by using division via count(*), the number I get is in the 3k range which is too small compared to what is expected.
Please help. I am just starting to learn MySql and cannot seem to figure this out after several hours.
I would try the AVG function over the ordertotal column.
SELECT AVG(`ordertotal`)
FROM (
select customernumber, round(sum(price_per_each*quantity_ordered),2) as 'ordertotal'
from orderdetails
join orders using (ordernumber)
join customers using (customernumber)
group by customernumber
) nested;

Joining three tables and finding a sum of payments in MySQL

I am struggling with this problem in MySQL. The question asks...
Find the names of the individuals and businesses that have made no more than three payments.
Individuals is a table, businesses is a table, and payments is a table. The problem I am having is Payments only contains columns dateFiled and amountPaid. I tried creating a count operation, but it shows blank results.
Here is my code:
SELECT Individuals.name, Businesses.name, Payments.taxpayerID, COUNT(*) AS 'Payments'
FROM Payments
JOIN Individuals ON Payments.taxpayerID=Individuals.taxpayerID
JOIN Businesses ON Payments.taxpayerID=Businesses.taxpayerID
GROUP BY Businesses.name, Individuals.name, Payments.taxpayerID
HAVING COUNT(*) <= 3;
If anyone can help me solve this it would be greatly appreciated.
I guess what you are looking for is not a join of three tables but a union of two selects:
SELECT Individuals.name, Payments.taxpayerID, COUNT(*) AS 'Payments'
FROM Payments
JOIN Individuals ON Payments.taxpayerID=Individuals.taxpayerID
GROUP BY Individuals.name, Payments.taxpayerID
HAVING COUNT(*) <= 3
UNION
SELECT Businesses.name, Payments.taxpayerID, COUNT(*) AS 'Payments'
FROM Payments
JOIN Businesses ON Payments.taxpayerID=Businesses.taxpayerID
GROUP BY Businesses.name, Payments.taxpayerID
HAVING COUNT(*) <= 3;
Your version is giving zero results, because a tax ID is either associated with a business or an individual. Therefore you need to query both independently and combine the results with union.
That said, yes you could work with joins and only a single select but then you'd need outer joins and the query would be less readable IMHO.

Joining two tables, including count, and sorting by count in MySQL

Have the need to run a bit more complex of a MySQL query. I have two tables that I need to join where one contains the primary key on the other. That's easy enough, but then I need to find the number of occurrences of each ID returned as well, and ultimately sort all the results by this number.
Normally this would just be a group by, but I also need to see ALL of the results (so if it were a group by containing 10 records, I'd need to see all 10, as well as that count returned as well).
So for instance, two tables could be:
Customers table:
CustomerID name address phone etc..
Orders table:
OrderID CustomerID product info etc..
The idea is to output, and sort the orders table to find the customer with the most orders in a given time period. The resultant report would have a few hundred customers, along with their order info below.
I couldn't figure out a way to have it return the rows containing ALL the info from both tables, plus the number of occurences of each in one row. (customer info, individual orders info, and count).
I considered separating it into multiple queries (get the list of top customers), then a bunch of sub-queries for each order programmatically. But that was going to end up with many hundreds of sub-queries every time this is submitted.
So I was hoping someone might know of an easier way to do this. My thought was to have a return result with repeated information, but get it only in one query.
Thanks in advance!
SELECT CUST.CustomerID, CUST.Name, ORDR.OrderID, ORDR.OrderDate, ORDR.ProductInfo, COUNTS.cnt
FROM Customers CUST
INNER JOIN Orders ORDR
ON ORDR.CustomerID = CUST.CustomerID
INNER JOIN
(
SELECT C.CustomerID, COUNT(DISTINCT O.OrderID) AS cnt
FROM Customers C
INNER JOIN Orders O
ON O.CustomerID = C.CustomerID
GROUP BY C.CustomerID
) COUNTS
ON COUNTS.CustomerID = CUST.CustomerID
ORDER BY COUNTS.cnt DESC, CustomerID
This will return one row per order, displayed by customer, ordered by the number of orders for that customer.

MySQL huge tables JOIN makes database collapse

Following my recent question Select information from last item and join to the total amount, I am having some memory problems while generation tables
I have two tables sales1 and sales2 like this:
id | dates | customer | sale
With this table definition:
CREATE TABLE sales (
id int auto_increment primary key,
dates date,
customer int,
sale int
);
sales1 and sales2 have the same definition, but sales2 has sale=-1 in every field. A customer can be in none, one or both tables. Both tables have around 300.000 records and much more fields than indicated here (around 50 fields). They are InnoDB.
I want to select, for each customer:
number of purchases
last purchase value
total amount of purchases, when it has a positive value
The query I am using is:
SELECT a.customer, count(a.sale), max_sale
FROM sales a
INNER JOIN (SELECT customer, sale max_sale
from sales x where dates = (select max(dates)
from sales y
where x.customer = y.customer
and y.sale > 0
)
)b
ON a.customer = b.customer
GROUP BY a.customer, max_sale;
The problem is:
I have to get the results, that I need for certain calculations, separated for dates: information on year 2012, information on year 2013, but also information from all the years together.
Whenever I do just one year, it takes about 2-3 minutes to storage all the information.
But when I try to gather information from all the years, the database crashes and I get messages like:
InternalError: (InternalError) (1205, u'Lock wait timeout exceeded; try restarting transaction')
It seems that joining such huge tables is too much for the database. When I explain the query, almost all the percentage of time comes from creating tmp table.
I thought in splitting the data gathering in quarters. We get the results for every three months and then join and sort it. But I guess this final join and sort will be too much for the database again.
So, what would you experts recommend to optimize these queries as long as I cannot change the tables structure?
300k rows is not a huge table. We frequently see 300 million row tables.
The biggest problem with your query is that you're using a correlated subquery, so it has to re-execute the subquery for each row in the outer query.
It's often the case that you don't need to do all your work in one SQL statement. There are advantages to breaking it up into several simpler SQL statements:
Easier to code.
Easier to optimize.
Easier to debug.
Easier to read.
Easier to maintain if/when you have to implement new requirements.
Number of Purchases
SELECT customer, COUNT(sale) AS number_of_purchases
FROM sales
GROUP BY customer;
An index on sales(customer,sale) would be best for this query.
Last Purchase Value
This is the greatest-n-per-group problem that comes up frequently.
SELECT a.customer, a.sale as max_sale
FROM sales a
LEFT OUTER JOIN sales b
ON a.customer=b.customer AND a.dates < b.dates
WHERE b.customer IS NULL;
In other words, try to match row a to a hypothetical row b that has the same customer and a greater date. If no such row is found, then a must have the greatest date for that customer.
An index on sales(customer,dates,sale) would be best for this query.
If you might have more than one sale for a customer on that greatest date, this query will return more than one row per customer. You'd need to find another column to break the tie. If you use an auto-increment primary key, it's suitable as a tie breaker because it's guaranteed to be unique and it tends to increase chronologically.
SELECT a.customer, a.sale as max_sale
FROM sales a
LEFT OUTER JOIN sales b
ON a.customer=b.customer AND (a.dates < b.dates OR a.dates = b.dates and a.id < b.id)
WHERE b.customer IS NULL;
Total Amount of Purchases, When It Has a Positive Value
SELECT customer, SUM(sale) AS total_purchases
FROM sales
WHERE sale > 0
GROUP BY customer;
An index on sales(customer,sale) would be best for this query.
You should consider using NULL to signify a missing sale value instead of -1. Aggregate functions like SUM() and COUNT() ignore NULLs, so you don't have to use a WHERE clause to exclude rows with sale < 0.
Re: your comment
What I have now is a table with fields year, quarter, total_sale (regarding to the pair (year,quarter)) and sale. What I want to gather is information regarding certain period: this quarter, quarters, year 2011... Info has to be splitted in top customers, ones with bigger sales, etc. Would it be possible to get the last purchase value from customers with total_purchases bigger than 5?
Top Five Customers for Q4 2012
SELECT customer, SUM(sale) AS total_purchases
FROM sales
WHERE (year, quarter) = (2012, 4) AND sale > 0
GROUP BY customer
ORDER BY total_purchases DESC
LIMIT 5;
I'd want to test it against real data, but I believe an index on sales(year, quarter, customer, sale) would be best for this query.
Last Purchase for Customers with Total Purchases > 5
SELECT a.customer, a.sale as max_sale
FROM sales a
INNER JOIN sales c ON a.customer=c.customer
LEFT OUTER JOIN sales b
ON a.customer=b.customer AND (a.dates < b.dates OR a.dates = b.dates and a.id < b.id)
WHERE b.customer IS NULL
GROUP BY a.id
HAVING COUNT(*) > 5;
As in the other greatest-n-per-group query above, an index on sales(customer,dates,sale) would be best for this query. It probably can't optimize both the join and the group by, so this will incur a temporary table. But at least it will only do one temporary table instead of many.
These queries are complex enough. You shouldn't try to write a single SQL query that can give all of these results. Remember the classic quote from Brian Kernighan:
Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?
I think you should try adding an index on sales(customer, date). The subquery is probably the performance bottleneck.
You can make this puppy scream. Dump the whole inner join query. Really. This is a trick virtually no one seems to know about.
Assuming dates is a datetime, convert it to a sortable string, concatenate the values you want, max (or min), substring, cast. You may need to adjust the date convert function (this one works in MS-SQL), but this idea will work anywhere:
SELECT customer, count(sale), max_sale = cast(substring(max(convert(char(19), dates, 120) + str(sale, 12, 2)), 20, 12) as numeric(12, 2))
FROM sales a
group by customer
Voilá. If you need more result columns, do:
SELECT yourkey
, maxval = left(val, N1) --you often won't need this
, result1 = substring(val, N1+1, N2)
, result2 = substring(val, N1+N2+1, N3) --etc. for more values
FROM ( SELECT yourkey, val = max(cast(maxval as char(N1))
+ cast(resultCol1 as char(N2))
+ cast(resultCol2 as char(N3)) )
FROM yourtable GROUP BY yourkey ) t
Be sure that you have fixed lengths for all but the last field. This takes a little work to get your head around, but is very learnable and repeatable. It will work on any database engine, and even if you have rank functions, this will often significantly outperform them.
More on this very common challenge here.

Sorting deals by their revenue, which is sum of orders table rows in MySQL

Okay I always have questioned this, but never really needed it until today.
Normally, I would grab all deals, go through them each and find the SUM(amount) from all the orders related to this deal. (PHP)
Then I would do a simple uasort() which works fine.
But now I need to do it all with a sql query.
This is what I have tried:
SELECT deals.ID, SUM(orders.amount) AS revenue FROM deals
JOIN orders ON (orders.deal_id = deals.id)
ORDER BY revenue DESC
Now this gives me one row, with a 'random' deal ID, and a bigger number in revenue.
I suspect this number in revenue is the SUM of ALL the amount columns, and not for this particular deal ID.
What I would like is a row for each deal ID, and with the right number in the column revenue, all the rows sorted DESC.
How would this be done? To sort the deals, out from their revenue (custom column) - which is the sum of all the amount columns from the rows for the respective deal id.
you lack GROUP BY clause
SELECT deals.ID, SUM(orders.amount) AS revenue
FROM deals
INNER JOIN orders
ON orders.deal_id = deals.id
GROUP BY deals.ID
ORDER BY revenue DESC
GROUP BY Clause
GROUP BY AGGREGATE FUNCTION