complex sql query (GROUP BY) - mysql

I need some help building a query.
Here is what I need :
I have a table called data:
ID| PRODUCT | VALUE |COUNTRY| DEVICE | SYSTEM
-----+---------+-------+-------+---------+--------
48 | p1 | 0.4 | US | dev1 | system1
47 | p2 | 0.67 | IT | dev2 | system2
46 | p3 | 1.2 | GB | dev3 | system3
45 | p1 | 0.9 | ES | dev4 | system4
44 | p1 | 0.6 | ES | dev4 | system1
I need to show which products have produced the most revenue and which country, device and system contributed the most.
**for example : the result i would get from the table would be:
PRODUCT | TOTAL COST |COUNTRY| DEVICE | SYSTEM
-------+------------+-------+---------+--------
p1 | 1.9 | ES | dev4 | system1
p2 | 0.67 | IT | dev2 | system2
p3 | 1.2 | GB | dev3 | system3
Top country is ES because ES contributed with 0.9 + 0.6 = 1.5 > 0.4 (contribution of US).
same logic for top device and top system.**
I guess for total revenue and product something like this will do :
SELECT SUM(value) as total_revenue,product FROM data GROUP BY product
But how can I add country,device and system?
Is this even feasible in a single query, if not what is the best way (performance wise) to do it?
Many thanks for your help.
EDIT
I edited the sample table to explain better.

Do it in separate queries:
SELECT product,
SUM(value) AS amount
FROM data
GROUP BY country -- change to device, system, etc. as required
ORDER BY amount DESC
LIMIT 1

You are correct... it is not just a simple query... but 3 queries wrapped into one result.
I've posted my sample out on SQL Fiddle here...
First query -- the inner most. You need to get all revenue based on a per product/country and sort that by the product and DESCENDING on the total revenue to have highest revenue in first position per product.
Next query (where I've implemented use of MySQL #variable use). Since the first result order already has it in order of product and revenue rank, I set the rank to 1 every time a product changes from whatever the "#LastProd" is... This would create ES = Rank #1 for product 1, then US = Rank #2 for product 1, then continue on the other "products".
The final outermost query re-joins back to the raw Data table but gets a list of all the devices and systems that comprised the product sale in question, but ONLY where the product rank was #1.
select
pqRank.product,
pqRank.country,
pqRank.revenue,
group_concat( distinct d2.device ) as PartDevices,
group_concat( distinct d2.system ) as PartSystems
from
( select
pq.product,
pq.country,
pq.revenue,
#RevenueRank := if( #LastProd = pq.product, #RevenueRank +1, 1 ) as ProdRank,
#LastProd := pq.product
from
( select
d.product,
d.country,
sum( d.value ) as Revenue
from
data d
group by
d.product,
d.country
order by
d.product,
Revenue desc ) pq,
( select #RevenueRank := 0,
#LastProd := ' ') as sqlvars
) pqRank
JOIN data d2
on pqRank.product = d2.product
and pqRank.country = d2.country
where
pqRank.ProdRank = 1
group by
pqRank.product,
pqRank.country

You could do sth like that
CREATE TABLE data
(
id int auto_increment primary key,
product varchar(20),
country varchar(4),
device varchar(20),
system varchar(20),
value decimal(5,2)
);
INSERT INTO data (product, country, device, system, value)
VALUES
('p1', 'US', 'dev1', 'system1', 0.4),
('p2', 'IT', 'dev2', 'system2', 0.67),
('p1', 'IT', 'dev1', 'system2', 0.23);
select 'p' as grouping_type, product, sum(value) as sumval
from data
group by product
union all
select 'c' as grouping_type, country, sum(value) as sumval
from data
group by country
union all
select 'd' as grouping_type, device, sum(value) as sumval
from data
group by device
union all
select 's' as grouping_type, system, sum(value) as sumval
from data
group by system
order by grouping_type, sumval
It's ugly, I wouldn't use it, but it should work.

Related

Return preferred record when there is more than one record for the same user

I have a table where it stores the types of discounts that a user can have.
Some users will get the standard discount, but some will get a bigger and better discount. For users who have the biggest and best discount, there will be two records in the database, one for the default discount and the other for the biggest and best discount. The biggest and best discount will be preferred in the search.
I would like to do a SELECT that would return the record with the highest discount and if you don't find it, return it with the standard discount for me to avoid making two queries in the database or having to filter in the source code.
Ex:
| id | user_id | country | discount | cashback | free_trial |
|-----------------------------------------------------------------------|
| 1 | 1 | EUA | DEFAULT | 10 | false |
| 2 | 1 | EUA | CHRISTMAS | 20 | true |
| 3 | 3 | EUA | DEFAULT | 10 | false |
SELECT *
FROM users
WHERE country = 'EUA'
AND (discount = 'CHRISTMAS' OR discount = 'DEFAULT');
In this example above for user 1 it would return the record with the discount equal to "CHRISTMAS" and for user 3 it would return "DEFAULT" because it is the only one that has. Can you help me please?
You can use the row_number() window function to do this. This function includes a PARTITION BY that lets you start the numbering over with each user, as well as it's own ORDER BY that lets you determine which rows will sort first within each user/partition.
Then you nest this inside another SELECT to limit to rows where the row_number() result is 1 (the discount that sorted best):
SELECT *
FROM (
SELECT *, row_number() OVER (PARTITION BY id, ORDER BY cashback desc) rn
FROM users
WHERE country = 'EUA'
) u
WHERE rn = 1
You could also use a LATERAL JOIN, which is usually better than the correlated join in the other answer, but not as good as the window function.
You can using GROUP BY to do it
SELECT u1.*
FROM users u1
JOIN
(
SELECT COUNT(id) AS cnt,user_id
FROM users WHERE country = 'EUA'
GROUP BY user_id
) u2 ON u1.user_id=u2.user_id
WHERE IF(u2.cnt=1,u1.discount='DEFAULT',u1.discount='CHRISTMAS')
DB Fiddle Demo

SQL Order results by Match Against Relevance and display the price based on sellers rank

Looking to display results based on 'relevance' of the users search along with the price of the seller that ranks highest. A live example to what i'm after is Amazons search results, now I understand their algorithm is extremely complicated, but i'm after a simplified version.
Lets say we search for 'Jumper' the results that are returned are products related to 'Jumper' but then the price is not always the cheapest is based on the sellers rank. The seller with the highest rank gets his/hers prices displayed.
Heres what I have been working on but not giving me the expected results at mentioned above, and to be honest I don't think this is very efficient.
SELECT a.catalogue_id, a.productTitle, a.prod_rank, b.catalogue_id, b.display_price, b.sellers_rank
FROM
(
SELECT c.catalogue_id,
c.productTitle,
MATCH(c.productTitle) AGAINST ('+jumper*' IN BOOLEAN MODE) AS prod_rank
FROM catalogue AS c
WHERE c.catalogue_id IN (1, 2, 3)
) a
JOIN
(
SELECT inventory.catalogue_id,
inventory.amount AS display_price,
(accounts.comsn + inventory.quantity - inventory.amount) AS sellers_rank
FROM inventory
JOIN accounts ON inventory.account_id = accounts.account_id
WHERE inventory.catalogue_id IN (1, 2, 3)
) AS b
ON a.catalogue_id = b.catalogue_id
ORDER BY a.prod_rank DESC
LIMIT 100;
Sample Tables:
Accounts:
----------------------------
account_id | comsn
----------------------------
1 | 100
2 | 9999
Catalogue:
----------------------------
catalogue_id | productTitle
----------------------------
1 | blue jumper
2 | red jumper
3 | green jumper
Inventory:
-----------------------------------------------
product_id | catalogue_id | account_id | quantity | amount |
-----------------------------------------------
1 | 2 | 1 | 6 | 699
2 | 2 | 2 | 2 | 2999
Expected Results:
Product Title:
red jumper
Amount:
29.99 (because he/she has sellers rank of: 7002)
First, you should limit the results only to the matches for the first subquery:
Second, you should eliminate the second subquery:
SELECT p.catalogue_id, p.productTitle, p.prod_rank,
i.amount as display_price,
(a.comsn + i.quantity - i.amount)
FROM (SELECT c.catalogue_id, c.productTitle,
MATCH(c.productTitle) AGAINST ('+jumper*' IN BOOLEAN MODE) AS prod_rank
FROM catalogue AS c
WHERE c.catalogue_id IN (1, 2, 3)
HAVING prod_rank > 0
) p JOIN
inventory i
ON i.catalogue_id = c.catalogue_id join
accounts a
ON i.account_id = a.account_id
ORDER BY c.prod_rank DESC
LIMIT 100;
I'm not sure if you can get rid of the final ORDER BY. MATCH with JOIN can be a bit tricky in that respect. But only ordering by the matches should help.

SQL: How to find top customers that pay 80% of revenue?

Let's say I have a table TRANSACTIONS:
desc customer_transactions;
+------------------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| transactionID | varchar(128) | YES | | NULL | |
| customerID | varchar(128) | YES | | NULL | |
| amountAuthorized | DECIMAL(5,2) | YES | | NULL | |
| createdDatetime | datetime | YES | | NULL | |
+------------------------------+--------------+------+-----+---------+----------------+
This table has records of credit card transactions for a SAAS business for the last 5 years. The business has a typical monthly subscription model, where customers automatically charged based on their plan.
I need to find the top customers that are responsible for 80% of all revenue (per time period). The SAAS business is very uneven, because some customers pay 10/month, others may pay in thousands per month.
I will add a "time period" filter later, just need help with aggregation.
I want to generate a report where I only select the customers that generated 80% of revenue in this format:
+------------+-------+
| customerID | Total |
+------------+-------+
Not sure why this question was "on hold". I just need help writing a query and do not have enough experience with SQL. Basically, the title of the question states what is needed here:
I need to list customers and their corresponding totals, however, only need to select those customers that make up 80% of total revenue. The report needs to aggregate a total per customer.
Using MariaDB version 10.3.9
This is the kind of thing you need to use window functions for.
WITH
-- define some sample data,
-- where the sum total of amountAuthorized is 10,000
customer_transactions( `id`, transactionID, customerID,
amountAuthorized, createdDatetime) AS
(
SELECT 1, 1, 1, 5000, '2018-08-01'
UNION ALL SELECT 2, 2, 2, 2000, '2018-08-01'
UNION ALL SELECT 3, 3, 3, 1000, '2018-08-01'
UNION ALL SELECT 4, 4, 4, 1000, '2018-08-01'
UNION ALL SELECT 5, 5, 5, 1000, '2018-08-01'
)
-- a query that gives us the running total, sorted to give us the biggest customers first.
-- note that the additional sorts affect what customers might be returned.
,running_totals AS
(
SELECT *, SUM(amountAuthorized) OVER (ORDER BY amountAuthorized DESC, createdDatetime DESC, `id`) AS runningTotal
FROM customer_transactions
)
SELECT *
FROM running_totals
WHERE runningTotal <= ( SELECT 0.8 * SUM(amountAuthorized)
FROM customer_transactions)
Note that this takes into account (no pun intended) all data in the table. When you want to only look at a specific time period, you might want to create an intermediate CTE that filters out the dates you want.
You will find that surprisingly close 20% of the customers account for the 80%. See the 80/20 rule .
But, if you don't want to go that direction, you have 2 options:
Switch to MySQL 8.0 or MariaDB 10.1 in order to use 'windowing' functions; or
Use #variables to produce a running total, then (in an outer query) grab the desired rows.
Since you are using MariaDB 10.3.9, the windowing seems to be the way to go. But first, you need a separate query (or derived table) that computes the total revenue so you can get 80% of it.
Suggest
SELECT #revenue80 := 0.8 * SUM(amountAuthorized)
FROM customer_transactions
Then use #revenue80 inside the WHERE that Zack suggests.
I see that each amount can be no more than 999.99. Really? Is this a coffee shop?
Use the following:
SELECT
ct1.customerID,
SUM(ct1.amountAuthorized) as Total,
100 * (SUM(ct1.amountAuthorized) / ct3.total_revenue) as percent_revenue
FROM
customer_transactions ct1
CROSS JOIN (SELECT SUM(amountAuthorized) AS total_revenue
FROM customer_transactions ct2) AS ct3
GROUP BY
ct1.customerID
HAVING percent_revenue >= 80

Is it possible to make this query working properly?

I'm creating a simple game and I want to get the best lap_time for each type in the db.
However, my query returns the wrong player_id (3 in second row) and total_ranks (all ranks instead of count by type).
Link to sqlfiddle: http://sqlfiddle.com/#!9/a0c36a/2
Desired result
+--------+-----+-------+------------+----------------+-------------+
| level | cp | type | player_id | MIN(lap_time) | total_ranks |
+--------+-----+-------+------------+----------------+-------------+
| 1 | 1 | 0 | 1 | 10.5 | 4 |
| 1 | 1 | 1 | 2 | 10.45 | 3 |
+--------+-----+-------+------------+----------------+-------------+
Is it possible to make it work in 1 query or do I need at least 2?
Fiddle
Same concept as Tim, but with Total_Ranks column
SELECT level, cp, R.type, player_id, MinTime, Total_Ranks
FROM runtimes R
JOIN (SELECT TYPE, MIN(LAP_TIME) MinTime, Count(*) Total_Ranks
FROM RUNTIMES
GROUP BY TYPE) T on R.Type = T.Type
and R.lap_time = T.MinTime
WHERE level=1
AND cp=1
One canonical way to solve this problem in MySQL is to use a subquery to identify the minimum lap time for each type. Then join your full table to this to obtain the entire record. Note that a nice side effect of this approach is that we also get back ties if a given type have more than one person sharing the minimum lap time.
SELECT r1.*, r2.total_ranks
FROM runtimes r1
INNER JOIN
(
SELECT type, MIN(lap_time) AS min_lap_time, COUNT(*) AS total_ranks
FROM runtimes
GROUP BY type
) r2
ON r1.type = r2.type AND
r1.lap_time = r2.min_lap_time
Here is a link to your updated Fiddle:
SQLFiddle

Query to Segment Results Based on Equal Sets of Column Value

I'd like to construct a single query (or as few as possible) to group a data set. So given a number of buckets, I'd like to return results based on a specific column.
So given a column called score which is a double which contains:
90.00
91.00
94.00
96.00
98.00
99.00
I'd like to be able to use a GROUP BY clause with a function like:
SELECT MIN(score), MAX(score), SUM(score) FROM table GROUP BY BUCKETS(score, 3)
Ideally this would return 3 rows (grouping the results into 3 buckets with as close to equal count in each group as is possible):
90.00, 91.00, 181.00
94.00, 96.00, 190.00
98.00, 99.00, 197.00
Is there some function that would do this? I'd like to avoid returning all the rows and figuring out the bucket segments myself.
Dave
create table test (
id int not null auto_increment primary key,
val decimal(4,2)
) engine = myisam;
insert into test (val) values
(90.00),
(91.00),
(94.00),
(96.00),
(98.00),
(99.00);
select min(val) as lower,max(val) as higher,sum(val) as total from (
select id,val,#row:=#row+1 as row
from test,(select #row:=0) as r order by id
) as t
group by ceil(row/2)
+-------+--------+--------+
| lower | higher | total |
+-------+--------+--------+
| 90.00 | 91.00 | 181.00 |
| 94.00 | 96.00 | 190.00 |
| 98.00 | 99.00 | 197.00 |
+-------+--------+--------+
3 rows in set (0.00 sec)
Unluckily mysql doesn't have analytical function like rownum(), so you have to use some variable to emulate it. Once you do it, you can simply use ceil() function in order to group every tot rows as you like. Hope that it helps despite my english.
set #r = (select count(*) from test);
select min(val) as lower,max(val) as higher,sum(val) as total from (
select id,val,#row:=#row+1 as row
from test,(select #row:=0) as r order by id
) as t
group by ceil(row/ceil(#r/3))
or, with a single query
select min(val) as lower,max(val) as higher,sum(val) as total from (
select id,val,#row:=#row+1 as row,tot
from test,(select count(*) as tot from test) as t2,(select #row:=0) as r order by id
) as t
group by ceil(row/ceil(tot/3))