Calculate tax amount between 3 different tables with MySQL - mysql

I have the following tables structure and trying to make a report from these:
___BillableDatas
|--------|------------|---------|--------------|------------|
| BIL_Id | BIL_Date |BIL_Rate | BIL_Quantity | BIL_Status |
|--------|------------|---------|--------------|------------|
| 1 | 2018-03-01 | 105 | 1 | charged |
| 2 | 2018-03-02 | 105 | 1 | cancelled |
| 3 | 2018-03-01 | 15 | 2 | notcharged |
| 4 | 2018-03-01 | 21 | 1 | notcharged |
| 5 | 2018-03-02 | 15 | 2 | notcharged |
| 6 | 2018-03-02 | 21 | 1 | notcharged |
|--------|------------|---------|--------------|------------|
___SalesTaxes
|--------|--------------|------------|
| STX_Id | STX_TaxeName | STX_Amount |
|--------|--------------|------------|
| 8 | Tax 1 | 5.000 |
| 9 | Tax 2 | 5.000 |
| 10 | Tax 3 | 19.975 |
|--------|--------------|------------|
STX_Amount is a percentage.
___ApplicableTaxes
|-----------|-----------|
| ATX_BILId | ATX_STXId |
|-----------|-----------|
| 1 | 8 |
| 1 | 9 |
| 1 | 10 |
| 2 | 8 |
| 2 | 9 |
| 2 | 10 |
| 3 | 9 |
| 3 | 10 |
| 4 | 9 |
| 5 | 9 |
| 5 | 10 |
| 6 | 9 |
|-----------|-----------|
ATX_BILId is the item ID link with ___BillableDatas.
ATX_STXId is the tax ID link with ___SalesTaxes.
I need to get to sum of the items per day
- without tax
- with tax
So mething like this:
|------------------|---------------|------------|
| BIL_RateNonTaxed | BIL_RateTaxed | BIL_Status |
|------------------|---------------|------------|
| 105.00 | 136.47 | charged | <- Taxes #8, #9 and #10 applicable
| 102.00 | 118.035 | notcharged | <- Taxes #9 and #10 applicable
|------------------|---------------|------------|
Explications on the totals:
105 = 105*1 -- (total of the charged item multiply by the quantity)
102 = (15*2)*2+(21*2) -- (total of the notcharged items multiply by the quantity)
136.47 = 105+(105*(5+5+19.975)/100)
119.085 = 102+(((15*2)*2)*(5+19.975)/100+(21*2)*5/100)
My last try was this one:
SELECT
BIL_Date,
(BIL_Rate*BIL_Quantity) AS BIL_RateNonTaxed,
(((BIL_Rate*BIL_Quantity)*SUM(STX_Amount)/100)+BIL_Rate*BIL_Quantity) AS BIL_RateTaxed,
BIL_Status
FROM ___BillableDatas
LEFT JOIN ___SalesTaxes
ON FIND_IN_SET(STX_Id, BIL_ApplicableTaxes) > 0
LEFT JOIN ___ApplicableTaxes
ON ___BillableDatas.BIL_Id = ___ApplicableTaxes.ATX_BILId
WHERE BIL_BookingId=1
GROUP BY BIL_Id AND BIL_Status
ORDER BY BIL_Date
ASC
Please see this SQLFiddle to help you if needed:
http://sqlfiddle.com/#!9/425854f
Thanks.

I cannot bear to work with your naming policy, so I made my own...
DROP TABLE IF EXISTS bills;
CREATE TABLE bills
(bill_id SERIAL PRIMARY KEY
,bill_date DATE NOT NULL
,bill_rate INT NOT NULL
,bill_quantity INT NOT NULL
,bill_status ENUM('charged','cancelled','notcharged')
);
INSERT INTO bills VALUES
(1,'2018-03-01',105,1,'charged'),
(2,'2018-03-02',105,1,'cancelled'),
(3,'2018-03-01',15,2,'notcharged'),
(4,'2018-03-01',21,1,'notcharged'),
(5,'2018-03-02',15,2,'notcharged'),
(6,'2018-03-02',21,1,'notcharged');
DROP TABLE IF EXISTS sales_taxes;
CREATE TABLE sales_taxes
(sales_tax_id SERIAL PRIMARY KEY
,sales_tax_name VARCHAR(12) NOT NULL
,sales_tax_amount DECIMAL(5,3) NOT NULL
);
INSERT INTO sales_taxes VALUES
( 8,'Tax 1', 5.000),
( 9,'Tax 2', 5.000),
(10,'Tax 3',19.975);
DROP TABLE IF EXISTS applicable_taxes;
CREATE TABLE applicable_taxes
(bill_id INT NOT NULL
,sales_tax_id INT NOT NULL
,PRIMARY KEY(bill_id,sales_tax_id)
);
INSERT INTO applicable_taxes VALUES
(1, 8),
(1, 9),
(1,10),
(2, 8),
(2, 9),
(2,10),
(3, 9),
(3,10),
(4, 9),
(5, 9),
(5,10),
(6, 9);
SELECT bill_status
, SUM(bill_rate*bill_quantity) nontaxed
, SUM((bill_rate*bill_quantity)+(bill_rate*bill_quantity*total_sales_tax/100)) taxed
FROM
( SELECT b.*
, SUM(t.sales_tax_amount) total_sales_tax
FROM bills b
JOIN applicable_taxes bt
ON bt.bill_id = b.bill_id
JOIN sales_taxes t
ON t.sales_tax_id = bt.sales_tax_id
GROUP
BY bill_id
) x
GROUP
BY bill_status;
+-------------+---------+-------------+
| bill_status | untaxed | total |
+-------------+---------+-------------+
| charged | 105 | 136.4737500 |
| cancelled | 105 | 136.4737500 |
| notcharged | 102 | 119.0850000 |
+-------------+---------+-------------+
My answer is very slightly different from yours, so one of us has made a mistake somewhere. Either way, this should get you pretty close.

SELECT a.BIL_Date, BIL_RateNonTaxed, BIL_RateNonTaxed+BIL_RateTaxed AS BIL_RateTaxed FROM (
SELECT BIL_Date,
SUM(BIL_Rate*BIL_Quantity) AS BIL_RateNonTaxed
FROM ___BillableDatas
WHERE BIL_Status != 'cancelled'
GROUP BY BIL_Date
) a INNER JOIN (
SELECT BIL_Date,
(((BIL_Rate*BIL_Quantity)*SUM(STX_Amount)/100)) AS BIL_RateTaxed
FROM ___BillableDatas
LEFT JOIN ___ApplicableTaxes
ON ___BillableDatas.BIL_Id = ___ApplicableTaxes.ATX_BILId
LEFT JOIN ___SalesTaxes
ON STX_Id = ATX_STXId
WHERE BIL_Status != 'cancelled'
GROUP BY BIL_Date
) b
ON a.BIL_Date = b.BIL_Date
ORDER BY a.BIL_Date;
Explanation:
Your BIL_RateNonTaxed calculation is not using the ___SalesTaxes table, so it must not appear on the query otherwise it would interfere the SUM function.
Howerver, your BIL_RateTaxed does use the ___SalesTaxes table. In that case, I solved by creating 2 subqueries and joining the results.
I know there are better answers, but I'm not familiar with MySQL syntax.

Related

Assigning passengers to buses based on bus capacity in MySQL

Problem: Buses and passengers arrive at a station. If a bus arrives at the station at a time tbus and a passenger arrives at a time tpassenger where tpassenger <= tbus, then the passenger will attempt to use the first available bus whose capacity has not been exceeded. If at the moment the bus arrives at the station there are more passengers waiting than its capacity capacity, only capacity passengers will use the bus.
Write a SQL query to report the users that appear on each bus (if two passengers arrive at the same time, then the passenger with the smaller passenger_id value should be given priority). The query result format is in the following example (schema and table descriptions appear at the end of this post).
Example
Input:
Buses table:
+--------+--------------+----------+
| bus_id | arrival_time | capacity |
+--------+--------------+----------+
| 1 | 2 | 1 |
| 2 | 4 | 10 |
| 3 | 7 | 2 |
+--------+--------------+----------+
Passengers table:
+--------------+--------------+
| passenger_id | arrival_time |
+--------------+--------------+
| 11 | 1 |
| 12 | 1 |
| 13 | 5 |
| 14 | 6 |
| 15 | 7 |
+--------------+--------------+
Output:
+--------+----------+-----------+------+--------------+-----------+
| bus_id | capacity | b_arrival | spot | passenger_id | p_arrival |
+--------+----------+-----------+------+--------------+-----------+
| 1 | 1 | 2 | 1 | 11 | 1 |
| 2 | 10 | 4 | 1 | 12 | 1 |
| 2 | 10 | 4 | 2 | NULL | NULL |
| 2 | 10 | 4 | 3 | NULL | NULL |
| 2 | 10 | 4 | 4 | NULL | NULL |
| 2 | 10 | 4 | 5 | NULL | NULL |
| 2 | 10 | 4 | 6 | NULL | NULL |
| 2 | 10 | 4 | 7 | NULL | NULL |
| 2 | 10 | 4 | 8 | NULL | NULL |
| 2 | 10 | 4 | 9 | NULL | NULL |
| 2 | 10 | 4 | 10 | NULL | NULL |
| 3 | 2 | 7 | 1 | 13 | 5 |
| 3 | 2 | 7 | 2 | 14 | 6 |
+--------+----------+-----------+------+--------------+-----------+
Explanation:
Passenger 11 arrives at time 1.
Passenger 12 arrives at time 1.
Bus 1 arrives at time 2 and collects passenger 11 as it has one empty seat.
Bus 2 arrives at time 4 and collects passenger 12 as it has ten empty seats.
Passenger 13 arrives at time 5.
Passenger 14 arrives at time 6.
Passenger 15 arrives at time 7.
Bus 3 arrives at time 7 and collects passengers 13 and 14 as it has two empty seats.
Attempt
The CTE
WITH RECURSIVE bus_spots AS (
SELECT B.bus_id, B.arrival_time AS b_arrival, B.capacity, 1 AS spot FROM Buses B
UNION ALL
SELECT BS.bus_id, BS.b_arrival, BS.capacity, BS.spot + 1 FROM bus_spots BS WHERE BS.spot < BS.capacity
) SELECT * FROM bus_spots ORDER BY bus_id, spot;
gives
+--------+-----------+----------+------+
| bus_id | b_arrival | capacity | spot |
+--------+-----------+----------+------+
| 1 | 2 | 1 | 1 |
| 2 | 4 | 10 | 1 |
| 2 | 4 | 10 | 2 |
| 2 | 4 | 10 | 3 |
| 2 | 4 | 10 | 4 |
| 2 | 4 | 10 | 5 |
| 2 | 4 | 10 | 6 |
| 2 | 4 | 10 | 7 |
| 2 | 4 | 10 | 8 |
| 2 | 4 | 10 | 9 |
| 2 | 4 | 10 | 10 |
| 3 | 7 | 2 | 1 |
| 3 | 7 | 2 | 2 |
+--------+-----------+----------+------+
as its result set while
WITH bus_queue AS (
SELECT
P.passenger_id,
P.arrival_time AS p_arrival,
ROW_NUMBER() OVER(ORDER BY P.arrival_time, P.passenger_id) AS queue_pos
FROM Passengers P
) SELECT * FROM bus_queue ORDER BY p_arrival, passenger_id;
gives
+--------------+-----------+-----------+
| passenger_id | p_arrival | queue_pos |
+--------------+-----------+-----------+
| 11 | 1 | 1 |
| 12 | 1 | 2 |
| 13 | 5 | 3 |
| 14 | 6 | 4 |
| 15 | 7 | 5 |
+--------------+-----------+-----------+
as its result set. But I'm not sure how to effectively relate the CTE result sets (or if this is even the best way of going about things), especially given the complications introduced by handling capacity effectively.
Question: Any ideas on how to work out a solution for this kind of problem (preferably without using variables)? For reference, I'm using MySQL 8.0.26.
Schema and Table Descriptions
Schema:
DROP TABLE IF EXISTS Buses;
CREATE TABLE IF NOT EXISTS
Buses (bus_id int, arrival_time int, capacity int);
INSERT INTO
Buses (bus_id, arrival_time, capacity)
VALUES
(1, 2, 1),
(2, 4, 10),
(3, 7, 2);
DROP TABLE IF EXISTS Passengers;
CREATE TABLE IF NOT EXISTS
Passengers (passenger_id int, arrival_time int);
INSERT INTO
Passengers (passenger_id, arrival_time)
VALUES
(11, 1),
(12, 1),
(13, 5),
(14, 6),
(15, 7);
Table descriptions:
Buses:
+--------------+------+
| Column Name | Type |
+--------------+------+
| bus_id | int |
| arrival_time | int |
| capacity | int |
+--------------+------+
bus_id is the primary key column for this table.
Each row of this table contains information about the arrival time of a bus at the station and its capacity (i.e., the number of empty seats it has).
There will be no two buses that arrive at the same time and capacity will be a positive integer.
Passengers:
+--------------+------+
| Column Name | Type |
+--------------+------+
| passenger_id | int |
| arrival_time | int |
+--------------+------+
passenger_id is the primary key column for this table.
Each row of this table contains information about the arrival time of a passenger at the station.
Using a recursive cte and several successive ctes:
with recursive cte(id, a, c, s) as (
select b.*, 1 from buses b
union all
select c.id, c.a, c.c, c.s + 1 from cte c where c.s+1 <= c.c
),
_passengers as (
select row_number() over (order by p.passenger_id) n, p.* from passengers p
),
gps(bid, n, a, pid) as (
select b.bus_id, p.n, p.arrival_time, p.passenger_id from buses b
join _passengers p on p.arrival_time <= b.arrival_time and not exists
(select 1 from buses b1 where b1.arrival_time < b.arrival_time and p.arrival_time <= b1.arrival_time)
),
slts(v, n, a, pid) as (
select case when
(select sum(g.bid = g1.bid and g1.n <= g.n) from gps g1) <= (select sum(c.id = g.bid) from cte c)
then g.bid else null end, g.n, g.a, g.pid from gps g
),
dists as (
select case when s.v is not null
then s.v
else (select min(b.bus_id) from buses b where b.arrival_time >= s.a and
(select sum(s2.v is null and s2.n <= s.n) from slts s2) <
(select sum(c3.id = b.bus_id) from cte c3)) end v,
s.a, s.pid from slts s
)
select c.id bus_id, c.c capacity, c.a arrival_time, c.s spot, p.pid passenger_id, p.a arrival from cte c
left join (select (select sum(d.v = d1.v and d1.a < d.a) from dists d1) + 1 r,
d.* from dists d where d.v is not null) p
on c.id = p.v and c.s = p.r
order by c.a, c.s

Select top 2 scorers from each combination of 3 columns in MySQL

I have following tables and data:
player_scores
+----+-----------+---------------------+-------+
| id | player_id | created_at | score |
+----+-----------+---------------------+-------+
| 1 | 1 | 2020-01-01 01:00:00 | 20 |
| 2 | 1 | 2020-01-02 01:00:00 | 30 |
| 3 | 2 | 2020-01-01 01:00:00 | 20 |
| 4 | 3 | 2020-01-01 01:00:00 | 20 |
| 5 | 4 | 2020-05-01 01:00:00 | 40 |
| 6 | 5 | 2020-01-02 01:00:00 | 20 |
| 7 | 6 | 2020-01-01 01:00:00 | 20 |
| 8 | 7 | 2020-01-03 01:00:00 | 20 |
| 9 | 1 | 2020-03-01 01:00:00 | 20 |
+----+-----------+---------------------+-------+
players
+----+---------+-------------+----------+---------------------+---------+
| id | city_id | category_id | group_id | created_at | name |
+----+---------+-------------+----------+---------------------+---------+
| 1 | 1 | 1 | 1 | 2020-01-01 01:00:00 | Player1 |
| 2 | 1 | 2 | 1 | 2020-01-02 01:00:00 | Player2 |
| 3 | 2 | 2 | 1 | 2020-01-01 01:00:00 | Player3 |
| 4 | 2 | 1 | 1 | 2020-05-01 01:00:00 | Player4 |
| 5 | 3 | 1 | 1 | 2020-01-02 01:00:00 | Player5 |
| 6 | 4 | 2 | 1 | 2020-01-01 01:00:00 | Player6 |
| 7 | 3 | 1 | 1 | 2020-01-01 01:00:00 | Player7 |
| 8 | 4 | 2 | 1 | 2020-01-01 01:00:00 | Player8 |
+----+---------+-------------+----------+---------------------+---------+
cities
+----+------------+------------+
| id | country_id | name |
+----+------------+------------+
| 1 | 1 | London |
| 2 | 2 | Sydney |
| 3 | 2 | Melbourne |
| 4 | 3 | Toronto |
+----+------------+------------+
countries
+----+-----------+
| id | name |
+----+-----------+
| 1 | England |
| 2 | Australia |
| 3 | Canada |
+----+-----------+
categories
+----+------------+
| id | name |
+----+------------+
| 1 | Category 1 |
| 2 | Category 2 |
+----+------------+
groups
+----+---------+
| id | name |
+----+---------+
| 1 | Group 1 |
| 2 | Group 2 |
+----+---------+
SQL code to create tables and data:
CREATE TABLE players
(
id INT UNSIGNED auto_increment PRIMARY KEY,
city_id INT UNSIGNED NOT NULL,
category_id INT UNSIGNED NOT NULL,
group_id INT UNSIGNED NOT NULL,
created_at DATETIME NOT NULL,
name VARCHAR(255) NOT NULL
);
CREATE TABLE player_scores
(
id INT UNSIGNED auto_increment PRIMARY KEY,
player_id INT UNSIGNED NOT NULL,
created_at DATETIME NOT NULL,
score INT(10) NOT NULL
);
CREATE TABLE cities
(
id INT UNSIGNED auto_increment PRIMARY KEY,
country_id INT UNSIGNED NOT NULL,
name VARCHAR(255) NOT NULL
);
CREATE TABLE countries
(
id INT UNSIGNED auto_increment PRIMARY KEY,
name VARCHAR(255) NOT NULL
);
CREATE TABLE categories
(
id INT UNSIGNED auto_increment PRIMARY KEY,
name VARCHAR(255) NOT NULL
);
CREATE TABLE `groups`
(
id INT UNSIGNED auto_increment PRIMARY KEY,
name VARCHAR(255) NOT NULL
);
INSERT INTO players (id, city_id, category_id, group_id, created_at, name) VALUES (1, 1, 1, 1, '2020-01-01 01:00:00', 'Player1'),(2, 1, 2, 1, '2020-01-02 01:00:00', 'Player2'),(3, 2, 2, 1, '2020-01-01 01:00:00', 'Player3'),(4, 2, 1, 1, '2020-05-01 01:00:00', 'Player4'),(5, 3, 1, 1, '2020-01-02 01:00:00', 'Player5'),(6, 4, 2, 1, '2020-01-01 01:00:00', 'Player6'),(7, 3, 1, 1, '2020-01-01 01:00:00', 'Player7'),(8, 4, 2, 1, '2020-01-01 01:00:00', 'Player8');
INSERT INTO player_scores (id, player_id, created_at, score) VALUES (1, 1, '2020-01-01 01:00:00', 20), (2, 1, '2020-01-02 01:00:00', 30),(3, 2, '2020-01-01 01:00:00', 20),(4, 3, '2020-01-01 01:00:00', 20),(5, 4, '2020-05-01 01:00:00', 40),(6, 5, '2020-01-02 01:00:00', 20),(7, 6, '2020-01-01 01:00:00', 20),(8, 7, '2020-01-03 01:00:00', 20),(9, 1, '2020-03-01 01:00:00', 20);
INSERT INTO cities (id, country_id, name) VALUES (1,1,'London'), (2,2,'Sydney'), (3,2,'Melbourne'), (4,3,'Toronto');
INSERT INTO countries (id, name) VALUES (1,'England'),(2,'Australia'),(3,'Canada');
INSERT INTO categories (id, name) VALUES (1,'Category 1'),(2,'Category 2');
INSERT INTO `groups` (id, name) VALUES (1,'Group 1'),(2,'Group 2');
Relationship between 'players' and 'player_scores' is one-to-many. Also, for some players there might be no scores at all.
I have to return a one list of top 2 scorers from each combination of country, category and group. If there are no scores at all for a combination then no scorers are selected for that combination. If there is only one scorer within a combination then only one scorer is selected. If a player does not have any scores yet then it will not be selected.
If 2 or more players have the same scores within the combination, the earliest created player (created_at field within 'players' table) should be selected.
I use MySQL 5.7, therefore I cannot use window functions !
So, the result from the testing data above should be:
+-----------+--------------+---------------+------------+---------------------+---------------------+--------------------------+
| player.id | country.name | category.name | group.name | player.created_at | player_scores.score | player_scores.created_at |
+-----------+--------------+---------------+------------+---------------------+---------------------+--------------------------+
| 1 | England | Category 1 | Group 1 | 2020-01-01 01:00:00 | 20 | 2020-03-01 01:00:00 |
| 2 | England | Category 2 | Group 1 | 2020-01-02 01:00:00 | 20 | 2020-01-01 01:00:00 |
| 3 | Australia | Category 2 | Group 1 | 2020-01-01 01:00:00 | 20 | 2020-01-01 01:00:00 |
| 4 | Australia | Category 1 | Group 1 | 2020-05-01 01:00:00 | 40 | 2020-05-01 01:00:00 |
| 7 | Australia | Category 1 | Group 1 | 2020-01-01 01:00:00 | 20 | 2020-01-03 01:00:00 |
| 6 | Canada | Category 2 | Group 1 | 2020-01-01 01:00:00 | 20 | 2020-01-01 01:00:00 |
+-----------+--------------+---------------+------------+---------------------+---------------------+--------------------------+
So far, I have this query, but obviously it is far away from solution. I tried and searched for some hints, but could not find any so far:
SELECT players.*, player_scores.*, cities.*, countries.*, categories.*, groups.*
FROM players
LEFT JOIN cities
ON players.city_id = cities.id
LEFT JOIN countries
ON cities.country_id = country.id
LEFT JOIN categories
ON players.category_id = categories.id
LEFT JOIN groups
ON players.group_id = groups.id
LEFT JOIN player_scores
ON player_scores.player_id = players.id
AND player_scores.id IN (
SELECT MAX(ps.id)
FROM player_scores AS ps
JOIN players AS p
ON p.id = ps.player_id
GROUP BY p.id
)
INNER JOIN (
SELECT DISTINCT countries.id, groups.id, categories.id
FROM players
LEFT JOIN cities
ON players.city_id = cities.id
LEFT JOIN countries
ON cities.country_id = country.id
LEFT JOIN groups
ON players.group_id = groups.id
LEFT JOIN categories
ON players.category_id = categories.id
INNER JOIN player_scores
ON player_scores.player_id = players.id
WHERE player_scores.id IN (
SELECT MAX(ps.id)
FROM player_scores AS ps
JOIN players AS p
ON p.id = ps.player_id
GROUP BY p.id
)
GROUP BY countries.id, categories.id, groups.id
HAVING MAX(player_scores.score) > 0
) players2
ON countries.id = players2.country_id
AND categories.id = players2.category_id
AND groups.id = players2.group_id;
Any help will be highly appreciated.
UPDATE: Provided testing data and result table.
To recap, am I right in thinking that this is the intermediate result, from which we have to select a subset of results based upon the stated criteria?
SELECT p.name
, s.score
, c.name city
, x.name country
, y.name category
, g.name player_group
, p.created_at
FROM players p
JOIN player_scores s
ON s.player_id = p.id
JOIN cities c
ON c.id = p.city_id
JOIN countries x
ON x.id = c.country_id
JOIN categories y
ON y.id = p.category_id
JOIN groups g
ON g.id = p.group_id;
+---------+-------+-----------+-----------+------------+--------------+---------------------+
| name | score | city | country | category | player_group | created_at |
+---------+-------+-----------+-----------+------------+--------------+---------------------+
| Player1 | 20 | London | England | Category 1 | Group 1 | 2020-01-01 01:00:00 |
| Player1 | 30 | London | England | Category 1 | Group 1 | 2020-01-01 01:00:00 |
| Player4 | 40 | Sydney | Australia | Category 1 | Group 1 | 2020-05-01 01:00:00 |
| Player5 | 20 | Melbourne | Australia | Category 1 | Group 1 | 2020-01-02 01:00:00 |
| Player7 | 20 | Melbourne | Australia | Category 1 | Group 1 | 2020-01-01 01:00:00 |
| Player1 | 20 | London | England | Category 1 | Group 1 | 2020-01-01 01:00:00 |
| Player2 | 20 | London | England | Category 2 | Group 1 | 2020-01-02 01:00:00 |
| Player3 | 20 | Sydney | Australia | Category 2 | Group 1 | 2020-01-01 01:00:00 |
| Player6 | 20 | Toronto | Canada | Category 2 | Group 1 | 2020-01-01 01:00:00 |
+---------+-------+-----------+-----------+------------+--------------+---------------------+

Correlated Subqueries with MAX() and GROUP BY

I have the issue using MAX() and GROUP BY.
I have next tables:
personal_prizes
___________ ___________ _________ __________
| id | userId | specId| group |
|___________|___________|_________|__________|
| 1 | 1 | 1 | 1 |
|___________|___________|_________|__________|
| 2 | 1 | 2 | 1 |
|___________|___________|_________|__________|
| 3 | 2 | 3 | 1 |
|___________|___________|_________|__________|
| 4 | 2 | 4 | 2 |
|___________|___________|_________|__________|
| 5 | 1 | 5 | 2 |
|___________|___________|_________|__________|
| 6 | 1 | 6 | 2 |
|___________|___________|_________|__________|
| 7 | 2 | 7 | 3 |
|___________|___________|_________|__________|
prizes
___________ ___________ _________
| id | title | group |
|___________|___________|_________|
| 1 | First | 1 |
|___________|___________|_________|
| 2 | Second | 1 |
|___________|___________|_________|
| 3 | Newby | 1 |
|___________|___________|_________|
| 4 | General| 2 |
|___________|___________|_________|
| 5 | Leter | 2 |
|___________|___________|_________|
| 6 | Ter | 2 |
|___________|___________|_________|
| 7 | Mentor | 3 |
|___________|___________|_________|
So, I need to select highest title for user.
E.g. user with id = 1 must have prizes 'Second', 'Ter'.
I don't know how to implement it in one query(((
So, first of all, I try to select highest specID for user.
I try next:
SELECT pp.specID
FROM personal_prizes pp
WHERE pp.specID IN (SELECT MAX(pp1.id)
FROM personal_prizes pp1
WHERE pp1.userId = 1
GROUP BY pp1.group)
And it doesnt work.
So please help me to solve this problem.
And if you help to select prizes for user it will be great!
The problem I perceive here is that prizes.id isn't really a reliable way to determine which is the "highest" prize. Ignoring this however I suggest using ROW_NUMBER() OVER() to locate the "highest" prize per user as follows:
Refer to this SQL Fiddle
CREATE TABLE personal_prizes
([id] int, [userId] int, [specId] int, [group] int)
;
INSERT INTO personal_prizes
([id], [userId], [specId], [group])
VALUES
(1, 1, 1, 1),
(2, 1, 2, 1),
(3, 2, 3, 1),
(4, 2, 4, 2),
(5, 1, 5, 2),
(6, 1, 6, 2),
(7, 2, 7, 3)
;
CREATE TABLE prizes
([id] int, [title] varchar(7), [group] int)
;
INSERT INTO prizes
([id], [title], [group])
VALUES
(1, 'First', 1),
(2, 'Second', 1),
(3, 'Newby', 1),
(4, 'General', 2),
(5, 'Leter', 2),
(6, 'Ter', 2),
(7, 'Mentor', 3)
;
Query 1:
select
*
from (
select
pp.*, p.title
, row_number() over(partition by pp.userId order by p.id ASC) as prize_order
from personal_prizes pp
inner join prizes p on pp.specid = p.id
) d
where prize_order = 1
Results:
| id | userId | specId | group | title | prize_order |
|----|--------|--------|-------|-------|-------------|
| 1 | 1 | 1 | 1 | First | 1 |
| 3 | 2 | 3 | 1 | Newby | 1 |
The result can be "reversed" by changing the ORDER BY within the over clause:
select
*
from (
select
pp.*, p.title
, row_number() over(partition by pp.userId order by p.id DESC) as prize_order
from personal_prizes pp
inner join prizes p on pp.specid = p.id
) d
where prize_order = 1
| id | userId | specId | group | title | prize_order |
|----|--------|--------|-------|--------|-------------|
| 6 | 1 | 6 | 2 | Ter | 1 |
| 7 | 2 | 7 | 3 | Mentor | 1 |
You could expand on this logic to locate "highest prize per group" too
select
*
from (
select
pp.*, p.title
, row_number() over(partition by pp.userId, p.[group] order by p.id ASC) as prize_order
from personal_prizes pp
inner join prizes p on pp.specid = p.id
) d
where prize_order = 1
| id | userId | specId | group | title | prize_order |
|----|--------|--------|-------|---------|-------------|
| 1 | 1 | 1 | 1 | First | 1 |
| 5 | 1 | 5 | 2 | Leter | 1 |
| 3 | 2 | 3 | 1 | Newby | 1 |
| 4 | 2 | 4 | 2 | General | 1 |
| 7 | 2 | 7 | 3 | Mentor | 1 |

Include NULL in SQL Join when using WHERE

I have the following two tables:
Table TempUser22 : 57,000 rows:
+------+-----------+
| Id | Followers |
+------+-----------+
| 874 | 55542 |
| 1081 | 330624 |
| 1378 | 17919 |
| 1621 | 920 |
| 1688 | 255463 |
| 2953 | 751 |
| 3382 | 204466 |
| 3840 | 273489 |
| 4145 | 376 |
| ... | ... |
+------+-----------+
Table temporal_users : 10,000,000 rows total, 3200 rows Where Date=2010-12-31:
+---------------------+---------+--------------------+
| Date | User_Id | has_original_tweet |
+---------------------+---------+--------------------+
| 2008-02-22 12:00:00 | 676493 | 2 |
| 2008-02-22 12:00:00 | 815263 | 1 |
| 2008-02-22 12:00:00 | 6245822 | 1 |
| 2008-02-22 12:00:00 | 8854092 | 1 |
| 2008-02-23 12:00:00 | 676493 | 2 |
| 2008-02-23 12:00:00 | 815263 | 1 |
| 2008-02-23 12:00:00 | 6245822 | 1 |
| 2008-02-23 12:00:00 | 8854092 | 1 |
| 2008-02-24 12:00:00 | 676493 | 2 |
| ............. | ... | .. |
+---------------------+---------+--------------------+
I am running the following join query on these tables:
SELECT sum(has_original_tweet), b.Id
FROM temporal_users AS a
RIGHT JOIN TempUser22 AS b
ON a.User_ID = b.Id
GROUP BY b.Id;
Which returns 57,00 rows as expected, with NULL answers on the first field:
+-------------------------+------+
| sum(has_original_tweet) | Id |
+-------------------------+------+
| NULL | 874 |
| NULL | 1081 |
| 135 | 1378 |
| 164 | 1621 |
| 652 | 1688 |
| 691 | 2953 |
| NULL | 3382 |
| NULL | 3840 |
| NULL | 4145 |
| ... | .... |
+-------------------------+------+
However, when adding the WHERE line specifying a date as below:
SELECT sum(has_original_tweet), b.Id
FROM temporal_users AS a
RIGHT JOIN TempUser22 AS b
ON a.User_ID = b.Id
WHERE a.Date BETWEEN '2010-12-31-00:00:00' AND '2010-12-31-23:59:59'
GROUP BY b.Id;
I receive the following answer, of only 3200 rows, and without any NULL in the first field.
+-------------------------+---------+
| sum(has_original_tweet) | Id |
+-------------------------+---------+
| 1 | 797194 |
| 1 | 815263 |
| 0 | 820678 |
| 1 | 1427511 |
| 0 | 4653731 |
| 1 | 5933862 |
| 2 | 7530552 |
| 1 | 7674072 |
| 1 | 8149632 |
| .. | .... |
+-------------------------+---------+
My question is: How to get, for a given date, an answer of size 57,000 rows for each user in TempUser22 with NULL values when has_original_tweet is not present in temporal_user for the given date?
Thanks.
SELECT b.Id, SUM(a.has_original_tweet) s
FROM TempUser22 b
LEFT JOIN temporal_users a ON b.Id = a.User_Id
AND a.Date BETWEEN '2010-12-31-00:00:00' AND '2010-12-31-23:59:59'
GROUP BY b.Id;
Id s
1 null
2 1
3 null
4 3
5 null
6 null
For debugging, I used:
CREATE TEMPORARY TABLE TempUser22(Id INT, Followers INT)
SELECT 1 Id, 10 Followers UNION ALL
SELECT 2, 20 UNION ALL
SELECT 3, 30 UNION ALL
SELECT 4, 40 UNION ALL
SELECT 5, 50 UNION ALL
SELECT 6, 60
;
CREATE TEMPORARY TABLE temporal_users(`Date` DATETIME, User_Id INT, has_original_tweet INT)
SELECT '2008-02-22 12:00:00' `Date`, 1 User_Id, 1 has_original_tweet UNION ALL
SELECT '2008-12-31 12:00:00', 2, 1 UNION ALL
SELECT '2010-12-31 12:00:00', 2, 1 UNION ALL
SELECT '2012-12-31 12:00:00', 2, 1 UNION ALL
SELECT '2008-12-31 12:00:00', 4, 9 UNION ALL
SELECT '2010-12-31 12:00:00', 4, 1 UNION ALL
SELECT '2010-12-31 12:00:00', 4, 2 UNION ALL
SELECT '2012-12-31 12:00:00', 4, 9
;
That's because NULL values will always be discarded from the where clause
You can use a coalesce in your where clause.
WHERE coalesce(a.Date, 'some-date-in-the-range') BETWEEN '2010-12-31-00:00:00' AND '2010-12-31-23:59:59'
With this instead, you force null values to be considered as valid.

MySQL - condition on the joined row from from right table

I have two tables:
mysql> select * from orders;
+------+---------------------+------------+---------+
| id | created_at | foreign_id | data |
+------+---------------------+------------+---------+
| 1 | 2010-10-10 10:10:10 | 3 | order 1 |
| 4 | 2010-10-10 00:00:00 | 6 | order 4 |
| 5 | 2010-10-10 00:00:00 | 7 | order 5 |
+------+---------------------+------------+---------+
mysql> select * from activities;
+------+---------------------+------------+------+
| id | created_at | foreign_id | verb |
+------+---------------------+------------+------+
| 1 | 2010-10-10 10:10:10 | 3 | get |
| 2 | 2010-10-10 10:10:15 | 3 | set |
| 3 | 2010-10-10 10:10:20 | 3 | put |
| 4 | 2010-10-10 00:00:00 | 6 | get |
| 5 | 2010-10-11 00:00:00 | 6 | set |
| 6 | 2010-10-12 00:00:00 | 6 | put |
+------+---------------------+------------+------+
Now I need to join activities with orders on foreign_id column: select only one activity (if exists) for every order such that ABS(TIMESTAMPDIFF(SECOND, orders.created_at, activities.created_at)) is minimal. E.g. the order and the activity were created approximately at the same time.
+----------+---------+---------------------+-------------+------+---------------------+
| order_id | data | order_created_at | activity_id | verb | activity_created_at |
+----------+---------+---------------------+-------------+------+---------------------+
| 1 | order 1 | 2010-10-10 10:10:10 | 1 | get | 2010-10-10 10:10:10 |
| 4 | order 4 | 2010-10-10 00:00:00 | 4 | get | 2010-10-10 00:00:00 |
| 5 | order 5 | 2010-10-10 00:00:00 | NULL | NULL | NULL |
+----------+---------+---------------------+-------------+------+---------------------+
The following query produces set of rows that includes the desired rows. If GROUP BY statement is included then it's not possible to control which row from activities is joined.
SELECT o.id AS order_id
, o.data AS data
, o.created_at AS order_created_at
, a.id AS activity_id
, a.verb AS verb
, a.created_at AS activity_created_at
FROM orders AS o
LEFT JOIN activities AS a ON a.foreign_id = o.foreign_id;
Is it possible to write such a query? Ideally I'd like to avoid using group by because this section is a part of larger reporting querty.
Because both tables reference some mysterious foreign key there's potential for errors with the query below, but it may give you a principle which you can adapt for your purposes...
DROP TABLE IF EXISTS orders;
CREATE TABLE orders
(id INT NOT NULL PRIMARY KEY
,created_at DATETIME NOT NULL
,foreign_id INT NOT NULL
,data VARCHAR(20) NOT NULL
);
INSERT INTO orders VALUES
(1 ,'2010-10-10 10:10:10',3 ,'order 1'),
(4 ,'2010-10-10 00:00:00',6 ,'order 4'),
(5 ,'2010-10-10 00:00:00',7 ,'order 5');
DROP TABLE IF EXISTS activities;
CREATE TABLE activities
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,created_at DATETIME NOT NULL
,foreign_id INT NOT NULL
,verb VARCHAR(20) NOT NULL
);
INSERT INTO activities VALUES
(1,'2010-10-10 10:10:10',3,'get'),
(2,'2010-10-10 10:10:15',3,'set'),
(3,'2010-10-10 10:10:20',3,'put'),
(4,'2010-10-10 00:00:00',6,'get'),
(5,'2010-10-11 00:00:00',6,'set'),
(6,'2010-10-12 00:00:00',6,'put');
SELECT o.id order_id
, o.data
, o.created_at order_created_at
, a.id activity_id
, a.verb
, a.created_at activity_created_at
FROM activities a
JOIN orders o
ON o.foreign_id = a.foreign_id
JOIN
( SELECT a.foreign_id
, MIN(ABS(TIMEDIFF(a.created_at,o.created_at))) x
FROM activities a
JOIN orders o
ON o.foreign_id = a.foreign_id
GROUP
BY a.foreign_id
) m
ON m.foreign_id = a.foreign_id
AND m.x = ABS(TIMEDIFF(a.created_at,o.created_at))
UNION DISTINCT
SELECT o.id
, o.data
, o.created_at
, a.id
, a.verb
, a.created_at
FROM orders o
LEFT
JOIN activities a
ON a.foreign_id = o.foreign_id
WHERE a.foreign_id IS NULL;
;
+----------+---------+---------------------+-------------+------+---------------------+
| order_id | data | order_created_at | activity_id | verb | activity_created_at |
+----------+---------+---------------------+-------------+------+---------------------+
| 1 | order 1 | 2010-10-10 10:10:10 | 1 | get | 2010-10-10 10:10:10 |
| 4 | order 4 | 2010-10-10 00:00:00 | 4 | get | 2010-10-10 00:00:00 |
| 5 | order 5 | 2010-10-10 00:00:00 | NULL | NULL | NULL |
+----------+---------+---------------------+-------------+------+---------------------+