Update With Another Column Max Value - mysql

I have a table states_risk:
id | state | municipally | rating
example:
id | state | municipally | rating
1 AG AG1 5
2 AG AG2 6
3 AG AG3 2
4 AG AG4 1
5 AG OTHER -
6 AB AB1 0.2
7 AB AB2 2
8 AB AB3 10
9 AB OTHER -
I need to update the value "rating" for municipally = OTHER set the MAX(rating) value by state "AG" - "AB", example: the id 5 set a 6 value from because is the max value for state AG.

You can do it by joining the table to a query that returns the max rating for each state:
update states_risk s inner join (
select state, max(rating) rating
from states_risk
group by state
) g on g.state = s.state
set s.rating = g.rating
where s.municipally = 'OTHER';
See the demo.
Results:
| id | state | municipally | rating |
| --- | ----- | ----------- | ------ |
| 1 | AG | AG1 | 5 |
| 2 | AG | AG2 | 6 |
| 3 | AG | AG3 | 2 |
| 4 | AG | AG4 | 1 |
| 5 | AG | OTHER | 6 |
| 6 | AB | AB1 | 0.2 |
| 7 | AB | AB2 | 2 |
| 8 | AB | AB3 | 10 |
| 9 | AB | OTHER | 10 |

This gives you the max values
SELECT state, max(rating) as maxrating
FROM states_risk
GROUP BY state
This gives you the ones you want to update
SELECT id, state
FROM states_risk
WHERE municiplally = 'OTHER'
So the update is
UPDATE states_risk
SET rating = (
SELECT max(rating) as maxrating
FROM states_risk inner
WHERE inner.state = states_risk.state
)
WHERE municiplally = 'OTHER'

If You column rating has '-' in his Column.
you need also to cast the column
Like
select version();
| version() |
| :-------- |
| 8.0.18 |
CREATE TABLE states_risk
(`id` int, `state` varchar(2), `municipally` varchar(5), `rating` varchar(3))
;
INSERT INTO states_risk
(`id`, `state`, `municipally`, `rating`)
VALUES
(1, 'AG', 'AG1', '5'),
(2, 'AG', 'AG2', '6'),
(3, 'AG', 'AG3', '2'),
(4, 'AG', 'AG4', '1'),
(5, 'AG', 'OTHER', '-'),
(6, 'AB', 'AB1', '0.2'),
(7, 'AB', 'AB2', '2'),
(8, 'AB', 'AB3', '10'),
(9, 'AB', 'OTHER', '-')
;
✓
✓
Select * From states_risk;
id | state | municipally | rating
-: | :---- | :---------- | :-----
1 | AG | AG1 | 5
2 | AG | AG2 | 6
3 | AG | AG3 | 2
4 | AG | AG4 | 1
5 | AG | OTHER | -
6 | AB | AB1 | 0.2
7 | AB | AB2 | 2
8 | AB | AB3 | 10
9 | AB | OTHER | -
SELECT state, MAX(CAST(rating as FLOAT)) MAXrating
FROM states_risk WHERE rating <> '-' GROUP BY state
state | MAXrating
:---- | --------:
AG | 6
AB | 10
UPDATE states_risk sr INNER JOIN (SELECT state, MAX(CAST(rating as FLOAT)) MAXrating
FROM states_risk WHERE rating <> '-' GROUP BY state) t1
ON sr.state = t1.state
SET sr.rating = t1.MAXrating WHERE sr.municipally = 'OTHER';
✓
Select * From states_risk;
id | state | municipally | rating
-: | :---- | :---------- | :-----
1 | AG | AG1 | 5
2 | AG | AG2 | 6
3 | AG | AG3 | 2
4 | AG | AG4 | 1
5 | AG | OTHER | 6
6 | AB | AB1 | 0.2
7 | AB | AB2 | 2
8 | AB | AB3 | 10
9 | AB | OTHER | 10
db<>fiddle here

Related

islands and gaps ordering issue MYSQL 8.0

I am trying to use partition by & row_number() to count consecutive duplicate values for a given date range.Essentially its attempting to capture "streaks" If there is a break in the streak the count should start over when the value occurs again.
To reproduce these results here is the code:
CREATE TABLE partion_test (
daily DATE,
response_short_name VARCHAR(10)
);
INSERT INTO `partion_test` (`daily`, `response_short_name`) VALUES
('2020-09-21', 'A'),
('2020-09-25', 'A'),
('2020-09-26', 'A'),
('2020-09-27', 'A'),
('2020-09-28', 'A'),
('2020-09-22', 'B'),
('2020-09-20', 'C'),
('2020-09-23', 'C'),
('2020-09-24', 'C');
SELECT
daily,
response_short_name
,row_number() over (partition by response_short_name order by daily) as seqnum
FROM (
select
daily,
response_short_name
FROM partion_test
order by daily limit 1000
) A;
HERE IS THE CURRENT OUTPUT
| daily | response_short_name | seqnum | |
+------------+---------------------+--------+--+
| 2020-09-21 | A | 1 | |
| 2020-09-25 | A | 2 | |
| 2020-09-26 | A | 3 | |
| 2020-09-27 | A | 4 | |
| 2020-09-28 | A | 5 | |
| 2020-09-22 | B | 1 | |
| 2020-09-20 | C | 1 | |
| 2020-09-23 | C | 2 | |
| 2020-09-24 | C | 3 | |
+------------+---------------------+--------+--+
HERE IS THE DESIRED OUTPOUT
+------------+---------------------+--------+--+
| daily | response_short_name | seqnum | |
+------------+---------------------+--------+--+
| 2020-09-20 | C | 1 | |
| 2020-09-21 | A | 1 | |
| 2020-09-22 | B | 1 | |
| 2020-09-23 | C | 1 | |
| 2020-09-24 | C | 2 | |
| 2020-09-25 | A | 1 | |
| 2020-09-26 | A | 2 | |
| 2020-09-27 | A | 3 | |
| 2020-09-28 | A | 4 | |
+------------+---------------------+--------+--+
Ive been scratching at my brain for a while on this. Any help would be appreciated
You can do:
select *,
row_number() over(partition by grp order by daily) as seqnum
from (
select *,
sum(inc) over(order by daily) as grp
from (
select *,
case when lag(response_short_name) over(order by daily) = response_short_name
then 0 else 1 end as inc
from partion_test
order by daily
) x
) y
order by daily
Result:
daily response_short_name inc grp seqnum
----------- -------------------- ---- ---- ------
2020-09-20 C 1 1 1
2020-09-21 A 1 2 1
2020-09-22 B 1 3 1
2020-09-23 C 1 4 1
2020-09-24 C 0 4 2
2020-09-25 A 1 5 1
2020-09-26 A 0 5 2
2020-09-27 A 0 5 3
2020-09-28 A 0 5 4
See running example at DB Fiddle:
Your data doesn't fit your result, so it is quite diffcult t achieve your result
CREATE TABLE partion_test (
daily DATE,
response_short_name VARCHAR(10)
);
INSERT INTO `partion_test` (`daily`, `response_short_name`) VALUES
('2020-09-21', 'A'),
('2020-09-25', 'A'),
('2020-09-26', 'A'),
('2020-09-27', 'A'),
('2020-09-28', 'A'),
('2020-09-22', 'B'),
('2020-09-20', 'C'),
('2020-09-23', 'C'),
('2020-09-24', 'C');
select `daily`,`response_short_name`,
row_number() over (partition by `response_short_name`, grp order by `daily`) as row_num
from (select t.*,
(row_number() over (order by `daily`) -
row_number() over (partition by `response_short_name` order by `daily`)
) as grp
from partion_test t
) t
ORDER BY `daily`
daily | response_short_name | row_num
:--------- | :------------------ | ------:
2020-09-20 | C | 1
2020-09-21 | A | 1
2020-09-22 | B | 1
2020-09-23 | C | 1
2020-09-24 | C | 2
2020-09-25 | A | 1
2020-09-26 | A | 2
2020-09-27 | A | 3
2020-09-28 | A | 4
db<>fiddle here

Show all the groupped value that have same value more than 1

I have table like this
table1
| ID | Val | Val2 |
| 2 | AA | 0 |
| 3 | AA | 1 |
| 4 | AD | 0 |
| 5 | CV | 1 |
| 6 | AF | 1 |
| 7 | CV | 1 |
I want to know if there is duplicate value in column Val more than one. I used the group by clause. plus i want to know how many duplicate appear in Val
select Val,count(Val) from table
group by Val where
having count(val) > 1
result :
| Val | count(val) |
| AA | 2 |
| CV | 2 |
Now i want to know which column that have duplicate value so i used Group_concat with query like this
select Val,count(Val), group_concat(ID) from table1
group by Val where
having count(val) > 1
Results
| Val | count(val) | group_concat(ID) |
| AA | 2 | 2,3 |
| CV | 2 | 5,7 |
Now i dont know how to show all the duplicate value, i only show which id that had duplicate value by group_concat() but i couldn't show all data without the group_concat column. I tried to use Field_in_set but it seems not working.
select Val,count(Val), group_concat(ID) from table1
where FIELD_IN_SET(ID,group_concat(ID))
group by Val where
having count(val) > 1
What i expect is i want to show all the duplicate value after i group by and count the value like below table
| ID | Val | Val2 |
| 2 | AA | 0 |
| 3 | AA | 1 |
| 5 | CV | 1 |
| 7 | CV | 1 |
CREATE TABLE table1 (
`ID` INTEGER,
`Val` VARCHAR(2),
`Val2` INTEGER
);
INSERT INTO table1
(`ID`, `Val`, `Val2`)
VALUES
('2', 'AA', '0'),
('3', 'AA', '1'),
('4', 'AD', '0'),
('5', 'CV', '1'),
('6', 'AF', '1'),
('7', 'CV', '1');
SELECT t1.*
FROM table1 t1
INNER JOIN (SELECT COUNT(`Val`) countval, `Val` FROM table1 GROUP BY `Val`) t2
ON t1.`Val` = t2.`Val`
WHERE countval > 1
ID | Val | Val2
-: | :-- | ---:
2 | AA | 0
3 | AA | 1
5 | CV | 1
7 | CV | 1
SELECT `ID`, `Val`, `Val2`
FROM ( SELECT
`ID`, `Val`, `Val2`,
COUNT(`Val`) OVER(PARTITION BY `Val`) c1
FROM table1) t1
WHERE c1 > 1
ID | Val | Val2
-: | :-- | ---:
2 | AA | 0
3 | AA | 1
5 | CV | 1
7 | CV | 1
db<>fiddle here
If you want to return the rows with the duplicate Vals then use EXISTS:
select t.* from tablename t
where exists (select 1 from tablename where Val = t.Val and ID <> t.ID)
See the demo.
Results:
| ID | Val | Val2 |
| --- | --- | ---- |
| 2 | AA | 0 |
| 3 | AA | 1 |
| 5 | CV | 1 |
| 7 | CV | 1 |

Calculate tax amount between 3 different tables with MySQL

I have the following tables structure and trying to make a report from these:
___BillableDatas
|--------|------------|---------|--------------|------------|
| BIL_Id | BIL_Date |BIL_Rate | BIL_Quantity | BIL_Status |
|--------|------------|---------|--------------|------------|
| 1 | 2018-03-01 | 105 | 1 | charged |
| 2 | 2018-03-02 | 105 | 1 | cancelled |
| 3 | 2018-03-01 | 15 | 2 | notcharged |
| 4 | 2018-03-01 | 21 | 1 | notcharged |
| 5 | 2018-03-02 | 15 | 2 | notcharged |
| 6 | 2018-03-02 | 21 | 1 | notcharged |
|--------|------------|---------|--------------|------------|
___SalesTaxes
|--------|--------------|------------|
| STX_Id | STX_TaxeName | STX_Amount |
|--------|--------------|------------|
| 8 | Tax 1 | 5.000 |
| 9 | Tax 2 | 5.000 |
| 10 | Tax 3 | 19.975 |
|--------|--------------|------------|
STX_Amount is a percentage.
___ApplicableTaxes
|-----------|-----------|
| ATX_BILId | ATX_STXId |
|-----------|-----------|
| 1 | 8 |
| 1 | 9 |
| 1 | 10 |
| 2 | 8 |
| 2 | 9 |
| 2 | 10 |
| 3 | 9 |
| 3 | 10 |
| 4 | 9 |
| 5 | 9 |
| 5 | 10 |
| 6 | 9 |
|-----------|-----------|
ATX_BILId is the item ID link with ___BillableDatas.
ATX_STXId is the tax ID link with ___SalesTaxes.
I need to get to sum of the items per day
- without tax
- with tax
So mething like this:
|------------------|---------------|------------|
| BIL_RateNonTaxed | BIL_RateTaxed | BIL_Status |
|------------------|---------------|------------|
| 105.00 | 136.47 | charged | <- Taxes #8, #9 and #10 applicable
| 102.00 | 118.035 | notcharged | <- Taxes #9 and #10 applicable
|------------------|---------------|------------|
Explications on the totals:
105 = 105*1 -- (total of the charged item multiply by the quantity)
102 = (15*2)*2+(21*2) -- (total of the notcharged items multiply by the quantity)
136.47 = 105+(105*(5+5+19.975)/100)
119.085 = 102+(((15*2)*2)*(5+19.975)/100+(21*2)*5/100)
My last try was this one:
SELECT
BIL_Date,
(BIL_Rate*BIL_Quantity) AS BIL_RateNonTaxed,
(((BIL_Rate*BIL_Quantity)*SUM(STX_Amount)/100)+BIL_Rate*BIL_Quantity) AS BIL_RateTaxed,
BIL_Status
FROM ___BillableDatas
LEFT JOIN ___SalesTaxes
ON FIND_IN_SET(STX_Id, BIL_ApplicableTaxes) > 0
LEFT JOIN ___ApplicableTaxes
ON ___BillableDatas.BIL_Id = ___ApplicableTaxes.ATX_BILId
WHERE BIL_BookingId=1
GROUP BY BIL_Id AND BIL_Status
ORDER BY BIL_Date
ASC
Please see this SQLFiddle to help you if needed:
http://sqlfiddle.com/#!9/425854f
Thanks.
I cannot bear to work with your naming policy, so I made my own...
DROP TABLE IF EXISTS bills;
CREATE TABLE bills
(bill_id SERIAL PRIMARY KEY
,bill_date DATE NOT NULL
,bill_rate INT NOT NULL
,bill_quantity INT NOT NULL
,bill_status ENUM('charged','cancelled','notcharged')
);
INSERT INTO bills VALUES
(1,'2018-03-01',105,1,'charged'),
(2,'2018-03-02',105,1,'cancelled'),
(3,'2018-03-01',15,2,'notcharged'),
(4,'2018-03-01',21,1,'notcharged'),
(5,'2018-03-02',15,2,'notcharged'),
(6,'2018-03-02',21,1,'notcharged');
DROP TABLE IF EXISTS sales_taxes;
CREATE TABLE sales_taxes
(sales_tax_id SERIAL PRIMARY KEY
,sales_tax_name VARCHAR(12) NOT NULL
,sales_tax_amount DECIMAL(5,3) NOT NULL
);
INSERT INTO sales_taxes VALUES
( 8,'Tax 1', 5.000),
( 9,'Tax 2', 5.000),
(10,'Tax 3',19.975);
DROP TABLE IF EXISTS applicable_taxes;
CREATE TABLE applicable_taxes
(bill_id INT NOT NULL
,sales_tax_id INT NOT NULL
,PRIMARY KEY(bill_id,sales_tax_id)
);
INSERT INTO applicable_taxes VALUES
(1, 8),
(1, 9),
(1,10),
(2, 8),
(2, 9),
(2,10),
(3, 9),
(3,10),
(4, 9),
(5, 9),
(5,10),
(6, 9);
SELECT bill_status
, SUM(bill_rate*bill_quantity) nontaxed
, SUM((bill_rate*bill_quantity)+(bill_rate*bill_quantity*total_sales_tax/100)) taxed
FROM
( SELECT b.*
, SUM(t.sales_tax_amount) total_sales_tax
FROM bills b
JOIN applicable_taxes bt
ON bt.bill_id = b.bill_id
JOIN sales_taxes t
ON t.sales_tax_id = bt.sales_tax_id
GROUP
BY bill_id
) x
GROUP
BY bill_status;
+-------------+---------+-------------+
| bill_status | untaxed | total |
+-------------+---------+-------------+
| charged | 105 | 136.4737500 |
| cancelled | 105 | 136.4737500 |
| notcharged | 102 | 119.0850000 |
+-------------+---------+-------------+
My answer is very slightly different from yours, so one of us has made a mistake somewhere. Either way, this should get you pretty close.
SELECT a.BIL_Date, BIL_RateNonTaxed, BIL_RateNonTaxed+BIL_RateTaxed AS BIL_RateTaxed FROM (
SELECT BIL_Date,
SUM(BIL_Rate*BIL_Quantity) AS BIL_RateNonTaxed
FROM ___BillableDatas
WHERE BIL_Status != 'cancelled'
GROUP BY BIL_Date
) a INNER JOIN (
SELECT BIL_Date,
(((BIL_Rate*BIL_Quantity)*SUM(STX_Amount)/100)) AS BIL_RateTaxed
FROM ___BillableDatas
LEFT JOIN ___ApplicableTaxes
ON ___BillableDatas.BIL_Id = ___ApplicableTaxes.ATX_BILId
LEFT JOIN ___SalesTaxes
ON STX_Id = ATX_STXId
WHERE BIL_Status != 'cancelled'
GROUP BY BIL_Date
) b
ON a.BIL_Date = b.BIL_Date
ORDER BY a.BIL_Date;
Explanation:
Your BIL_RateNonTaxed calculation is not using the ___SalesTaxes table, so it must not appear on the query otherwise it would interfere the SUM function.
Howerver, your BIL_RateTaxed does use the ___SalesTaxes table. In that case, I solved by creating 2 subqueries and joining the results.
I know there are better answers, but I'm not familiar with MySQL syntax.

Correlated Subqueries with MAX() and GROUP BY

I have the issue using MAX() and GROUP BY.
I have next tables:
personal_prizes
___________ ___________ _________ __________
| id | userId | specId| group |
|___________|___________|_________|__________|
| 1 | 1 | 1 | 1 |
|___________|___________|_________|__________|
| 2 | 1 | 2 | 1 |
|___________|___________|_________|__________|
| 3 | 2 | 3 | 1 |
|___________|___________|_________|__________|
| 4 | 2 | 4 | 2 |
|___________|___________|_________|__________|
| 5 | 1 | 5 | 2 |
|___________|___________|_________|__________|
| 6 | 1 | 6 | 2 |
|___________|___________|_________|__________|
| 7 | 2 | 7 | 3 |
|___________|___________|_________|__________|
prizes
___________ ___________ _________
| id | title | group |
|___________|___________|_________|
| 1 | First | 1 |
|___________|___________|_________|
| 2 | Second | 1 |
|___________|___________|_________|
| 3 | Newby | 1 |
|___________|___________|_________|
| 4 | General| 2 |
|___________|___________|_________|
| 5 | Leter | 2 |
|___________|___________|_________|
| 6 | Ter | 2 |
|___________|___________|_________|
| 7 | Mentor | 3 |
|___________|___________|_________|
So, I need to select highest title for user.
E.g. user with id = 1 must have prizes 'Second', 'Ter'.
I don't know how to implement it in one query(((
So, first of all, I try to select highest specID for user.
I try next:
SELECT pp.specID
FROM personal_prizes pp
WHERE pp.specID IN (SELECT MAX(pp1.id)
FROM personal_prizes pp1
WHERE pp1.userId = 1
GROUP BY pp1.group)
And it doesnt work.
So please help me to solve this problem.
And if you help to select prizes for user it will be great!
The problem I perceive here is that prizes.id isn't really a reliable way to determine which is the "highest" prize. Ignoring this however I suggest using ROW_NUMBER() OVER() to locate the "highest" prize per user as follows:
Refer to this SQL Fiddle
CREATE TABLE personal_prizes
([id] int, [userId] int, [specId] int, [group] int)
;
INSERT INTO personal_prizes
([id], [userId], [specId], [group])
VALUES
(1, 1, 1, 1),
(2, 1, 2, 1),
(3, 2, 3, 1),
(4, 2, 4, 2),
(5, 1, 5, 2),
(6, 1, 6, 2),
(7, 2, 7, 3)
;
CREATE TABLE prizes
([id] int, [title] varchar(7), [group] int)
;
INSERT INTO prizes
([id], [title], [group])
VALUES
(1, 'First', 1),
(2, 'Second', 1),
(3, 'Newby', 1),
(4, 'General', 2),
(5, 'Leter', 2),
(6, 'Ter', 2),
(7, 'Mentor', 3)
;
Query 1:
select
*
from (
select
pp.*, p.title
, row_number() over(partition by pp.userId order by p.id ASC) as prize_order
from personal_prizes pp
inner join prizes p on pp.specid = p.id
) d
where prize_order = 1
Results:
| id | userId | specId | group | title | prize_order |
|----|--------|--------|-------|-------|-------------|
| 1 | 1 | 1 | 1 | First | 1 |
| 3 | 2 | 3 | 1 | Newby | 1 |
The result can be "reversed" by changing the ORDER BY within the over clause:
select
*
from (
select
pp.*, p.title
, row_number() over(partition by pp.userId order by p.id DESC) as prize_order
from personal_prizes pp
inner join prizes p on pp.specid = p.id
) d
where prize_order = 1
| id | userId | specId | group | title | prize_order |
|----|--------|--------|-------|--------|-------------|
| 6 | 1 | 6 | 2 | Ter | 1 |
| 7 | 2 | 7 | 3 | Mentor | 1 |
You could expand on this logic to locate "highest prize per group" too
select
*
from (
select
pp.*, p.title
, row_number() over(partition by pp.userId, p.[group] order by p.id ASC) as prize_order
from personal_prizes pp
inner join prizes p on pp.specid = p.id
) d
where prize_order = 1
| id | userId | specId | group | title | prize_order |
|----|--------|--------|-------|---------|-------------|
| 1 | 1 | 1 | 1 | First | 1 |
| 5 | 1 | 5 | 2 | Leter | 1 |
| 3 | 2 | 3 | 1 | Newby | 1 |
| 4 | 2 | 4 | 2 | General | 1 |
| 7 | 2 | 7 | 3 | Mentor | 1 |

Include NULL in SQL Join when using WHERE

I have the following two tables:
Table TempUser22 : 57,000 rows:
+------+-----------+
| Id | Followers |
+------+-----------+
| 874 | 55542 |
| 1081 | 330624 |
| 1378 | 17919 |
| 1621 | 920 |
| 1688 | 255463 |
| 2953 | 751 |
| 3382 | 204466 |
| 3840 | 273489 |
| 4145 | 376 |
| ... | ... |
+------+-----------+
Table temporal_users : 10,000,000 rows total, 3200 rows Where Date=2010-12-31:
+---------------------+---------+--------------------+
| Date | User_Id | has_original_tweet |
+---------------------+---------+--------------------+
| 2008-02-22 12:00:00 | 676493 | 2 |
| 2008-02-22 12:00:00 | 815263 | 1 |
| 2008-02-22 12:00:00 | 6245822 | 1 |
| 2008-02-22 12:00:00 | 8854092 | 1 |
| 2008-02-23 12:00:00 | 676493 | 2 |
| 2008-02-23 12:00:00 | 815263 | 1 |
| 2008-02-23 12:00:00 | 6245822 | 1 |
| 2008-02-23 12:00:00 | 8854092 | 1 |
| 2008-02-24 12:00:00 | 676493 | 2 |
| ............. | ... | .. |
+---------------------+---------+--------------------+
I am running the following join query on these tables:
SELECT sum(has_original_tweet), b.Id
FROM temporal_users AS a
RIGHT JOIN TempUser22 AS b
ON a.User_ID = b.Id
GROUP BY b.Id;
Which returns 57,00 rows as expected, with NULL answers on the first field:
+-------------------------+------+
| sum(has_original_tweet) | Id |
+-------------------------+------+
| NULL | 874 |
| NULL | 1081 |
| 135 | 1378 |
| 164 | 1621 |
| 652 | 1688 |
| 691 | 2953 |
| NULL | 3382 |
| NULL | 3840 |
| NULL | 4145 |
| ... | .... |
+-------------------------+------+
However, when adding the WHERE line specifying a date as below:
SELECT sum(has_original_tweet), b.Id
FROM temporal_users AS a
RIGHT JOIN TempUser22 AS b
ON a.User_ID = b.Id
WHERE a.Date BETWEEN '2010-12-31-00:00:00' AND '2010-12-31-23:59:59'
GROUP BY b.Id;
I receive the following answer, of only 3200 rows, and without any NULL in the first field.
+-------------------------+---------+
| sum(has_original_tweet) | Id |
+-------------------------+---------+
| 1 | 797194 |
| 1 | 815263 |
| 0 | 820678 |
| 1 | 1427511 |
| 0 | 4653731 |
| 1 | 5933862 |
| 2 | 7530552 |
| 1 | 7674072 |
| 1 | 8149632 |
| .. | .... |
+-------------------------+---------+
My question is: How to get, for a given date, an answer of size 57,000 rows for each user in TempUser22 with NULL values when has_original_tweet is not present in temporal_user for the given date?
Thanks.
SELECT b.Id, SUM(a.has_original_tweet) s
FROM TempUser22 b
LEFT JOIN temporal_users a ON b.Id = a.User_Id
AND a.Date BETWEEN '2010-12-31-00:00:00' AND '2010-12-31-23:59:59'
GROUP BY b.Id;
Id s
1 null
2 1
3 null
4 3
5 null
6 null
For debugging, I used:
CREATE TEMPORARY TABLE TempUser22(Id INT, Followers INT)
SELECT 1 Id, 10 Followers UNION ALL
SELECT 2, 20 UNION ALL
SELECT 3, 30 UNION ALL
SELECT 4, 40 UNION ALL
SELECT 5, 50 UNION ALL
SELECT 6, 60
;
CREATE TEMPORARY TABLE temporal_users(`Date` DATETIME, User_Id INT, has_original_tweet INT)
SELECT '2008-02-22 12:00:00' `Date`, 1 User_Id, 1 has_original_tweet UNION ALL
SELECT '2008-12-31 12:00:00', 2, 1 UNION ALL
SELECT '2010-12-31 12:00:00', 2, 1 UNION ALL
SELECT '2012-12-31 12:00:00', 2, 1 UNION ALL
SELECT '2008-12-31 12:00:00', 4, 9 UNION ALL
SELECT '2010-12-31 12:00:00', 4, 1 UNION ALL
SELECT '2010-12-31 12:00:00', 4, 2 UNION ALL
SELECT '2012-12-31 12:00:00', 4, 9
;
That's because NULL values will always be discarded from the where clause
You can use a coalesce in your where clause.
WHERE coalesce(a.Date, 'some-date-in-the-range') BETWEEN '2010-12-31-00:00:00' AND '2010-12-31-23:59:59'
With this instead, you force null values to be considered as valid.