I have a SQL table (temp2) like this:
I want to calculate the balance*rate/sum(balance) for each cat
So, my desired output would be something like this:
To get this output, I used following code:
DROP TABLE IF EXISTS temp3;
create table temp3 as select cat, balance * rate /sum(balance) as prod from temp2
group by cat
select temp2.emp_id, temp2.cat,temp2.balance, temp2.rate , temp3.prod from temp2
left outer join temp3 on temp2.cat=temp3.cat
So here I have created a new table to get the answer.
Will there be an easier way to get the same results?
There's no need for the new table unless you need to refer to it in multiple queries. You can just join with a subquery.
SELECT t2.emp_id, t2.cat, t2.balance, t2.rate, t3.prod
FROM temp2 AS t2
JOIN (
SELECT cat, balance * rate /sum(balance) AS prod
FROM temp2
GROUP BY cat
) AS t3 ON t2.cat = t3.cat
There's no need to use LEFT JOIN. Since the subquery gets cat from the same table, there can't be any non-matching rows.
Sometimes it's useful to create the new table so you can add an index for performance reasons.
You actually don't need a join or subquery at all thanks to window functions:
SELECT emp_id, cat, balance, rate,
balance * rate / sum(balance) OVER (PARTITION BY cat) AS prod
FROM temp2
ORDER BY emp_id;
gives
emp_id cat balance rate prod
------ --- ------- ---- ------------------
1 1 1000.0 0.25 0.0625
2 3 1250.0 0.25 0.0568181818181818
3 2 1500.0 0.25 0.0681818181818182
4 1 1000.0 0.25 0.0625
5 2 1250.0 0.25 0.0568181818181818
6 3 1500.0 0.25 0.0681818181818182
100 1 1000.0 0.25 0.0625
101 3 1250.0 0.25 0.0568181818181818
102 2 1500.0 0.25 0.0681818181818182
103 1 1000.0 0.25 0.0625
104 2 1250.0 0.25 0.0568181818181818
105 3 1500.0 0.25 0.0681818181818182
(Create an index on temp2.cat for best performance).
This is also more accurate; both yours and Barmar's uses balance and rate in the grouped query without including those values in the GROUP BY clause - that's an error in most databases, but Sqlite will pick a random row from the group to use as the values, which when different rows in the group have different values for them, will throw off the final calculations. To do it properly with grouping (If, for example, you're using an old database version that doesn't support window functions), you need something like
SELECT t2.emp_id, t2.cat, t2.balance, t2.balance * t2.rate / t3.sumbalance AS prod
FROM temp2 AS t2
JOIN (SELECT cat, sum(balance) AS sumbalance
FROM temp2
GROUP BY cat) AS t3
ON t2.cat = t3.cat
ORDER BY t2.emp_id;
Related
I have a system that stores the data only when they are changed. So, the dataset looks like below.
data_type_id
data_value
inserted_at
2
240
2022-01-19 17:20:52
1
30
2022-01-19 17:20:47
2
239
2022-01-19 17:20:42
1
29
2022-01-19 17:20:42
My data frequency is every 5 seconds. So, whether there's any timestamp or not I need to get the result by assuming in this 5th-second data value the same as the previous value.
As I am storing the data that are only changed, indeed the dataset should be like below.
data_type_id
data_value
inserted_at
2
240
2022-01-19 17:20:52
1
30
2022-01-19 17:20:52
2
239
2022-01-19 17:20:47
1
30
2022-01-19 17:20:47
2
239
2022-01-19 17:20:42
1
29
2022-01-19 17:20:42
I don't want to insert into my table, I just want to retrieve the data like this on the SELECT statement.
Is there any way I can create this query?
PS. I have many data_types hence when the OP makes a query, it usually gets around a million rows.
EDIT:
Information about server Server version: 10.3.27-MariaDB-0+deb10u1 Debian 10
The User is going to determine the SELECT DateTime. So, there's no certain between time.
As #Akina mentioned, sometimes there're some gaps between the inserted_at. The difference might be ~4seconds or ~6seconds instead of a certain 5seconds. Since it's not going to happen so frequently, It is okay to generate by ignoring this fact.
With the help of a query that gets you all the combinations of data_type_id and the 5-second moments you need, you can achieve the result you need using a subquery that gets you the closest data_value:
with recursive u as
(select '2022-01-19 17:20:42' as d
union all
select DATE_ADD(d, interval 5 second) from u
where d < '2022-01-19 17:20:52'),
v as
(select * from u cross join (select distinct data_type_id from table_name) t)
select v.data_type_id,
(select data_value from table_name where inserted_at <= d and data_type_id = v.data_type_id
order by inserted_at desc limit 1) as data_value,
d as inserted_at
from v
Fiddle
You can replace the recursive CTE with any query that gets you all the 5-second moments you need.
WITH RECURSIVE
cte1 AS ( SELECT #start_datetime dt
UNION ALL
SELECT dt + INTERVAL 5 SECOND FROM cte1 WHERE dt < #end_datetime),
cte2 AS ( SELECT *,
ROW_NUMBER() OVER (PARTITION BY test.data_type_id, cte1.dt
ORDER BY test.inserted_at DESC) rn
FROM cte1
LEFT JOIN test ON FIND_IN_SET(test.data_type_id, #data_type_ids)
AND cte1.dt >= test.inserted_at )
SELECT *
FROM cte2
WHERE rn = 1
https://dbfiddle.uk/?rdbms=mariadb_10.3&fiddle=380ad334de0c980a0ddf1b49bb6fa38e
I am writing an MySQL query that will take a table, split it into buckets of equal size of a given column, and then return a count of values within each bucket. This isnt the same as 10 equal "count" buckets - I am expecting the number of records in each bucket to vary - but for them to be split equally by a given column.
I have data as follows:
User | Followers
----------------
User 1 | 100
User 2 | 1000
User 3 | 1300
User 4 | 2000
User 5 | 10000
I would like to split the data into 5 equal sized "follower" buckets - ie buckets of increasing 2000 followers. So there would be an output as follows:
Bucket | Count
-----------------------
1.(0 - 2000) | 3
2.(2000 - 4000) | 1
3.(4000 - 6000) | 0
4.(6000 - 8000) | 0
4.(8000 - 10000)| 1
So far I've tried the following:
SELECT (followers)%(bucket_size),COUNT(*) FROM (SELECT (ROUND((MAX(followers)/MIN(followers))/10,0)) as bucket_size FROM users
WHERE followers > 0) as a
INNER JOIN users
GROUP BY (followers)%(bucket_size)
But this is providing me with all distinct values.
You can use aggregation as follows:
select 1 + (t.followers - 1) % b.bucket_size bucket, count(*) no_users
from mytable t
cross join (select 2000 bucket_size) b
group by t.followers % b.bucket_size
On the other hand, if you want to also return empty buckets, as shown in your desired results, it is a bit different. You can use an inline query to list the buckets, then bring the table with a left join:
select n bucket, count(t.followers) cnt
from (select 2000 bucket_size) b
cross join (select 1 bucket union all select 2 union all select 3 union all select 4 union all select 5) n
left join mytable t on (t.followers - 1) % b.bucket_size = n.bucket - 1
group by n.bucket
If having empty buckets is not important, here is a simple and readable solution:
select bucket as Bucket,
count(*) as Count
from (
select case when followers between 0 and 1999 then '(0-2000)'
when followers between 2000 and 3999 then '(2000-4000)'
when followers between 4000 and 5999 then '(4000-6000)'
when followers between 6000 and 7999 then '(6000-8000)'
when followers between 8000 and 10000 then '(8000-10000)'
end as bucket
from users
) buckets
group by bucket
You can also play around with the above query here: db-fiddle
I'm trying to extract stats from DB.
Table's structure is:
UpdatedId product_name revenue
980 Product1 1000
975 Product1 950
973 Product1 900
970 Product1 800
965 Product21 1200
So revenue = previous revenue + new revenue.
In order to make graphs, the goal is to get the output for Product1 like this
UpdateId Difference
980 50
975 50
973 100
970 0
I tried this query but MySQL gets stuck :)
select a.product_name, a.revenue, b.revenue, b.revenue- a.revenue as difference from updated_stats a, updated_stats b where a.product_name=b.product_name and b.revenue= (select min(revenue) from updated_stats where product_name=a.product_name and revenue > a.revenue and product_name= 'Product1')
Could you please tell me, how it should be queried? Thanks.
I would do this with a correlated subquery:
select u.*,
(select u.revenue - u2.revenue
from updated_stats u2
where u2.product_name = u.product_name and
u2.updatedid < u.updatedid
order by u2.updatedid desc
limit 1
) as diff
from updated_stats u;
Note: This returns NULL instead of 0 for 970. That actually makes more sense to me. But you can use COALESCE() or a similar function to turn it into a 0.
If updated_stats is even moderately sized, you will want an index on updated_status(product_name, updated_id, revenue). This index covers the subquery.
I'm attempting to join two tables and also get a SUM and flailing badly. I need to get the total commission amounts for each affiliate where affiliate.approved=1 AND order.status=3.
//affiliate table
affiliate_id | firstname | lastname | approved |
1 joe shmoe 1
2 frank dimag 0
3 bob roosky 1
here's the order table
//order
affiliate_id | order_status_id | commission
1 3 0.20
1 0 0.30
2 3 0.10
3 3 0.25
1 3 0.25
2 3 0.15
2 0 0.20
and here's what I'd like the query to return:
affiliate_id | commission
1 0.45
3 0.25
Here is my attempt that doesn't work. It outputs just one line.
SELECT order.affiliate_id, SUM(order.commission) AS total, affiliate.firstname, affiliate.lastname FROM `order`, `affiliate` WHERE order.order_status_id=3 AND affiliate.approved=1 AND order.affiliate_id = affiliate.affiliate_id ORDER BY total;
thanks for any help.
You've missed GROUP BY, try this:
SELECT
`order`.affiliate_id,
SUM(`order`.commission) AS total,
affiliate.firstname,
affiliate.lastname
FROM `order`
JOIN `affiliate`
ON `order`.order_status_id = 3 AND affiliate.approved = 1 AND `order`.affiliate_id = affiliate.affiliate_id
GROUP BY `order`.affiliate_id
ORDER BY total;
Demo Here
You can try this Query for your solution :-
SELECT order.affiliate_id, SUM(order.commission) AS total,affiliate.firstname,
affiliate.lastname
FROM `order`, `affiliate`
WHERE order.order_status_id=3
AND affiliate.approved=1
AND order.affiliate_id = affiliate.affiliate_id
GROUP BY order.affiliate_id
ORDER BY total;
Here is the solution:
select affiliate.affiliate_id,sum(`order`.commission) as total from affiliate left join `order` on affiliate.affiliate_id=`order`.affiliate_id
where affiliate.approved=1 and `order`.order_status_id=3 group by affiliate.affiliate_id
In addition,"order" is a key word of SQL , I recommend you not to use it as a table/column name.
First: Remove the implicit join syntax. It's confusing.
Second: You needed to group by affiliate_id. Using aggregate function without group by collapses your result set into a single row.
Here's the query using INNER JOIN:
SELECT
`order`.affiliate_id,
SUM(`order`.commission) AS total,
affiliate.firstname,
affiliate.lastname
FROM `order`
INNER JOIN`affiliate` ON `order`.affiliate_id = affiliate.affiliate_id
WHERE `order`.order_status_id = 3
AND affiliate.approved = 1
GROUP BY affiliate.affiliate_id
ORDER BY total;
WORKING DEMO
Caution: You have picked one of the reserved words of MySQL as table name (order). Be aware to enclose it with (`)backtick always .
Just a gentle reminder
Assume I have a table like this:
id pay
-- ---
1 10
2 20
3 30
4 40
5 50
6 60
I want to create a view from table above with this result:
id pay paid_before
-- --- -------------
1 10 0
2 20 10
3 30 30
4 40 60
5 50 100
6 60 150
which "paid_before" is sum of pay rows that have smaller id.
How could I do this job?
This accomplishes what you want.
SELECT p1.id,p1.pay, sum(p2.pay) as Paid_Before FROM PAYMENTS P1 LEFT JOIN
PAYMENTS P2 ON p1.id > p2.id
GROUP BY p1.id, p1.pay
See this sql fiddle
In MySQL, this is most efficiently done with variables:
select p.id, p.pay, (#p := #p + p.pay) - p.pay as PaidBefore
from payments p cross join
(select #p := 0) vars
order by id;
Although this is not standard SQL (which I usually prefer), that is okay. The standard SQL solution is to use cumulative sum:
select p.id, p.pay, sum(p.pay) over (order by p.id) - p.pay as PaidBefore
from payments p;
Many databases support this syntax, but not MySQL.
The SQL Fiddle (courtesy of Atilla) is here.