In StandardSQL, is it possible to run operations on each row during the grouping process? I'm not sure if I'm even asking the right question. Here's an example.
Let's say I have 3 rows like this:
| move_id | item_id | quantity | value |
|---------|---------|----------|-------|
| 1 | 1 | 10 | 100 |
| 1 | 2 | 20 | 150 |
| 1 | 3 | 30 | 200 |
I now want to group the table by move_id, summing values based on the proportion of each row's quantity to the minimum quantity.
For example the minimum quantity is 10, and row 2 has a quantity of 20, which means it's value should be cut in half before summing. Row 3 has a quantity of 30, which means it's value should be cut to a third before summing.
So my final value column should be 100 + (150 / 2) + (200 / 3) = 241.67.
My result should be:
| move_id | quantity | value |
|---------|----------|--------|
| 1 | 10 | 241.67 |
The query should be something like:
SELECT ANY_VALUE(move_id) AS move_id, MIN(quantity) AS quantity, SUM([THIS IS MY QUESTION, WHAT GOES HERE?]) as value FROM table GROUP BY move_id;
Is this possible?
Below is for BigQuery Standard SQL and does all in one shot
#standardSQL
SELECT move_id,
MIN(quantity) AS quantity,
SUM(value/quantity) * MIN(quantity) AS value
FROM `project.dataset.table`
GROUP BY move_id
If to apply to sample data from your question - result is
Row move_id quantity value
1 1 10 241.66666666666669
As you can see here - instead of splitting calculation/aggregation inside query, you can rather transform your formula such like below
100 + (150 / 2) + (200 / 3)
(100 * 10 / 10 + (150 * 10 / 20) + (200 * 10 / 30)
((100 / 10 + (150 / 20) + (200 / 30)) * 10
SUM(value / quantity) * MIN(quantity)
so, you ended up with just simple aggregation "in one shot"
The somewhat difficult part of your query is that you want to aggregate, but the sum you have in mind itself requires the result of an aggregation - the minimum quantity per each move_id group. One option here would be to first generate the min quantity in a CTE, then aggregate that CTE using your logic.
WITH cte AS (
SELECT *, MIN(quantity) OVER (PARTITION BY move_id) min_quantity
FROM yourTable
)
SELECT
move_id,
MIN(quantity) AS quantity,
SUM(value * min_quantity / quantity) AS value
FROM cte
GROUP BY
move_id;
Demo
Note: The above demo uses SQL Server, but the SQL used is ANSI compliant, and should also run on BigQuery without any issues.
Also, if your version of BigQuery does not support cte, then you may just inline the code contained inside the CTE as as a subquery.
In absence of CTEs, you can use Derived Table (subquery) to fetch minimum quantity for every move_id separately. And, then utilize them in the main query, to compute the sum:
SELECT t.move_id,
dt.min_quantity,
Sum(t.value / ( t.quantity / dt.min_quantity )) AS value
FROM your_table AS t
JOIN (SELECT move_id,
Min(quantity) AS min_quantity
FROM your_table
GROUP BY move_id) AS dt
ON dt.move_id = t.move_id
GROUP BY t.move_id
SQL Fiddle Demo
Related
I have one table and trying to subtract the total where a condition is True from the full total.
Ticket
Amount
Code
11
5.00
12
3.00
X
13
10.00
14
2.00
X
My query was
SELECT SUM(AMOUNT)
FROM Table
MINUS
SELECT SUM(Amount)
FROM Table
WHERE Code = 'X'
So the answer should be 20 - 5= 15
Below two possible queries:
-- Use IF operator
SELECT SUM(amount) - SUM(IF(code = 'X', amount, 0)) FROM tbl;
-- Use implicit MySQL conversion boolean to int (true => 1)
SELECT SUM(amount) - SUM(amount * (code = 'X')) FROM tbl;
SQL editor online
I have a query which returns some figures from a table of sales data. The table contains data for a number of different sales managers. I need to return data for the individual managers and also some calculated figures.
One of the figures I'm trying to get at involves a subquery. Getting the figures for each individual manager is fine and works well. The problem occurs when I am trying to get a figure which involves the use of a subquery. It seems that, though the outer query uses a group by clause to separate out individual salespeople, the subquery operates on the entire set.
Sample Data
name | Amount | Sell_at | Profit
--------------------------------
Fred | 1 | 3.99 | 0.99
Joe | 2 | 10.50 | 5.00
Fred | 5 | 20.00 | 15.00
Joe | 10 | 10.00 | 6.00
Desired result:
name | Total Profit | < 50% | > 50%
------------------------------------
Fred | 75.99 | 0.99 | 75.00
Joe | 71.00 | 60 | 10
SELECT
Account_Manager,
SUM(Profit * Amount) AS 'Total Profit'
(SELECT sum(Profit * Amount) from sales WHERE Profit * Amount / (Sell_at * Amount) < 0.5) AS '< 50%',
(SELECT sum(Profit * Amount) from sales WHERE Profit * Amount / (Sell_at * Amount) > 0.5) AS '> 50%'
FROM sales WHERE Invoice_Date = 'some date' GROUP BY Account_Manager
This gives me a row for each salesperson and their profit for that day, but the sub queries return figures totaled from the entire table. I could add a clause to the subquery WHERE in order to limit the result to the same date as the outer query, but ideally what I need to do really is to get the results for each individual salesperson.
Am I on the right track or is there another way I should be approaching this?
From your sample data and desired results it appears you only need a conditional aggregate:
select Name, sum(amount * profit) as TotalProfit,
Sum(case when Profit * Amount / (Sell_at * Amount) < 0.5 then Profit * Amount end) as '<50%',
Sum(case when Profit * Amount / (Sell_at * Amount) > 0.5 then Profit * Amount end) as '>50%'
from t
group by name
The expression:
Profit * Amount / (Sell_at * Amount)
is equivalent to just:
Profit / Sell_at
Use it in a CASE expression to perform conditional aggregation:
SELECT Account_Manager,
SUM(Amount * Profit) as TotalProfit,
SUM(CASE WHEN Profit / Sell_at < 0.5 THEN Profit * Amount END) `< 50%`,
SUM(CASE WHEN Profit / Sell_at > 0.5 THEN Profit * Amount END) `> 50%`
FROM Sales
WHERE Invoice_Date = 'some date'
GROUP BY Account_Manager;
You should also check for the case that Profit / Sell_at is equal to 0.5.
See the demo.
If you want to classify the rows based on their percentile, then use window functions. Let me assume that you want to know who is above and below average:
SELECT Account_Manager,
SUM(Profit * Amount) AS Total_Profit,
(CASE WHEN SUM(Profit * Amount) > AVG(SUM(Profit * Amount)) OVER ()
THEN 'Above average'
WHEN SUM(Profit * Amount) < AVG(SUM(Profit * Amount)) OVER ()
THEN 'Below average'
ELSE 'Average'
END) as relative_position
FROM sales
WHERE Invoice_Date = 'some date'
GROUP BY Account_Manager;
I'm working to create a SQL report on answers table:
id | created_at
1 | 2018-03-02 18:05:56
2 | 2018-04-02 18:05:56
3 | 2018-04-02 18:05:56
4 | 2018-05-02 18:05:56
5 | 2018-06-02 18:05:56
And output is:
weeks_ago | record_count (# of rows per weekly cohort) | growth (%)
-4 | 21 | 22%
-3 | 22 | -12%
-2 | 32 | 2%
-1 | 2 | 20%
0 | 31 | 0%
My query is currently erring with:
1111 - Invalid use of group function
What am I doing wrong here?
SELECT floor(datediff(f.created_at, curdate()) / 7) AS weeks_ago,
count(DISTINCT f.id) AS "New Records in Cohort",
100 * (count(*) - lag(count(*), 1) over (order by f.created_at)) / lag(count(*), 1) over (order by f.created_at) || '%' as growth
FROM answers f
WHERE f.completed_at IS NOT NULL
GROUP BY weeks_ago
HAVING count(*) > 1;
I think you want to find running count of all rows excluding the current row. I think you can ditch the LAG function as follows:
SELECT
COUNT(*) OVER (ORDER BY f.created_at ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) x, -- running count before current row
COUNT(*) OVER (ORDER BY f.created_at) y -- running count including current row
You can divide and multiply all you want.
Nope. you simply need to separate GROUP BY and LAG OVER:
WITH cte AS (
SELECT
FLOOR(DATEDIFF(created_at, CURDATE()) / 7) AS weeks_ago,
COUNT(DISTINCT id) AS new_records
FROM answers
WHERE 1 = 1 -- todo: change this
GROUP BY weeks_ago
HAVING 1 = 1 -- todo: change this
)
SELECT
cte.*,
100 * (
new_records - LAG(new_records) OVER (ORDER BY weeks_ago)
) / LAG(new_records) OVER (ORDER BY weeks_ago) AS percent_increase
FROM cte
Fiddle
You can't use lag contain COUNT aggregate function, because It isn't valid when you use aggregate function contain aggregate function.
you can try to use a subquery to make it.
SELECT weeks_ago,
NewRecords "New Records in Cohort",
100 * (cnt - lag(cnt, 1) over (order by created_at)) / lag(cnt, 1) over (order by created_at) || '%' as growth
FROM (
SELECT floor(datediff(f.created_at, curdate()) / 7) AS weeks_ago,
COUNT(*) over(partition by weeks_ago order by weeks_ago) cnt,
count(DISTINCT f.id) NewRecords,
f.created_at
FROM answers f
) t1
I want to get the average of a calculated sum. I have tried the syntax from this stackoverflow answer So my SQL query looks like this:
SELECT AVG(iq.stockvalue_sum), iq.date
FROM(
SELECT CONCAT(DATE_FORMAT(s.date, '%Y'), '-01-01') as date,
SUM(GREATEST(s.stockvalue,0)) as stockvalue_sum
FROM stockvalues s
GROUP BY CONCAT(DATE_FORMAT(s.date, '%Y'), '-01-01')
) iq
However this is not giving me a correct average. I want to get the average stockvalue for each year. The idea behind the table is to save every day the stock and stockvalue for each product. So this specifiq query is to show the average stockvalue for each year it has data for.
Edit: Sample output data
Stockvalue | Year
_________________
- 205 | 2015
- 300 | 2014
Input data:
pid | val | date
______________________
- 1 | 100 | 28-04-2015
- 2 | 150 | 28-04-2015
- 1 | 80 | 27-04-2015
- 2 | 80 | 27-04-2015
....
- 1 | 100 | 29-01-2014
- 2 | 100 | 29-01-2014
- 1 | 200 | 30-01-2014
- 2 | 200 | 30-01-2014
So I need to calculate know the average of the total stockvalue. So the sum of all stockvalues for day X and the average of X
At minimum you are missing a group by in your outer query:
SELECT AVG(iq.stockvalue_sum), iq.date
FROM(
SELECT CONCAT(DATE_FORMAT(s.date, '%Y'), '-01-01') as date,
SUM(GREATEST(s.stockvalue,0)) as stockvalue_sum
FROM stockvalues s
GROUP BY CONCAT(DATE_FORMAT(s.date, '%Y'), '-01-01')
) iq
GROUP BY iq.date
However, given your inner query is returning a single year with a summed value, the average of that value would be the same. Perhaps you can clarify your intentions. Are you sure you need the inner query at all? Perhaps this is all you need?
select avg(GREATEST(stockvalue,0)), CONCAT(DATE_FORMAT(s.date, '%Y'), '-01-01') as date
from stockvalues
group by CONCAT(DATE_FORMAT(s.date, '%Y'), '-01-01')
I think you need to group your inner query by date, and your outer query by year to get the results you are after:
SELECT AVG(s.Stockvalue) AS Stockvalue
YEAR(s.Date) AS Date
FROM (
SELECT DATE(s.Date) AS Date,
SUM(GREATEST(s.stockvalue,0)) AS Stockvalue
FROM stockvalues AS s
GROUP BY DATE(s.Date)
) AS s
GROUP BY YEAR(s.Date);
I am trying to get the maximum value out of a aggregate function, and then also get the min value out of a Price column which comes back in results.
id | discount | price
1 | 60 | 656
2 | 60 | 454
3 | 60 | 222
4 | 30 | 335
5 | 30 | 333
6 | 10 | 232
So in above table, I would like to separate Minimum Price vs Highest Discount.
This is the result I should be seeing:
id | discount | price
3 | 60 | 222
5 | 30 | 333
6 | 10 | 232
As you can see, its taken discount=60 group and separated the lowest price - 222, and the same for all other discount groups.
Could someone give me the SQL for this please, something like this -
SELECT MAX(discount) AS Maxdisc
, MIN(price) as MinPrice
,
FROM mytable
GROUP
BY discount
However, this doesnt separate the minimum price for each group. I think i need to join this table to itself to achieve that. Also, the table contains milions of rows, so the sql needs to be fast. One flat table.
This question is asked and answered with tedious regularity in SO. If only the algorithm was better at spotting duplicates. Anyway...
SELECT x.*
FROM my_table x
JOIN
( SELECT discount,MIN(price) min_price FROM my_table GROUP BY discount) y
ON y.discount = x.discount
AND y.min_price = x.price;
In your query, you cannot group by discount and then maximize the discount value.
This should get you the result you are looking for..
SELECT Max(ID) AS ID, discount, MIN(price) as MinPrice, FROM mytable GROUP BY discount
If you do not need the id, yo would do:
select discount, min(price) as minprice
from table t
group by discount;
If you want other columns in the row, you can either join back to the original table or use the substring_index()/group_concat() trick:
select substring_index(group_concat(id order by price), ',', 1) as id,
discount, min(price)
from table t
group by discount;
This will not always work because the intermediate result for group_concat() can overflow if there are too many matches within a column. This is controlled by a system parameter, which could be made bigger if necessary.