If I have added "group by date" then sum or avg function is not working.
Here is a table
| date | calories |
|-------------------------|
| 2021-03-28 | 42.50 |
| 2021-03-30 | 500.00 |
| 2021-03-31 | 35.00 |
| 2021-04-01 | 200.00 |
| 2021-04-01 | 35.00 |
Here is create Query
SELECT CONCAT(round(IF(avg(up.calories), avg(up.calories), 0), 2), "kcal") as avg, CONCAT(round(IF(SUM(up.calories), SUM(up.calories), 0), 2), "kcal") as total_burned
FROM `tbl` as `up`
WHERE `date` BETWEEN "2021-03-28" AND "2021-04-03"
AND `calories` != '0'
GROUP BY `date`
Below is my query result
| avg | total_burned |
|-----------------------------|
| 42.50 | 42.50 |
| 500.00 | 500.00 |
| 35.00 | 35.00 |
| 235.00 | 235.00 |
But actually, I want to this type of result
| avg | total_burned |
|-----------------------------|
| 203.13 | 812.50 |
Roll your own
DROP TABLE IF EXISTS T;
create table t( date date, calories decimal(10,2));
insert into t values
( '2021-03-28' , 42.50 ),
( '2021-03-30' , 500.00 ),
( '2021-03-31' , 35.00 ),
( '2021-04-01' , 200.00 ),
( '2021-04-01' , 35.00 );
select sum(calories) sumcal,sum(calories) / count(distinct date) calcavg, avg(calories)
from t;
+--------+------------+---------------+
| sumcal | calcavg | avg(calories) |
+--------+------------+---------------+
| 812.50 | 203.125000 | 162.500000 |
+--------+------------+---------------+
1 row in set (0.002 sec)
Related
I am trying to create a classifier model for a dataset, but I have too many distinct values for my target variable. If I run something like this:
Create or replace model `model_name`
options (model_type="AUTOML_CLASSIFIER", input_label_cols=["ORIGIN_AIRPORT"]) as
select DAY_OF_WEEK, ARRIVAL_TIME, ARRIVAL_DELAY, ORIGIN_AIRPORT
from `table_name`
limit 1000
I end up getting
Error running query
Classification model currently only supports classification with up to 50 unique labels and the label column had 111 unique labels.
So how can I select, for example, all rows that have one of the first 50 values of ORIGIN_AIRPORT?
Select * from “TABLE_NAME” as T1 left outer join (SELECT distinct
COLUMN_NAME from TABLE_NAME Order by COLUMN_NAME limit 50)as T2 on
T1.COLUMN_NAME=T2.COLUMN_NAME
This query will fetch you 50 distinct values in the inner query, then the outer query searches for those particular 50 distinct values using the T1.COLUMN_NAME=T2.COLUMN_NAME commands and returns all the records( it shows null for those not included in the 50 unique list)
Given a table of values (origin_airport), with unique identifiers (id) and date, find the minimum date for each unique value (origin_airport) to decide which N origin_airport values are to be returned.
Return all rows which match the first 3 unique origin_airport values (densely ranked, by min(date) per origin_airport).
Updated: to use columns that more closely match the model, with origin_airport and a date column for ordering.
Full working test case
The test data:
CREATE TABLE airportlogs (
origin_airport int
, id int primary key auto_increment
, date date DEFAULT NULL
);
INSERT INTO airportlogs (origin_airport) VALUES
( 1 )
, ( 1 )
, ( 8 )
, ( 8 )
, ( 8 )
, ( 7 )
, ( 7 )
, ( 6 )
, ( 5 )
, ( 4 )
, ( 3 )
, ( 3 )
, ( 7 )
, ( 7 )
, ( 1 )
, ( 8 )
, ( 3 )
, ( 1 )
;
-- Create some dates to use for ordering.
-- Ordering can be as complicated as we need.
UPDATE airportlogs SET date = current_date + INTERVAL +id DAY;
-- Intermediate calculation to show the MIN(date) per origin_airport
WITH nvals (origin_airport, mdate) AS (
SELECT origin_airport, MIN(date) AS mdate FROM airportlogs GROUP BY origin_airport
)
SELECT *
FROM nvals
ORDER BY mdate
;
+----------------+------------+
| origin_airport | mdate |
+----------------+------------+
| 1 | 2021-08-05 |
| 8 | 2021-08-07 |
| 7 | 2021-08-10 |
| 6 | 2021-08-12 |
| 5 | 2021-08-13 |
| 4 | 2021-08-14 |
| 3 | 2021-08-15 |
+----------------+------------+
-- Calculation of ordered rank for the unique origin_airport values
-- by MIN(date) per origin_airport.
WITH nvals0 (origin_airport, date, mdate) AS (
SELECT origin_airport
, date
, MIN(date) OVER (PARTITION BY origin_airport) AS mdate
FROM airportlogs
)
, nvals (origin_airport, date, mdate, r) AS (
SELECT origin_airport
, date
, mdate
, DENSE_RANK() OVER (ORDER BY mdate) AS r
FROM nvals0
)
SELECT *
FROM nvals
ORDER BY r, date
;
Result:
+----------------+------------+------------+---+
| origin_airport | date | mdate | r |
+----------------+------------+------------+---+
| 1 | 2021-08-05 | 2021-08-05 | 1 |
| 1 | 2021-08-06 | 2021-08-05 | 1 |
| 1 | 2021-08-19 | 2021-08-05 | 1 |
| 1 | 2021-08-22 | 2021-08-05 | 1 |
| 8 | 2021-08-07 | 2021-08-07 | 2 |
| 8 | 2021-08-08 | 2021-08-07 | 2 |
| 8 | 2021-08-09 | 2021-08-07 | 2 |
| 8 | 2021-08-20 | 2021-08-07 | 2 |
| 7 | 2021-08-10 | 2021-08-10 | 3 |
| 7 | 2021-08-11 | 2021-08-10 | 3 |
| 7 | 2021-08-17 | 2021-08-10 | 3 |
| 7 | 2021-08-18 | 2021-08-10 | 3 |
| 6 | 2021-08-12 | 2021-08-12 | 4 |
| 5 | 2021-08-13 | 2021-08-13 | 5 |
| 4 | 2021-08-14 | 2021-08-14 | 6 |
| 3 | 2021-08-15 | 2021-08-15 | 7 |
| 3 | 2021-08-16 | 2021-08-15 | 7 |
| 3 | 2021-08-21 | 2021-08-15 | 7 |
+----------------+------------+------------+---+
The final solution:
WITH min_date (origin_airport, date, mdate) AS (
SELECT origin_airport
, date
, MIN(date) OVER (PARTITION BY origin_airport) AS mdate
FROM airportlogs
)
, ranks (origin_airport, date, mdate, r) AS (
SELECT origin_airport
, date
, mdate
, DENSE_RANK() OVER (ORDER BY mdate) AS r
FROM min_date
)
SELECT *
FROM ranks
WHERE r <= 3
ORDER BY r, date
;
The final result:
+----------------+------------+------------+---+
| origin_airport | date | mdate | r |
+----------------+------------+------------+---+
| 1 | 2021-08-05 | 2021-08-05 | 1 |
| 1 | 2021-08-06 | 2021-08-05 | 1 |
| 1 | 2021-08-19 | 2021-08-05 | 1 |
| 1 | 2021-08-22 | 2021-08-05 | 1 |
| 8 | 2021-08-07 | 2021-08-07 | 2 |
| 8 | 2021-08-08 | 2021-08-07 | 2 |
| 8 | 2021-08-09 | 2021-08-07 | 2 |
| 8 | 2021-08-20 | 2021-08-07 | 2 |
| 7 | 2021-08-10 | 2021-08-10 | 3 |
| 7 | 2021-08-11 | 2021-08-10 | 3 |
| 7 | 2021-08-17 | 2021-08-10 | 3 |
| 7 | 2021-08-18 | 2021-08-10 | 3 |
+----------------+------------+------------+---+
There are a number of other solutions.
The poster didn't mention the logic for this ordering. But with the above window function behavior, that's trivial to specify.
I am trying to build a timeline. My table have a type column, date_start and date_end, like so:
+------+---------------------+---------------------+----------+
| Type | Start | End | Diff |
+------+---------------------+---------------------+----------+
| 1 | 2020-11-23 23:40:00 | 2020-11-23 23:41:00 | 00:01:00 |
| 1 | 2020-11-23 23:42:00 | 2020-11-23 23:43:00 | 00:01:00 |
| 1 | 2020-11-23 23:44:00 | 2020-11-23 23:45:00 | 00:01:00 |
| 2 | 2020-11-23 23:46:00 | 2020-11-23 23:47:00 | 00:01:00 |
| 2 | 2020-11-23 23:48:00 | 2020-11-23 23:49:00 | 00:01:00 |
| 1 | 2020-11-23 23:50:00 | 2020-11-23 23:51:00 | 00:01:00 |
| 1 | 2020-11-23 23:52:00 | 2020-11-23 23:53:00 | 00:01:00 |
+------+---------------------+---------------------+----------+
I need to sum the differences, while the column value stays the same as the one before. Once the type column value changes, it creates a new line, giving a result like this:
+------+----------+
| Type | Diff |
+------+----------+
| 1 | 00:03:00 |
| 2 | 00:02:00 |
| 1 | 00:02:00 |
+------+----------+
How can I achieve such grouping and sum result in MySQL?
PS: Don't bother with time logics, if you want to setup an example using integer is perfectly ok.
Use a variable to assign a block number and then aggregate
drop table if exists t;
create table t
( Type int, Startdt datetime, Enddt datetime, Diff time);
insert into t values
( 1 ,'2020-11-23 23:40:00' ,'2020-11-23 23:41:00' , '00:01:00' ),
( 1 ,'2020-11-23 23:42:00' ,'2020-11-23 23:43:00' , '00:01:00' ),
( 1 ,'2020-11-23 23:44:00' ,'2020-11-23 23:45:00' , '00:01:00' ),
( 2 ,'2020-11-23 23:46:00' ,'2020-11-23 23:47:00' , '00:01:00' ),
( 2 ,'2020-11-23 23:48:00' ,'2020-11-23 23:49:00' , '00:01:00' ),
( 1 ,'2020-11-23 23:50:00' ,'2020-11-23 23:51:00' , '00:01:00' ),
( 1 ,'2020-11-23 23:52:00' ,'2020-11-23 23:53:00' , '00:01:00' );
select type,block,sec_to_time(sum(time_to_sec(diff)))
from
(
select t.*,
if(type <> #p, #b:=#b+1,#b:=#b) block,
#p:=type p
from t
cross join (select #b:=0,#p:=0) b
order by startdt,type
) s
group by s.block,s.type;
------+-------+-------------------------------------+
| type | block | sec_to_time(sum(time_to_sec(diff))) |
+------+-------+-------------------------------------+
| 1 | 1 | 00:03:00 |
| 2 | 2 | 00:02:00 |
| 1 | 3 | 00:02:00 |
+------+-------+-------------------------------------+
3 rows in set (0.148 sec)
SELECT t.type,sum(c.dif)
FROM TABLE t JOIN
(SELECT c.type TIMEDIFF(c.start,c.end)
as dif FROM TABLE c)ta ON ta.type = t.type
group by t.type
SELECT source, SUM(deposit), SUM(distribute), SUM(deposit)-SUM(distribute),(SUM(deposit)-SUM(distribute)) / (SUM(SUM(deposit)-SUM(distribute)) * 100 as percentage
FROM tbl_sourceofFunds
GROUP BY source
it keep saying "#1111 - Invalid use of group function"
source | deposit | withdraw |
--------------------------------------
A | 300,000.00 | |
B | 300,000.00 | |
C | 220,000.00 | |
A | | 53,300.00 |
A | 20,000.00 | |
B | | 3,700.00 |
C | | 5,100.00 |
what I meant is to get:
source | sum.deposit |sum.withdraw | balance | percentage |
------------------------------------------------------------------
A | 320,000.00 | 53,300 | 266,700.00 | 34.284612 |
B | 300,000.00 | 3,700 | 296,300.00 | 38.089729 |
C | 220,000.00 | 5,100 | 214,900.00 | 34.284612 |
You can include a calculation to get the total across the entire table by including it in a sub query
DROP TABLE IF EXISTS sourcefunds;
CREATE TABLE sourcefunds(source VARCHAR(1), deposit DECImal (10,2), distribute decimal(10,2));
insert into sourcefunds values
('A' , 320000.00 , null ),
('B' , 300000.00 , null ),
('C' , 220000.00 , null ),
('A' , null , 53300.00 ),
('A' , 20000.00 , null ),
('B' , null , 3700.00 ),
('C' , null , 5100.00 );
SELECT source, SUM(deposit), SUM(distribute), SUM(deposit)-SUM(distribute),
(SUM(deposit)-SUM(distribute)) / (select sum(deposit) - sum(distribute) from sourcefunds) * 100 as percentage
FROM sourcefunds
GROUP BY source;
+--------+--------------+-----------------+------------------------------+------------+
| source | SUM(deposit) | SUM(distribute) | SUM(deposit)-SUM(distribute) | percentage |
+--------+--------------+-----------------+------------------------------+------------+
| A | 340000.00 | 53300.00 | 286700.00 | 35.931821 |
| B | 300000.00 | 3700.00 | 296300.00 | 37.134979 |
| C | 220000.00 | 5100.00 | 214900.00 | 26.933200 |
+--------+--------------+-----------------+------------------------------+------------+
3 rows in set (0.00 sec)
And if you want grand total use rollup
SELECT source, SUM(deposit), SUM(distribute), SUM(deposit)-SUM(distribute),
(SUM(deposit)-SUM(distribute)) / (select sum(deposit) - sum(distribute) from sourcefunds) * 100 as percentage
FROM sourcefunds
GROUP BY source with rollup;
+--------+--------------+-----------------+------------------------------+------------+
| source | SUM(deposit) | SUM(distribute) | SUM(deposit)-SUM(distribute) | percentage |
+--------+--------------+-----------------+------------------------------+------------+
| A | 340000.00 | 53300.00 | 286700.00 | 35.931821 |
| B | 300000.00 | 3700.00 | 296300.00 | 37.134979 |
| C | 220000.00 | 5100.00 | 214900.00 | 26.933200 |
| NULL | 860000.00 | 62100.00 | 797900.00 | 100.000000 |
+--------+--------------+-----------------+------------------------------+------------+
4 rows in set (0.00 sec)
Try this
Select source, deposit, withdrew,
deposit - distribute as balance,
(deposit - distribute) / (Select sum(deposit - distribute) from tbl_sourceofFunds) * 100 as percentage
From (
SELECT source,
SUM(deposit) as deposit,
SUM(distribute) as withdrew
FROM tbl_sourceofFunds
GROUP BY source
) x
I have the following two tables:
Table TempUser22 : 57,000 rows:
+------+-----------+
| Id | Followers |
+------+-----------+
| 874 | 55542 |
| 1081 | 330624 |
| 1378 | 17919 |
| 1621 | 920 |
| 1688 | 255463 |
| 2953 | 751 |
| 3382 | 204466 |
| 3840 | 273489 |
| 4145 | 376 |
| ... | ... |
+------+-----------+
Table temporal_users : 10,000,000 rows total, 3200 rows Where Date=2010-12-31:
+---------------------+---------+--------------------+
| Date | User_Id | has_original_tweet |
+---------------------+---------+--------------------+
| 2008-02-22 12:00:00 | 676493 | 2 |
| 2008-02-22 12:00:00 | 815263 | 1 |
| 2008-02-22 12:00:00 | 6245822 | 1 |
| 2008-02-22 12:00:00 | 8854092 | 1 |
| 2008-02-23 12:00:00 | 676493 | 2 |
| 2008-02-23 12:00:00 | 815263 | 1 |
| 2008-02-23 12:00:00 | 6245822 | 1 |
| 2008-02-23 12:00:00 | 8854092 | 1 |
| 2008-02-24 12:00:00 | 676493 | 2 |
| ............. | ... | .. |
+---------------------+---------+--------------------+
I am running the following join query on these tables:
SELECT sum(has_original_tweet), b.Id
FROM temporal_users AS a
RIGHT JOIN TempUser22 AS b
ON a.User_ID = b.Id
GROUP BY b.Id;
Which returns 57,00 rows as expected, with NULL answers on the first field:
+-------------------------+------+
| sum(has_original_tweet) | Id |
+-------------------------+------+
| NULL | 874 |
| NULL | 1081 |
| 135 | 1378 |
| 164 | 1621 |
| 652 | 1688 |
| 691 | 2953 |
| NULL | 3382 |
| NULL | 3840 |
| NULL | 4145 |
| ... | .... |
+-------------------------+------+
However, when adding the WHERE line specifying a date as below:
SELECT sum(has_original_tweet), b.Id
FROM temporal_users AS a
RIGHT JOIN TempUser22 AS b
ON a.User_ID = b.Id
WHERE a.Date BETWEEN '2010-12-31-00:00:00' AND '2010-12-31-23:59:59'
GROUP BY b.Id;
I receive the following answer, of only 3200 rows, and without any NULL in the first field.
+-------------------------+---------+
| sum(has_original_tweet) | Id |
+-------------------------+---------+
| 1 | 797194 |
| 1 | 815263 |
| 0 | 820678 |
| 1 | 1427511 |
| 0 | 4653731 |
| 1 | 5933862 |
| 2 | 7530552 |
| 1 | 7674072 |
| 1 | 8149632 |
| .. | .... |
+-------------------------+---------+
My question is: How to get, for a given date, an answer of size 57,000 rows for each user in TempUser22 with NULL values when has_original_tweet is not present in temporal_user for the given date?
Thanks.
SELECT b.Id, SUM(a.has_original_tweet) s
FROM TempUser22 b
LEFT JOIN temporal_users a ON b.Id = a.User_Id
AND a.Date BETWEEN '2010-12-31-00:00:00' AND '2010-12-31-23:59:59'
GROUP BY b.Id;
Id s
1 null
2 1
3 null
4 3
5 null
6 null
For debugging, I used:
CREATE TEMPORARY TABLE TempUser22(Id INT, Followers INT)
SELECT 1 Id, 10 Followers UNION ALL
SELECT 2, 20 UNION ALL
SELECT 3, 30 UNION ALL
SELECT 4, 40 UNION ALL
SELECT 5, 50 UNION ALL
SELECT 6, 60
;
CREATE TEMPORARY TABLE temporal_users(`Date` DATETIME, User_Id INT, has_original_tweet INT)
SELECT '2008-02-22 12:00:00' `Date`, 1 User_Id, 1 has_original_tweet UNION ALL
SELECT '2008-12-31 12:00:00', 2, 1 UNION ALL
SELECT '2010-12-31 12:00:00', 2, 1 UNION ALL
SELECT '2012-12-31 12:00:00', 2, 1 UNION ALL
SELECT '2008-12-31 12:00:00', 4, 9 UNION ALL
SELECT '2010-12-31 12:00:00', 4, 1 UNION ALL
SELECT '2010-12-31 12:00:00', 4, 2 UNION ALL
SELECT '2012-12-31 12:00:00', 4, 9
;
That's because NULL values will always be discarded from the where clause
You can use a coalesce in your where clause.
WHERE coalesce(a.Date, 'some-date-in-the-range') BETWEEN '2010-12-31-00:00:00' AND '2010-12-31-23:59:59'
With this instead, you force null values to be considered as valid.
+-----------+-----------+--------+
| punchtime | punchdate | emp_id |
+-----------+-----------+--------+
| 9:51:00 | 4/1/2016 | 2 |
| 12:59:00 | 4/1/2016 | 2 |
| 10:28:00 | 4/1/2016 | 5 |
| 14:13:00 | 4/1/2016 | 5 |
| 9:56:00 | 4/1/2016 | 10 |
| 15:31:00 | 4/1/2016 | 10 |
| 10:08:00 | 5/1/2016 | 2 |
| 18:09:00 | 5/1/2016 | 2 |
| 10:15:00 | 5/1/2016 | 5 |
| 18:32:00 | 5/1/2016 | 5 |
| 10:11:00 | 6/1/2016 | 2 |
| 18:11:00 | 6/1/2016 | 2 |
| 10:25:00 | 6/1/2016 | 5 |
| 18:28:00 | 6/1/2016 | 5 |
| 10:19:00 | 6/1/2016 | 10 |
| 18:26:00 | 6/1/2016 | 10 |
+-----------+-----------+--------+
I need to count where emp_id punchtime is less then that 4 hours and count ir for the whole. i am trying the below code but its not working.
SELECT
a.emp_id,
sum( case when TIMESTAMPDIFF(hour, min(a.punchtime),
max(a.punchtime))< 4 then 1 else 0 end ) as 'Half Day'
FROM machinedata a
GROUP BY
a.emp_id
I am getting a error #1111 - Invalid use of group function
Desired output -
+-----------+-----------+
| emp_id | Half Day |
+-----------+-----------+
|2 | 1 |
|8 | 0 |
|10 |0 |
+-----------+-----------+
Your data set and desired result do not accord, so I'm going to ignore it...
Instead consider the following...
Note both the way in which I have presented the problem, and the construction of the solution.
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(employee_id INT NOT NULL
,punchtime DATETIME NOT NULL
,PRIMARY KEY(employee_id,punchtime)
);
INSERT INTO my_table VALUES
( 2,'2016/01/04 09:51:00'),
( 2,'2016/01/04 12:59:00'),
( 5,'2016/01/04 10:28:00'),
( 5,'2016/01/04 14:13:00'),
(10,'2016/01/04 09:56:00'),
(10,'2016/01/04 15:31:00'),
( 2,'2016/01/05 10:08:00'),
( 2,'2016/01/05 18:09:00'),
( 5,'2016/01/05 10:15:00'),
( 5,'2016/01/05 18:32:00'),
( 2,'2016/01/06 10:11:00'),
( 2,'2016/01/06 18:11:00'),
( 5,'2016/01/06 10:25:00'),
( 5,'2016/01/06 18:28:00'),
(10,'2016/01/06 10:19:00'),
(10,'2016/01/06 18:26:00');
SELECT employee_id
, SUM(diff < 14400 ) half
FROM
( SELECT x.*
, DATE(x.punchtime) dt
, TIME_TO_SEC(MAX(y.punchtime)) - TIME_TO_SEC(MIN(x.punchtime)) diff
FROM my_table x
JOIN my_table y
ON y.employee_id = x.employee_id
AND DATE(y.punchtime) = DATE(x.punchtime)
GROUP
BY x.employee_id
, dt
) n
GROUP
BY employee_id;
+-------------+------+
| employee_id | half |
+-------------+------+
| 2 | 1 |
| 5 | 1 |
| 10 | 0 |
+-------------+------+