GROUP BY and SUM distinct date across 2 tables - mysql

I'm not sure if this is possible in one mysql query so I might just combine the results via php.
I have 2 tables: 'users' and 'billing'
I'm trying to group summed activity for every date that is available in these two tables. 'users' is not historical data but 'billing' contains a record for each transaction.
In this example I am showing a user's status which I'd like to sum for created date and deposit amounts that I would also like to sum by created date. I realize there is a bit of a disconnect between the data but I'd like to some all of it together and display it as seen below. This will show me an overview of all of the users by when they were created and what the current statuses are next to total transactions.
I've tried UNION as well as LEFT JOIN but I can't seem to get either to work.
Union example is pretty close but doesn't combine the dates into one row.
(
SELECT
created,
SUM(status) as totalActive,
NULL as totalDeposit
FROM users
GROUP BY created
)
UNION
(
SELECT
created,
NULL as totalActive,
SUM(transactionAmount) as totalDeposit
FROM billing
GROUP BY created
)
I've also tried using a date lookup table and joining on the dates but the SUM values are being added multiple times.
note: I don't care about the userIds at all but have it in here for the example.
users table
(where status of '1' denotes "active")
(one record for each user)
created | userId | status
2010-03-01 | 10 | 0
2010-03-01 | 11 | 1
2010-03-01 | 12 | 1
2010-03-10 | 13 | 0
2010-03-12 | 14 | 1
2010-03-12 | 15 | 1
2010-03-13 | 16 | 0
2010-03-15 | 17 | 1
billing table
(record created for every instance of a billing "transaction"
created | userId | transactionAmount
2010-03-01 | 10 | 50
2010-03-01 | 18 | 50
2010-03-01 | 19 | 100
2010-03-10 | 89 | 55
2010-03-15 | 16 | 50
2010-03-15 | 12 | 90
2010-03-22 | 99 | 150
desired result:
created | sumStatusActive | sumStatusInactive | sumTransactions
2010-03-01 | 2 | 1 | 200
2010-03-10 | 0 | 1 | 55
2010-03-12 | 2 | 0 | 0
2010-03-13 | 0 | 0 | 0
2010-03-15 | 1 | 0 | 140
2010-03-22 | 0 | 0 | 150
Table dump:
CREATE TABLE IF NOT EXISTS `users` (
`created` date NOT NULL,
`userId` int(11) NOT NULL,
`status` smallint(6) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `users` (`created`, `userId`, `status`) VALUES
('2010-03-01', 10, 0),
('2010-03-01', 11, 1),
('2010-03-01', 12, 1),
('2010-03-10', 13, 0),
('2010-03-12', 14, 1),
('2010-03-12', 15, 1),
('2010-03-13', 16, 0),
('2010-03-15', 17, 1);
CREATE TABLE IF NOT EXISTS `billing` (
`created` date NOT NULL,
`userId` int(11) NOT NULL,
`transactionAmount` int(11) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `billing` (`created`, `userId`, `transactionAmount`) VALUES
('2010-03-01', 10, 50),
('2010-03-01', 18, 50),
('2010-03-01', 19, 100),
('2010-03-10', 89, 55),
('2010-03-15', 16, 50),
('2010-03-15', 12, 90),
('2010-03-22', 99, 150);

Try this:
Select created, sum(status) as totalActive, sum(transactionAmount) as totalDeposit
From
( (
SELECT
created,
status,
0 as transactionAmount
FROM users
)
UNION
(
SELECT
created,
0 as status,
transactionAmount
FROM billing
) ) as x group by created

Ah. Thanks to p.g.I.hall I was able to modify the query and get my desired result:
Select
createdDate,
SUM(statusSum),
SUM(transactionAmountSum)
From
( (
SELECT
created as createdDate,
sum(status) as statusSum,
'0' as transactionAmountSum
FROM users
GROUP BY createdDate
)
UNION
(
SELECT
created as createdDate,
'0' as statusSum,
sum(transactionAmount) as transactionAmountSum
FROM billing
GROUP BY createdDate
) )
as x
group by createdDate

A word of warning - your users table does not have a unique key. I'm going to take a wild guess here and say that you should probably create a primary key with the userId column.
A table without primary keys means you have no protection against bad, duplicate data slipping into your tables! Aaaaaah!

Related

MySQL Order by show specific data at the top

I have a below data in my activity table. I want to show those records at the top whose followup_date is today onwards in ascending order, after that those records whose followup_date is past date in ascending order and after that those records whose followup_date is null.
DROP TABLE IF EXISTS `activity`;
CREATE TABLE IF NOT EXISTS `activity` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`type_id` int(11) NOT NULL,
`followup_date` date DEFAULT NULL,
`created` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=9 DEFAULT CHARSET=utf8;
--
-- Dumping data for table `activity`
--
INSERT INTO `activity` (`id`, `type_id`, `followup_date`, `created`) VALUES
(1, 1, '2022-03-22', '2022-03-24 18:51:23'),
(2, 1, '2022-03-23', '2022-03-24 18:51:23'),
(3, 1, '2022-03-24', '2022-03-24 18:51:58'),
(4, 1, '2022-03-25', '2022-03-24 18:51:58'),
(5, 1, '2022-03-26', '2022-03-24 18:52:21'),
(6, 1, '2022-03-13', '2022-03-24 18:52:21'),
(7, 1, NULL, '2022-03-24 18:54:15'),
(8, 1, NULL, '2022-03-24 18:54:15');
I tried using below query but could not understand how would i use ORDER BY CASE statement to get the result mentioned below.
SELECT * FROM `activity` ORDER BY CASE WHEN followup_date IS NULL THEN 2 WHEN followup_date >= '2022-03-24' THEN 1 END ASC
Current Output:
Expected Output
What changes i will need to make in above query to get Expected Output
I moved the expression into the select-list so we could see it in the result, but you may keep it in the ORDER BY clause:
SELECT CASE WHEN followup_date IS NULL THEN 2
WHEN followup_date < '2022-03-24' THEN 1
ELSE 0 END AS sort_bucket,
id, followup_date
FROM `activity`
ORDER BY sort_bucket ASC, followup_date ASC
Output:
+-------------+----+---------------+
| sort_bucket | id | followup_date |
+-------------+----+---------------+
| 0 | 3 | 2022-03-24 |
| 0 | 4 | 2022-03-25 |
| 0 | 5 | 2022-03-26 |
| 1 | 6 | 2022-03-13 |
| 1 | 1 | 2022-03-22 |
| 1 | 2 | 2022-03-23 |
| 2 | 7 | NULL |
| 2 | 8 | NULL |
+-------------+----+---------------+

Sort records on multiple columns and conditions

I have the below table, that stores the rank of person participating in respective events.
event_running and event_jumping are the events and the ranks stored.
CREATE TABLE `ranks` (
`id` int(11) NOT NULL,
`personid` int(11) NOT NULL,
`event_running` int(11) DEFAULT NULL,
`event_longjump` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Sample data
INSERT INTO `ranks` (`id`, `personid`, `event_running`, `event_longjump`) VALUES
(1, 1, 4, 8),
(2, 2, 10, 6),
(3, 3, 5, 0),
(4, 5, 20, 1),
(5, 4, 9, 3),
(6, 6, 1, 2);
SQL Fiddle Link
I want to build a leaderboard as below
| Standing | PersonID | RunningRank | JumpingRank |
| 1 | 6 | 1 | 2 |
| 2 | 4 | 9 | 3 |
| 3 | 1 | 4 | 8 |
| 4 | 3 | 5 | 0 |
| 5 | 2 | 10 | 6 |
This has to be sorted in ascending order - irrespective of the events lowest come first and also ranks above 20 are ignored.
And inputs on how can this be done?
you can use something similar to below
select PersonID,
RunningRank,
JumpingRank,
(RunningRank + JumpingRank) as Total
from ranks
order by Total asc
limit 20;
Here's your query.
set #row_number = 0;
select (#row_number:=#row_number + 1) as standing, personid, event_running, event_longjump from ranks
where event_running < 20 and event_longjump < 20
order by
case when if(event_longjump=0,event_running ,event_longjump) < event_running
then event_longjump else event_running end
see dbfiddle
Your sorting criteria is a bit vague. I am assuming that you want to sort on the basis of cumulative of the ranks across all events and its jumping score.
Also, please explain the position of person Id 3 in your queation.
You can do,
select PersonID,
RunningRank,
JumpingRank,
(JumpingRank + RunningRank) as cumulativeRank
from ranks
ORDER BY cumulativeRank, JumpingRank aesc
limit 20;
This will get you all the positions baring person id 3

How to structure SQL query to not return null for calculation

I'm trying to do build a query that will perform the following;
1) Query a row with a given 'acctuniqueid' that has the second largest value for 'acctoutputoctets', or if no matching row is found, return 0
2) Perform the following calculation 250+350-(return value of 'acctinputoctets'+ return value of 'acctoutputoctets' || "0")
Using "a25d16693309cdb4807effe00a9f076c" as the 'acctuniqueid' field.
Table Name: radacct
Example #1
+-----------+----------------------------------+-----------------+------------------+
| radacctid | acctuniqueid | acctinputoctets | acctoutputoctets |
+-----------+----------------------------------+-----------------+------------------+
| 5 | a25d16693309cdb4807effe00a9f076c | 150 | 250 |
| 8 | a25d16693309cdb4807effe00a9f076c | 250 | 350 |
+-----------+----------------------------------+-----------------+------------------+
Example #2
+-----------+----------------------------------+-----------------+------------------+
| radacctid | acctuniqueid | acctinputoctets | acctoutputoctets |
+-----------+----------------------------------+-----------------+------------------+
| 4 | a25d16693309cdb4807effe00a9f076c | 250 | 350 |
+-----------+----------------------------------+-----------------+------------------+
In Example #1: 250+350-(150+250) = 200
So the expected result is 200
In Example #2: 250+350-(0) = 600
So the expected result is 600
Query I've been tinkering with so far:
SELECT (SUM(250)+SUM(350)-((SUM(IFNULL(acctinputoctets,0)))+
(SUM(IFNULL(acctoutputoctets,0))))
)
FROM
( SELECT *,IFNULL(acctinputoctets,0),
IFNULL(acctoutputoctets,0)
FROM radacct
WHERE acctuniqueid = 'a25d16693309cdb4807effe00a9f076c'
ORDER BY acctoutputoctets DESC
LIMIT 1 , 1
) as meh
Which returns something for Example #1, but for Example #2 I get NULL as the result.
It should be noted that in the above query "250", "350" and "a25d16693309cdb4807effe00a9f076c" have been added manually for testing and readability purposes, but will be replaced with run-time variable output later.
I've tried various iterations, combinations and placements of IFNULL and COALESCE and have tried searching for similar problem/solution posts online - but haven't been able to find anything close enough to what I'm doing that I've had the "aha!" moment yet.
Given my (lack) of SQL experience, I'm guessing that either (a) It's something really simple that someone will spot immediately, and/or (b) I've gone about my query the completely wrong way and there is a a different and more correct way of structuring this query which is outside my current level of knowledge.
At 4AM this morning, and following hours of swearing, pleading and bargaining - I finally conceded defeat and so any assistance provided would be much appreciated.
Thanks in advance.
This seems to work for me:
Example #1
CREATE TABLE radacct (
radacctid int,
acctuniqueid nvarchar(50),
acctinputoctets int,
acctoutputoctets int
);
INSERT INTO radacct values
(5, 'a25d16693309cdb4807effe00a9f076c', 150, 250),
(8, 'a25d16693309cdb4807effe00a9f076c', 250, 350)
SELECT 250 + 350
- COALESCE((SELECT acctinputoctets FROM radacct ORDER BY acctoutputoctets DESC LIMIT 1, 1), 0)
- COALESCE((SELECT acctoutputoctets FROM radacct ORDER BY acctoutputoctets DESC LIMIT 1, 1), 0)
Example #2
CREATE TABLE radacct (
radacctid int,
acctuniqueid nvarchar(50),
acctinputoctets int,
acctoutputoctets int
);
INSERT INTO radacct values
(4, 'a25d16693309cdb4807effe00a9f076c', 250, 350);
SELECT 250 + 350
- COALESCE((SELECT acctinputoctets FROM radacct ORDER BY acctoutputoctets LIMIT 1, 1), 0)
- COALESCE((SELECT acctoutputoctets FROM radacct ORDER BY acctoutputoctets LIMIT 1, 1), 0)
However, if there are multiple rows tied for having the second largest value in acctoutputoctects, then the outcome will be quite random. For example, if in Example #1, both rows had 350 as the value in the acctoutputoctects column, then the result would depend on how the values were inserted since both rows fit the criteria, but have different values in the acctinputoctects column (which affects the answer). If you give some more information on how you want ties to be broken, I'd be happy to modify the code to accommodate it.
A major problem here is deciding in the case of draws which acctinputoctets to pick. This nasty looking code trys to pick the most recent highest and second highest values based on radacctid before left joining to get the final result
drop table if exists t;
create table t( radacctid int, acctuniqueid varchar(40), acctinputoctets int, acctoutputoctets int);
insert into t values
( 5 , 'a25d16693309cdb4807effe00a9f076c' , 150 , 250),
( 8 , 'a25d16693309cdb4807effe00a9f076c' , 250 , 350),
( 9 , 'a25d16693309cdb4807effe00a9f076c' , 200 , 250),
( 4 , 'b25d16693309cdb4807effe00a9f076c' , 10 , 10),
( 6 , 'b25d16693309cdb4807effe00a9f076c' , 250 , 350),
( 7 , 'c25d16693309cdb4807effe00a9f076c' , 20 , 30);
select mm.acctuniqueid,mm.maxid,mm.maxin,mm.maxout ,
sm.acctuniqueid,sm.secondmaxid,sm.secondmaxin,sm.secondmaxout,
(ifnull(mm.maxin,0) + ifnull(mm.maxout,0)) -
(ifnull(sm.secondmaxin,0) + ifnull(sm.secondmaxout,0)) as Total
from
(
select t.acctuniqueid,t.radacctid maxid,t.acctinputoctets maxin,t.acctoutputoctets maxout
from t
join
(
select t.acctuniqueid,s.maxout, max(radacctid) maxid #adjust maxid as required
from
(
select acctuniqueid, max(acctoutputoctets) maxout
from t
where acctoutputoctets = (select max(acctoutputoctets) from t t1 where t1.acctuniqueid = t.acctuniqueid)
group by acctuniqueid
) s
join t on t.acctuniqueid = s.acctuniqueid and t.acctoutputoctets = s.maxout
group by t.acctuniqueid,s.maxout
) s
on s.acctuniqueid = t.acctuniqueid and s.maxid = t.radacctid
) mm
left join
(
select t.acctuniqueid,t.radacctid secondmaxid,t.acctinputoctets secondmaxin,t.acctoutputoctets secondmaxout
from t
join
(
select t.acctuniqueid,s.secondmaxout, max(radacctid) secondmaxid #adjust secondmaxid as required
from
(
select acctuniqueid, max(acctoutputoctets) secondmaxout
from t
where acctoutputoctets < (select max(acctoutputoctets) from t t1 where t1.acctuniqueid = t.acctuniqueid)
group by acctuniqueid
) s
join t on t.acctuniqueid = s.acctuniqueid and t.acctoutputoctets = secondmaxout
group by t.acctuniqueid,s.secondmaxout
) s
on s.acctuniqueid = t.acctuniqueid and s.secondmaxid = t.radacctid
) sm
on mm.acctuniqueid = sm.acctuniqueid
+----------------------------------+-------+-------+--------+----------------------------------+-------------+-------------+--------------+-------+
| acctuniqueid | maxid | maxin | maxout | acctuniqueid | secondmaxid | secondmaxin | secondmaxout | Total |
+----------------------------------+-------+-------+--------+----------------------------------+-------------+-------------+--------------+-------+
| a25d16693309cdb4807effe00a9f076c | 8 | 250 | 350 | a25d16693309cdb4807effe00a9f076c | 9 | 200 | 250 | 150 |
| b25d16693309cdb4807effe00a9f076c | 6 | 250 | 350 | b25d16693309cdb4807effe00a9f076c | 4 | 10 | 10 | 580 |
| c25d16693309cdb4807effe00a9f076c | 7 | 20 | 30 | NULL | NULL | NULL | NULL | 50 |
+----------------------------------+-------+-------+--------+----------------------------------+-------------+-------------+--------------+-------+
3 rows in set (0.01 sec)
Possibly could be simplified if you simply needed the max (or min) in value to go with the max out value in the event of a draw.

Counting records with related records which appear first in a given date

I have two tables, players and games, created as follows:
CREATE TABLE IF NOT EXISTS `players` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`created_at` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
CREATE TABLE IF NOT EXISTS `games` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`player` int(11) NOT NULL,
`played_at` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
I wish to extract 3 values for each day:
The number of players created at that day
The number of players played at that day
The number of players having played for the first time at that day
So, suppose for example that the players table looks as follows:
+----+--------+---------------------+
| id | name | created_at |
+----+--------+---------------------+
| 1 | Alan | 2016-02-01 00:00:00 |
| 2 | Benny | 2016-02-01 06:00:00 |
| 3 | Calvin | 2016-02-02 00:00:00 |
| 4 | Dan | 2016-02-03 00:00:00 |
+----+--------+---------------------+
And the games table looks as follows:
+----+--------+---------------------+
| id | player | played_at |
+----+--------+---------------------+
| 1 | 1 | 2016-02-01 01:00:00 |
| 2 | 3 | 2016-02-02 02:00:00 |
| 3 | 2 | 2016-02-03 14:00:00 |
| 4 | 3 | 2016-02-03 17:00:00 |
| 5 | 3 | 2016-02-03 18:00:00 |
+----+--------+---------------------+
Then the query should return something like
+------------+-----+--------+-------+
| day | new | played | first |
+------------+-----+--------+-------+
| 2016-02-01 | 2 | 1 | 1 |
| 2016-02-02 | 1 | 1 | 1 |
| 2016-02-03 | 1 | 2 | 1 |
+------------+-----+--------+-------+
I have a solution for 1 (new):
SELECT Date(created_at) AS day,
Count(*) AS new
FROM players
GROUP BY day;
That's easy. I think I also have a solution for 2 (played), thanks to MySQL COUNT DISTINCT:
select Date(played_at) AS day,
Count(Distinct player) AS played
FROM games
GROUP BY day;
But I have no idea how to get the needed result for 3 (first). I also don't know how to put everything in a single query, to save execution time (the games table may include millions of records).
In case you need it, here's a query which inserts the example data:
INSERT INTO `players` (`id`, `name`, `created_at`) VALUES
(1, 'Alan', '2016-02-01 00:00:00'),
(2, 'Benny', '2016-02-01 06:00:00'),
(3, 'Calvin', '2016-02-02 00:00:00'),
(4, 'Dan', '2016-02-03 00:00:00');
INSERT INTO `games` (`id`, `player`, `played_at`) VALUES
(1, 1, '2016-02-01 01:00:00'),
(2, 3, '2016-02-02 02:00:00'),
(3, 2, '2016-02-03 14:00:00'),
(4, 3, '2016-02-03 17:00:00'),
(5, 3, '2016-02-03 18:00:00');
One version is to get all relevant data into a union and do the analysis from there;
SELECT SUM(type='P') new,
COUNT(DISTINCT CASE WHEN type='G' THEN pid END) played,
SUM(type='F') first
FROM (
SELECT id pid, DATE(created_at) date, 'P' type FROM players
UNION ALL
SELECT player, DATE(played_at) date, 'G' FROM games
UNION ALL
SELECT player, MIN(DATE(played_at)), 'F' FROM games GROUP BY player
) z
GROUP BY date;
In the union;
Records with type P is player creation statistics.
Records with type G is player related game statistics.
Records with type F is statistics for when players played their first game.
You can count the result of a temp table based on min(played_at) and filterd by having
select count(player) from
( select player, min(played_at)
from games
group by player
having min(played_at) = YOUR_GIVEN_DATE ) as t;
this query will give you the result:
select day,( select count(distinct(id)) from players where Date(created_at) = temp.day ) as no_created_at ,
( select count(distinct(player)) from games where Date(played_at) = temp.day) as no_played_at,
( select count(distinct(player)) from games where Date(played_at) =
(select min(Date(played_at)) from games internal_games
where internal_games.player =games.player and Date(games.played_at) = temp.day )) as no_first_played_at
from (
SELECT Date(created_at) AS day
FROM players
GROUP BY day
union
select Date(played_at) AS day
FROM games
GROUP BY day) temp
and the output:
Here's a solution with a bunch of subqueries, which accounts for the possibility that players may have been created on days with no games, and vice versa:
select
all_dates.date as day,
ifnull(new.num, 0) as new,
ifnull(players.num, 0) as players,
ifnull(first.num, 0) as first
from (
select date(created_at) as date from players
union
select date(played_at) from games
) as all_dates
left join (
select date(created_at) as created_at_date, count(*) as num
from players
group by created_at_date
) as new on all_dates.date = new.created_at_date
left join (
select date(played_at) as played_at_date, count(distinct player) as num
from games
group by played_at_date
) as players on all_dates.date = players.played_at_date
left join (
select min_date, count(*) num
from (
select player, date(min(played_at)) as min_date
from games
group by player
) as players_first
group by min_date
) as first on all_dates.date = first.min_date
order by day;

MySQL GROUP BY with sorting

I'm having some trouble writing succinct code to generate the desired result efficiently (on a multiple million records DB).
items will be grouped by time
items will be selected by provider being that B takes precedence over A (and C over B)
value must match value of selected provider
Table vs wanted result:
// given this table
id | provider | time | value
---+----------+------------+-----------
1 | A | 2013-07-01 | 0.1
2 | A | 2013-07-02 | 0.2
3 | B | 2013-07-02 | 0.3
4 | A | 2013-07-03 | 0.4
// extrapolate this result
---+----------+------------+-----------
1 | A | 2013-07-01 | 0.1
3 | B | 2013-07-02 | 0.3
4 | A | 2013-07-03 | 0.4
The queries to generate table and populate data:
data_teste CREATE TABLE `data_teste` (`id` int(11) unsigned NOT NULL AUTO_INCREMENT,`provider` varchar(12) NOT NULL,`time` date NOT NULL,`value` double NOT NULL,PRIMARY KEY (`id`),UNIQUE KEY `index` (`provider`,`time`),KEY `provider` (`provider`),KEY `time` (`time`)) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO data_teste(`provider`, `time`, `value`) VALUES('A', '2013-07-01', 0.1),('A', '2013-07-02', 0.2),('B', '2013-07-02', 0.3),('A', '2013-07-03', 0.4);
This is the classic group_by/sort problem windowed.
Thank you very much.
select d.*
from data_teste d
inner join
(
select `time`, max(provider) mp
from data_teste
group by `time`
) x on x.mp = d.provider
and x.`time` = d.`time`
order by `time` asc,
provider desc
How well does this perform?
SELECT
*
FROM
`data_teste` dt1
LEFT JOIN `data_teste` dt2 ON ( dt2.time = dt1.time
AND dt2.provider > dt1.provider )
WHERE
dt2.ID IS NULL