I have a table like this:
ID_____StartDate_____EndDate
----------------------------
1______05/01/2012___02/03/2013
2______06/30/2013___07/12/2013
3______02/17/2010___02/17/2013
4______12/10/2012___11/16/2013
I'm trying to get a count of the ID's that were active during each year. If the ID was active for multiple years, it would be counted multiple times. I don't want to "hardcode" years into my query because the data is over many many multiple years. (i.e. can't use CASE YEAR(StartDate) WHEN x then y or IF...
Desired Result from the table above:
YEAR_____COUNT
2010_____1
2011_____1
2012_____3
2013_____4
I've tried:
SELECT COUNT(ID)
FROM table
WHERE (DATE_FORMAT(StartDate, '%Y-%m') BETWEEN '2013-01' AND '2013-12'
OR DATE_FORMAT(EndDate, '%Y-%m') BETWEEN '2013-01' AND '2013-12')
of course this only is for the year 2013. I also tried:
SELECT YEAR(StartDate) AS 'Start Year', YEAR(EndDate) AS 'End Year', COUNT(id)
FROM table
WHERE StartDate IS NOT NULL
GROUP BY YEAR(StartDate);
though this gave me just those that started in a given year.
Assuming that there is an auxiliary table that contains consecutive numbers from 1 .. to X (where X must be grather than possible number of years in the table):
create table series( x int primary key auto_increment );
insert into series( x )
select null from information_schema.tables;
then the query might look like:
SELECT years.year, count(*)
FROM (
SELECT mm.min_year + s.x - 1 as year
FROM (
SELECT min( year( start_date )) min_year,
max( year( end_date )) max_year
FROM tab
) mm
JOIN series s
ON s.x <= mm.max_year - mm.min_year + 1
GROUP BY mm.min_year + s.x - 1
) years
JOIN tab
ON years.year between year( tab.start_date )
and year( tab.end_date )
GROUP BY years.year
;
see a demo: http://www.sqlfiddle.com/#!2/f49ab/14
Related
I have a table that contains an orderId, a timestamp and a customerId, like this:
DROP TABLE IF EXISTS testdata;
CREATE TABLE testdata (
`orderId` int,
`createdOn` datetime(6),
`customerId` int,
PRIMARY KEY (`orderId`)
);
INSERT INTO testdata (orderId, createdOn, customerId) VALUES
('1000001','2020-01-01 17:08:41.460000','101'),
('1000002','2020-01-02 18:01:00.180000','102'),
('1000003','2020-01-03 12:26:02.460000','103'),
('1000004','2020-01-04 13:32:42.610000','104'),
('1000005','2020-01-05 20:21:28.540000','101'),
('1000006','2020-01-06 11:54:20.530000','102'),
('1000007','2020-02-01 20:54:42.470000','102'),
('1000008','2020-02-02 10:21:29.470000','102'),
('1000009','2020-02-03 16:22:23.880000','102'),
('1000010','2020-02-04 16:22:23.880000','103'),
('1000011','2020-02-05 17:08:41.460000','103'),
('1000012','2020-02-06 18:01:00.180000','103'),
('1000013','2020-03-01 12:26:02.460000','102'),
('1000014','2020-03-02 13:32:42.610000','102'),
('1000015','2020-03-03 20:21:28.540000','103'),
('1000016','2020-03-04 11:54:20.530000','103'),
('1000017','2020-03-05 20:54:42.470000','104'),
('1000018','2020-03-06 10:21:29.470000','104'),
('1000019','2020-04-01 16:22:23.880000','103'),
('1000020','2020-04-02 16:22:23.880000','103'),
('1000021','2020-04-03 17:08:41.460000','103'),
('1000022','2020-04-04 18:01:00.180000','104'),
('1000023','2020-04-05 12:26:02.460000','104'),
('1000024','2020-04-06 13:32:42.610000','104'),
('1000025','2020-05-01 20:21:28.540000','103'),
('1000026','2020-05-02 11:54:20.530000','103'),
('1000027','2020-05-03 20:54:42.470000','104'),
('1000028','2020-05-04 10:21:29.470000','104'),
('1000029','2020-05-05 16:22:23.880000','105'),
('1000030','2020-05-06 16:22:23.880000','105'),
('1000031','2020-05-01 20:21:28.540000','104'),
('1000032','2020-05-02 11:54:20.530000','104'),
('1000033','2020-05-03 20:54:42.470000','104'),
('1000034','2020-05-04 10:21:29.470000','105'),
('1000035','2020-05-05 16:22:23.880000','105'),
('1000036','2020-05-06 16:22:23.880000','105')
;
Now I want to calculate for each month the number of customers that have been active (i.e., have an order) within the last 3 months (i.e., current month or the preceding two months).
I manage to calculate the active users for the current month, like this:
SELECT
EXTRACT(YEAR_MONTH FROM createdOn) AS order_createdOn_ym
,COUNT(DISTINCT customerId) AS mau
FROM testdata
GROUP BY order_createdOn_ym
ORDER BY order_createdOn_ym asc
;
(Fiddle over here.)
However, I'm completely stumped as to how you can approach calculating the 3-months-active users.
Any help is greatly appreciated!
Here is one option:
select c.createdmonth, count(distinct customerid) as mau
from (
select distinct date_format(createdon, '%Y-%m-01') as createdmonth
from testdata
) c
left join testdata t
on t.createdon >= c.createdmonth - interval 2 month
and t.createdon < c.createdmonth + interval 1 month
group by c.createdmonth
The idea is to enumerate the distinct months, then bring the table with a left join that recovers the last 2 month and the current month. You can then aggregate and count the number of distinct customers per group.
Thanks to #GMB for providing the solution. Purely as a matter of taste I prefer to have the month interval the following way though:
SELECT date_format(c.end_of_createdOn_month, '%Y-%m') as order_month,
count(distinct customerid) as mau_3m
FROM (
select distinct LAST_DAY(createdOn) as end_of_createdOn_month
from testdata
) c
LEFT JOIN testdata t
ON t.createdon >= (c.end_of_createdOn_month - interval 3 month)
AND t.createdon <= c.end_of_createdOn_month
GROUP BY c.end_of_createdOn_month;
Click link below to see image.
I need to get the time diff highlighted in red and the time diff in blue. Then add the time together to get the total.
Below is the query that will show all the records.
The first 2 records i will need to get the time difference, which will be around 3 days 4 hours and the last 2 records should only be 2 mins. So in total it should be AROUND 3 days 4 hours and 2 mins.
query image
select so
, createDate
, o
, n
from userTrans
where ( n = 10 OR o = 10 ) and so = 'g220'
Below will show that i grouped the records to get the total time. This is not what i want because it is getting the min and max time diff. The result turns out to be 32 hours.
select so
, min(createDate) minDate
, max(createDate) maxDate
, TIMESTAMPDIFF(MINUTE, min(createDate), max(createDate)) diff
, CONCAT(
FLOOR(HOUR(TIMEDIFF(min(createDate), max(createDate))) / 24), ' days ',
MOD(HOUR(TIMEDIFF(min(createDate), max(createDate))), 24), ' hours ',
MINUTE(TIMEDIFF(min(createDate), max(createDate))), ' minutes') diff1
, count(*) hits
from userTrans
where ( n = 10 OR o = 10 ) and so = 'g220'
group by so
order by TIMESTAMPDIFF(MINUTE, min(createDate), max(createDate)) DESC
Will this be possible to achieve? I hope I was clear.
Thank you
You can use a correlated subquery to align the 2 date/time references for a calculation.
SELECT
*
, TIME_TO_SEC(TIMEDIFF(createDate,nextcreatedate))
FROM (
SELECT
so
, createDate
, (
SELECT
createDate
FROM table1 AS t2
WHERE t2.so = t1.so
AND t2.createDate > t1.createDate
ORDER BY
CreaDate
LIMIT 1
)
AS nextcreatedate
, n
, test
FROM table1 AS t1
WHERE o = 0
) d
In other dbms systems using lead() might provide a more convenient of getting the next needed createdate value.
i have used the query function in SlamData.
My code:
SELECT
DATE_PART("year",thedate) AS year,DATE_PART("month",thedate) AS month,
SUM(runningPnL) AS PnL
FROM "/Mickey/testdb/sampledata3" AS c
GROUP BY DATE_PART("year", thedate) ,DATE_PART("month", thedate)
order by DATE_PART("year", thedate) ,DATE_PART("month", thedate)
The extract of my table:
PnL month year
-1651.8752 1 2001
17180.4776 2 2001
48207.54560000001 3 2001
Now, how can i find the cumulative sum of the PnL?
eg.-1651.8752 for the first month
15528.6024 for the second month
Thank you very much >.<
I am generating sample data same as you for cumulative sum. Hope from this you get some idea.
Create table tempData
(
pnl float,
[month] int,
[year] int
)
Go
insert into tempData values ( -1651.8752, 1,2001)
insert into tempData values ( 17180.4776, 2,2001)
insert into tempData values ( 48207.54560000001, 3,2001)
Select * , (SELECT SUM(Alias.pnl)
FROM tempData As Alias
WHERE Alias.[Month] <= tempData.[Month]
) As CumulativSUM
FROm tempData
ORDER BY tempData.[MOnth]
done
my code is
SELECT a1.year, a1.month, a1.PnL, a1.PnL/(SUM(a2.PnL)+125000) as Running_Total
FROM/Mickey/testdb/sampledata6as a1,/Mickey/testdb/sampledata6as a2
WHERE (a1.month > a2.month And a1.year=a2.year) or (a1.year>a2.year)
GROUP BY a1.year, a1.month,a1.PnL
ORDER BY a1.year,a1.month ASC;
I have a single table with rows like this: (Date, Score, Name)
The Date field has two possible dates, and it's possible that a Name value will appear under only one date (if that name was recently added or removed).
I'm looking to get a table with rows like this: (Delta, Name), where delta is the score change for each name between the earlier and later dates. In addition, only a negative change interests me, so if Delta>=0, it shouldn't appear in the output table at all.
My main challenge for me is calculating the Delta field.
As stated in the title, it should be an SQL query.
Thanks in advance for any help!
I assumed that each name can have it's own start/end dates. It can be simplified significantly if there are only two possible dates for the entire table.
I tried this out in SQL Fiddle here
SELECT (score_end - score_start) delta, name_start
FROM
( SELECT date date_start, score score_start, name name_start
FROM t t
WHERE NOT EXISTS
( SELECT 1
FROM t x
WHERE x.date < t.date
AND x.name = t.name
)
) AS start_date_t
JOIN
( SELECT date date_end, score score_end, name name_end
FROM t t
WHERE NOT EXISTS
( SELECT 1
FROM t x
WHERE x.date > t.date
AND x.name = t.name
)
) end_date_t ON start_date_t.name_start = end_date_t.name_end
WHERE score_end-score_start < 0
lets say you have a table with date_value, sum_value
Then it should be something like that:
select t.date_value,sum_value,
sum_value - COALESCE((
select top 1 sum_value
from tmp_num
where date_value > t.date_value
order by date_value
),0) as sum_change
from tmp_num as t
order by t.date_value
The following uses a "trick" in MySQL that I don't really like using, because it turns the score into a string and then back into a number. But, it is an easy way to get what you want:
select t.name, (lastscore - firstscore) as diff
from (select t.name,
substring_index(group_concat(score order by date asc), ',', 1) as firstscore,
substring_index(group_concat(score order by date desc), ',', 1) as lastscore
from table t
group by t.name
) t
where lastscore - firstscore < 0;
If MySQL supported window functions, such tricks wouldn't be necessary.
I have a query that works correctly to pull a series of targets and total hours worked for company A. I would like to run the exact same query for company B and join them on a common date, which happens to be grouped by week. My current query:
SELECT * FROM (
SELECT org, date,
( SELECT SUM( target ) FROM target WHERE org = "companyA" ) AS companyA_target,
SUM( hours ) AS companyA_actual
FROM time_management_system
WHERE org = "companyA"
GROUP BY WEEK( date )
ORDER BY DATE
) q1
LEFT JOIN (
SELECT org, date,
( SELECT SUM( target ) FROM target WHERE org = "companyB" ) AS companyB_target,
SUM( hours ) AS companyB_actual
FROM time_management_system
WHERE org = "companyB"
GROUP BY WEEK( date )
ORDER BY DATE
) q2
ON q1.date = q2.date
The results show all of the dates / information of companyA, however companyB only shows sporadic data. Separately, the two queries will show the exact same set of dates, just with different information in the 'target' and 'actual' columns.
companyA 2012-01-28 105.00 39.00 NULL NULL NULL NULL
companyA 2012-02-05 105.00 15.00 NULL NULL NULL NULL
companyA 2012-02-13 105.00 60.50 companyB 2012-02-13 97.50 117.50
Any idea why I'm not getting all the information for companyB?
As a side note, would anybody be able to point in the direction of converting each row's week value into a column? With companyA and companyB as the only two rows?
I appreciate all the help! Thanks.
WITH no date apparent in the target table, the summation will be constant across all weeks. So, I have performed a pre-query for only those "org" values of company A and B with a group by. This will ensure only 1 record per "org" so you don't get a Cartesian result.
Then, I am querying the time_management_system ONCE for BOTH companies. Within the field computations, I am applying an IF() to test the company value and apply when correct. The WEEK activity is the same for both in the final result, so I don't have to do separately and join. This also prevents the need of having the date column appear twice. I also don't need to explicitly add the org column names as the final column names reflect that.
SELECT
WEEK( tms.date ) as GrpWeek,
IF( tms.org = "companyA", TargetSum.CompTarget, 00000.00 )) as CompanyATarget,
SUM( IF( tms.org = "companyA", tms.hours, 0000.00 )) as CompanyAHours,
IF( tms.org = "companyB", TargetSum.CompTarget, 00000.00 )) as CompanyBTarget,
SUM( IF( tms.org = "companyB", tms.hours, 000.00 )) as CompanyBHours
from
Time_Management_System tms
JOIN ( select
t.org,
SUM( t.target ) as CompTarget
from
Target T
where
t.org in ( "companyA", "companyB" )
group by
t.org ) as TargetSums
ON tms.org = TargetSums.org
where
tms.org in ( "companyA", "companyB" )
group by
WEEK( tms.date )
order by
WEEK( tms.date )
Both of your subqueries are wrong.
Either you want this:
SELECT
org,
WEEK(date),
( SELECT SUM( target ) FROM target WHERE org = "companyB" ) AS companyB_target,
SUM( hours ) AS companyB_actual
FROM time_management_system
WHERE org = "companyB"
GROUP BY WEEK( date )
Or else you want this:
SELECT
org,
date,
( SELECT SUM( target ) FROM target WHERE org = "companyB" ) AS companyB_target,
SUM( hours ) AS companyB_actual
FROM time_management_system
WHERE org = "companyB"
GROUP BY date
The way you are doing it now is not correctly formed SQL. In pretty much any other database your query would fail immediately with an error. MySQL is more lax and runs the query but gives indeterminate results.
GROUP BY and HAVING with Hidden Columns