How to give 'where' condition in select query while selecting the columns? - mysql

I want to give condition in a column selection while performing the select statement.
I want to perform average of TOTAL_TIMEONSITE, RENAME IT, and want to average it for the values existing in the month of Jun'20, Jul'20 and Aug'20 against a visitor.
Also the range of the whole query must be the month of Aug'20 only. So I want to put the constraint on TOTAL_TIMEONSITE so that it averages the values for the months of Jun'20, Jul'20 and Aug'20 against a visitor.
select FULLVISITORID AS VISITOR_ID,
VISITID AS VISIT_ID,
VISITSTARTTIME_TS,
USER_ACCOUNT_TYPE,
(select AVG(TOTAL_TIMEONSITE) AS AVG_TOTAL_TIME_ON_SITE_LAST_3M FROM "ACRO_DEV"."GA"."GA_MAIN" WHERE
(cast((visitstarttime_ts) as DATE) >= to_date('2020-06-01 00:00:00.000') and CAST((visitstarttime_ts) AS DATE) <= to_date('2020-08-31 23:59:00.000'))
GROUP BY TOTAL_TIMEONSITE),
CHANNELGROUPING,
GEONETWORK_CONTINENT
from "ACRO_DEV"."GA"."GA_MAIN"
where (FULLVISITORID) in (select distinct (FULLVISITORID) from "ACRO_DEV"."GA"."GA_MAIN" where user_account_type in ('anonymous', 'registered')
and (cast((visitstarttime_ts) as DATE) >= to_date('2020-08-01 00:00:00.000') and CAST((visitstarttime_ts) AS DATE) <= to_date('2020-08-31 23:59:00.000')));
The issue is that it is giving me the 'select subquery for TOTAL_TIMEONSITE' as the resultant column name and the values in that column are all same but I want the values to be unique for visitors.

So for Snowflake:
So I am going to assume visitstarttime_ts is a timestamp thus
cast((visitstarttime_ts) as DATE) is the same as `visitstarttime_ts::date'
select to_timestamp('2020-08-31 23:59:00') as ts
,cast((ts) as DATE) as date_a
,ts::date as date_b;
gives:
TS
DATE_A
DATE_B
2020-08-31 23:59:00.000
2020-08-31
2020-08-31
and thus the date range also can be simpler
select to_timestamp('2020-08-31 13:59:00') as ts
,cast((ts) as DATE) as date_a
,ts::date as date_b
,date_a >= to_date('2020-08-01 00:00:00.000') and date_a <= to_date('2020-08-31 23:59:00.000') as comp_a
,date_b >= to_date('2020-08-01 00:00:00.000') and date_b <= to_date('2020-08-31 23:59:00.000') as comp_b
,date_b >= '2020-08-01'::date and date_a <= '2020-08-31 23:59:00.000'::date as comp_c
,date_b between '2020-08-01'::date and '2020-08-31 23:59:00.000'::date as comp_d
TS
DATE_A
DATE_B
COMP_A
COMP_B
COMP_C
COMP_D
2020-08-31 13:59:00.000
2020-08-31
2020-08-31
TRUE
TRUE
TRUE
TRUE
Anyways, if I understand what you want I would write it like using CTE to make it more readable (to me):
with distinct_aug_ids as (
SELECT DISTINCT
fullvisitorid
FROM acro_dev.ga.ga_main
WHERE user_account_type IN ('anonymous', 'registered')
AND visitstarttime_ts::date BETWEEN '2020-08-01::date AND '2020-08-31'::date
), three_month_avg as (
SELECT
fullvisitorid
,AVG(total_timeonsite) AS avg_total_time_on_site_last_3m
FROM acro_dev.ga.ga_main
WHERE visitstarttime_ts::DATE BETWEEN to_date('2020-06-01 00:00:00.000') AND to_date('2020-08-31 23:59:00.000')
GROUP BY 1
)
select
m.fullvisitorid as visitor_id,
m.visitid as visit_id,
m.visitstarttime_ts,
m.user_account_type,
tma.avg_total_time_on_site_last_3m,
m.channelgrouping,
m.geonetwork_continent
FROM acro_dev.ga.ga_main as m
JOIN distinct_aug_ids AS dai
ON m.fullvisitorid = dai.fullvisitorid
JOIN three_month_avg AS tma
ON m.fullvisitorid = tma.fullvisitorid
;
But if you want that to be sub-selects, they are the same:
select
m.fullvisitorid as visitor_id,
m.visitid as visit_id,
m.visitstarttime_ts,
m.user_account_type,
tma.avg_total_time_on_site_last_3m,
m.channelgrouping,
m.geonetwork_continent
FROM acro_dev.ga.ga_main as m
JOIN (
SELECT DISTINCT
fullvisitorid
FROM acro_dev.ga.ga_main
WHERE user_account_type IN ('anonymous', 'registered')
AND visitstarttime_ts::date BETWEEN '2020-08-01::date AND '2020-08-31'::date
) AS dai
ON m.fullvisitorid = dai.fullvisitorid
JOIN (
SELECT
fullvisitorid
,AVG(total_timeonsite) AS avg_total_time_on_site_last_3m
FROM acro_dev.ga.ga_main
WHERE visitstarttime_ts::DATE BETWEEN to_date('2020-06-01 00:00:00.000') AND to_date('2020-08-31 23:59:00.000')
GROUP BY 1
)AS tma
ON m.fullvisitorid = tma.fullvisitorid
;

Related

MySql - Selecting MAX & MIN and returning the corresponding rows

I trying to get the last 6 months of the min and max of prices in my table and display them as a group by months. My query is not returning the corresponding rows values, such as the date time for when the max price was or min..
I want to select the min & max prices and the date time they both occurred and the rest of the data for that row...
(the reason why i have concat for report_term, as i need to print this with the dataset when displaying results. e.g. February 2018 -> ...., January 2018 -> ...)
SELECT metal_price_id, CONCAT(MONTHNAME(metal_price_datetime), ' ', YEAR(metal_price_datetime)) AS report_term, max(metal_price) as highest_gold_price, metal_price_datetime FROM metal_prices_v2
WHERE metal_id = 1
AND DATEDIFF(NOW(), metal_price_datetime) BETWEEN 0 AND 180
GROUP BY report_term
ORDER BY metal_price_datetime DESC
I have made an example, extract from my DB:
http://sqlfiddle.com/#!9/617bcb2/4/0
My desired result would be to see the min and max prices grouped by month, date of min, date of max.. and all in the last 6 months.
thanks
UPDATE.
The below code works, but it returns back rows from beyond the 180 days specified. I have just checked, and it is because it joining by the price which may be duplicated a number of times during the years.... see: http://sqlfiddle.com/#!9/5f501b/1
You could use twice inner join on the subselect for min and max
select a.metal_price_datetime
, t1.highest_gold_price
, t1.report_term
, t2.lowest_gold_price
,t2.metal_price_datetime
from metal_prices_v2 a
inner join (
SELECT CONCAT(MONTHNAME(metal_price_datetime), ' ', YEAR(metal_price_datetime)) AS report_term
, max(metal_price) as highest_gold_price
from metal_prices_v2
WHERE metal_id = 1
AND DATEDIFF(NOW(), metal_price_datetime) BETWEEN 0 AND 180
GROUP BY report_term
) t1 on t1.highest_gold_price = a.metal_price
inner join (
select a.metal_price_datetime
, t.lowest_gold_price
, t.report_term
from metal_prices_v2 a
inner join (
SELECT CONCAT(MONTHNAME(metal_price_datetime), ' ', YEAR(metal_price_datetime)) AS report_term
, min(metal_price) as lowest_gold_price
from metal_prices_v2
WHERE metal_id = 1
AND DATEDIFF(NOW(), metal_price_datetime) BETWEEN 0 AND 180
GROUP BY report_term
) t on t.lowest_gold_price = a.metal_price
) t2 on t2.report_term = t1.report_term
simplified version of what you should do so you can learn the working process.
You need calculate the min() max() of the periods you need. That is your first brick on this building.
you have tableA, you calculate min() lets call it R1
SELECT group_field, min() as min_value
FROM TableA
GROUP BY group_field
same for max() call it R2
SELECT group_field, max() as max_value
FROM TableA
GROUP BY group_field
Now you need to bring all the data from original fields so you join each result with your original table
We call those T1 and T2:
SELECT tableA.group_field, tableA.value, tableA.date
FROM tableA
JOIN ( ... .. ) as R1
ON tableA.group_field = R1.group_field
AND tableA.value = R1.min_value
SELECT tableA.group_field, tableA.value, tableA.date
FROM tableA
JOIN ( ... .. ) as R2
ON tableA.group_field = R2.group_field
AND tableA.value = R2.max_value
Now we join T1 and T2.
SELECT *
FROM ( .... ) as T1
JOIN ( .... ) as T2
ON t1.group_field = t2.group_field
So the idea is if you can do a brick, you do the next one. Then you also can add filters like last 6 months or something else you need.
In this case the group_field is the CONCAT() value

MySQL Derived Table Issue

I'd like to be able to output the following fields from the query below:
AddedById,AddedByName,HoursWorked,CurrentYearlyFlexiAvailable
However where I have WHERE addedby=1, I'd like to replace this with the field name AddedById as I do not want to hard code this value as the overall query should and will return more than one person, I want to get this value from the rpt_timesheet_data view, which is there. The CurrentYearlyFlexiAvailable field should be telling me how much time they have left for the current year to date by doing the calculation between the SELECT SUM(ttl) and from the FROM (SELECT SUM(worked)-420 as ttl
SELECT AddedById,AddedByName,SUM(HoursWorked) AS HoursWorked
,(SELECT SUM(ttl) - (
SELECT SUM(worked)
FROM vwtimesheet
WHERE addedby=AddedById
AND entrydate BETWEEN '2017-01-01' AND '2017-04-13'
AND activityid=3192
GROUP BY addedby ) AS flexihours
FROM (
SELECT SUM(worked)-420 AS ttl
FROM vwtimesheet
WHERE addedby=1 <!--HERE IS THE ISSUE
AND entrydate BETWEEN '2017-01-01' AND '2017-04-13'
AND projectid<>113 AND activityid<>3192
GROUP BY entrydate
HAVING SUM(worked)>420
) AS s) AS CurrentYearlyFlexiAvailable
FROM rpt_timesheet_data
WHERE entrydate BETWEEN '2017-04-02' AND '2017-04-13 23:59:59'
AND ActivityId=3192
GROUP BY AddedById,AddedByName
ORDER BY AddedByName
but I keep getting:
Error Code: 1054. Unknown column 'AddedById' in 'where clause'
Just in that one location. I've tried various queries to sort this, but just cannot figure it out. Sorry not to good at explaining this, can see it in my head what I want to do...
Here is a query that does something very similar in that it returns the results for a single user, where as the one above is meant to loop through all users and give me the results:-
SELECT addedbyname, SUM(ttl) -
(SELECT SUM(worked)
FROM vwtimesheet
WHERE addedby=1
AND entrydate BETWEEN '2017-01-01' AND '2017-04-13'
AND activityid=3192
GROUP BY addedby ) AS CurrentYearlyFlexiAvailable
,(SELECT SUM(worked) FROM vwtimesheet
WHERE addedby=1
AND entrydate BETWEEN '2017-01-01' AND '2017-04-13'
AND activityid=3192
GROUP BY addedby ) AS flexiused
,(SELECT sum(worked) FROM vwtimesheet
WHERE addedby=1
AND entrydate BETWEEN DATE_FORMAT(NOW() ,'%Y-%m-01') AND curdate()
AND activityid=3192
GROUP BY addedby ) as fleximonthused
FROM ( SELECT entrydate,addedbyname,SUM(worked)-420 AS ttl FROM vwtimesheet
WHERE addedby=1
AND entrydate BETWEEN '2017-01-01' AND '2017-04-13'
AND projectid<>113
AND activityid<>3192
GROUP BY entrydate,addedbyname
HAVING SUM(worked)>420
) AS s
Please try the following...
SELECT AddedById,
AddedByName,
SUM( HoursWorked ) AS HoursWorked,
SUM( ttl ) - sumWorked AS CurrentYearlyFlexiAvailable
FROM ( SELECT AddedById AS AddedById,
AddedByName AS AddedByName
FROM rpt_timesheet_data
GROUP BY AddedById
) AS AddedByFinder
JOIN ( SELECT addedby AS addedby,
entrydate AS entrydate,
SUM( worked ) - 420 AS ttl
FROM vwtimesheet
WHERE entrydate BETWEEN '2017-01-01' AND '2017-04-13'
AND projectid <> 113
AND activityid <> 3192
GROUP BY addedby,
entrydate
HAVING SUM( worked ) > 420
) AS ttlFinder ON AddedByFinder.AddedById = ttlFinder.addedby
JOIN ( SELECT addedby AS addedby,
SUM( worked ) AS sumWorked
FROM vwtimesheet
WHERE entrydate BETWEEN '2017-01-01' AND '2017-04-13'
AND activityid = 3192
GROUP BY addedby
) sumWorkedFinder ON AddedByFinder.AddedById = sumWorkedFinder.addedby
WHERE entrydate BETWEEN '2017-04-02' AND '2017-04-13 23:59:59'
AND ActivityId = 3192
GROUP BY AddedById,
AddedByName
ORDER BY AddedByName;
(Explanation to follow...)
If you have any questions or comments, then please feel free to post a Comment accordingly.

tsql best way to merge records with start date and end date when there is no gap in between

I have a mapping table as follows :
FirstEntityID int
MappedTo int
BeginDate Date
EndDate Date
and lets say I have following records in the table :
FirstEntityID MappedTo BeginDate EndDate
1 2 2012-09-01 2012-10-01
2 3 2012-09-01 2012-10-01
1 2 2012-10-02 2012-11-24
2 3 2012-11-01 2012-11-24
I need a script which will get this table and merges records based on the Start and end date to return a result like :
FirstEntityID MappedTo BeginDate EndDate
1 2 2012-09-01 2012-11-24
2 3 2012-09-01 2012-10-01
2 3 2012-11-01 2012-11-24
Using CTEs, we find the starting dates first:
; WITH StartD AS
( SELECT
FirstEntityID
, MappedTo
, BeginDate
, ROW_NUMBER() OVER( PARTITION BY FirstEntityID, MappedTo
ORDER BY BeginDate )
AS Rn
FROM
tableX AS t
WHERE
NOT EXISTS
( SELECT *
FROM tableX AS p
WHERE p.FirstEntityID = t.FirstEntityID
AND p.MappedTo = t.MappedTo
AND p.BeginDate < t.BeginDate
AND t.BeginDate <= DATEADD(day, 1, p.EndDate)
)
)
then the ending dates:
, EndD AS
( SELECT
FirstEntityID
, MappedTo
, EndDate
, ROW_NUMBER() OVER( PARTITION BY FirstEntityID, MappedTo
ORDER BY EndDate )
AS Rn
FROM
tableX AS t
WHERE
NOT EXISTS
( SELECT *
FROM tableX AS p
WHERE p.FirstEntityID = t.FirstEntityID
AND p.MappedTo = t.MappedTo
AND DATEADD(day, -1, p.BeginDate) <= t.EndDate
AND t.EndDate < p.EndDate
)
)
and the final result:
SELECT
s.FirstEntityID
, s.MappedTo
, s.BeginDate
, e.EndDate
FROM
StartD AS s
JOIN
EndD AS e
ON e.FirstEntityID = s.FirstEntityID
AND e.MappedTo = s.MappedTo
AND e.Rn = s.Rn ;
Tested in SQL-Fiddle
Tested this and it seems to work
It will fail on an edge case with duplicate rows.
For that would need to go with a RowNumber approach like Ypercube.
Or add a constraint on the table to force the row to be unique.
-- first the overlaps
SELECT T1.FirstEntityId, T1.MappedTo, T1.BeginDate, Max(T2.EndDate) as [EndDate]
FROM tablex as T1
join tablex as T2
on T1.FirstEntityId = T2.FirstEntityId
and T1.MappedTo = T2.MappedTo
and T1.EndDate >= T2.BeginDate
and T1.EndDate < T2.EndDate
and T1.BeginDate <= T2.BeginDate
GROUP BY T1.FirstEntityId, T1.MappedTo, T1.BeginDate
union
-- add the non overlaps
SELECT T1.FirstEntityId, T1.MappedTo, T1.BeginDate, T1.EndDate
FROM tablex as T1
join tablex as T2
on T1.FirstEntityId = T2.FirstEntityId
and T1.MappedTo = T2.MappedTo
and ( T1.EndDate < T2.BeginDate or T1.BeginDate > T2.EndDate
or (T1.BeginDate < T2.BeginDate and T1.EndDate > T2.EndDate) )
order by FirstEntityId, MappedTo, BeginDate
This might work:
SELECT FirstEntityId, MappedTo, Min(BeginDate), Max(EndDate)
FROM
T1
GROUP BY
FirstEntityId, MappedTo

JIRA : Issue status count for the past x (i.e 30 ) days

With below Query I able to see the count(no) of issues for all issueType in JIRA for a given date .
ie.
SELECT count(*), STEP.STEP_ID
FROM (SELECT STEP_ID, ENTRY_ID
FROM OS_CURRENTSTEP
WHERE OS_CURRENTSTEP.START_DATE < '<your date>'
UNION SELECT STEP_ID, ENTRY_ID
FROM OS_HISTORYSTEP
WHERE OS_HISTORYSTEP.START_DATE < '<your date>'
AND OS_HISTORYSTEP.FINISH_DATE > '<your date>' ) As STEP,
(SELECT changeitem.OLDVALUE AS VAL, changegroup.ISSUEID AS ISSID
FROM changegroup, changeitem
WHERE changeitem.FIELD = 'Workflow'
AND changeitem.GROUPID = changegroup.ID
UNION SELECT jiraissue.WORKFLOW_ID AS VAL, jiraissue.id as ISSID
FROM jiraissue) As VALID,
jiraissue as JI
WHERE STEP.ENTRY_ID = VALID.VAL
AND VALID.ISSID = JI.id
AND JI.project = <proj_id>
Group By STEP.STEP_ID;
the result is
Status Count
open 12
closed 13
..... ....
What I'd like to achieve is something like this actually ..where the total count for status open and closed for each day .
Date COUNT(Open) COUNT(Closed)
12-1-2012 12 1
13-1-2012 14 5
The general strategy would be this:
Select from a table of all the days in a month
LEFT OUTER JOIN your table that gets counts for each day
(left outer join being necessary in case there were no entries for that day, you'd want it to show a zero value).
So I think this is roughly what you need (not complete and date-function syntax is probably wrong for your db, but it will get you closer):
SELECT aDate
, COALESCE(SUM(CASE WHEN IssueStatus = 'whateverMeansOpen' THEN 1 END,0)) OpenCount
, COALESCE(SUM(CASE WHEN IssueStatus = 'whateverMeansClosed' THEN 1 END,0)) ClosedCount
FROM
(
SELECT DATEADD(DAY, I, #START_DATE) aDate
FROM
(
SELECT number AS I FROM [SomeTableWithAtLeast31Rows]
where number between 1 and 31
) Numbers
WHERE DATEADD(DAY, I, #START_DATE) < #END_DATE
) DateTimesInInterval
LEFT OUTER JOIN
(
Put your query here. It needs to output two columns, DateTimeOfIssue and IssueStatus
) yourHugeQuery ON yourHugeQuery.DateTimeOfIssue BETWEEN aDate and DATEADD(DAY, 1, aDate)
GROUP BY aDate
ORDER BY aDate

mysql find date where no row exists for previous day

I need to select how many days since there is a break in my data. It's easier to show:
Table format:
id (autoincrement), user_id (int), start (datetime), end (datetime)
Example data (times left out as only need days):
1, 5, 2011-12-18, 2011-12-18
2, 5, 2011-12-17, 2011-12-17
3, 5, 2011-12-16, 2011-12-16
4, 5, 2011-12-13, 2011-12-13
As you can see there would be a break between 2011-12-13 and 2011-12-16. Now, I need to be able say:
Using the date 2011-12-18, how many days are there until a break:
2011-12-18: Lowest sequential date = 2011-12-16: Total consecutive days: 3
Probably: DATE_DIFF(2011-12-18, 2011-12-16)
So my problem is, how can I select that 2011-12-16 is the lowest sequential date? Remembering that data applies for particular user_id's.
It's kinda like the example here: http://www.artfulsoftware.com/infotree/queries.php#72 but in the reverse.
I'd like this done in SQL only, no php code
Thanks
SELECT qmin.start, qmax.end, DATE_DIFF( qmax.end, qmin.start ) FROM table AS qmin
LEFT JOIN (
SELECT end FROM table AS t1
LEFT JOIN table AS t2 ON
t2.start > t1.end AND
t2.start < DATE_ADD( t1.end, 1 DAY )
WHERE t1.end >= '2011-12-18' AND t2.start IS NULL
ORDER BY end ASC LIMIT 1
) AS qmax
LEFT JOIN table AS t2 ON
t2.end < qmin.start AND
t2.end > DATE_DIFF( qmin.start, 1 DAY )
WHERE qmin.start <= '2011-12-18' AND t2.start IS NULL
ORDER BY end DESC LIMIT 1
This should work - left joins selects one date which can be in sequence, so max can be fineded out if you take the nearest record without sequential record ( t2.anyfield is null ) , same thing we do with minimal date.
If you can calculate days between in script - do it using unions ( eg 1. row - minimal, 2. row maximal )
Check this,
SELECT DATEDIFF((SELECT MAX(`start`) FROM testtbl WHERE `user_id`=1),
(select a.`start` from testtbl as a
left outer join testtbl as b on a.user_id = b.user_id
AND a.`start` = b.`start` + INTERVAL 1 DAY
where a.user_id=1 AND b.`start` is null
ORDER BY a.`start` desc LIMIT 1))
DATEDIFF() show difference of the Two days, if you want to number of consecutive days add one for that result.
If it's not a beauty contents then you may try something like:
select t.start, t2.start, datediff(t2.start, t.start) + 1 as consecutive_days
from tab t
join tab t2 on t2.start = (select min(start) from (
select c1.*, case when c2.id is null then 1 else 0 end as gap
from tab c1
left join tab c2 on c1.start = adddate(c2.start, -1)
) t4 where t4.start <= t.start and t4.start >= (select max(start) from (
select c1.*, case when c2.id is null then 1 else 0 end as gap
from tab c1
left join tab c2 on c1.start = adddate(c2.start, -1)
) t3 where t3.start <= t.start and t3.gap = 1))
where t.start = '2011-12-18'
Result should be:
start start consecutive_days
2011-12-18 2011-12-16 3