I have an table with the following columns
email ---- created at
abc#gmail.com 2019-12-12 16:03:34
rp#gamil.com 2019-11-12 16:03:34
abc#gmail.com 2020-1-12 16:03:34
er#gmail.com 2020-1-12 16:03:34
I want to design a query that return the back number of emails that registered in consecutive 2 months. I am no novice with queries and have been struggling to come up with a query for this.
For the above the query abc#gmail.com was registered twice in consecutive months
By doing a self-join for Month + 1 and email (and also taking December-to-January transitions into account) this should work:
SELECT
*
FROM
(
SELECT
email,
YEAR( created ) AS createdYear,
MONTH( created ) AS createdMonth
FROM
table
) AS t
INNER JOIN
(
SELECT
email,
YEAR( created ) AS createdYear,
MONTH( created ) AS createdMonth
FROM
table
) AS monthPlus1 ON
t.email = monthPlus1.email
AND
(
(
t.createdMonth = monthPlus1.createdMonth + 1
AND
t.createdYear = monthPlus1.createdYear
)
OR
(
t.createdMonth = 12
AND
monthPlus1.createdMonth = 1
AND
t.createdYear + 1 = monthPlus1.createdYear
)
)
The date logic in this query is a bit gnarly - it can probably be improved by representing the month as a single date value or integer months-since-epoc rather than a year + month tuple.
You can use an EXISTS query to check if an email exists that also had a registration in the previous month:
SELECT DISTINCT email
FROM yourtable t1
WHERE EXISTS (SELECT *
FROM yourtable t2
WHERE t2.email = t1.email
AND DATE_FORMAT(t2.createdat, '%Y%m') = DATE_FORMAT(t1.createdat - INTERVAL 1 MONTH, '%Y%m'))
Output for your sample data
abc#gmail.com
Demo on dbfiddle
We use DISTINCT so we don't get multiple copies of the same email if an email address is registered in more than one consecutive month.
You can use lag(). If this occurs, then lag() will be in two adjacent months.
select t.email
from (select t.*
lag(created_at) over (partition by t.email order by created_at) as prev_created_at
from t
) t
where extract(year_month from created_at) = extract(year_month from (prev_created_at + interval 1 month));
You may need select distinct, if this can occur multiple times.
Related
In this scenario I have two tables users and transactions. I would like to filter all the transactions for a specified time period into 3 categories, first time deposit, second time deposit and additional deposits.
To work out a first time deposit you would check if the user has no transactions before that one using the created_at field, for second time deposit they would have one other transaction before that one and for the rest they should have 2 or more before that one.
The transactions table has 2 fields we care about here:
user (user id)
created_at (time transaction was created)
Here is my attempt but I am having trouble visualising the whole query. Any ideas on how I would do this?
SELECT
COUNT(t.id) as first_time_deposits
FROM
transactions t
WHERE
status = 'approved' AND DATE(t.created_at) BETWEEN (CURDATE() - INTERVAL 0 DAY) AND CURDATE()
GROUP BY user
HAVING NOT EXISTS
(
SELECT
u.id
FROM
transactions u
WHERE
u.created_at < t.created_at
)
I use the date interval here just for filtering transactions between a day, week etc. This query doesn't work because I am trying to reference the date of outer query in the sub query. I am also missing second time deposits and additionald deposits.
Example output I am looking for:
first_time_deposits
second_time_deposits
additional_deposits
15
5
6
All for a selected time period.
Any help would be greatly appreciated.
This is how I'd do that. The solution works fine if, for example, "first" transactions took place at the same time. Same for others
"first_to_last" is a recursive query just to display numbers we need to get transactions for (1 to 3 in your case). This makes the query easy adjustable in case if you suddenly need not first 3 but first 10 transactions
"numbered" - ranks transactions by date
Main query joins first 2 CTEs and replaces numbers with words like "first", "second", and "third". I didn't find other way rather than to hardcode values.
with recursive first_to_last(step) as (
select 1
union all
select step + 1
from first_to_last
where step < 3 -- how many lines to display
),
numbered as (
select dense_rank() over(partition by user_id order by created_at) rnk, created_at, user_id
from transactions
)
select user_id,
concat(case when f.step = 1 then 'first_deposit: '
when f.step = 2 then 'second_deposit: '
when f.step = 3 then 'third_deposit: '
end,
count(rnk))
from numbered n
join first_to_last f
on n.rnk = f.step
group by user_id, f.step
order by user_id, f.step
dbfiddle
UPD. Answer to the additional question: ". I just want the count of all first, second and any deposit that isn't first or second"
Just remove the "first_to_last" cte
with numbered as (
select dense_rank() over(partition by user_id order by created_at) rnk, created_at, user_id
from transactions
)
select user_id,
concat(case when n.rnk = 1 then 'first_deposit: '
when n.rnk = 2 then 'second_deposit: '
else 'other_deposits: '
end,
count(rnk))
from numbered n
group by user_id, case when n.rnk = 1 then 'first_deposit: '
when n.rnk = 2 then 'second_deposit: '
else 'other_deposits: '
end
order by user_id, rnk
UPD2. output in 3 columns: first, second and others
with numbered as (
select dense_rank() over(partition by user_id order by created_at) rnk, created_at, user_id
from transactions
)
select
sum(case when n.rnk = 1 then 1 else 0 end) first_deposit,
sum(case when n.rnk = 2 then 1 else 0 end) second_deposit,
sum(case when n.rnk not in (1,2) then 1 else 0 end) other_deposit
from numbered n
So we log when we send our clients promotional emails and sometimes clients are in our database for a while before they receive their first email so we want to know how many clients received their first ever email by month for the past 12 months.
So far I can only think to get the information month by month but there has to be a way to query all 12 months in a single query.
SELECT DISTINCT
`id`
FROM
`table1`
WHERE
`sendtime` BETWEEN '2019-08-01' AND '2019-09-01'
AND `id` NOT IN (SELECT
`id`
FROM
`table1`
WHERE
`sendtime` < '2019-08-01');
Check for the minimum sendtime for each user:
SELECT id
FROM table1
GROUP BY id
HAVING MIN(sendtime) BETWEEN '2019-08-01' AND '2019-09-01'
If you want the number of these ids:
SELECT COUNT(*) counter
FROM (
SELECT id
FROM table1
GROUP BY id
HAVING MIN(sendtime) BETWEEN '2019-08-01' AND '2019-09-01'
) t
You can use two levels of aggregation:
select date_format(min_sendtime, '%Y-%m') yyyy_mm, count(*) no_clients
from (
select id, min(sendtime) min_sendtime
from table1
group by id
) t
where min_sendtime >= date_format(current_date, '%Y-%m-01') - interval 1 year
group by yyyy_mm
order by yyyy_mm
This gives you one row for each of the last twelve months (that has a least one customer that received their first email), with the count of "new" email over the month.
for now I was able to collect_set() everyone that is active with no problem:
with aux as(
select date
,collect_set(user_id) over(
partition by feature
order by cast(timestamp(date) as float)
range between (-90*60*60*24) following and 0 preceding
) as user_id
,feature
--
from (
select data
,feature
,collect_set(user_id)
--
from table
--
group by date, feature
)
)
--
select date
,distinct_array(flatten(user_id))
,feature
--
from aux
The problem is, now I have to keep only users that are older than last 90 days
I tried this and didn't work:
select date
,collect_set(case when user_created_at < date - interval 90 day
then user_id end) over(
partition by feature
order by cast(timestamp(date) as float)
range between (-90*60*60*24) following and 0 preceding
) as teste
,feature
from table
The reason it didn't work is because the filter inside collect_select() filters only users from one day instead filtering all the users from the last 90 days,
Making the result with more results than expected.
How can I get it correctly?
As reference, I'm using this query to verify if is correct:
select
count(distinct user_id) as total
,count(distinct case when user_created_at < date('2020-04-30') - interval 90 day then user_id end)
,count(distinct case when user_created_at >= date('2020-04-30') - interval 90 day then user_id end)
--
from table
--
where 1=1
and date >= date('2020-04-30') - interval 90 day
and date <= '2020-04-30'
and feature = 'a_feature'
pretty ugly workaround but:
select data
,feature
,collect_set(cus.client_id) as client
from (
select data
,explode(array_distinct(flatten(client))) as client
,feature
from(
select data
,collect_set(client_id) over(
partition by feature
order by cast(timestamp(data) as float)
range between (-90*60*60*24) following and 0 preceding
) as cliente
,feature
from (
select data
,feature
,collect_set(client_id) as cliente
from da_pandora.ds_transaction dtr
--
group by data, feature
)
)
)as dtr
left join costumer as cus
on cus.client_id = dtr.client and date(client_created_at) < data - interval 90 day
group by data, feature
We have a date_value column and another Boolean column which indicates whether the day is a business day or not.
We are trying to find the first business day of the next month( example, for September, 2015 I want it to return 2015-10-01)
We have tried a couple different methods involving last_day, intervals and subqueries but can't quite get it to work.
We also don't have the ability to create custom functions, which makes this a little more difficult.
I think you want something like this:
select min(date_value) fwd
from tablename
where isWorkDay = 1 and
extract(year from date_value)=extract(year from curdate()) and
extract(month from date_value)=extract(month from curdate()) + 1
For all months (v0.3) (please note that I can test this now, so it might have some error):
select t1.month_number, min(t2.date_value)
from tablename t1 join
tablename t2 on extract(year from t1.date_value) * 12 + t1.month_number = extract(year from t2.date_value) * 12 + t2.month_number - 1
where t2.isWorkDay = 1
group by t1.month_number
I was able to get it using the below
SELECT
d.year
,d.month_number
,first_business_period as next_month_first_period
,d2.previous_business_day
FROM lk_date d
JOIN (
SELECT
a.*
, MAX(CASE WHEN d2.business_period_in_days<>0 THEN d2.date_value ELSE NULL END) AS previous_business_day
FROM(
SELECT
d1.year
,d1.month_number
, MIN(CASE WHEN d1.business_period_in_days <> 0 THEN d1.date_value END) AS first_business_period
FROM lk_date d1
GROUP BY 1,2
) a
JOIN lk_date d2 ON d2.date_Value < a.first_business_period
GROUP BY 1,2,3) d2 on d2.previous_business_day = d.date_value
I have a lookup table that relates dates and people associated with those dates:
id, user_id,date
1,1,2014-11-01
2,2,2014-11-01
3,1,2014-11-02
4,3,2014-11-02
5,1,2014-11-03
I can group these by date(day):
SELECT DATE_FORMAT(
MIN(date),
'%Y/%m/%d 00:00:00 GMT-0'
) AS date,
COUNT(*) as count
FROM user_x_date
GROUP BY ROUND(UNIX_TIMESTAMP(created_at) / 43200)
But, how can get the number of unique users, that have now shown up previously? For instance this would be a valid result:
unique, non-unique, date
2,0,2014-11-01
1,1,2014-11-02
0,1,2014-11-03
Is this possibly without having to rely on a scripting language to keep track of this data?
I think this query will do what you want, at least it seems to work for your limited sample data.
The idea is to use a correlated sub-query to check if the user_id has occurred on a date before the date of the current row and then do some basic arithmetic to determine number of unique/non-unique users for each date.
Please give it a try.
select
sum(u) - sum(n) as "unique",
sum(n) as "non-unique",
date
from (
select
date,
count(user_id) u,
case when exists (
select 1
from Table1 i
where i.user_id = o.user_id
and i.date < o.date
) then 1 else 0
end n
from Table1 o
group by date, user_id
) q
group by date
order by date;
Sample SQL Fiddle
I didn't include the id column in the sample fiddle as it's not needed (or used) to produce the result and won't change anything.
This is the relevant question: "But, how can get the number of unique users, that have now shown up previously?"
Calculate the first time a person shows up, and then use that for the aggregation:
SELECT date, count(*) as FirstVisit
FROM (SELECT user_id, MIN(date) as date
FROM user_x_date
GROUP BY user_id
) x
GROUP BY date;
I would then use this as a subquery for another aggregation:
SELECT v.date, v.NumVisits, COALESCE(fv.FirstVisit, 0) as NumFirstVisit
FROM (SELECT date, count(*) as NumVisits
FROM user_x_date
GROUP BY date
) v LEFT JOIN
(SELECT date, count(*) as FirstVisit
FROM (SELECT user_id, MIN(date) as date
FROM user_x_date
GROUP BY user_id
) x
GROUP BY date
) fv
ON v.date = fv.date;