mySQL query that is a bit tricky - mysql

Hi there I want to design this query in mySQL.
Statement: For all the customers that transacted during 2017, what % made another transaction within 30 days?
can you tell me how such query can be designed?
This is the picture of the table to perform this query on:
Table name is: transactions

Just use lead() to get the next date. Then aggregate at the customer level to determine if any transaction in the time period has another within 30 days for that customer.
Finally, aggregate again:
select avg(case when mindiff < 30 then 1.0 else 0 end) as within_30days
from (select customerid, min(datediff(next_date - date)) as mindiff
from (select t.*, lead(date) over (partition by customerid order by date) as next_date
from transactions t
) t
where date >= '2017-01-01' and date < '2018-01-01'
group by customerid
) c

Related

Avg function not returning proper value

I expect this query to give me the avg value from daily active users up to date and grouped by month (from Oct to December). But the result is 164K aprox when it should be 128K. Why avg is not working? Avg should be SUM of values / number of current month days up to today.
SELECT sq.month_year AS 'month_year', AVG(number)
FROM
(
SELECT CONCAT(MONTHNAME(date), "-", YEAR(DATE)) AS 'month_year', count(distinct id_user) AS number
FROM table1
WHERE date between '2020-10-01' and '2020-12-31 23:59:59'
GROUP BY EXTRACT(year_month FROM date)
) sq
GROUP BY 1
Ok guys thanks for your help. The problem was that on the subquery I was pulling the info by month and not by day. So I should pull the info by day there and group by month in the outer query. This finally worked:
SELECT sq.day_month, AVG(number)
FROM (SELECT date(date) AS day_month,
count(distinct id_user) AS number
FROM table_1
WHERE date >= '2020-10-01' AND
date < '2021-01-01'
GROUP BY 1
) sq
GROUP BY EXTRACT(year_month FROM day_month)
Do not use single quotes for column aliases!
SELECT sq.month_year, AVG(number)
FROM (SELECT CONCAT(MONTHNAME(date), '-', YEAR(DATE)) AS month_year,
count(distinct id_user) AS number
FROM table1
WHERE date >= '2020-10-01' AND
date < '2021-01-01'
GROUP BY month_year
) sq
GROUP BY 1;
Note the fixes to the query:
The GROUP BY uses the same columns as the SELECT. Your query should return an error (although it works in older versions of MySQL).
The date comparisons have been simplified.
No single quotes on column aliases.
Note that the outer query is not needed. I assume it is there just to illustrate the issue you are having.

collect_set() distinct users by day from last 90 days only when user is older than last 90 days

for now I was able to collect_set() everyone that is active with no problem:
with aux as(
select date
,collect_set(user_id) over(
partition by feature
order by cast(timestamp(date) as float)
range between (-90*60*60*24) following and 0 preceding
) as user_id
,feature
--
from (
select data
,feature
,collect_set(user_id)
--
from table
--
group by date, feature
)
)
--
select date
,distinct_array(flatten(user_id))
,feature
--
from aux
The problem is, now I have to keep only users that are older than last 90 days
I tried this and didn't work:
select date
,collect_set(case when user_created_at < date - interval 90 day
then user_id end) over(
partition by feature
order by cast(timestamp(date) as float)
range between (-90*60*60*24) following and 0 preceding
) as teste
,feature
from table
The reason it didn't work is because the filter inside collect_select() filters only users from one day instead filtering all the users from the last 90 days,
Making the result with more results than expected.
How can I get it correctly?
As reference, I'm using this query to verify if is correct:
select
count(distinct user_id) as total
,count(distinct case when user_created_at < date('2020-04-30') - interval 90 day then user_id end)
,count(distinct case when user_created_at >= date('2020-04-30') - interval 90 day then user_id end)
--
from table
--
where 1=1
and date >= date('2020-04-30') - interval 90 day
and date <= '2020-04-30'
and feature = 'a_feature'
pretty ugly workaround but:
select data
,feature
,collect_set(cus.client_id) as client
from (
select data
,explode(array_distinct(flatten(client))) as client
,feature
from(
select data
,collect_set(client_id) over(
partition by feature
order by cast(timestamp(data) as float)
range between (-90*60*60*24) following and 0 preceding
) as cliente
,feature
from (
select data
,feature
,collect_set(client_id) as cliente
from da_pandora.ds_transaction dtr
--
group by data, feature
)
)
)as dtr
left join costumer as cus
on cus.client_id = dtr.client and date(client_created_at) < data - interval 90 day
group by data, feature

Find number of rows for each hour where datetime columns match certain criteria

RDBMS: MySQL
The time column(s) datatype is of datetime
For every hour of the 24 hour day I need to retrieve the number of rows in which their start_time matches the hour OR the end_time is great than or equal to the hour.
Below is the current query I have which returns the data I need but only based off of one hour. I can loop through and do 24 separate queries for each hour of the day but I would love to have this in one query.
SELECT COUNT(*) as total_online
FROM broadcasts
WHERE DATE(start_time) = '2018-01-01' AND (HOUR(start_time) = '0' OR
HOUR(end_time) >= '0')
Is there a better way of querying the data I need? Perhaps by using group by somehow? Thank you.
Not exactly sure if i am following, but try something like this:
select datepart(hh, getdate()) , count(*)
from broadcasts
where datepart(hh, starttime) <=datepart(hh, endtime)
and cast(starttime as date)=cast(getdate() as date) and cast(endtime as date)=cast(getdate() as date)
group by datepart(hh, getdate())
Join with a subquery that returns all the hour numbers:
SELECT h.hour_num, COUNT(*) AS total_online
FROM (SELECT 0 AS hour_num UNION SELECT 1 UNION SELECT 2 ... UNION SELECT 23) AS h
JOIN broadcasts AS b ON HOUR(b.start_time) = h.hour_num OR HOUR(b.end_time) >= h.hour_num
WHERE DATE(b.start_time) = '2018-01-01'
GROUP BY h.hour_num

Get percentage of total when using GROUP BY in SQL query

I have a SQL query that I'm using to return the number of training sessions recorded by a client on each day of the week (during the last year).
SELECT COUNT(*) total_sessions
, DAYNAME(log_date) day_name
FROM programmes_results
WHERE log_date >= DATE_SUB(CURDATE(), INTERVAL 1 YEAR)
AND log_date <= CURDATE()
AND client_id = 7171
GROUP
BY day_name
ORDER
BY FIELD(day_name, 'MONDAY', 'TUESDAY', 'WEDNESDAY', 'THURSDAY', 'FRIDAY', 'SATURDAY', 'SUNDAY')
I would like to then plot a table showing these values as a percentage of the total, as opposed to as a 'count' for each day. However I'm at a bit of a loss as to how to do that without another query (which I'd like to avoid).
Any thoughts?
Use a derived table
select day_name, total_sessions, total_sessions / sum(total_sessions) * 100 percentage
from (
query from your question goes here
) temp
group by day_name, total_sessions
You can add the number of trainings per day in your client application to get the total count. This way you definitely avoid having a 2nd query to get the total.
Use the with rollup modifier in the query to get the total returned in the last row:
...GROUP BY day_name WITH ROLLUP ORDER BY ...
Use a subquery to return the overall count within each row
SELECT ..., t.total_count
...FROM programmes_results INNER JOIN (SELECT COUNT(*) as total_count FROM programmes_results WHERE <same where criteria>) as t --NO join condition
...
This will have the largest performance impact on the database, however, it enables you to have the total number in each row.

Count number of entries in time interval 1 that appear in time interval 2 - SQL

I am new here and tried to look up the answer to my question but couldn't find anything on it. I am currently learning how to work with SQL queries and am wondering how I can count the amount of unique values that appear in two time intervals?
I have two columns; one is the timestamp while the other is a customer id. What I want to do is to check, for example, the amount of customers that appear in time interval A, let's say January 2014 - February 2014. I then want to see how many of these also appear in another time interval that i specify, for example February 2014-April 2014. If the total sample were 2 people who both bought something in january while only one of them bought something else before the end of April, the count would be 1.
I am a total beginner and tried the query below but it obviously won't return what I want because each entry only having one timestamp makes it not possible to be in two intervals.
SELECT
count(customer_id)
FROM db.table
WHERE time >= date('2014-01-01 00:00:00')
AND time < date('2014-02-01 00:00:00')
AND time >= date('2014-02-01 00:00:00')
AND time < date('2014-05-01 00:00:00')
;
Try this.
select count(distinct t.customer_id) from Table t
INNER JOIN Table t1 on t1.customer_id = t.customer_id
and t1.time >= '2014-01-01 00:00:00' and t1.time<'2014-02-01 00:00:00'
where t.time >='2014-02-01 00:00:00' and t.time<'2014-05-01 00:00:00'
Here's one method of doing this with conditional grouping in an inner-select.
Select Case
When GroupBy = 1 Then 'January - February 2014'
When GroupBy = 2 Then 'February - April 2014'
End As Period,
Count (Customer_Id) As Total
From
(
SELECT Customer_Id,
Case
When Time Between '2014-01-01' And '2014-02-01' Then 1
When Time Between '2014-02-01' And '2014-04-01' Then 2
Else -1
End As GroupBy
From db.Table
) D
Where GroupBy <> -1
Group By GroupBy
Edit: Sorry, misread the question. This will show you those that overlap those two time ranges:
Select Count(Customer_Id)
From db.Table t1
Where Exists
(
Select Customer_Id
From db.Table t2
Where t1.customer_id = t2.customer_id
And t2.Time Between '2014-02-01' And '2014-04-01'
)
And t1.Time Between '2014-01-01' And '2014-02-01'