SQL, multiple email adresses problem - mysql

I've a user table (MySQL) with the following data
id email creation_date
1 bob#mail.com 2011-08-01 09:00:00
2 bob#mail.com 2011-06-24 02:00:00
3 john#mail.com 2011-02-01 04:00:00
4 john#mail.com 2011-08-05 20:30:00
5 john#mail.com 2011-08-05 23:00:00
6 jill#mail.com 2011-08-01 00:00:00
As you can see we allow email duplicates so its possible to register several accounts with the same email address.
Now I need to select all adresses ordered by the creation_date but no duplicates. This is easy (i think)
SELECT * FROM (SELECT * FROM users ORDER BY creation_date) AS X GROUP BY email
Expected result:
id email creation_date
2 bob#mail.com 2011-06-24 02:00:00
6 jill#mail.com 2011-08-01 00:00:00
3 john#mail.com 2011-02-01 04:00:00
But then I also need to select all other adresses, ie. all that are not present in the result from the first query. Duplicate are allowed here.
Expected result:
id email creation_date
1 bob#mail.com 2011-08-01 09:00:00
4 john#mail.com 2011-08-05 20:30:00
5 john#mail.com 2011-08-05 23:00:00
Any ideas? Perfomance is important because the real database is very huge

SELECT * FROM a
FROM users a
LEFT JOIN (SELECT email, MIN(creation_date) as min_date GROUP BY email)x ON
(x.email = a.email AND x.min_date=a.creation_date)
WHERE x.email IS NULL

In SQL server we would do a Select statement using a rank.
Here are some MYSQL samples:
How to perform grouped ranking in MySQL
http://thinkdiff.net/mysql/how-to-get-rank-using-mysql-query/
I hope this helps.

Related

Mysql / Big Query to Find when a customer loses access for the first time (Gap / Island)

I'm hoping the query will work both in Mysql and BigQuery
For each customer I need to find the first date they had a subscription and the first date that they stopped having any subscriptions (e.g. a break in subscription). A customer can have multiple overlapping subscriptions. Once a customer stops having access then future subscriptions are not considered.
This is a sample table with a few rows. The actual table will have millions of rows and thousands of customers.
using this sample data:
select * from test_sub order by customer_id, effect_date, expire_date;
subscription_id
customer_id
effect_date
expire_date
1
1
2022-01-01 00:00:00
2022-03-01 00:00:00
2
2
2021-01-01 00:00:00
2021-03-01 00:00:00
3
2
2021-02-01 00:00:00
2021-04-25 00:00:00
4
2
2021-05-01 00:00:00
2021-06-01 00:00:00
5
2
2021-08-01 00:00:00
2022-10-01 00:00:00
The answer should be:
customer_id
min(effect_date)
max(expire_date)
1
2022-01-01 00:00:00
2022-03-01 00:00:00
2
2021-01-01 00:00:00
2022-04-25 00:00:00

Combining last and first timestamp data of a group data in 1 row

I have a screenshot table and I want to get the user screenshot time starts and screenshot time ends. I want to create a query to be able to export the data to provide to my users.
Let's say this is my table data.
scs_id
scs_tracker_id
created_at
1
1000
2022-02-22 00:00:00
2
1001
2022-02-22 04:00:00
3
1000
2022-02-22 01:00:00
4
1002
2022-02-22 12:00:00
5
1001
2022-02-22 08:00:00
3
1000
2022-02-22 02:00:00
My expected output should be:
scs_tracker_id
screenshot_starts
screenshot_ends
1000
2022-02-22 00:00:00
2022-02-22 02:00:00
1001
2022-02-22 04:00:00
2022-02-22 08:00:00
1002
2022-02-22 12:00:00
2022-02-22 12:00:00
Code that I'm playing as of the moment:
SELECT
(SELECT MIN(created_at) FROM screen_shots GROUP BY scs_tracker_id ORDER BY scs_id ASC LIMIT 1) AS screenshot_starts,
(SELECT MAX(created_at) FROM screen_shots GROUP BY scs_tracker_id ORDER BY scs_id DESC LIMIT 1) AS screenshot_ends
FROM screen_shots
Aggregate by tracker ID and then take the min/max timestamp:
SELECT
scs_tracker_id,
MIN(created_at) AS screenshot_starts,
MAX(created_at) AS screenshot_ends
FROM screen_shots
GROUP BY scs_tracker_id;

how do i get the correct id with the query results

I want to create a stored procedure in MySQL, but first, I want to get the query right. However, I keep getting the problem that I can't seem to get the correct id back from my query that correspond with the DateTime stamps that I get back.
this is the table I am trying to get the result from:
id EventId start end
1 1 2019-04-05 00:00:00 2019-04-07 00:00:00
2 2 2020-04-03 00:00:00 2020-04-03 00:00:00
3 3 2020-04-02 00:00:00 2020-04-02 00:00:00
7 1 2020-06-11 00:00:00 2020-06-11 00:00:00
9 2 2020-06-18 00:00:00 2020-06-18 00:00:00
10 3 2020-06-11 00:00:00 2020-06-11 00:00:00
11 3 2020-06-07 00:00:00 2020-06-07 00:00:00
query:
SELECT DISTINCT Eventid, MIN(start), id
from date_planning
WHERE `start` >= NOW()
GROUP BY Eventid
this gives me the following result
EventId Min(start) id
1 2020-06-11 00:00:00 3
2 2020-06-18 00:00:00 9
3 2020-06-07 00:00:00 10
but these are the correct ids that belong to those DateTimes:
EventId Min(start) id
1 2020-06-11 00:00:00 7
2 2020-06-18 00:00:00 9
3 2020-06-07 00:00:00 11
You want the row with the minimum "future" date for each eventId. To solve this greatest-n-per-group problem, you need to filter rather than aggregate. Here is one option using a correlated subquery:
select dt.*
from date_planning dt
where dt.start = (
select min(dt1.start)
from date_planning dt1
where dt1.eventId = dt.eventId and dt1.start >= now()
)
For performance, you need an index on (eventId, start).

Is there a way to include double condition in HAVING?

I'm currently working on a query that looks like this. There are two tables - members and member_gathering.
SELECT id, city_id, name FROM members
WHERE "id" = "member_id" IN
(
SELECT "member_id" from member_gathering
GROUP BY "member_id"
HAVING COUNT(DATEDIFF("visited","joined">=365))>=5
ORDER BY "member_id"
)
ORDER BY id*1;
The goal is to have an output of all IDs satisfying the condition of being in more than 5 groups, in which a member is active for more than a year. Being active means having a difference between "visited" and "joined" columns (both are TIMESTAMP) for more than a year (I set that as 365 days).
However, after running, this code shows all the rows in a members table (though manual check of both tables shows that some rows do not satisfy both conditions at the same time).
Any ideas on how to improve the code above? I'm not sure if I can use 'nested' condition inside COUNT(), but all other variants used before show either NULL values or returned all rows in the table, which is obviously not right. Also, I was thinking that problem might be with DATEDIFF function.
All suggestions are welcome: I'm a newbie to MySQL, so I'm not that familiar with it.
UPD: data sample:
1) members
id city_id name
2 980 Joey
5 980 Carl
10 1009 Louis
130 1092 Andrea
2) member_gathering
member_id gathering_id joined visited
2 1 2010-01-01 00:00:00 2010-02-01 00:00:00
2 2 2010-01-01 00:00:00 2010-02-01 00:00:00
5 2 2010-01-01 00:00:00 2010-02-01 00:00:00
10 3 2010-01-01 00:00:00 2010-02-01 00:00:00
130 1 2010-02-01 00:00:00 2013-02-01 00:00:00
130 2 2010-02-01 00:00:00 2013-02-01 00:00:00
130 3 2010-02-01 00:00:00 2014-02-01 00:00:00
130 4 2010-02-01 00:00:00 2018-02-01 00:00:00
130 5 2010-02-01 00:00:00 2015-02-01 00:00:00
Expected result would be only ID 130, thus: 130, 1092, Andreana.
I believe you first need to find all records where datediff is 365 days or more. Then find members who have 5 or more such instances. This needs both WHERE and HAVING clause:
SELECT id, city_id, name
FROM members
WHERE id IN (
SELECT member_id
FROM member_gathering
WHERE DATEDIFF(visited, joined) >= 365
GROUP BY member_id
HAVING COUNT(*) >= 5
)
You could use this way
SELECT id, city_id, name FROM members
WHERE member_id IN
(
SELECT member_id from member_gathering
GROUP BY member_id
HAVING SUM(DATEDIFF(visited, joined) >= 365)>=5
ORDER BY member_id
)
You should use separated expression for count differente category of datediff and remmeber that count work for not null values so if you want obtain the totale for true values you should sue SUM

how to select Count of date ranges from mysql table

I have a table:
Name Registered Date
Amit 2017-01-01
Akshay 2017-01-03
Ankith 2017-01-05
Amit 2017-01-12
Amit 2017-01-13
Amit 2017-02-01
Amit 2017-02-01
I want to write a query which will display the registration weekly report:
Say date between 2017-01-01 to 2017-03-01
Week Count
2017-01-01 3
2017-01-08 2
2017-01-15 0
2017-01-22 0
2017-01-29 2
2017-02-05 0
2017-02-12 0
2017-02-19 0
2017-02-26 0
Here Count is the number of people who registered that week. 3 people registered in between 2017-01-01 to 2017-01-07.
So which query i have to use for this result?
Thanks
If you can use the WEEK function and display the week number instead of a date, then:
select dummy.n, count(table.RegiseredDate)
from (SELECT 1 as n UNION SELECT 2 as n UNION SELECT 3 as n ... UNION SELECT 53 as n) dummy
left outer join table on dummy.n=WEEK(table.Registered Date)
where start_date>= x and end_date<= y
group by WEEK(Registered Date)