MySQL Select multiple distinct - mysql

I have one table which holds lots of records. Its for an auction site. This table can have multiple user_id which can be the same and multiple auction_id which can be the same.
I am trying to write a script that sends an email once to each user that has placed a bid. If a user places say 10 bids on the same auction, I only want the email to be sent once per user per auction, per bid.
How would I do this with distinct over 2 fields? The user_id field and the auction_listing field? I will also need to have a WHERE clause so I only select records of auctions that have less than 24 hours to run.

SELECT DISTINCT b.user_id, b.auction_listing
FROM AuctionBids b
JOIN Auctions a USING (auction_listing)
WHERE a.end_datetime < NOW() + INTERVAL 24 hour
Or
SELECT b.user_id, b.auction_listing
FROM AuctionBids b
JOIN Auctions a USING (auction_listing)
WHERE a.end_datetime < NOW() + INTERVAL 24 hour
GROUP BY b.user_id, b.auction_listing

Without knowing your table schema it is hard to answer to answer you question.
In general if you do a GROUP_BY user_id in the end one user will only appear once in your query results

Related

SQL how to find the average users from each category on a typical date

I have a user table as below;
Column Name Column Datatype Column Description
user_id varchar Unique user id
reg_ts timestamp Registration date
reg_device varchar Device registered
reg_attribution varchar Acquisition type
I am trying to find "On a typical day, what share of registrants are coming from each acquisition
source?"
I wrote the code below but not sure how to divide by the total number of records:
select reg_ts as registiration_date,
reg_attribution as acquisition_type,
count(*)
from users
group by 1,2
order by 1 asc
After I run the code above, I get only get the count of each acquisition type for each date. But I need to find the share of registrants are coming from each acquisition type. Can you please help me fix my query?
You can use a correlated subquery that gets the count for a day (assuming that reg_ts is a day despite being a timestamp).
SELECT u1.reg_ts AS registiration_date,
u1.reg_attribution AS acquisition_type,
count(*) / (SELECT count(*)
FROM users u2
WHERE u2.reg_ts = u1.reg_ts) AS share
FROM users u1
GROUP BY u1.reg_ts,
u1.reg_attribution
ORDER BY u1.reg_ts ASC;
Edit:
If you want the ratio in regard to the total number of users rather than users that registered that day just remove the WHERE clause from the subquery.
SELECT u1.reg_ts AS registiration_date,
u1.reg_attribution AS acquisition_type,
count(*) / (SELECT count(*)
FROM users u2) AS share
FROM users u1
GROUP BY u1.reg_ts,
u1.reg_attribution
ORDER BY u1.reg_ts ASC;
Use window functions:
select reg_ts as registiration_date,
reg_attribution as acquisition_type,
count(*) / sum(count(*)) over () as ratio
from users
group by 1, 2
order by 1 asc;
These have been available in MySQL version 8.0.

How can I use SQL to select duplicate rows of specific fields, allowing a time difference?

I record sending emails in a MySQL database, and I want to find duplicate emails that were sent at the same time.
This query works successfully to find emails sent at the exact same time:
SELECT user_id, template, created_at, COUNT(*)
FROM emails
WHERE sender_id = 08347
GROUP BY user_id, template, created_at
HAVING COUNT(*) > 1;
But if I want to allow a time margin, say created_at +/- 5 seconds, I'm not sure how to implement that in the GROUP BY.
How can I select duplicate emails allowing a time difference?
EDIT:
There could be more than 2 emails sent around the same time, which the query would ideally include, although I realize that could get complicated, for example if there are many identical emails sent a second apart consistently for an hour.
This is just an example how to achieve what you want.
But it is pretty expensive query. If you have a huge table - this will become very slow. To improve performance I would recommend to create another column 10_sec_period and update it with some trigger maybe on each insert. And on top of that this new column need to be added to some index.
SELECT user_id,
template,
SEC_TO_TIME((TIME_TO_SEC(created_at) DIV 60) * 60) AS 10_sec_period,
COUNT(*)
FROM emails
WHERE sender_id = 08347
GROUP BY user_id, template, 10_sec_period
HAVING COUNT(*) > 1;
The correct solution would use exists:
SELECT e.*
FROM emails e
WHERE sender_id = '08347' AND
EXISTS (SELECT 1
FROM emails e2
WHERE e2.user_id = e.user_id and e2.template = e.template and
e2.sender_id = e.sender_id and
e2.created_at > e.created_at - interval 5 second and
e2.created_at < e.created_at + interval 5 second and
e2.id <> e.id
)
ORDER BY sender_id, user_id, template, created_at;
SELECT
user_id,
template,
SEC_TO_TIME((TIME_TO_SEC(created_at) DIV 5) * 5) AS rounded_time,
COUNT(*)
FROM emails
WHERE sender_id = 08347
GROUP BY user_id, template, rounded_time
HAVING COUNT(*) > 1;
you can convert the date to unix_time to get the seconds, the divide by 5 and look for the floor to get the group which belong (5 or 0)...Now multiply by 5 to come back the real seconds, in this point only left convert to date again.
Functions:
UNIX_TIMESTAMP: to convert date to unix time
FLOOR: to get the floor from a decimal
FROM_UNIXTIME: to convert unix time to date
SELECT
user_id,
template,
COUNT(1),
FROM_UNIXTIME(FLOOR(UNIX_TIMESTAMP(created_at) / 5)*5)
FROM emails
GROUP BY
FROM_UNIXTIME(FLOOR(UNIX_TIMESTAMP(created_at) / 5)*5) ,
template,
user_id
HAVING COUNT(1) > 1;

Growth for each quarter+year in SQL over my user table

I am using MYSQL and I have a User database table where my registered users are stored. I'd love to see how many users have registered on an increasing timeline for each quarter. So maybe Q1 2016 I had 1000 users total, then in Q2 2016 I had 2000 users register, in Q3 2016 4000 total users registered, etc (so I want to see the increase, not just how many registered in each quarter)
From another Stack Overflow post, I was able to create a query to see it by each day:
select u.created, count(*)
from (select distinct date(DateCreated) created from `Users`) u
join `Users` u2 on u.created >= date(u2.DateCreated)
group by u.created
and this works for each day, but I'd like to now group it by quarter and year. I tried using the QUARTER(d) function in mysql and even QUARTER(d) + YEAR(d) to concat it but I still can't get the data right (The count(*) ends up producing incredibly high values).
Would anyone be able to help me get my data grouped by quarter/year? My timestamp column is called DateCreated (it's a unix timestamp in milliseconds, so I have to divide by 1000 too)
Thanks so much
I would suggest using a correlated subquery -- this allows you to easily define each row in the result set. I think this is the logic that you want:
select dates.yyyy, dates.q,
(select count(*)
from Users u
where u.DateCreated < dates.mindc + interval 3 month
) as cnt
from (select year(DateCreated) as yyyy, quarter(DateCreated) as q
min(DateCreated) as mindc
from Users u
group by year(DateCreated), quarter(DateCreated)
) dates;

SQL query with condition in second table

I have two tables: rooms and scheduled. In the scheduled-table, there are entries with a begin_time, end_time and a room_id - which displays when the rooms are booked. The rooms-table contains entries with an id for every room.
The begin_time and end_time only contain the hours, so for example '9' and '11', which indicate that the room is booked from 9 till 11.
I want to display a list of the rooms that are currently available. To do this, I need a query that select all room id's from the rooms table, on the condition that there is no entry in scheduled for the current hour (so the current hour does not exist between begin_time and end_time for the room id).
I tried the following:
SELECT DISTINCT id
, room_nr
FROM rooms
WHERE NOT EXISTS(SELECT room_id
FROM scheduled
WHERE rooms.id = scheduled.room_id
AND date = CURDATE()
AND HOUR(CURDATE()) NOT BETWEEN `begin_time` AND `end_time`)
But that does not work, it only shows the rooms for which the id is not existing in scheduled for the current date - without the time condition. I also tried something with joins, but I don't really understand them. How can I make a query that returns the id's from the rooms-table under the described condition?
UPDATE:
I builded the database in this MySQLFiddle: http://sqlfiddle.com/#!2/ecd82c
Try following SQL query:
SELECT DISTINCT id
, room_nr
FROM rooms
WHERE NOT EXISTS(SELECT room_id
FROM scheduled
WHERE rooms.id = scheduled.room_id
AND date = CURDATE()
AND HOUR(CURTIME()) NOT BETWEEN `begin_time` AND `end_time`)

join results of two mysql queries on the same table

I have the intuition that I'm missing something simple, so please excuse me if it's a stupid question but I haven't been able to find an answer here.
I'm treating a database with usage behaviors. We have one row per user, with date and time spent (plus other non-relevant info).
I'd like to output a histogram of the number of visits per day, and number of visits that lasted more than a certain time ; ideally I'd like to have that in one query.
For now I have these two queries:
SELECT DATE(date), COUNT(date) AS Number_of_users FROM users GROUP BY DATE(date)
SELECT DATE(date), COUNT(date) AS Number_of_stayers FROM users WHERE timespent>5 GROUP BY DATE(date)
How can I combine them to obtain a result in the form of:
date users stayers
2014-01-01 21 5
2014-01-02 13 0
etc.
Thanks in advance for any help!
You can try using IF, like this:
SELECT DATE(date),
COUNT(date) AS Number_of_users,
SUM(IF(timespent>5,1,0)) AS Number_of_stayers
FROM users
GROUP BY DATE(date)
This should work, or at least show the basic idea of using JOINs:
SELECT DATE(a.date),
COUNT(a.date) AS Number_of_users,
COUNT(b.date) AS Number_of_stayers
FROM users a
LEFT JOIN users b ON (a.date = b.date AND b.timespent>5)