I have a db table with about a half-million rows of user sign in data.
simple db table:
users_signin:
id
userid
datetime
What I am trying to figure out is how to acquire the average, or most common, hour of day that a specific person signs into the website.
I am wanting to have a "hour" returned, such as: 04 or 23 (4am/11pm).
The datetime field is a unix time stamp
I have fiddled around doing avg() but getting just the hour is where I am hitting a wall at.
If you want the most common hour of the day for a specific user, you can try the following query:
select hour(datetime) as hr, count(*)
from simple
where userid = $userid
group by hour(datetime)
order by count(*) desc
limit 1;
EDIT:
If the thing you are calling datetime is really a unix time, then you should do:
select hour(from_unixtime(datetime)) as hr, count(*)
from simple
where userid = $userid
group by hour(from_unixtime(datetime))
order by count(*) desc
limit 1;
Related
I record sending emails in a MySQL database, and I want to find duplicate emails that were sent at the same time.
This query works successfully to find emails sent at the exact same time:
SELECT user_id, template, created_at, COUNT(*)
FROM emails
WHERE sender_id = 08347
GROUP BY user_id, template, created_at
HAVING COUNT(*) > 1;
But if I want to allow a time margin, say created_at +/- 5 seconds, I'm not sure how to implement that in the GROUP BY.
How can I select duplicate emails allowing a time difference?
EDIT:
There could be more than 2 emails sent around the same time, which the query would ideally include, although I realize that could get complicated, for example if there are many identical emails sent a second apart consistently for an hour.
This is just an example how to achieve what you want.
But it is pretty expensive query. If you have a huge table - this will become very slow. To improve performance I would recommend to create another column 10_sec_period and update it with some trigger maybe on each insert. And on top of that this new column need to be added to some index.
SELECT user_id,
template,
SEC_TO_TIME((TIME_TO_SEC(created_at) DIV 60) * 60) AS 10_sec_period,
COUNT(*)
FROM emails
WHERE sender_id = 08347
GROUP BY user_id, template, 10_sec_period
HAVING COUNT(*) > 1;
The correct solution would use exists:
SELECT e.*
FROM emails e
WHERE sender_id = '08347' AND
EXISTS (SELECT 1
FROM emails e2
WHERE e2.user_id = e.user_id and e2.template = e.template and
e2.sender_id = e.sender_id and
e2.created_at > e.created_at - interval 5 second and
e2.created_at < e.created_at + interval 5 second and
e2.id <> e.id
)
ORDER BY sender_id, user_id, template, created_at;
SELECT
user_id,
template,
SEC_TO_TIME((TIME_TO_SEC(created_at) DIV 5) * 5) AS rounded_time,
COUNT(*)
FROM emails
WHERE sender_id = 08347
GROUP BY user_id, template, rounded_time
HAVING COUNT(*) > 1;
you can convert the date to unix_time to get the seconds, the divide by 5 and look for the floor to get the group which belong (5 or 0)...Now multiply by 5 to come back the real seconds, in this point only left convert to date again.
Functions:
UNIX_TIMESTAMP: to convert date to unix time
FLOOR: to get the floor from a decimal
FROM_UNIXTIME: to convert unix time to date
SELECT
user_id,
template,
COUNT(1),
FROM_UNIXTIME(FLOOR(UNIX_TIMESTAMP(created_at) / 5)*5)
FROM emails
GROUP BY
FROM_UNIXTIME(FLOOR(UNIX_TIMESTAMP(created_at) / 5)*5) ,
template,
user_id
HAVING COUNT(1) > 1;
I have a table containing:
Balance, Client_ID, Date
This table has ~25 Million rows - Most days, a service executes and creates a new row for each client, with today's date, and balance of the client.
Inside a date range, lets say 01/01/2016 to 12/05/2016, I need to get the first and last row.
*the service does not run every day, so doing Date = 12/05/2016 will not work. If today's balance is equal to yesterday's balance, there is no row inserted (saves me about 90% of the data, which if I calculate correctly, should be 300 Million rows)
To do such, I run these two queries:
Get the first date: 6.9433851242065 seconds
SELECT * FROM (SELECT * FROM daily
WHERE TIME >= '01/01/2016' AND TIME < '13/05/2016') dates
GROUP BY Client_ID
Get the last date: 32.034277915955 seconds
SELECT * FROM (SELECT * FROM daily
WHERE TIME >= '01/01/2016' AND TIME < '13/05/2016'
ORDER BY Date DESC) dates
GROUP BY Client_ID
The first query has no order, because rows are inserted always in the right order, by the service mentioned above - and such is much faster. (7/32)
How can I make both queries faster, or at least the second one?
Query description:
Get the row where the date is the first date after 01/01/2016
Get the row where the date is the last date before 13/05/2016
EDIT: The checked answer gives me the following:
ASC and DESC are mine, 'combined' is the suggested answer
dates_ASC: 33.300458192825
dates_DESC: 8.9232740402222
dates_combined: 8.4357199668884
dates_ASC: 5.4825110435486
dates_DESC: 10.173403978348
dates_combined: 2.7024359703064
dates_ASC: 15.090759038925
dates_DESC: 29.375104904175
dates_combined: 3.2885720729828
Pick each client's min and max time in a derived table. Join with that table:
select *
from daily d1
join (select Client_ID, max(TIME) as maxtime, min(TIME) as mintime
from daily
WHERE TIME >= '01/01/2016' AND TIME < '13/05/2016'
group by Client_ID) d2
on d1.Client_ID = d2.Client_ID and d1.TIME in (d2.mintime, d2.maxtime)
Try first query as:
SELECT * FROM daily WHERE TIME >= '01/01/2016' AND TIME < '13/05/2016' ORDER BY TIME ASC LIMIT 1
The second query as:
SELECT * FROM daily WHERE TIME >= '01/01/2016' AND TIME < '13/05/2016' ORDER BY TIME DESC LIMIT 1
I've been tasked with converting a Rails app from MySQL to Postgres asap and ran into a small issue.
The active record query:
current_user.profile_visits.limit(6).order("created_at DESC").where("created_at > ? AND visitor_id <> ?", 2.months.ago, current_user.id).distinct
Produces the SQL:
SELECT visitor_id, MAX(created_at) as created_at, distinct on (visitor_id) *
FROM "profile_visits"
WHERE "profile_visits"."social_user_id" = 21
AND (created_at > '2015-02-01 17:17:01.826897' AND visitor_id <> 21)
ORDER BY created_at DESC, id DESC
LIMIT 6
I'm pretty confident when working with MySQL but I'm honestly new to Postgres. I think this query is failing for multiple reasons.
I believe the distinct on needs to be first.
I don't know how to order by the results of max function
Can I even use the max function like this?
The high level goal of this query is to return the 6 most recent profile views of a user. Any pointers on how to fix this ActiveRecord query (or it's resulting SQL) would be greatly appreciated.
The high level goal of this query is to return the 6 most recent
profile views of a user.
That would be simple. You don't need max() nor DISTINCT for this:
SELECT *
FROM profile_visits
WHERE social_user_id = 21
AND created_at > (now() - interval '2 months')
AND visitor_id <> 21 -- ??
ORDER BY created_at DESC NULLS LAST, id DESC NULLS LAST
LIMIT 6;
I suspect your question is incomplete. If you want:
the 6 latest visitors with their latest visit to the page
then you need a subquery. You cannot get this sort order in one query level, neither with DISTINCT ON, nor with window functions:
SELECT *
FROM (
SELECT DISTINCT ON (visitor_id) *
FROM profile_visits
WHERE social_user_id = 21
AND created_at > (now() - interval '2 months')
AND visitor_id <> 21 -- ??
ORDER BY visitor_id, created_at DESC NULLS LAST, id DESC NULLS LAST
) sub
ORDER BY created_at DESC NULLS LAST, id DESC NULLS LAST
LIMIT 6;
The subquery sub gets the latest visit per user (but not older than two months and not for a certain visitor21. ORDER BY must have the same leading columns as DISTINCT ON.
You need the outer query to get the 6 latest visitors then.
Consider the sequence of events:
Best way to get result count before LIMIT was applied
Why NULLS LAST? To be sure, you did not provide the table definition.
PostgreSQL sort by datetime asc, null first?
I apologize if this has been asked before.. I'm very new to developing and although I've tried searching a lot, I'm not really sure what to look for.
Anyway so I have a table which counts records being entered per day. It looks something like this (each record is represented by a letter) (assume today's date is 27/01/2013):
RECORD | COUNT | DATE
------A-----|-----4-----|27/01/2013
------B-----|-----7-----|27/01/2013
------B-----|-----3-----|24/01/2013
------C-----|-----8-----|22/01/2013
------A-----|-----2-----|19/01/2013
Each new post is checked in the table and it updates the count if the record already exists on the current day, otherwise a new record is created.
For the page which prints the records which have been added 'TODAY', I have the MySQL query
SELECT * FROM `table` ORDER BY `date` DESC, `count` DESC LIMIT 1000
and use a php 'if' statement to only print the records where the date('Y-m-d') = date in the table. So only the records and the corresponding count which has been entered that day are printed.
- the table above would produce the result:
1. B 7
2. A 4
What I would like is a page which prints the records which have been entered in the last week. I know I can use DATE_SUB(now(),INTERVAL 1 WEEK) AND NOW(), to print the records from last week but I need to duplicate records to be combined and the counts added together.. so the result for this table would look like this:
1. B 10
2. C 8
3. A 4
How would I go about combining those duplicate records and have a list of records ordered by count? Is this the best method to get a 'last week' record count, or is there another table structure which would be better?
Again I'm sorry if this a silly question or if my explanation was long-winded, but just some simple pointers will be really appreciated.
Try this
SELECT `record`, SUM(`count`) AS `count`
FROM `table`
WHERE `date` > DATE_SUB(CURDATE(),INTERVAL 1 WEEK)
GROUP BY `record`
ORDER BY `count` DESC
And you can LIMIT 1000 grouped resultset if you need to
Using GROUP BY will allow you group related records together
SELECT `record`
, SUM(`count`) AS `count`
FROM `table`
WHERE `date` > `date` - INTERVAL 1 WEEK
GROUP BY `record`
ORDER BY `count` DESC
LIMIT 1000
All I want to count entries based on date.(i.e entries with same date.)
My table is
You can see 5th and 6th entry have same date.
Now, the real problem as i think is the same date entry have different time so i am not getting what I want.
I am using this sql
SELECT COUNT( created_at ) AS entries, created_at
FROM wp_frm_items
WHERE user_id =1
GROUP BY created_at
LIMIT 0 , 30
What I am getting is this.
I want entries as 2 for date 2012-02-22
The reason you get what you get is because you also compare the time, down to a second apart. So any entries created the same second will be grouped together.
To achieve what you actually want, you need to apply a date function to the created_at column:
SELECT COUNT(1) AS entries, DATE(created_at) as date
FROM wp_frm_items
WHERE user_id =1
GROUP BY DATE(created_at)
LIMIT 0 , 30
This would remove the time part from the column field, and so group together any entries created on the same day. You could take this further by removing the day part to group entries created on the same month of the same year etc.
To restrict the query to entries created in the current month, you add a WHERE-clause to the query to only select entries that satisfy that condition. Here's an example:
SELECT COUNT(1) AS entries, DATE(created_at) as date
FROM wp_frm_items
WHERE user_id = 1
AND created_at >= DATE_FORMAT(CURDATE(),'%Y-%m-01')
GROUP BY DATE(created_at)
Note: The COUNT(1)-part of the query simply means Count each row, and you could just as well have written COUNT(*), COUNT(id) or any other field. Historically, the most efficient approach was to count the primary key, since that is always available in whatever index the query engine could utilize. COUNT(*) used to have to leave the index and retrieve the corresponding row in the table, which was sometimes inefficient. In more modern query planners this is probably no longer the case. COUNT(1) is another variant of this that didn't force the query planner to retrieve the rows from the table.
Edit: The query to group by month can be created in a number of different ways. Here is an example:
SELECT COUNT(1) AS entries, DATE_FORMAT(created_at,'%Y-%c') as month
FROM wp_frm_items
WHERE user_id =1
GROUP BY DATE_FORMAT(created_at,'%Y-%c')
You must eliminate the time with GROUP BY
SELECT COUNT(*) AS entries, created_at
FROM wp_frm_items
WHERE user_id =1
GROUP BY DATE(created_at)
LIMIT 0 , 30
Oops, misread it.
Use GROUP BY DATE(created_at)
Try:
SELECT COUNT( created_at ) AS entries, created_at
FROM wp_frm_items
WHERE user_id =1
GROUP BY DATE(created_at)
LIMIT 0 , 30