I'm in need of a better way of retrieving top 10 distinct UID from some tables I have.
The setup:
Table user_view_tracker
Contains pairs of {user id (uid), timestamp (ts)}
Is growing every day (today it's 41k entries)
My goal:
To produce a top 10 of most viewed user id's in the table user_view_tracker
My current code is working, but killing the database slowly:
select
distinct uvt.uid as UID,
(select count(*) from user_view_tracker temp where temp.uid=uvt.uid and temp.ts>date_sub(now(),interval 1 month)) as CLICK
from user_view_tracker uvt
order by CLICK
limit 10
It's quite obvious that a different data structure would help. But I can't do that as of now.
First of all, delete that subquery, this should be enough ;)
select
uvt.uid as UID
,count(*) as CLICK
from
user_view_tracker uvt
where
uvt.ts > date_sub(now(),interval 1 month)
group by
uvt.uid
order by CLICK DESC
limit 10
Try:
select uid, count(*) as num_stamps
from user_view_tracker
where ts > date_sub(now(), interval 1 month)
group by uid
order by 2 desc limit 10
I kept your criteria as far as getting the count for just the past month. You can remove that line if you want to count all.
The removal of DISTINCT should improve performance. It is not necessary if you aggregate in your outer query and group by uid, as that will aggregate the data to one row per uid with the count.
You should use Aggregate functions in MySQL
SELECT UID, COUNT(ts) as Number_Of_Views FROM user_view_tracker
GROUP BY UID
ORDER BY Number_Of_Views DESC
LIMIT 10
A simple demo which selects the top 10 UID viewed
http://sqlfiddle.com/#!2/907c10/3
Related
There is a task: develop a fragment of the Web site that provides work with one table.
Attributes of the table:
Day of the week,
Time of the beginning of the lesson,
Subject name,
Number of the audience,
Full name of the teacher.
We need to make a query: determine the day of the week with the largest number of entries, if there are more than one maximum (ie, they are the same), then output them all. I did the query as follows:
SELECT COUNT (*) cnt, day
FROM schedule
GROUP BY day
ORDER BY cnt DESC
LIMIT 1;
But if there are several identical maxima, then only one is displayed. How to write a query which returns them all?
You can use your query as a subquery in the HAVING clause, e.g.:
SELECT day, count(*) as cnt
FROM schedule
GROUP BY day
HAVING count(*) = (
SELECT count(*) as cnt
FROM schedule
GROUP BY day
ORDER BY cnt DESC
LIMIT 1
)
ORDER BY day
I've been tasked with converting a Rails app from MySQL to Postgres asap and ran into a small issue.
The active record query:
current_user.profile_visits.limit(6).order("created_at DESC").where("created_at > ? AND visitor_id <> ?", 2.months.ago, current_user.id).distinct
Produces the SQL:
SELECT visitor_id, MAX(created_at) as created_at, distinct on (visitor_id) *
FROM "profile_visits"
WHERE "profile_visits"."social_user_id" = 21
AND (created_at > '2015-02-01 17:17:01.826897' AND visitor_id <> 21)
ORDER BY created_at DESC, id DESC
LIMIT 6
I'm pretty confident when working with MySQL but I'm honestly new to Postgres. I think this query is failing for multiple reasons.
I believe the distinct on needs to be first.
I don't know how to order by the results of max function
Can I even use the max function like this?
The high level goal of this query is to return the 6 most recent profile views of a user. Any pointers on how to fix this ActiveRecord query (or it's resulting SQL) would be greatly appreciated.
The high level goal of this query is to return the 6 most recent
profile views of a user.
That would be simple. You don't need max() nor DISTINCT for this:
SELECT *
FROM profile_visits
WHERE social_user_id = 21
AND created_at > (now() - interval '2 months')
AND visitor_id <> 21 -- ??
ORDER BY created_at DESC NULLS LAST, id DESC NULLS LAST
LIMIT 6;
I suspect your question is incomplete. If you want:
the 6 latest visitors with their latest visit to the page
then you need a subquery. You cannot get this sort order in one query level, neither with DISTINCT ON, nor with window functions:
SELECT *
FROM (
SELECT DISTINCT ON (visitor_id) *
FROM profile_visits
WHERE social_user_id = 21
AND created_at > (now() - interval '2 months')
AND visitor_id <> 21 -- ??
ORDER BY visitor_id, created_at DESC NULLS LAST, id DESC NULLS LAST
) sub
ORDER BY created_at DESC NULLS LAST, id DESC NULLS LAST
LIMIT 6;
The subquery sub gets the latest visit per user (but not older than two months and not for a certain visitor21. ORDER BY must have the same leading columns as DISTINCT ON.
You need the outer query to get the 6 latest visitors then.
Consider the sequence of events:
Best way to get result count before LIMIT was applied
Why NULLS LAST? To be sure, you did not provide the table definition.
PostgreSQL sort by datetime asc, null first?
I apologize if this has been asked before.. I'm very new to developing and although I've tried searching a lot, I'm not really sure what to look for.
Anyway so I have a table which counts records being entered per day. It looks something like this (each record is represented by a letter) (assume today's date is 27/01/2013):
RECORD | COUNT | DATE
------A-----|-----4-----|27/01/2013
------B-----|-----7-----|27/01/2013
------B-----|-----3-----|24/01/2013
------C-----|-----8-----|22/01/2013
------A-----|-----2-----|19/01/2013
Each new post is checked in the table and it updates the count if the record already exists on the current day, otherwise a new record is created.
For the page which prints the records which have been added 'TODAY', I have the MySQL query
SELECT * FROM `table` ORDER BY `date` DESC, `count` DESC LIMIT 1000
and use a php 'if' statement to only print the records where the date('Y-m-d') = date in the table. So only the records and the corresponding count which has been entered that day are printed.
- the table above would produce the result:
1. B 7
2. A 4
What I would like is a page which prints the records which have been entered in the last week. I know I can use DATE_SUB(now(),INTERVAL 1 WEEK) AND NOW(), to print the records from last week but I need to duplicate records to be combined and the counts added together.. so the result for this table would look like this:
1. B 10
2. C 8
3. A 4
How would I go about combining those duplicate records and have a list of records ordered by count? Is this the best method to get a 'last week' record count, or is there another table structure which would be better?
Again I'm sorry if this a silly question or if my explanation was long-winded, but just some simple pointers will be really appreciated.
Try this
SELECT `record`, SUM(`count`) AS `count`
FROM `table`
WHERE `date` > DATE_SUB(CURDATE(),INTERVAL 1 WEEK)
GROUP BY `record`
ORDER BY `count` DESC
And you can LIMIT 1000 grouped resultset if you need to
Using GROUP BY will allow you group related records together
SELECT `record`
, SUM(`count`) AS `count`
FROM `table`
WHERE `date` > `date` - INTERVAL 1 WEEK
GROUP BY `record`
ORDER BY `count` DESC
LIMIT 1000
All I want to count entries based on date.(i.e entries with same date.)
My table is
You can see 5th and 6th entry have same date.
Now, the real problem as i think is the same date entry have different time so i am not getting what I want.
I am using this sql
SELECT COUNT( created_at ) AS entries, created_at
FROM wp_frm_items
WHERE user_id =1
GROUP BY created_at
LIMIT 0 , 30
What I am getting is this.
I want entries as 2 for date 2012-02-22
The reason you get what you get is because you also compare the time, down to a second apart. So any entries created the same second will be grouped together.
To achieve what you actually want, you need to apply a date function to the created_at column:
SELECT COUNT(1) AS entries, DATE(created_at) as date
FROM wp_frm_items
WHERE user_id =1
GROUP BY DATE(created_at)
LIMIT 0 , 30
This would remove the time part from the column field, and so group together any entries created on the same day. You could take this further by removing the day part to group entries created on the same month of the same year etc.
To restrict the query to entries created in the current month, you add a WHERE-clause to the query to only select entries that satisfy that condition. Here's an example:
SELECT COUNT(1) AS entries, DATE(created_at) as date
FROM wp_frm_items
WHERE user_id = 1
AND created_at >= DATE_FORMAT(CURDATE(),'%Y-%m-01')
GROUP BY DATE(created_at)
Note: The COUNT(1)-part of the query simply means Count each row, and you could just as well have written COUNT(*), COUNT(id) or any other field. Historically, the most efficient approach was to count the primary key, since that is always available in whatever index the query engine could utilize. COUNT(*) used to have to leave the index and retrieve the corresponding row in the table, which was sometimes inefficient. In more modern query planners this is probably no longer the case. COUNT(1) is another variant of this that didn't force the query planner to retrieve the rows from the table.
Edit: The query to group by month can be created in a number of different ways. Here is an example:
SELECT COUNT(1) AS entries, DATE_FORMAT(created_at,'%Y-%c') as month
FROM wp_frm_items
WHERE user_id =1
GROUP BY DATE_FORMAT(created_at,'%Y-%c')
You must eliminate the time with GROUP BY
SELECT COUNT(*) AS entries, created_at
FROM wp_frm_items
WHERE user_id =1
GROUP BY DATE(created_at)
LIMIT 0 , 30
Oops, misread it.
Use GROUP BY DATE(created_at)
Try:
SELECT COUNT( created_at ) AS entries, created_at
FROM wp_frm_items
WHERE user_id =1
GROUP BY DATE(created_at)
LIMIT 0 , 30
I have a table that stores actions for rate-limiting purposes. What I want to do is fetch the newest row that has a 'key_action' (the action that starts the time for rate-limiting) and then find all entries after that date.
The only way I can currently think to do it is with two queries:
SELECT created_at FROM actions WHERE key_action=1 ORDER BY created_at DESC LIMIT 1
SELECT * FROM actions WHERE created_at >= (created_at from query 1)
Is there a was to combine these two queries into one?
You can make query 1 a subquery of query 2.
SELECT *
FROM actions
WHERE created_at >= (SELECT MAX(created_at)
FROM actions
WHERE key_action=1)
I'd have thought #Joe Stefanelli's answer was right, but Limits are not allowed in subqueries in WHERE statement. From this workaround, I put together this query (not tested)
SELECT * FROM actions
JOIN (SELECT created_at FROM actions WHERE key_action=1 ORDER BY created_at DESC LIMIT 1) createdActions
WHERE actions.created_at >= createdActions.created_at