MySQL order by count for last hour recursive - mysql

I have a SQL question. First of all I'd like to know is it even possible with just SQL, and if not does anyone know a good workaround.
We are building a site, where users can vote for videos.
The users can vote by SMS or directly on site after Facebook authentication.
We have to make a top list of all videos, and calculate the "position" on the list for each video.
So far, we have done that with a simple subquery, something like this:
SELECT v.video_id AS id,
(SELECT (COUNT(*)+1) FROM videos AS v2
WHERE (v2.SMS_votes + v2.facebook_votes) > (v.SMS_votes + v.facebook_votes)) AS total_position
FROM videos AS v
SMS_votes and facebook_votes are aggregated fields. There are separate tables for each kind of votes, with records for each vote, including the time the vote has been set.
This works fine, the positions are calculated... if 2 or more videos have the same number of votes, they "share" the position.
Unfortunately there can be no position sharing, and we have to resolve it by the following rules:
if 2 videos have the same number of votes, the one with more SMS votes has the advantage
if they also have the same number of SMS votes, the one which has more SMS votes in the last hour has the advantage
if they also have the same number of SMS votes in the last hour, they are compared by the hour before, and recursively like that, until there is a difference between the two
Is it possible to do this kind of recursive ordering only in SQL, or do we have to resolve this manually in code? All ideas are welcomed. Just to note, performance is important here, because the top list is used all over the site.

I don't think it's feasible to perform this kind of ordering with a recusive calculation (which is potentially unbounded), but if you're willing to limit the amount of time you look back, there are ways it could be done.
Here's one possibility.
SELECT video_id,
SMS_votes + facebook_votes AS total_votes,
SMS_votes,
COUNT(CASE WHEN time > NOW() - INTERVAL 1 HOUR THEN 1 END) AS h1,
COUNT(CASE WHEN time > NOW() - INTERVAL 2 HOUR THEN 1 END) AS h2,
COUNT(CASE WHEN time > NOW() - INTERVAL 3 HOUR THEN 1 END) AS h3
FROM videos
JOIN SMS_votes USING(video_id)
GROUP BY video_id
ORDER BY total_votes DESC, SMS_votes DESC, h1 DESC, h2 DESC, h3 DESC;
This assumes you have a table called SMS_votes tracking each vote, with a video_id field and a time field.
For each video, it calculates the total votes, the SMS votes, the SMS votes in the past hour, the past two hours, and the past three hours. It then does an ORDER BY on all those values to get the correct position.
It's fairly easy to extend this to include a wider range of hours, but you might also want to consider using an increasing time range as you go back in time. For example, you first look at votes in the past hour, then the past day, then the past week, etc. I suspect that would lower your chance of videos having the same votes without having to add as many extra calculations.
SQL Fiddle example

Related

Calculate the number of regular users in mysql

Given the table showed in the picture, I want to calculate the number of users who have dates far in more than one day. Basically the problem is to calculate the number of regular visitors.
For example: The user adrian# have 3 timestamps, 2 of them in the same day and the other one 2 days after, so this user came back. Instead, the user david# only have 2 timestamps (in the same day), that means this user didn't come back. Any ideas?
You can use the following query:
SELECT usuario_email
FROM users
GROUP BY usuario_email
HAVING COUNT(DISTINCT DATE(fecha)) > 1
The above will select users having visited your site in 2 or more different dates, hence it will select only adrian# based on your sample data.
Demo here

Grouping by time ignoring the Date portion

I have a table that has a column that is called scores and another one that is called date_time
I am trying to find out for each 5 minute time increment how many I have that are above a certain score. I want to ignore the date portion completely and just base this off of time.
This is kind of like in a stats program where they display your peak hours with the only difference that I want to go is detailed as 5 minute time segments.
I am still fairly new at MySQL and Google seems to be my best companion.
What I have found so far is:
SELECT id, score, date_time, COUNT(id)
FROM data
WHERE score >= 500
GROUP BY TIME(date_time) DIV 300;
Would this work or is there a better way to do this.
I don't think your query would work. You need to do a bit more work to get the time rounded to 5 minute intervals. Something like:
SELECT SEC_TO_TIME(FLOOR(TIME_TO_SEC(time(date_time))/300)*300) as time5, COUNT(id)
FROM data
WHERE score >= 500
GROUP BY SEC_TO_TIME(FLOOR(TIME_TO_SEC(time(date_time))/300)*300)
ORDER BY time5;

Group by date from multiple columns?

first of all sorry for that title, but I have no idea how to describe it:
I'm saving sessions in my table and I would like to get the count of sessions per hour to know how many sessions were active over the day. The sessions are specified by two timestamps: start and end.
Hopefully you can help me.
Here we go:
http://sqlfiddle.com/#!2/bfb62/2/0
While I'm still not sure how you'd like to compare the start and end dates, looks like using COUNT, YEAR, MONTH, DAY, and HOUR, you could come up with your desired results.
Possibly something similar to this:
SELECT COUNT(ID), YEAR(Start), HOUR(Start), DAY(Start), MONTH(Start)
FROM Sessions
GROUP BY YEAR(Start), HOUR(Start), DAY(Start), MONTH(Start)
And the SQL Fiddle.
What you want to do is rather hard in MySQL. You can, however, get an approximation without too much difficulty. The following counts up users who start and stop within one day:
select date(start), hour,
sum(case when hours.hour between hour(start) and hours.hour then 1 else 0
end) as GoodEstimate
from sessions s cross join
(select 0 as hour union all
select 1 union all
. . .
select 23
) hours
group by date(start), hour
When a user spans multiple days, the query is harder. Here is one approach, that assumes that there exists a user who starts during every hour:
select thehour, count(*)
from (select distinct date(start), hour(start),
(cast(date(start) as datetime) + interval hour(start) hour as thehour
from sessions
) dh left outer join
sessions s
on s.start <= thehour + interval 1 hour and
s.end >= thehour
group by thehour
Note: these are untested so might have syntax errors.
OK, this is another problem where the index table comes to the rescue.
An index table is something that everyone should have in their toolkit, preferably in the master database. It is a table with a single id int primary key indexed column containing sequential numbers from 0 to n where n is a number big enough to do what you need, 100,000 is good, 1,000,000 is better. You only need to create this table once but once you do you will find it has all kinds of applications.
For your problem you need to consider each hour and, if I understand your problem you need to count every session that started before the end of the hour and hasn't ended before that hour starts.
Here is the SQL fiddle for the solution.
What it does is use a known sequential number from the indextable (only 0 to 100 for this fiddle - just over 4 days - you can see why you need a big n) to link with your data at the top and bottom of the hour.

how to calculate the trend of a specific activity

I have a table in mysql which contain posts/entries, these posts have creation date and categorized. What I want to do is get the trends of those categories, each category how is the trend in the past hour? by trend, I mean, the trend of posting.
Since you marked your question with the data-warehouse tag, you should probably have 2 dimensions. A date dimension for the day and then a time dimension for the hour, minute, seconds component of a date. If you have those two piece, you can simply run a query joining up your time dimension to your main fact grouping by hour.
select pc.category, t.hour, count(*)
from post_details pc, -- since you said they were categorized
posts p,
time_of_day t
where p.time_of_day_id = t.time_of_day_id
and p.post_details_id = pc.post_details_id
group by pc.category, t.hour;
Even if you don't have everything dimensionalized, you still should be able to extract the hour of the date the entry was posted and do a group by.
select p.category, extract(hours from p.post_date), count(*)
from posts p;

Tricky Rails3/mysql query

In rails 3 (also with meta_where gem if you feel like using it in your query), I got a really tricky query that I have been banging my head for:
Suppose I have two models, customers and purchases, customer have many purchases. Let's define customers with at least 2 purchases as "repeat_customer". I need to find the total number of repeat_customers by each day for the past 3 months, something like:
Date TotalRepeatCustomerCount
1/1/11 10 (10 repeat customers by the end of 1/1/11)
1/2/11 15 (5 more customer gained "repeat" status on this date)
1/3/11 16 (1 more customer gained "repeat" status on this date)
...
3/30/11 150
3/31/11 160
Basically I need to group customer count based on the date of creation of their second purchase, since that is when they "gain repeat status".
Certainly this can be achieved in ruby, something like:
Customer.includes(:purchases).all.select{|x| x.purchases.count >= 2 }.group_by{|x| x.purchases.second.created_at.to_date }.map{|date, customers| [date, customers.count]}
However, the above code will fire query on the same lines of Customer.all and Purchase.all, then do a bunch of calculation in ruby. I would much prefer doing selection, grouping and calculations in mysql, since it is not only much faster, it also reduces the bandwith from the database. In large databases, the code above is basically useless.
I have been trying for a while to conjure up the query in rails/active_record, but have no luck even with the nice meta_where gem. If I have to, I will accept a solution in pure mysql query as well.
Edited: I would cache it (or add a "repeat" field to customers), though only for this simplified problem. The criteria for repeat customer can change by the client at any point (2 purchases, 3 purchases, 4 purchases etc), so unfortunately I do have to calculate it on the spot.
SELECT p_date, COUNT(customers.id) FROM
(
SELECT p_date - INTERVAL 1 day p_date, customers.id
FROM
customers NATURAL JOIN purchases
JOIN (SELECT DISTINCT date(purchase_date) p_date FROM purchases) p_dates
WHERE purchases.purchase_date < p_date
GROUP BY p_date, customers.id
HAVING COUNT(purchases.id) >= 2
) a
GROUP BY p_date
I didn't test this in the slightest, so I hope it works. Also, I hope I understood what you are trying to accomplish.
But please note that you should not do this, it'll be too slow. Since the data never changes once the day is passed, just cache it for each day.