How can i optimize MySQL's ORDER BY RAND() function? - mysql

I'd like to optimize my queries so I look into mysql-slow.log.
Most of my slow queries contains ORDER BY RAND(). I cannot find a real solution to resolve this problem. Theres is a possible solution at MySQLPerformanceBlog but I don't think this is enough. On poorly optimized (or frequently updated, user managed) tables it doesn't work or I need to run two or more queries before I can select my PHP-generated random row.
Is there any solution for this issue?
A dummy example:
SELECT accomodation.ac_id,
accomodation.ac_status,
accomodation.ac_name,
accomodation.ac_status,
accomodation.ac_images
FROM accomodation, accomodation_category
WHERE accomodation.ac_status != 'draft'
AND accomodation.ac_category = accomodation_category.acat_id
AND accomodation_category.acat_slug != 'vendeglatohely'
AND ac_images != 'b:0;'
ORDER BY
RAND()
LIMIT 1

Try this:
SELECT *
FROM (
SELECT #cnt := COUNT(*) + 1,
#lim := 10
FROM t_random
) vars
STRAIGHT_JOIN
(
SELECT r.*,
#lim := #lim - 1
FROM t_random r
WHERE (#cnt := #cnt - 1)
AND RAND(20090301) < #lim / #cnt
) i
This is especially efficient on MyISAM (since the COUNT(*) is instant), but even in InnoDB it's 10 times more efficient than ORDER BY RAND().
The main idea here is that we don't sort, but instead keep two variables and calculate the running probability of a row to be selected on the current step.
See this article in my blog for more detail:
Selecting random rows
Update:
If you need to select but a single random record, try this:
SELECT aco.*
FROM (
SELECT minid + FLOOR((maxid - minid) * RAND()) AS randid
FROM (
SELECT MAX(ac_id) AS maxid, MIN(ac_id) AS minid
FROM accomodation
) q
) q2
JOIN accomodation aco
ON aco.ac_id =
COALESCE
(
(
SELECT accomodation.ac_id
FROM accomodation
WHERE ac_id > randid
AND ac_status != 'draft'
AND ac_images != 'b:0;'
AND NOT EXISTS
(
SELECT NULL
FROM accomodation_category
WHERE acat_id = ac_category
AND acat_slug = 'vendeglatohely'
)
ORDER BY
ac_id
LIMIT 1
),
(
SELECT accomodation.ac_id
FROM accomodation
WHERE ac_status != 'draft'
AND ac_images != 'b:0;'
AND NOT EXISTS
(
SELECT NULL
FROM accomodation_category
WHERE acat_id = ac_category
AND acat_slug = 'vendeglatohely'
)
ORDER BY
ac_id
LIMIT 1
)
)
This assumes your ac_id's are distributed more or less evenly.

It depends on how random you need to be. The solution you linked works pretty well IMO. Unless you have large gaps in the ID field, it's still pretty random.
However, you should be able to do it in one query using this (for selecting a single value):
SELECT [fields] FROM [table] WHERE id >= FLOOR(RAND()*MAX(id)) LIMIT 1
Other solutions:
Add a permanent float field called random to the table and fill it with random numbers. You can then generate a random number in PHP and do "SELECT ... WHERE rnd > $random"
Grab the entire list of IDs and cache them in a text file. Read the file and pick a random ID from it.
Cache the results of the query as HTML and keep it for a few hours.

Here's how I'd do it:
SET #r := (SELECT ROUND(RAND() * (SELECT COUNT(*)
FROM accomodation a
JOIN accomodation_category c
ON (a.ac_category = c.acat_id)
WHERE a.ac_status != 'draft'
AND c.acat_slug != 'vendeglatohely'
AND a.ac_images != 'b:0;';
SET #sql := CONCAT('
SELECT a.ac_id,
a.ac_status,
a.ac_name,
a.ac_status,
a.ac_images
FROM accomodation a
JOIN accomodation_category c
ON (a.ac_category = c.acat_id)
WHERE a.ac_status != ''draft''
AND c.acat_slug != ''vendeglatohely''
AND a.ac_images != ''b:0;''
LIMIT ', #r, ', 1');
PREPARE stmt1 FROM #sql;
EXECUTE stmt1;

(Yeah, I will get dinged for not having enough meat here, but can't you be a vegan for one day?)
Case: Consecutive AUTO_INCREMENT without gaps, 1 row returned
Case: Consecutive AUTO_INCREMENT without gaps, 10 rows
Case: AUTO_INCREMENT with gaps, 1 row returned
Case: Extra FLOAT column for randomizing
Case: UUID or MD5 column
Those 5 cases can be made very efficient for large tables. See my blog for the details.

This will give you single sub query that will use the index to get a random id then the other query will fire getting your joined table.
SELECT accomodation.ac_id,
accomodation.ac_status,
accomodation.ac_name,
accomodation.ac_status,
accomodation.ac_images
FROM accomodation, accomodation_category
WHERE accomodation.ac_status != 'draft'
AND accomodation.ac_category = accomodation_category.acat_id
AND accomodation_category.acat_slug != 'vendeglatohely'
AND ac_images != 'b:0;'
AND accomodation.ac_id IS IN (
SELECT accomodation.ac_id FROM accomodation ORDER BY RAND() LIMIT 1
)

The solution for your dummy-example would be:
SELECT accomodation.ac_id,
accomodation.ac_status,
accomodation.ac_name,
accomodation.ac_status,
accomodation.ac_images
FROM accomodation,
JOIN
accomodation_category
ON accomodation.ac_category = accomodation_category.acat_id
JOIN
(
SELECT CEIL(RAND()*(SELECT MAX(ac_id) FROM accomodation)) AS ac_id
) AS Choices
USING (ac_id)
WHERE accomodation.ac_id >= Choices.ac_id
AND accomodation.ac_status != 'draft'
AND accomodation_category.acat_slug != 'vendeglatohely'
AND ac_images != 'b:0;'
LIMIT 1
To read more about alternatives to ORDER BY RAND(), you should read this article.

I am optimizing a lot of existing queries in my project. Quassnoi's solution has helped me speed up the queries a lot! However, I find it hard to incorporate the said solution in all queries, especially for complicated queries involving many subqueries on multiple large tables.
So I am using a less optimized solution. Fundamentally it works the same way as Quassnoi's solution.
SELECT accomodation.ac_id,
accomodation.ac_status,
accomodation.ac_name,
accomodation.ac_status,
accomodation.ac_images
FROM accomodation, accomodation_category
WHERE accomodation.ac_status != 'draft'
AND accomodation.ac_category = accomodation_category.acat_id
AND accomodation_category.acat_slug != 'vendeglatohely'
AND ac_images != 'b:0;'
AND rand() <= $size * $factor / [accomodation_table_row_count]
LIMIT $size
$size * $factor / [accomodation_table_row_count] works out the probability of picking a random row. The rand() will generate a random number. The row will be selected if rand() is smaller or equals to the probability. This effectively performs a random selection to limit the table size. Since there is a chance it will return less than the defined limit count, we need to increase probability to ensure we are selecting enough rows. Hence we multiply $size by a $factor (I usually set $factor = 2, works in most cases). Finally we do the limit $size
The problem now is working out the accomodation_table_row_count.
If we know the table size, we COULD hard code the table size. This would run the fastest, but obviously this is not ideal. If you are using Myisam, getting table count is very efficient. Since I am using innodb, I am just doing a simple count+selection. In your case, it would look like this:
SELECT accomodation.ac_id,
accomodation.ac_status,
accomodation.ac_name,
accomodation.ac_status,
accomodation.ac_images
FROM accomodation, accomodation_category
WHERE accomodation.ac_status != 'draft'
AND accomodation.ac_category = accomodation_category.acat_id
AND accomodation_category.acat_slug != 'vendeglatohely'
AND ac_images != 'b:0;'
AND rand() <= $size * $factor / (select (SELECT count(*) FROM `accomodation`) * (SELECT count(*) FROM `accomodation_category`))
LIMIT $size
The tricky part is working out the right probability. As you can see the following code actually only calculates the rough temp table size (In fact, too rough!): (select (SELECT count(*) FROM accomodation) * (SELECT count(*) FROM accomodation_category)) But you can refine this logic to give a closer table size approximation. Note that it is better to OVER-select than to under-select rows. i.e. if the probability is set too low, you risk not selecting enough rows.
This solution runs slower than Quassnoi's solution since we need to recalculate the table size. However, I find this coding a lot more manageable. This is a trade off between accuracy + performance vs coding complexity. Having said that, on large tables this is still by far faster than Order by Rand().
Note: If the query logic permits, perform the random selection as early as possible before any join operations.

My recommendation is to add a column with a UUID (version 4) or other random value, with a unique index (or just the primary key).
Then you can simply generate a random value at query time and select rows greater than the generated value, ordering by the random column.
Make sure if you receive less than the expected number of rows, you repeat the query without the greater than clause (to select rows at the "beginning" of the result set).
uuid = generateUUIDV4()
select * from foo
where uuid > :uuid
order by uuid
limit 42
if count(results) < 42 {
select * from foo
order by uuid
limit :remainingResultsRequired
}

function getRandomRow(){
$id = rand(0,NUM_OF_ROWS_OR_CLOSE_TO_IT);
$res = getRowById($id);
if(!empty($res))
return $res;
return getRandomRow();
}
//rowid is a key on table
function getRowById($rowid=false){
return db select from table where rowid = $rowid;
}

Related

Optimization of relatively basic JOIN and GROUP BY query

I have a relatively basic query that fetches the most recent messages per conversation:
SELECT `message`.`conversation_id`, MAX(`message`.`add_time`) AS `max_add_time`
FROM `message`
LEFT JOIN `conversation` ON `message`.`conversation_id` = `conversation`.`id`
WHERE ((`conversation`.`receiver_user_id` = 1 AND `conversation`.`status` != -2)
OR (`conversation`.`sender_user_id` = 1 AND `conversation`.`status` != -1))
GROUP BY `conversation_id`
ORDER BY `max_add_time` DESC
LIMIT 12
The message table contains more than 911000 records, the conversation table contains around 680000. The execution time for this query, varies between 4 and 10 seconds, depending on the load on the server. Which is far too long.
Below is a screenshot of the EXPLAIN result:
The cause is apparently the MAX and/or the GROUP BY, because the following similar query only takes 10ms:
SELECT COUNT(*)
FROM `message`
LEFT JOIN `conversation` ON `message`.`conversation_id` = `conversation`.`id`
WHERE (`message`.`status`=0)
AND (`message`.`user_id` <> 1)
AND ((`conversation`.`sender_user_id` = 1 OR `conversation`.`receiver_user_id` = 1))
The corresponding EXPLAIN result:
I have tried adding different indices to both tables without any improvement, for example: conv_msg_idx(add_time, conversation_id) on message which seems to be used according to the first EXPLAIN result, however the query still takes around 10 seconds to execute.
Any help improving the indices or query to get the execution time down would be greatly appreciated.
EDIT:
I have changed the query to use an INNER JOIN:
SELECT `message`.`conversation_id`, MAX(`message`.`add_time`) AS `max_add_time`
FROM `message`
INNER JOIN `conversation` ON `message`.`conversation_id` = `conversation`.`id`
WHERE ((`conversation`.`receiver_user_id` = 1 AND `conversation`.`status` != -2)
OR (`conversation`.`sender_user_id` = 1 AND `conversation`.`status` != -1))
GROUP BY `conversation_id`
ORDER BY `max_add_time` DESC
LIMIT 12
But the execution time is still ~ 6 seconds.
You should create Multiple-Column Index on the columns which are in your WHERE clause and which you want to SELECT (except conversation_id). (reference)
conversation_id should be an index in both table.
Try to avoid 'Or' in Sql query this will make the fetching slow. Instead use union or any other methods.
SELECT message.conversation_id, MAX(message.add_time) AS max_add_time FROM message INNER JOIN conversation ON message.conversation_id = conversation.id WHERE (conversation.sender_user_id = 1 AND conversation.status != -1)) GROUP BY conversation_id
union
SELECT message.conversation_id, MAX(message.add_time) AS max_add_time FROM message INNER JOIN conversation ON message.conversation_id = conversation.id WHERE ((conversation.receiver_user_id = 1 AND conversation.status != -2) ) GROUP BY conversation_id ORDER BY max_add_time DESC LIMIT 12
Instead of depending on a single table message, have two tables: One for message, as you have, plus another thread that keeps the status of the thread of messages.
Yes, that requires a little more work when adding a new message -- update a column or two in thread.
But it eliminates the GROUP BY and MAX that are causing grief in this query.
While doing this split, see if some other columns would be better off in the new table.
SELECT `message`.`conversation_id`, MAX(`message`.`add_time`) AS `max_add_time`
FROM `message`
INNER JOIN `conversation` ON `message`.`conversation_id` = `conversation`.`id`
WHERE ((`conversation`.`receiver_user_id` = 1 AND `conversation`.`status` != -2)
OR (`conversation`.`sender_user_id` = 1 AND `conversation`.`status` != -1))
GROUP BY `conversation_id`
ORDER BY `max_add_time` DESC
LIMIT 12
You can try with INNER JOIN, if your logic not get affect using it.
you can modify this query by avoiding max() use
select * from(
select row_number() over(partition by conversation_id order by add_time desc)p1
)t1 where t1.p1=1

Slow Execution of MySQL Select Query

I have the following query…
SELECT DISTINCT * FROM
vPAS_Posts_Users
WHERE (post_user_id =:id AND post_type != 4)
AND post_updated >:updated
GROUP BY post_post_id
UNION
SELECT DISTINCT vPAS_Posts_Users.* FROM PAS_Follow
JOIN vPAS_Posts_Users ON
( PAS_Follow.folw_followed_user_id = vPAS_Posts_Users.post_user_id )
WHERE (( PAS_Follow.folw_follower_user_id =:id AND PAS_Follow.folw_deleted = 0 )
OR ( post_type = 4 AND post_passed_on_by = PAS_Follow.folw_follower_user_id
AND post_user_id !=:id ))
AND post_updated >:updated
GROUP BY post_post_id ORDER BY post_posted_date DESC LIMIT :limit
Where :id = 7, :updated = 0.0 and :limit=40 for example
My issue is that the query is taking about a minute to return results. Is there anything in this query that I can do to speed up the result?
I am using RDS
********EDIT*********
I was asked to run the query with an EXPLAIN the result is below
********EDIT**********
View Definitition
CREATE ALGORITHM=UNDEFINED DEFINER=`MySQLUSer`#`%` SQL SECURITY DEFINER VIEW `vPAS_Posts_Users`
AS SELECT
`PAS_User`.`user_user_id` AS `user_user_id`,
`PAS_User`.`user_country` AS `user_country`,
`PAS_User`.`user_city` AS `user_city`,
`PAS_User`.`user_company` AS `user_company`,
`PAS_User`.`user_account_type` AS `user_account_type`,
`PAS_User`.`user_account_premium` AS `user_account_premium`,
`PAS_User`.`user_sign_up_date` AS `user_sign_up_date`,
`PAS_User`.`user_first_name` AS `user_first_name`,
`PAS_User`.`user_last_name` AS `user_last_name`,
`PAS_User`.`user_avatar_url` AS `user_avatar_url`,
`PAS_User`.`user_cover_image_url` AS `user_cover_image_url`,
`PAS_User`.`user_bio` AS `user_bio`,
`PAS_User`.`user_telephone` AS `user_telephone`,
`PAS_User`.`user_dob` AS `user_dob`,
`PAS_User`.`user_sector` AS `user_sector`,
`PAS_User`.`user_job_type` AS `user_job_type`,
`PAS_User`.`user_unique` AS `user_unique`,
`PAS_User`.`user_deleted` AS `user_deleted`,
`PAS_User`.`user_updated` AS `user_updated`,
`PAS_Post`.`post_post_id` AS `post_post_id`,
`PAS_Post`.`post_language_id` AS `post_language_id`,
`PAS_Post`.`post_type` AS `post_type`,
`PAS_Post`.`post_promoted` AS `post_promoted`,
`PAS_Post`.`post_user_id` AS `post_user_id`,
`PAS_Post`.`post_posted_date` AS `post_posted_date`,
`PAS_Post`.`post_latitude` AS `post_latitude`,
`PAS_Post`.`post_longitude` AS `post_longitude`,
`PAS_Post`.`post_location_name` AS `post_location_name`,
`PAS_Post`.`post_text` AS `post_text`,
`PAS_Post`.`post_media_url` AS `post_media_url`,
`PAS_Post`.`post_image_height` AS `post_image_height`,
`PAS_Post`.`post_link` AS `post_link`,
`PAS_Post`.`post_link_title` AS `post_link_title`,
`PAS_Post`.`post_unique` AS `post_unique`,
`PAS_Post`.`post_deleted` AS `post_deleted`,
`PAS_Post`.`post_updated` AS `post_updated`,
`PAS_Post`.`post_original_post_id` AS `post_original_post_id`,
`PAS_Post`.`post_original_type` AS `post_original_type`,
`PAS_Post`.`post_passed_on_by` AS `post_passed_on_by`,
`PAS_Post`.`post_passed_on_caption` AS `post_passed_on_caption`,
`PAS_Post`.`post_passed_on_fullname` AS `post_passed_on_fullname`,
`PAS_Post`.`post_passed_on_avatar_url` AS `post_passed_on_avatar_url`
FROM (`PAS_User` join `PAS_Post` on((`PAS_User`.`user_user_id` = `PAS_Post`.`post_user_id`)));
try this query:
SELECT *
FROM
vPAS_Posts_Users
WHERE
post_user_id =:id
AND post_type != 4
AND post_updated > :updated
UNION
SELECT u.*
FROM vPAS_Posts_Users u
JOIN PAS_Follow f ON f.folw_followed_user_id = u.post_user_id
WHERE
u.post_updated > :updated
AND ( (f.folw_follower_user_id = :id AND f.folw_deleted = 0)
OR (u.post_type = 4 AND u.post_passed_on_by = f.folw_follower_user_id AND u.post_user_id != :id)
)
ORDER BY u.post_posted_date DESC;
LIMIT :limit
Other improvements
Indices:
Be sure you have indices on the following columns:
PAS_User.user_user_id
PAS_Post.post_user_id
PAS_Post.post_type
PAS_Post.post_updated
PAS_Follow.folw_followed_user_id
PAS_Follow.folw_deleted
PAS_Post.post_passed_on_by
After that is done, please 1- check the performance again (SQL_NO_CACHE) and 2- extract another explain plan so we can adjust the query.
EXPLAIN Results
Here are the some suggestions for the query and view first of all using the UNION for the two result sets which might makes your query to work slow instead you can use the UNION ALL
Why i am referring you to use UNION ALL
Reason is both UNION ALL and UNION use temporary table for result generation.The difference in execution speed comes from the fact UNION requires internal temporary table with index (to skip duplicate rows) while UNION ALL will create table without such index.This explains the slight performance improvement when using UNION ALL.
UNION on its own will remove any duplicate records so no need to use the DISTINCT clause, try to only one GROUP BY of the whole result set by subqueries this will also minimize the execution time rather then grouping results in each subquery.
Make sure you have added the right indexes on the columns especially the columns used in the WHERE,ORDER BY, GROUP BY, the data types should be appropriate for each column with respect to the nature of data in it like post_posted_date should be datetime,date with an index also.
Here is the rough idea for the query
SELECT q.* FROM (
SELECT * FROM
vPAS_Posts_Users
WHERE (post_user_id =:id AND post_type != 4)
AND post_updated >:updated
UNION ALL
SELECT vPAS_Posts_Users.* FROM PAS_Follow
JOIN vPAS_Posts_Users ON
( PAS_Follow.folw_followed_user_id = vPAS_Posts_Users.post_user_id
AND vPAS_Posts_Users.post_updated >:updated)
WHERE (( PAS_Follow.folw_follower_user_id =:id AND PAS_Follow.folw_deleted = 0 )
OR ( post_type = 4 AND post_passed_on_by = PAS_Follow.folw_follower_user_id
AND post_user_id !=:id ))
) q
GROUP BY q.post_post_id ORDER BY q.post_posted_date DESC LIMIT :limit
References
Difference Between Union vs. Union All – Optimal Performance Comparison
Optimize Mysql Union
MySQL Performance Blog
From your explain I can see that most of your table don't have any key except for the primary one, I would suggest you to add some extra key on the columns you're going to join, for example on: PAS_Follow.folw_followed_user_id and vPAS_Posts_Users.post_user_id, just this will result in a big performance boost.
Bye,
Gnagno

MySQL Sub-Queries Simplifying

I have two tables: Races and RacesTimes, I want to extract all from Races and from RacesTimes only Finisher and Time, only the best RacesTimes.TotalTime (ordered ASC with LIMIT 1) from each RaceID (a column from RacesTimes).
So the result would be:
Races.*, RacesTimes.Finisher, RacesTimes.Time
This is what I made:
SELECT
Races.*,
(
SELECT
`TotalTime`
FROM
`RacesTimes`
WHERE
`RaceID` = Races.ID
ORDER BY
`TotalTime` ASC
LIMIT 1
) AS `BestTime`,
(
SELECT
`Time`
FROM
`RacesTimes`
WHERE
`RaceID` = Races.ID
ORDER BY
`TotalTime` ASC
LIMIT 1
) AS `BestTimeS`,
(
SELECT
`Finisher`
FROM
`RacesTimes`
WHERE
`RaceID` = Races.ID
ORDER BY
`TotalTime` ASC
LIMIT 1
) AS `BestFinisher`
FROM `Races`
It is extracting corectly all, but the query is way too long, can't it be simplified ? I think the simplified version uses LEFT JOIN or other thing like that, I don't know how to use queries with JOIN.
The approach here is to aggregate RaceTimes by race. The trick is to get the finisher with the minimum time.
MySQL offers a solution for this, by using group_concat() and substring_index() in a clever way. group_concat() takes an order by argument, so it can order the results by the time. Then the best finisher is in the first position.
The SQL looks like this:
select r.*, rtr.mintt as TotalTime, rtr.Finisher
from Races r join
(select RaceId, MIN(TotalTime) as mintt,
substring_inde(group_concat(finisher separator ',' order by totaltime), 1) as Finisher
from RaceTimes rt
group by RaceId
) rtr
on rtr.RaceId = r.id

How to optimize query if table contain 10000 entries using MySQL?

When I execute this query like this they take so much execution time because user_fans table contain 10000 users entries. How can I optimize it?
Query
SELECT uf.`user_name`,uf.`user_id`,
#post := (SELECT COUNT(*) FROM post WHERE user_id = uf.`user_id`) AS post,
#post_comment_likes := (SELECT COUNT(*) FROM post_comment_likes WHERE user_id = uf.`user_id`) AS post_comment_likes,
#post_comments := (SELECT COUNT(*) FROM post_comments WHERE user_id = uf.`user_id`) AS post_comments,
#post_likes := (SELECT COUNT(*) FROM post_likes WHERE user_id = uf.`user_id`) AS post_likes,
(#post+#post_comments) AS `sum_post`,
(#post_likes+#post_comment_likes) AS `sum_like`,
((#post+#post_comments)*10) AS `post_cal`,
((#post_likes+#post_comment_likes)*5) AS `like_cal`,
((#post*10)+(#post_comments*10)+(#post_likes*5)+(#post_comment_likes*5)) AS `total`
FROM `user_fans` uf ORDER BY `total` DESC lIMIT 20
I would try to simplify this COMPLETELY by putting triggers on your other tables, and just adding a few columns to your User_Fans table... One for each respective count() you are trying to get... from Posts, PostLikes, PostComments, PostCommentLikes.
When a record is added to whichever table, just update your user_fans table to add 1 to the count... it will be virtually instantaneous based on the user's key ID anyhow. As for the "LIKES"... Similar, only under the condition that something is triggered as a "Like", add 1.. Then your query will be a direct math on the single record and not rely on ANY joins to compute a "weighted" total value. As your table gets even larger, the queries too will get longer as they have more data to pour through and aggregate. You are going through EVERY user_fan record which in essence is querying every record from all the other tables.
All that being said, keeping the tables as you have them, I would restructure as follows...
SELECT
uf.user_name,
uf.user_id,
#pc := coalesce( PostSummary.PostCount, 000000 ) as PostCount,
#pl := coalesce( PostLikes.LikesCount, 000000 ) as PostLikes,
#cc := coalesce( CommentSummary.CommentsCount, 000000 ) as PostComments,
#cl := coalesce( CommentLikes.LikesCount, 000000 ) as CommentLikes,
#pc + #cc AS sum_post,
#pl + #cl AS sum_like,
#pCalc := (#pc + #cc) * 10 AS post_cal,
#lCalc := (#pl + #cl) * 5 AS like_cal,
#pCalc + #lCalc AS `total`
FROM
( select #pc := 0,
#pl := 0,
#cc := 0,
#cl := 0,
#pCalc := 0
#lCalc := 0 ) sqlvars,
user_fans uf
LEFT JOIN ( select user_id, COUNT(*) as PostCount
from post
group by user_id ) as PostSummary
ON uf.user_id = PostSummary.User_ID
LEFT JOIN ( select user_id, COUNT(*) as LikesCount
from post_likes
group by user_id ) as PostLikes
ON uf.user_id = PostLikes.User_ID
LEFT JOIN ( select user_id, COUNT(*) as CommentsCount
from post_comment
group by user_id ) as CommentSummary
ON uf.user_id = CommentSummary.User_ID
LEFT JOIN ( select user_id, COUNT(*) as LikesCount
from post_comment_likes
group by user_id ) as CommentLikes
ON uf.user_id = CommentLikes.User_ID
ORDER BY
`total` DESC
LIMIT 20
My variables are abbreviated as
"#pc" = PostCount
"#pl" = PostLikes
"#cc" = CommentCount
"#cl" = CommentLike
"#pCalc" = weighted calc of post and comment count * 10 weighted value
"#lCalc" = weighted calc of post and comment likes * 5 weighted value
The LEFT JOIN to prequeries runs those queries ONCE through, then the entire thing is joined instead of being hit as a sub-query for every record. By using the COALESCE(), if there are no such entries in the LEFT JOINed table results, you won't get hit with NULL values messing up the calcs, so I've defaulted them to 000000.
CLARIFICATION OF YOUR QUESTIONS
You can have any QUERY as an "AS AliasResult". The "As" can also be used to simplify any long table names for simpler readability. Aliases can also be using the same table but as a different alias to get similar content, but for different purpose.
select
MyAlias.SomeField
from
MySuperLongTableNameInDatabase MyAlias ...
select
c.LastName,
o.OrderAmount
from
customers c
join orders o
on c.customerID = o.customerID ...
select
PQ.SomeKey
from
( select ST.SomeKey
from SomeTable ST
where ST.SomeDate between X and Y ) as PQ
JOIN SomeOtherTable SOT
on PQ.SomeKey = SOT.SomeKey ...
Now, the third query above is not practical requiring the ( full query resulting in alias "PQ" representing "PreQuery" ). This could be done if you wanted to pre-limit a certain set of other complex conditions and wanted a smaller set BEFORE doing extra joins to many other tables for all final results.
Since a "FROM" does not HAVE to be an actual table, but can be a query in itself, any place else used in the query, it has to know how to reference this prequery resultset.
Also, when querying fields, they too can be "As FinalColumnName" to simplify results to where ever they will be used too.
select
CONCAT( User.Salutation, User.LastName ) as CourtesyName
from ...
select
Order.NonTaxable
+ Order.Taxable
+ ( Order.Taxable * Order.SalesTaxRate ) as OrderTotalWithTax
from ...
The "As" columnName is NOT required being an aggregate, but is most commonly seen that way.
Now, with respect to the MySQL variables... If you were doing a stored procedure, many people will pre-declare them setting their default values before the rest of the procedure. You can do them in-line in a query by just setting and giving that result an "Alias" reference. When doing these variables, the select will simulate always returning a SINGLE RECORD worth of the values. Its almost like an update-able single record used within the query. You don't need to apply any specific "Join" conditions as it may not have any bearing on the rest of the tables in a query... In essence, creates a Cartesian result, but one record against any other table will never create duplicates anyhow, so no damage downstream.
select
...
from
( select #SomeVar := 0,
#SomeDate := curdate(),
#SomeString := "hello" ) as SQLVars
Now, how the sqlvars work. Think of a linear program... One command is executed in the exact sequence as the query runs. That value is then re-stored back in the "SQLVars" record ready for the next time through. However, you don't reference it as SQLVars.SomeVar or SQLVars.SomeDate... just the #SomeVar := someNewValue. Now, when the #var is used in a query, it is also stored as an "As ColumnName" in the result set. Some times, this can be just a place-holder computed value in preparation of the next record. Each value is then directly available for the next row. So, given the following sample...
select
#SomeVar := SomeVar * 2 as FirstVal,
#SomeVar := SomeVar * 2 as SecondVal,
#SomeVar := SomeVar * 2 as ThirdVal
from
( select #SomeVar := 1 ) sqlvars,
AnotherTable
limit 3
Will result in 3 records with the values of
FirstVal SecondVal ThirdVal
2 4 8
16 32 64
128 256 512
Notice how the value of #SomeVar is used as each column uses it... So even on the same record, the updated value is immediately available for the next column... That said, now look at trying to build a simulated record count / ranking per each customer...
select
o.CustomerID,
o.OrderID
#SeqNo := if( #LastID = o.CustomerID, #SeqNo +1, 1 ) as CustomerSequence,
#LastID := o.CustomerID as PlaceHolderToSaveForNextRecordCompare
from
orders o,
( select #SeqNo := 0, #LastID := 0 ) sqlvars
order by
o.CustomerID
The "Order By" clause forces the results to be returned in sequence first. So, here, the records per customer are returned. First time through, LastID is 0 and customer ID is say...5. Since different, it returns 1 as the #SeqNo, THEN it preserves that customer ID into the #LastID field for the next record. Now, next record for customer... Last ID is the the same, so it takes the #SeqNo (now = 1), and adds 1 to 1 and becomes #2 for the same customer... Continue on the path...
As for getting better at writing queries, take a look at the MySQL tag and look at some of the heavy contributors. Look into the questions and some of the complex answers and how problems solving works. Not to say there are not others with lower reputation scores just starting out and completely competent, but you'll find who gives good answers and why's. Look at their history of answers posted too. The more you read and follow, the more you'll get a better handle on writing more complex queries.
You can convert this query to Group By clause, instead of using Subquery for each column.
You can create indexes on the relationship parameters ( it will be the most helpful way of optimizing your query response ).
1000 user records isn't much data at all.
There may be work you can do on the database itself:
1) Have you got the relevant indexes set on the foreign keys (indexes set on user_id in each of the tables)? Try running EXPLAIN before the query http://www.slideshare.net/phpcodemonkey/mysql-explain-explained
2) Are your data types correct?
See the difference between #me(see image 1) and #DRapp(see image 2) Query execution time and explain. When i read #Drapp answer i realized that what am i doing wrong in this query and why my query take so much time basically answer is so simple my query dependent on subquery or #Drapp used derived (temporary/file sort) with the help of session variables , Alias and joins...
image 1 exe time (00:02:56:321)
image 2 exe time (00:00:32:860)

Number rows of query result with index number

After some help I have my top ten results from my database. But there is no number that specifies the order. The user currently has to look at the data to understand if the data is ascending or descending order. I would like each piece of data to have a number to specify the ranking.
I am not sure how to go about this, so advice rather than code would be sufficient. A little confused and would rather consult someone who knows what they are doing before I start breaking my code.
Sorry i forgot to add my code:
$result = mysql_query("SELECT coffeeshops.*, services.*, sum(temp.total) as final_total FROM coffeeshops inner join services on coffeeshops.shop_id=services.shop_id
inner join (
select
SUM(comfort + service + ambience + friendliness + spacious + experience + bud_quality + bud_price + drink_price + space_cake + accessibility + toilet_access)/(12) /COUNT(shop_id) AS total, shop_id FROM ratings GROUP BY shop_id
) as temp on coffeeshops.shop_id=temp.shop_id
GROUP BY shop_name
ORDER BY final_total DESC, shop_name ASC limit 10");
while($row = mysql_fetch_array($result)) {
OUTPUT HTML here
}
Michael's solution will work, but you can also use PHP. If you have an array of your results $results, you can sort it, and use a for loop to print out the result number. For example:
sort($results); //Asc order, or use rsort($results) for Desc order
for($i=0; $i<count($results); $i++) {
echo $results[$i].' Rank #: '.($i + 1);
}
This works to create an incrementing column rank, if I understand your question:
SELECT
#rownum:=#rownum+1 `rank`,
column1
FROM table, (SELECT #rownum:=0) r
ORDER BY column1 DESC
LIMIT 10;
In order to use this with a GROUP BY and aggregates, you will probably need to wrap it in a subquery and place the #rownum incrementer in the outer query.
Example:
SELECT
#rownum:=#rownum+1 `rank`,
column1,
column1_count
FROM (
SELECT
column1,
COUNT(column1) column1_count
FROM table
GROUP BY column1
ORDER BY COUNT(column1) DESC
) c,
(SELECT #rownum:=0) r
LIMIT 10;