There are two tables: posts (~5,000,000 rows) and relations (~8,000 rows).
posts columns:
-------------------------------------------------
| id | source_id | content | date (int) |
-------------------------------------------------
relations columns:
---------------------------
| source_id | user_id |
---------------------------
I wrote a MySQL query for getting 10 most recent rows from posts which are related to a specific user:
SELECT p.id, p.content
FROM posts AS p
LEFT JOIN relations AS r
ON r.source_id = p.source_id
WHERE r.user_id = 1
ORDER BY p.date DESC
LIMIT 10
However, it takes ~30 seconds to execute it.
I already have indexes at relations for (source_id, user_id), (user_id) and for (source_id), (date), (date, source_id) at posts.
EXPLAIN results:
How can I optimize the query?
Your WHERE clause renders your outer join a mere inner join (because in an outer-joined pseudo record user_id will always be null, never 1).
If you really want this to be an outer join then it is completely superfluous, because every record in posts either has or has not a match in relations of course. Your query would then be
select id, content
from posts
order by "date" desc limit 10;
If you don't want this to be an outer join really, but want a match in relations, then we are talking about existence in a table, an EXISTS or IN clause hence:
select id, content
from posts
where source_id in
(
select source_id
from relations
where user_id = 1
)
order by "date" desc
limit 10;
There should be an index on relations(user_id, source_id) - in this order, so we can select user_id 1 first and get an array of all desired source_id which we then look up.
Of course you also need an index on posts(source_id) which you probably have already, as source_id is an ID. You can even speed things up with a composite index posts(source_id, date, id, content), so the table itself doesn't have to be read anymore - all the information needed is in the index already.
UPDATE: Here is the related EXISTS query:
select id, content
from posts p
where exists
(
select *
from relations r
where r.user_id = 1
and r.source_id = p.source_id
)
order by "date" desc
limit 10;
You could put an index on the date column of the posts table, I believe that will help the order-by speed.
You could also try reducing the number of results before ordering with some additional where statements. For example if you know the that there will likely be ten records with the correct user_id today, you could limit the date to just today (or N days back depending on your actual data).
Try This
SELECT p.id, p.content FROM posts AS p
WHERE p.source_id IN (SELECT source_id FROM relations WHERE user_id = 1)
ORDER BY p.date DESC
LIMIT 10
I'd consider the following :-
Firstly, you only want the 10 most recent rows from posts which are related to a user. So, an INNER JOIN should do just fine.
SELECT p.id, p.content
FROM posts AS p
JOIN relations AS r
ON r.source_id = p.source_id
WHERE r.user_id = 1
ORDER BY p.date DESC
LIMIT 10
The LEFT JOIN is needed if you want to fetch the records which do not have a relations mapping. Hence, doing the LEFT JOIN results in a full table scan of the left table, which as per your info, contains ~5,000,000 rows. This could be the root cause of your query.
For further optimisation, consider moving the WHERE clause into the ON clause.
SELECT p.id, p.content
FROM posts AS p
JOIN relations AS r
ON (r.source_id = p.source_id AND r.user_id = 1)
ORDER BY p.date DESC
LIMIT 10
I would try with a composite index on relations :
INDEX source_user (user_id,source_id)
and change the query to this :
SELECT p.id, p.content
FROM posts AS p
INNER JOIN relations AS r
ON ( r.user_id = 1 AND r.source_id = p.source_id )
ORDER BY p.date DESC
LIMIT 10
Related
I want to return all my users posts except the last 7 ones. MySql.
The pattern would be something like this:
SELECT *
FROM posts
WHERE id_post < (SELECT * FROM posts min(id_post) WHERE id_user=4
ORDER BY id_post DESC LIMIT 7)
ORDER id_post ASC
If I Left Join with the Users table
SELECT q.*,q.id_post as id
FROM posts q
LEFT JOIN users u ON u.id_user=q.id_user
WHERE p.id_user=4
AND q.id_post < (SELECT min(rel.id_post) as min_id_post
FROM
(
SELECT p.*
FROM posts p
WHERE p.id_user=4
ORDER BY p.date DESC
LIMIT 7
) rel )
I retrieve the results by this last query, but it has so many subqueries...
Is it a subquery needed to achive what I want? Is there a shorter version?
You don't need an Outer Join (in fact the WHERE p.id_user=4 turns it into an Inner Join anyway).
You should be able to use a LIMIT to skip the first 7 rows:
SELECT q.*,q.id_post as id
FROM
( SELECT
FROM posts AS p
WHERE p.id_user=4
ORDER BY p.date DESC
LIMIT 8, 999999999
) q
JOIN users u ON u.id_user=q.id_user
WHERE u.id_user=4
ORDER id_post ASC
And your current query doesn't need to join to users, you don't access any column from that table (but this was probably a stripped down version)
I need to join 2 identical tables to display the same list sorted by id. (posts and posts2)
It happens that before only worked with 1 table, but we've been using a second table (posts2) to store the new data from a certain id.
This is the query I used when I worked with 1 table(posts) and works fine.
select posts.id_usu,posts.id_cat,posts.titulo,posts.html,posts.slug,posts.fecha,hits.id,hits.hits,usuarios.id,usuarios.usuario,posts.id
From posts
Join hits On posts.id = hits.id
Join usuarios On posts.id_usu = usuarios.id
where posts.id_cat='".$catid."' order by posts.id desc
Now I tried to apply this query to Union 2 tables, but I don't know at what point instantiate the JOINS. I tried several ways but sends MYSQL Error. The following query merge the 2 tables and order by id, but need to add the JOIN.
select * from (
SELECT posts.id,posts.id_usu,posts.id_cat,posts.titulo,posts.html,posts.slug,posts.fecha
FROM posts where id_cat='6' ORDER BY id
)X
UNION ALL
SELECT posts2.id,posts2.id_usu,posts2.id_cat,posts2.titulo,posts2.html,posts2.slug,posts2.fecha FROM posts2 where id_cat='4' ORDER BY id DESC limit 20
I need to add this at the above query
Join hits On posts.id = hits.id
Join usuarios On posts.id_usu = usuarios.id
Thanks in advance guys.
If you want the same query as your first query but this time with union of your identical table i.e post2 then you can do so
select
p.id_usu,p.id_cat,p.titulo,p.html,p.slug,p.fecha
,hits.id,hits.hits,usuarios.id,usuarios.usuario
from (
(select
id_usu,id_cat,titulo,html,slug,fecha ,id
From posts
where id_cat='".$catid."' order by id desc limit 20)
UNION ALL
(select
id_usu,id_cat,titulo,html,slug,fecha ,id
From posts2
where id_cat='".$catid."' order by id desc limit 20)
) p
Join hits On p.id = hits.id
Join usuarios On p.id_usu = usuarios.id
order by p.id desc limit 20
I'm looking for the best MySQL query for that situation:
I'm listing 10 last posts of a member.
table for posts:
post_id | uid | title | content | date
The member have the possibility to subscribe to other member posts, so that posts are listed in the same list (sorted by date - same table)
So it's ok to select last posts of userid X and userid Y
But I'd like to allow members to diable display of some posts (the ones he doesn't want to be displayed).
My problem is: how can I make that as simple as possible for MySQL?... I thought about a second table where I put the post ids that the user doesn't want:
table postdenied
uid | post_id
Then make a select like:
select * from posts as p where not exists (select 1 from postdenied as d where d.post_id = p.post_id and d.uid = p.uid) order by date DESC limit 10
I'm right?
Or is there something better?
Thanks
If I understand correctly, the posts.uid column stores the ID of the poster. And the postdenied.uid stores the ID of the user that doesn't want to see a certain post.
If the above assumptions are correct, then your query is fine, except that you should not join on the uid columns, only on the post_id ones. And you should have a parameter or constant the userID (noted as #X in the code below) of the user that you want to show all the posts - except those he has "denied":
select p.*
from posts as p
where not exists
(select 1
from postdenied as d
where d.post_id = p.post_id
and d.uid = #X -- #X is the userID of the specific user
)
order by date DESC
limit 10 ;
Another approach to implementing this would be with a LEFT JOIN clause.
SELECT * FROM posts AS p
LEFT JOIN postdenied as d ON d.post_id = p.post_id and d.uid = p.uid
WHERE d.uid IS NULL
ORDER BY date DESC
LIMIT 10
It's unclear to me whether this would be more amenable to the query optimizer. If you have a large amount of data, it may be worth testing both queries and seeing if one is more performant than the other.
See http://sqlfiddle.com/#!2/be7e3/1
Appreciation to ypercube and Lamak for their feedback on my original answer
I have an article table which holds the number of articles views for each day. A new record is created to hold the count for each seperate day for each article.
The query below gets the article id and total views for the top 5 viewed article id for all time :
SELECT article_id,
SUM(article_count) as cnt
FROM article_views
GROUP BY article_id
ORDER BY cnt DESC
LIMIT 5
I also have a seperate article table which holds all the article fields. I want to ammend the query above to join to the article table and get two fields for each article id. I have tried to do this below but count is comming back incorrectly :
SELECT article_views.article_id, SUM( article_views.article_count ) AS cnt, articles.article_title, articles.artcile_url
FROM article_views
INNER JOIN articles ON articles.article_id = article_views.article_id
GROUP BY article_views.article_id
ORDER BY cnt DESC
LIMIT 5
Im not sure exactly what im doing wrong. Do I need to do a subquery?
Add articles.article_title, articles.artcile_url to the GROUP BY clause:
SELECT
article_views.article_id,
articles.article_title,
articles.artcile_url,
SUM( article_views.article_count ) AS cnt
FROM article_views
INNER JOIN articles ON articles.article_id = article_views.article_id
GROUP BY article_views.article_id,
articles.article_title,
articles.artcile_url
ORDER BY cnt DESC
LIMIT 5;
The reason you were not getting correct result set, is that when you select rows that are not included in the GROUP BY nor in an aggregate function in the SELECT clause MySQL picks up random value.
You are using a MySQL (mis) feature called Hidden Columns, because article title is not in the group by. However, this may or may not be causing your problem.
If the counts are wrong, then I think you have duplicate article_id in the article table. You can check this by doing:
select article_id, count(*) as cnt
from articles
group by article_id
having cnt > 1
If any appear, then that is your problem. If they all have different titles, then grouping by the title (as suggested by Mahmoud) would fix the problem.
If not, one way to fix it is the following:
SELECT article_views.article_id, SUM( article_views.article_count ) AS cnt, articles.article_title, articles.artcile_url
FROM article_views INNER JOIN
(select a.* from articles group by article_id) articles
ON articles.article_id = article_views.article_id
GROUP BY article_views.article_id
ORDER BY cnt DESC
LIMIT 5
This chooses an abitrary title for the article.
Your query looks basically right to me...
But the value returned for cnt is going to be dependent upon article_id column being UNIQUE in the articles table. We'd assume that it's the primary key, and absent a schema definition, that's only an assumption.)
Also, we're likely to assume there's a foreign key between the tables, that is, there are no values of article_id in the articles_view table which don't match a value of article_id on a row from the articles table.
To check for "orphan" article_id values, run a query like:
SELECT v.article_id
FROM articles_view v
LEFT
JOIN articles a
ON a.article_id = v.article_id
WHERE a.article_id IS NULL
To check for "duplicate" article_id values in articles, run a query like:
SELECT a.article_id
FROM articles a
GROUP BY a.article_id
HAVING COUNT(1) > 1
If either of those queries returns rows, that could be an explanation for the behavior you observe.
Context:
I have an app that shows posts and comments on the home page.
My intention is to limit the number of posts shown (ie, 10 posts) and...
Limit the number of comments shown per post (ie, 2 comments).
Show the total number of comments in the front end (ie, "read all 10 comments")
MySQL:
(SELECT *
FROM (SELECT *
FROM post
ORDER BY post_timestamp DESC
LIMIT 0, 10) AS p
JOIN user_profiles
ON user_id = p.post_author_id
LEFT JOIN (SELECT *
FROM data
JOIN pts
ON pts_id = pts_id_fk) AS d
ON d.data_id = p.data_id_fk
LEFT JOIN (SELECT *
FROM comment
JOIN user_profiles
ON user_id = comment_author_id
ORDER BY comment_id ASC) AS c
ON p.post_id = c.post_id_fk))
I've failed to insert LIMIT and COUNT in this code to get what I want - any suggestions? - will be glad to post more info if needed.
If I'm understanding you correctly you want no more than 10 posts (and 2 comments) to come back for each unique user in the returned result set.
This is very easy in SQLServer / Oracle / Postgre using a "row_number() PARTITION BY".
Unfortunately there is no such function in MySql. Similar question has been asked here:
ROW_NUMBER() in MySQL
I'm sorry I can't offer a more specific solution for MySql. Definitely further research "row number partition by" equivalents for MySql.
The essence of what this does:
You can add a set of columns that make up a unique set, say user id for example sake (this is the "partition") A "row number" column is then added to each row that matches the partition and starts over when it changes.
This should illustrate:
user_id row_number
1 1
1 2
1 3
2 1
2 2
You can then add an outer query that says: select where row_number <= 10, which can be used in your case to limit to no more than 10 posts. Using the max row_number for that user to determine the "read all 10 comments" part.
Good luck!
This is the skeleton of the query you're looking for:
select * from (
select p1.id from posts p1
join posts p2 on p1.id <= p2.id
group by p1.id
having count(*) <= 3
order by p1.post_timestamp desc
) p left join (
select c1.id, c2.post_id from comments c1
join comments c2 on c1.id <= c2.id and c1.post_id = c2.post_id
group by c1.id
having count(*) <= 2
order by c1.comment_timestamp desc
) c
on p.id = c.post_id
It will get posts ordered by their descending timestamp but only the top 3 of them. That result will be joined with the top 2 comments of each post order by their descending timestamp. Just change the column names and it will work :)