mysql union limit problem - mysql

I want a paging script working properly basically but the situation is a bit complex. I need to pick data from union of two sql queries. See the query below. I have a table book and a table bookvisit. What I want is here to show all books for a particular category in their popularity order. I am getting data for all books with atleast one visit by joining table book and bookvisit. and then union it with all books with no visit. Everything works fine but when I try to do paging, I need to limit it like (0,10) (10,10) (20,10) (30,10), correct? If I have 9 books in bookvisit for that category and 3761 books without any visit for that category (total of 3770 books), it should list 377 pages , 10 books on each page. but it does not show any data for some pages because it tries to show books with limit 3760,10 and hence no records for second query in union. May be I am unable to clear the situation here but if you think a bit about the situation, you will get my point.
SELECT * FROM (
SELECT * FROM (
SELECT viewcount, b.isbn, booktitle, stock_status, price, description FROM book AS b
INNER JOIN bookvisit AS bv ON b.isbn = bv.isbn WHERE b.price <> 0 AND hcategoryid = '25'
ORDER BY viewcount DESC
LIMIT 10, 10
) AS t1
UNION
SELECT * FROM
(
SELECT viewcount, b.isbn, booktitle, stock_status, price, description FROM book AS b
LEFT JOIN bookvisit AS bv ON b.isbn = bv.isbn WHERE b.price <> 0 AND hcategoryid = '25'
AND viewcount IS NULL
ORDER BY viewcount DESC
LIMIT 10, 10
) AS t2
)
AS qry
ORDER BY viewcount DESC
LIMIT 10

Do not use limit for the separate queries. Use limit only at the end. You want to get the hole result set from the 2 queries and then show only the 10 results that you need no matter if this is LIMIT 0, 10 or LIMIT 3760,10
SELECT * FROM (
SELECT * FROM (
SELECT viewcount, b.isbn, booktitle, stock_status, price, description FROM book AS b
INNER JOIN bookvisit AS bv ON b.isbn = bv.isbn WHERE b.price <> 0 AND hcategoryid = '25'
ORDER BY viewcount DESC
) AS t1
UNION
SELECT * FROM
(
SELECT viewcount, b.isbn, booktitle, stock_status, price, description FROM book AS b
LEFT JOIN bookvisit AS bv ON b.isbn = bv.isbn WHERE b.price <> 0 AND hcategoryid = '25'
AND viewcount IS NULL
ORDER BY viewcount DESC
) AS t2
)
AS qry
ORDER BY viewcount DESC
LIMIT 10, 10

old one, but still relevant.
Basically, performance wise, you have to use LIMIT on each query involved into UNION, if you know there will be no duplicates between result sets you should consider using UNION ALL, again, performance wise. Then, if you need, lets say, LIMIT 100, 20, you do LIMIT each query with 120 (OFFSET + LIMIT), you are always fetching twice as much records you need, but not all.
SELECT [fields] FROM
(
(SELECT [fields] FROM ... LIMIT 10)
UNION ALL
(SELECT [fields] FROM ... LIMIT 10)
) query
LIMIT 0, 10
5th page
SELECT [fields] FROM
(
(SELECT [fields] FROM ... LIMIT 50)
UNION ALL
(SELECT [fields] FROM ... LIMIT 50)
) query
LIMIT 40, 10

A decade after this question was asked, I can offer a solution, one that perhaps seems obvious to anyone familiar with views: instead of attempting a nested select statement to combine the two tables, use CREATE VIEW (or CREATE OR REPLACE VIEW) to combine the two tables into a view. The speed performance may be poor, as the tables will have to be combined for every page access and may have to be recombined for every pagination, depending on how your code is arranged, but it will work.
If you run into SQL user permissions issues that you and your sysadmin cannot solve, my best advice is to create a new user with full permissions, assign the new user to the table, and use the new user to create the views. That was the only thing that worked for me.

Related

How to Optimize mysql query the brings huge quantity of rows?

I really need your help. I´m doing one work from my universitiy and before I come here I read a lot of things from the documentations of mysql, searched and searched but none of this helped me in my sql query. Look I have this query:
SELECT a.nome, COUNT(*)
FROM publ p JOIN auth a on p.pubid = a.pubid
WHERE p.pubid IN (SELECT pubid
FROM auth
GROUP BY pubid
HAVING COUNT(*) < 3) // THIS VALUE 3 here I have to do with value 2, 4 and 5
GROUP BY a.nome // in different querys.
ORDER BY COUNT(*) DESC, a.nome ASC
I tried to put index in the where clause but I never get the results and takes to long time. What can I do to increase my query to bring me more faster the results? Thank you for the help
I would create these indexes and reorder the query
CREATE INDEX publ_pubid ON publ(pubid);
CREATE INDEX auth_pubid ON auth(pubid, nome);
SELECT a.nome, COUNT(*)
FROM (
SELECT pubid
FROM auth
GROUP BY pubid
HAVING COUNT(*) < 3
) L
LEFT JOIN publ p on L.pubid=publ.pubid
JOIN auth a on p.pubid = a.pubid
GROUP BY a.nome
ORDER BY COUNT(*) DESC, a.nome ASC;

How can I speed up a multiple inner join query?

I have two tables. The first table (users) is a simple "id, username" with 100,00 rows and the second (stats) is "id, date, stat" with 20M rows.
I'm trying to figure out which username went up by the most in stat and here's the query I have. On a powerful machine, this query takes minutes to complete. Is there a better way to write it to speed it up?
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN stats AS b ON (b.id=a.id)
INNER JOIN stats AS c ON (c.id=a.id)
WHERE b.date = '2016-01-10'
AND c.date = '2016-01-13'
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
the other way i tried but it doesn't seem optimal is
SELECT a.id, a.username,
(SELECT b.stat FROM stats AS b ON (b.id=a.id) AND b.date = '2016-01-10') AS start,
(SELECT c.stat FROM stats AS c ON (c.id=a.id) AND c.date = '2016-01-14') AS end,
((SELECT b.stat FROM stats AS b ON (b.id=a.id) AND b.date = '2016-01-10') -
(SELECT c.stat FROM stats AS c ON (c.id=a.id) AND c.date = '2016-01-14')) AS stat_diff
FROM users AS a
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
Introduction
Let's suppose we rewrite sentence like this:
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN stats AS b ON
b.date = STR_TO_DATE('2016-01-10', '%Y-%m-%d' ) and b.id=a.id
INNER JOIN stats AS c ON
c.date = STR_TO_DATE('2016-01-13', '%Y-%m-%d' ) and c.id=a.id
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
And we ensure than:
users table has index on field id:
stats has index on composite field date, id: create index stats_idx_d_i on stats ( date, id );
Then
Database optimizer may use indexes to selected a Restricted Set of Date ('RSD'), that means, rows that match filtered dates. This is fast.
But
You are sorting by a calculated field:
(b.stat - c.stat) AS stat_diff #<-- calculated
ORDER BY stat_diff DESC #<-- this forces to calculate it
They are no possible optimization on this sort because you should to calculate one by one all results on your 'RSD' (restricted set of data).
Conclusion
The question is, how may rows they are on your 'RSD'? If only they are few hundreds rows you query may run fast, else, your query will be slow.
Any case, you should to be sure the first step of query ( without sorting ) is made by index and no fullscanning. Use Explain command to be sure.
All you need to do is to help optimizer.At a bare minimum.have a check list which looks like below
1.Are my join columns indexed ?
2.Are the where clauses Sargable
3.are there any implicit,explicit conversions
4.Am i seeing any statistics issues
one more interesting aspect to look at is how is your data distributed,once you understand the data,you will be able to intrepret the execution plan and alter it as per your need
EX:
Think like i have any customers table with 100rows,Each one has a minimum of 10 orders(total upto 10000 orders).Now if you need to find out only top 3 orders by date,you dont want a scan happening of orders table
Now in your case ,i may not go with second option,even though the optimizer may choose a good plan for this one as well,I will go first approach and try to see if the execution time is acceptable.if not then i will go through my check list and try to tune it further
The Query Seems OK, Verify your Indexes ..
Or
Try this Query
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN (select id,stat from stats where date = '2016-01-10') AS b ON (b.id=a.id)
INNER JOIN (select id,stat from stats where date = '2016-01-13') AS c ON (c.id=a.id)
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100

SQL top records based on two tables relations

I have three main items I am storing: Articles, Entities, and Keywords. This makes 5 tables:
article { id }
entity {id, name}
article_entity {id, article_id, entity_id}
keyword {id, name}
article_keyword {id, article_id, keyword_id}
I would like to get all articles that contain the TOP X keywords + entities. I can get the top X keywords or entities with a simple group by on the entity_id/keyword_id.
SELECT [entity|keyword]_id, count(*) as num FROM article_entity
GROUP BY entity_id ORDER BY num DESC LIMIT 10
How would I get all articles that have a relation to the top entities and keywords?
This was what I imagined, but I know it doesn't work because of the group by entity limiting the article_id's that return.
SELECT * FROM article
WHERE EXISTS (
[... where article is mentioned in top X entities.. ]
) AND EXISTS (
[... where article is mentioned in top X keywords.. ]
);
If I understand you correct the objective of the query is to find the articles that have a relation to both one of the top 10 entities as well as to one of the top 10 keywords. If this is the case the following query should do that, by requiring that the article returned has a match in both the set of top 10 entities and the set of top 10 keywords.
Please give it a try.
SELECT a.id
FROM article a
INNER JOIN article_entity ae ON a.id = ae.article_id
INNER JOIN article_keyword ak ON a.id = ak.article_id
INNER JOIN (
SELECT entity_id, COUNT(article_id) AS article_entity_count
FROM article_entity
GROUP BY entity_id
ORDER BY article_entity_count DESC LIMIT 10
) top_ae ON ae.entity_id = top_ae.entity_id
INNER JOIN (
SELECT keyword_id, COUNT(article_id) AS article_keyword_count
FROM article_keyword
GROUP BY keyword_id
ORDER BY article_keyword_count DESC LIMIT 10
) top_ak ON ak.keyword_id = top_ak.keyword_id
GROUP BY a.id;
The downside to using a simplelimit 10in the two subqueries for top entities/keywords is that it won't handle ties, so if the 11th keyword was just as popular as the 10th it still won't get chosen. This can be fixed though by using a ranking function, but afaik MySQL doesn't have anything build in (like RANK() window functions in Oracle or MSSQL).
I set up a sample SQL Fiddle (but using fewer data points andlimit 2as I'm lazy).
Not knowing the volume of data you are working with, I would first recommend that you have two storage columns on your article table for count of entities and keywords respectively. Then via triggers on adding/deleting from each, update the respective counter columns. This way, you don't have to do a burning query each time needed, especially in a web-based interface. Then, you can just select from the articles table ordered by the E+K counts descending and be done with it, instead of constant sub-querying the underlying tables.
Now, that said, the other suggestions are somewhat similar to what I am posting, but they all appear to be doing a limit of 10 records for each set. Lets throw this scenario into the picture. Say you have articles 1-20 all a range of 10, 9 and 8 entities and 1-2 keywords. Then articles 21-50 have the reverse... 10, 9, 8 keywords and 1-2 entities. Now, you have articles 51-58 that have 7 entities AND 7 keywords total of 14 combined points. None of the queries would have caught this as entities would only return the qualifying 1-20 records and keywords records 21-50. Articles 51-58 would be so far down on the list, it would not even be considered even though its total is 14.
To handle this, each sub-query is a full query specifically on the article ID and its count. Simple order by the article_ID as that is basis of the join to the master article table.
Now, the coalesce() will get the count if so available, otherwise 0 and add the two values together. From that, the results are ordered with the highest counts first (thus getting scenario sample articles 51-58 plus a few of the others) when the limit is applied.
SELECT
a.id,
coalesce( JustE.ECount, 0 ) ECount,
coalesce( JustK.KCount, 0 ) KCount,
coalesce( JustE.ECount, 0 ) + coalesce( JustK.KCount, 0 ) TotalCnt
from
article a
LEFT JOIN ( select article_id, COUNT(*) as ECount
from article_entity
group by article_id
order by article_id ) JustE
on a.id = JustE.article_id
LEFT JOIN ( select article_id, COUNT(*) as KCount
from article_keyword
group by article_id
order by article_id ) JustK
on a.id = JustK.article_id
order by
coalesce( JustE.ECount, 0 ) + coalesce( JustK.KCount, 0 ) DESC
limit 10
I took this in several steps
tl;dr This shows all the articles from the top (4) keywords and entities:
Here's a fiddle
select
distinct article_id
from
(
select
article_id
from
article_entity ae
inner join
(select
entity_id, count(*)
from
article_entity
group by
entity_id
order by
count(*) desc
limit 4) top_entities on ae.entity_id = top_entities.entity_id
union all
select
article_id
from
article_keyword ak
inner join
(select
keyword_id, count(*)
from
article_keyword
group by
keyword_id
order by
count(*) desc
limit 4) top_keywords on ak.keyword_id = top_keywords.keyword_id) as articles
Explanation:
This starts with an effort to find the top X entities. (4 seemed to work for the number of associations i wanted to make in the fiddle)
I didn't want to select articles here because it skews the group by, you want to focus solely on the top entities. Fiddle
select
entity_id, count(*)
from
article_entity
group by
entity_id
order by
count(*) desc
limit 4
Then I selected all the articles from these top entities. Fiddle
select
*
from
article_entity ae
inner join
(select
entity_id, count(*)
from
article_entity
group by
entity_id
order by
count(*) desc
limit 4) top_entities on ae.entity_id = top_entities.entity_id
Obviously the same logic needs to happen for the keywords. The queries are then unioned together (fiddle) and the distinct article ids are pulled from the union.
This will give you all articles that have a relation to the top (x) entities and keywords.
This gets the top 10 keyword articles that are also a top 10 entity. You may not get 10 records back because it is possible that an article only meets one of the criteria (top entity but not top keyword or top keyword but not top entity)
select *
from article a
inner join
(select count(*),ae.article_id
from article_entity ae
group by ae.article_id
order by count(*) Desc limit 10) e
on a.id = e.article_id
inner join
(select count(*),ak.article_id
from article_keyword ak
group by ak.article_id
order by count(*) Desc limit 10) k
on a.id = k.article_id

MySQL ORDER BY multiple column ASC and DESC

I have 2 MYSQL tables, users and scores. Detail:
users table:
scores table:
My intention is get 20 users list that have point field sort DESC (descending) combine avg_time field sort ASC (ascending). I use the query:
SELECT users.username, scores.point, scores.avg_time
FROM scores, users
WHERE scores.user_id = users.id
GROUP BY users.username
ORDER BY scores.point DESC, scores.avg_time
LIMIT 0, 20
The result is:
The result is wrong because the first line is exactly point = 100 and avg_time = 60.
My desired result is:
username point avg_time
demo123 100 60
demo123456 100 100
demo 90 120
I tried many times with different queries but the result is still wrong. Could you give me some solutions?
Ok, I THINK I understand what you want now, and let me clarify to confirm before the query. You want 1 record for each user. For each user, you want their BEST POINTS score record. Of the best points per user, you want the one with the best average time. Once you have all users "best" values, you want the final results sorted with best points first... Almost like ranking of a competition.
So now the query. If the above statement is accurate, you need to start with getting the best point/average time per person and assigning a "Rank" to that entry. This is easily done using MySQL # variables. Then, just include a HAVING clause to only keep those records ranked 1 for each person. Finally apply the order by of best points and shortest average time.
select
U.UserName,
PreSortedPerUser.Point,
PreSortedPerUser.Avg_Time,
#UserRank := if( #lastUserID = PreSortedPerUser.User_ID, #UserRank +1, 1 ) FinalRank,
#lastUserID := PreSortedPerUser.User_ID
from
( select
S.user_id,
S.point,
S.avg_time
from
Scores S
order by
S.user_id,
S.point DESC,
S.Avg_Time ) PreSortedPerUser
JOIN Users U
on PreSortedPerUser.user_ID = U.ID,
( select #lastUserID := 0,
#UserRank := 0 ) sqlvars
having
FinalRank = 1
order by
Point Desc,
Avg_Time
Results as handled by SQLFiddle
Note, due to the inline #variables needed to get the answer, there are the two extra columns at the end of each row. These are just "left-over" and can be ignored in any actual output presentation you are trying to do... OR, you can wrap the entire thing above one more level to just get the few columns you want like
select
PQ.UserName,
PQ.Point,
PQ.Avg_Time
from
( entire query above pasted here ) as PQ
i think u miss understand about table relation..
users : scores = 1 : *
just join is not a solution.
is this your intention?
SELECT users.username, avg(scores.point), avg(scores.avg_time)
FROM scores, users
WHERE scores.user_id = users.id
GROUP BY users.username
ORDER BY avg(scores.point) DESC, avg(scores.avg_time)
LIMIT 0, 20
(this query to get each users average point and average avg_time by desc point, asc )avg_time
if you want to get each scores ranking? use left outer join
SELECT users.username, scores.point, scores.avg_time
FROM scores left outer join users on scores.user_id = users.id
ORDER BY scores.point DESC, scores.avg_time
LIMIT 0, 20
#DRapp is a genius. I never understood how he coded his SQL,so I tried coding it in my own understanding.
SELECT
f.username,
f.point,
f.avg_time
FROM
(
SELECT
userscores.username,
userscores.point,
userscores.avg_time
FROM
(
SELECT
users.username,
scores.point,
scores.avg_time
FROM
scores
JOIN users
ON scores.user_id = users.id
ORDER BY scores.point DESC
) userscores
ORDER BY
point DESC,
avg_time
) f
GROUP BY f.username
ORDER BY point DESC
It yields the same result by using GROUP BY instead of the user #variables.
group by default order by pk id,so the result
username point avg_time
demo123 100 90 ---> id = 4
demo123456 100 100 ---> id = 7
demo 90 120 ---> id = 1

speed up mysql query with multiple subqueries

I'm wondering if there's a way to speed up a mysql query which is ordered by multiple subqueries.
On a music related site users can like different things like artists, songs, albums etc. These "likes" are all stored in the same table. Now I want to show a list of artists ordered by the number of "likes" by the users friends and all users. I want to show all artists, also those who have no likes at all.
I have the following query:
SELECT `artists`.*,
// friend likes
(SELECT COUNT(*)
FROM `likes`
WHERE like_type = 'artist'
AND like_id = artists.id
AND user_id IN (1,2,3,4, etc) // ids of friends
GROUP BY like_id
) AS `friend_likes`,
// all likes
(SELECT COUNT(*)
FROM `likes`
WHERE like_type = 'artist'
AND like_id = artists.id
GROUP BY like_id
) AS `all_likes`
FROM artists
ORDER BY
friend_likes DESC,
all_likes DESC,
artists.name ASC
The query takes ± 1.5 seconds on an artist table with 2000 rows. I'm afraid that this takes longer and longer as the table gets bigger and bigger. I tried using JOINS by can't seem to get this working because the subqueries contain WHERE statements.
Any ideas in the right direction would be greatly appreciated!
Try using JOINs instead of subqueries:
SELECT
artists.*, -- do you really need all this?
count(user_id) AS all_likes,
sum(user_id IN (1, 2, 3, 4)) AS friend_likes
FROM artists a
LEFT JOIN likes l
ON l.like_type = 'artist' AND l.like_id = a.id
GROUP BY a.id
ORDER BY
friend_likes DESC,
all_likes DESC,
artists.name ASC;
If this doesn't make the query faster, try adding indices, or consider selecting less fields.
You need to break this down a bit to see where the time goes. You're absolutely right that 1.5 sec on 2000 rows won't scale well. I suspect you need to look at indexes and foreign-key relationships. Look at each count/group-by query individually to get them tuned as best you can then recombine.
try rolling up into a query using inline IF() and go through the table/join ONCE
SELECT STRAIGHT_JOIN
artists.*
, LikeCounts.AllCount
, LikeCounts.FriendLikeCount
FROM
(SELECT
like_id
, count(*) AllCount
, sum( If( User_id in ( 1, 2, 3, 4 ), 1, 0 ) as FriendLikeCount
FROM
friend_likes
WHERE
like_type = 'artist'
GROUP BY
like_id ) LikeCounts
JOIN artists ON LikeCounts.like_id = artists.id
ORDER BY
LikeCounts.FriendLikeCount DESC
, LikeCounts.AllCount DESC
, artists.name ASC