The data looks as follows:
13 users with id's 2-13. User 2 got 2 likes, user 10 got 2 likes, user 3 got 1 like. The rest didn't get any likes.
Prisma query looks like this:
return this.prisma.user.findMany({
skip: Number(page) * Number(size),
take: Number(size),
orderBy: { likesReceived: { _count: "desc" }
});
When I send a query to the database, ordering by likesReceived I get these responses:
page
size
items id's
0
5
2, 10, 3, 4, 14
1
5
6, 7, 8, 9, 11
2
5
12, 13, 14
User 14 appears twice, and user 5 is missing. Why?
Additional sorting by id fixes the problem:
return this.prisma.user.findMany({
skip: Number(page) * Number(size),
take: Number(size),
orderBy: [{ likesReceived: { _count: "desc" } }, { id: "asc" }],
});
Results:
page
size
items id's
0
5
2, 10, 3, 4, 5
1
5
6, 7, 8, 9, 11
2
5
12, 13, 14
When is specifying a second parameter in orderBy with pagination necessary for it to work well?
I agree with the provided answer, just posting here what I replied on the Prisma repo issue directly:
What I suspect is happening here is that in the first case where you order only by the count, the count values are not unique (if I got your description correctly a lot of them have count 0). In this case I don't think the order within the values with the same count is stable between different requests at the database level. This is normal in the SQL world. The solution is to add a tiebreaker, a second field to order by that ideally is unique. This you did in the second request and then got a stable ordering.
So I'd say this is not a bug but expected behaviour from the database and therefore Prisma. The fix you already found, just add a second unique field to break ties or use that one as cursor directly.
I had a similar issue with Laravel using sqlserver.
Laravel was doing a different query for the first page than the subsequent pages. For page 1 they used...
SELECT TOP 100 * FROM users
while for subsequent pages they used row_number(), something like...
SELECT * FROM (
SELECT
ROW_NUMBER() row_num,
*
FROM
users
) u
WHERE
row_num > 100 AND row_num <=201;
Sqlserver doesn't do a default order by Default row order in SELECT query - SQL Server 2008 vs SQL 2012, rather each time it will choose the most optimized way. Therefore on the page 1 query using TOP it chose one way to order and on page 2 with row_number() it chose a different way to order. Thereby returning duplicate results in page 2 that were already in page 1. This was true even though I had many other order bys.
Mysql also seems not to have a default order by SQL: What is the default Order By of queries?.
I don't know if Prisma does the same thing with mysql. Printing out the queries may shed light on if different queries are used for different pages.
Either way if you're using pagination it may make sense, to do as you mentioned and to always use id as a final order by. Like this even if your other intended order bys allow the same record to be on multiple pages the final order by id will ensure that doesn't occur, since now you're forcing it to order by ids instead of choosing a more optimal approach that doesn't order by ids.
In your case since user 14 has 0 likes it can be on any page after 2, 10 and 3 and still satisfy your likesReceived orderBy. But with the id order by then it'll always be on page 2, since page 1 will now have 4 and 6 as the last records, instead of 14, due to the 2nd orderBy of id.
Related
We're currently dealing with a slow query in an odd situation. The issue comes into play when we LIMIT the results by 1, 2, 3, 4, 5, 6, but it works with any other limits. This issue is also limited to this one specific user. We can't reproduce the slowness/timeouts with any other user.
We can change the ORDER BY to use a different column, and the query works. We can remove the LIMIT 1, and the query works. Once we change the LIMIT to anything between 1-6, it timesout.
We could get away with setting the ORDER BY on a different column, but this may cause reporting issues in the future and doesn't address the 'why' is this happening.
The query:
SELECT
*
FROM
table_name tn
WHERE
tn.user = '123'
ORDER BY
timestamp_col DESC
LIMIT 1
And our data:
user --- timestamp_col ---
123 2005-02-23 02:02:34
123 2005-03-21 00:12:30
123 2006-01-09 14:23:48
123 2006-01-10 15:01:05
123 2006-01-20 13:11:13
123 2006-10-20 20:08:00
123 2006-11-01 18:31:03
123 2006-12-01 09:10:12
Are there special needs when ordering by a timestamp?
Add the composite
INDEX(user, timestamp_col)
That way both the WHERE, the ORDER BY, and the LIMIT are all handled by the index. And it will stop after getting the desired LIMIT.
Any single-column index needs to read lots of rows and/or sort those rows.
I am trying to return the ranking of a user in a table, and I am stumped.
First off, I have a table that captures scores in my game called Participation. For two players, it would contain results with a user_id, game_id, and finally a score, like so:
<Participation id: 168, user_id: 1, game_id: 7, ranking: 0, created_at: "2016-04-07 05:36:48", updated_at: "2016-04-07 05:36:58", finished: true, current_question_index: 3, score: 2>
And then a second result may be:
<Participation id: 169, user_id: 2, game_id: 7, ranking: 0, created_at: "2016-04-07 05:36:48", updated_at: "2016-04-07 05:36:58", finished: true, current_question_index: 3, score: 1>
If I wanted to show a leaderboard of where users placed, it would be easy, like: Participation.where(game_id: "7").order(:asc). I am doing that now successfully.
Instead though, I want to return a result of where a user ranks in the table, if organized by score, against the competition. For bare bones, in the example above, I would have user 1 and user 2, both played game 7, and:
user_id 1: should return a 1 for 1st place, higher score of 2 points
user_id 2: should return a 2 for 2nd place, lower score of 1 point
How can I rewrite that participation statement in my controller to check where a user ranks for a matching game_id based on score and then assign an integer value based on that ranking?
For bonus points, if I can have the controller return that value (like 1 for user_id 1), do you think it would be a bad idea to use update_attributes to add that to the ranking column rather than breaking out a new table to store user rankings?
If you're using mysql, try using the ROW_NUMBER function on an ordered query to calculate rank:
Participation.select("user_id, ROW_NUMBER() AS rank").where(game_id: game_id).order(score: :asc)
The generated SQL would be:
SELECT user_id, ROW_NUMBER() AS rank FROM "participations" ORDER BY "participations"."score" ASC
I usually use Postgres so not able to test the query directly, however, it should work.
I'd recommend caching the rank column if you need to access it frequently, or if you need to access rank for a single user/game pair. You'd also need to set up a background job to do this on a recurring basis, perhaps once every 15 minutes or so. However, the benefits of the dynamic query above is that it's more likely to be up to date, but takes time to generate depending on how many participation entries exist for that particular game.
Try
Participation.all.order("game_id asc, score desc")
I ended up figuring out a strategy on how to do this. Again, the key thing here is that I want to assign a "rank" to a user, which is effectively an integer representing which "row" they would be in if we were to order all results by the column score.
Here's my process:
a = Participation.where(user_id: current_user.id).last.score
b = Participation.where(user_id: current_user.id).last.id
scores = Participation.where(game_id: params[:game_id]).where("score > ?", a).count
if scores == 0
Participation.update(b, :ranking => 1)
else
Participation.update(b, :ranking => scores + 1)
end
end
In short, I took a count of how many higher scores there are for a particular result. So if I am user 2 and I have the second highest score, this would count 1 result. Using the if/else logic, I think translate this to the ranking and update my table accordingly.
You could push back here and say the ranking likely would need frequent updates (like a background job), and that is surely true. This method does work to answer my initial question though.
I have 2 tables posts<id, user_id, text, votes_counter, created> and votes<id, post_id, user_id, vote>. Here the table vote can be either 1 (upvote) or -1(downvote). Now if I need to fetch the total votes(upvotes - downvotes) on a post, I can do it in 2 ways.
Use count(*) to count the number of upvotes and downvotes on that post from votes table and then do the maths.
Set up a counter column votes_counter and increment or decrement it everytime a user upvotes or downvotes. Then simply extract that votes_counter.
My question is which one is better and under what condition. By saying condition, I mean factors like scalability, peaktime et cetera.
To what I know, if I use method 1, for a table with millions of rows, count(*) could be a heavy operation. To avoid that situation, if I use a counter then during peak time, the votes_counter column might get deadlocked, too many users trying to update the counter!
Is there a third way better than both and as simple to implement?
The two approaches represent a common tradeoff between complexity of implementation and speed.
The first approach is very simple to implement, because it does not require you to do any additional coding.
The second approach is potentially a lot faster, especially when you need to count a small percentage of items in a large table
The first approach can be sped up by well designed indexes. Rather than searching through the whole table, your RDBMS could retrieve a few records from the index, and do the counts using them
The second approach can become very complex very quickly:
You need to consider what happens to the counts when a user gets deleted
You should consider what happens when the table of votes is manipulated by tools outside your program. For example, merging records from two databases may prove a lot more complex when the current counts are stored along with the individual ones.
I would start with the first approach, and see how it performs. Then I would try optimizing it with indexing. Finally, I would consider going with the second approach, possibly writing triggers to update counts automatically.
As this sounds a lot like StackExchange, I'll refer you to this answer on the meta about the database schema used on the site. The votes table looks like this:
Votes table:
Id
PostId
VoteTypeId, one of the following values:
1 - AcceptedByOriginator
2 - UpMod
3 - DownMod
4 - Offensive
5 - Favorite (if VoteTypeId = 5, UserId will be populated)
6 - Close
7 - Reopen
8 - BountyStart (if VoteTypeId = 8, UserId will be populated)
9 - BountyClose
10 - Deletion
11 - Undeletion
12 - Spam
15 - ModeratorReview
16 - ApproveEditSuggestion
UserId (only present if VoteTypeId is 5 or 8)
CreationDate
BountyAmount (only present if VoteTypeId is 8 or 9)
And so based on that it sounds like the way it would be run is:
SELECT VoteTypeId FROM Votes WHERE VoteTypeId = 2 OR VoteTypeId = 3
And then based on the value, do the maths:
int score = 0;
for each vote in voteQueryResults
if(vote == 2) score++;
if(vote == 3) score--;
Even with millions of results, this is probably going to be a very fast operation as it's so simple.
I'm stumped with how to do the following purely in MySQL, and I've resorted to taking my result set and manipulating it in ruby afterwards, which doesn't seem ideal.
Here's the question. With a dataset of 'items' like:
id state_id price issue_date listed
1 5 450 2011 1
1 5 455 2011 1
1 5 490 2011 1
1 5 510 2012 0
1 5 525 2012 1
...
I'm trying to get something like:
SELECT * FROM items
WHERE ([some conditions], e.g. issue_date >= 2011 and listed=1)
AND state_id = 5
GROUP BY id
HAVING AVG(price) <= 500
ORDER BY price DESC
LIMIT 25
Essentially I want to grab a "group" of items whose average price fall under a certain threshold. I know that my above example "group by" and "having" are not correct since it's just going to give the AVG(price) of that one item, which doesn't really make sense. I'm just trying to illustrate my desired result.
The important thing here is I want all of the individual items in my result set, I don't just want to see one row with the average price, total, etc.
Currently I'm just doing the above query without the HAVING AVG(price) and adding up the individual items one-by-one (in ruby) until I reach the desired average. It would be really great if I could figure out how to do this in SQL. Using subqueries or something clever like joining the table onto itself are certainly acceptable solutions if they work well! Thanks!
UPDATE: In response to Tudor's answer below, here are some clarifications. There is always going to be a target quantity in addition to the target average. And we would always sort the results by price low to high, and by date.
So if we did have 10 items that were all priced at $5 and we wanted to find 5 items with an average < $6, we'd simply return the first 5 items. We wouldn't return the first one only, and we wouldn't return the first 3 grouped with the last 2. That's essentially how my code in ruby is working right now.
I would do almost an inverse of what Jasper provided... Start your query with your criteria to explicitly limit the few items that MAY qualify instead of getting all items and running a sub-select on each entry. Could pose as a larger performance hit... could be wrong, but here's my offering..
select
i2.*
from
( SELECT i.id
FROM items i
WHERE
i.issue_date > 2011
AND i.listed = 1
AND i.state_id = 5
GROUP BY
i.id
HAVING
AVG( i.price) <= 500 ) PreQualify
JOIN items i2
on PreQualify.id = i2.id
AND i2.issue_date > 2011
AND i2.listed = 1
AND i2.state_id = 5
order by
i2.price desc
limit
25
Not sure of the order by, especially if you wanted grouping by item... In addition, I would ensure an index on (state_id, Listed, id, issue_date)
CLARIFICATION per comments
I think I AM correct on it. Don't confuse "HAVING" clause with "WHERE". WHERE says DO or DONT include based on certain conditions. HAVING means after all the where clauses and grouping is done, the result set will "POTENTIALLY" accept the answer. THEN the HAVING is checked, and if IT STILL qualifies, includes in the result set, otherwise throws it out. Try the following from the INNER query alone... Do once WITHOUT the HAVING clause, then again WITH the HAVING clause...
SELECT i.id, avg( i.price )
FROM items i
WHERE i.issue_date > 2011
AND i.listed = 1
AND i.state_id = 5
GROUP BY
i.id
HAVING
AVG( i.price) <= 500
As you get more into writing queries, try the parts individually to see what you are getting vs what you are thinking... You'll find how / why certain things work. In addition, you are now talking in your updated question about getting multiple IDs and prices at apparent low and high range... yet you are also applying a limit. If you had 20 items, and each had 10 qualifying records, your limit of 25 would show all of the first item and 5 into the second... which is NOT what I think you want... you may want 25 of each qualified "id". That would wrap this query into yet another level...
What MySQL does makes perfectly sense. What you want to do does not make sense:
if you have let's say 4 items, each with price of 5 and you put HAVING AVERAGE <= 7 what you say is that the query should return ALL the permutations, like:
{1} - since item with id 1, can be a group by itself
{1,2}
{1,3}
{1,4}
{1,2,3}
{1,2,4}
...
and so on?
Your algorithm of computing the average in ruby is also not valid, if you have items with values 5, 1, 7, 10 - and seek for an average value of less than 7, element with value 10 can be returned just in a group with element of value 1. But, by your algorithm (if I understood correctly), element with value 1 is returned in the first group.
Update
What you want is something like the Knapsack problem and your approach is using some kind of Greedy Algorithm to solve it. I don't think there are straight, easy and correct ways to implement that in SQL.
After a google search, I found this article which tries to solve the knapsack problem with AI written in SQL.
By considering your item price as a weight, having the number of items and the desired average, you could compute the maximum value that can be entered in the 'knapsack' by multiplying desired_cost with number_of_items
I'm not entirely sure from your question, but I think this is a solution to your problem:
SELECT * FROM items
WHERE (some "conditions", e.g. issue_date > 2011 and listed=1)
AND state_id = 5
AND id IN (SELECT id
FROM items
GROUP BY id
HAVING AVG(price) <= 500)
ORDER BY price DESC
LIMIT 25
note: This is off the top of my head and I haven't done complex SQL in a while, so it might be wrong. I think this or something like it should work, though.
I've got a database table called servers with three columns 'id', 'name', and 'votes'.
How can I select the position of column id 5 by votes?
Example, I want to check which position server 3 is in by votes in my competition?
If I've interpreted your question correctly, you are asking how to find the rank of the row with id 5 in a list of servers sorted by votes. There is a complex solution, which requires sorting, but the easier solution which can be done in O(log(n)) space and O(n) time is to simply measure the number of votes for id = 5
select votes from servers where id = 5;
and then walk through the database and add one for every server encountered that has smaller number of votes. Alternatively, you can do something like:
select count(*) from servers where votes <= %votes
It is excessive to sort this (O(nlog(n) time) when you can simple iterate through the entire list once and gather all the information you need.
Use LIMIT:
SELECT id, name, votes FROM servers ORDER BY votes DESC LIMIT 2,1;
LIMIT a, b means "give me b rows, starting at row a", and a is zero-based.
OK, I misunderstood. Now. Suppose your server has 27 votes.
SELECT COUNT(*) FROM servers WHERE votes < 27;
Your server's rank will be 1 plus the result; ties are possible (i.e. ranks will be like 1, 2, 3, 3, 3, 6, 7, 7, 9 etc.).