I am very surprised that I can't figure this out.
I'm currently outputting a table from MySQL in a somewhat random order. I say somewhat because there is a formula that is partially reliant on RAND(). In any case, we can assume the order is effectively random for my question.
This was all working great, except I want to keep the same order for a "session". I don't want it to keep jumping around while actively using the data. I have been trying to figure out how to have MySQL generate the same sequence a second time.
I know that you can do RAND(N) where N is a seed, but as far as I can tell that will be the exact same number each time. So basically there will be no random factor at all if I use that.
What I would like is a way I can feed a seed into my ORDER BY and always get a reliable output order. For the same seed, I will get the same order, and if I feed in a different seed, it will be a different random order.
The best I could come up with is that I could create an additional table cell with a RAND for each row and use that for sorting. There are a few issues:
Additional memory is used in the database.
It doesn't work for multiple users, because I'd need a separate column for each user.
I have to think about this, but I'm pretty certain that there is a clever solution here that doesn't involve me adding an additional column to the database. Has anyone else ever encountered the need to do something like this?
As you mentioned, you can provide a seed to generate a sequence of random numbers. The algorithm for generating random numbers returns the same sequence of numbers for the same seed number. For example;
SELECT Rand(1) AS rnd, CustomerId, CustomerName FROM Customers ORDER BY rnd
By doing so you, you will always have the same random order for the seed "1". You can provide session Id or some similar number in order to get the same result.
Hope it helps.
I would strongly suggest that you use one of the columns to generate the number:
select t.*
from t
order by rand(t.id);
This assumes that id is an integer. You can get a different ordering by adjusting the seed, say, rand(t.id + 1).
Related
The problem is I need to do pagination.I want to use order by and limit.But my colleague told me mysql will return records in the same order,and since this job doesn't care in which order the records are shown,so we don't need order by.
So I want to ask if what he said is correct? Of course assuming that no records are updated or inserted between the two queries.
You don't show your query here, so I'm going to assume that it's something like the following (where ID is the primary key of the table):
select *
from TABLE
where ID >= :x:
limit 100
If this is the case, then with MySQL you will probably get rows in the same order every time. This is because the only predicate in the query involves the primary key, which is a clustered index for MySQL, so is usually the most efficient way to retrieve.
However, probably may not be good enough for you, and if your actual query is any more complex than this one, probably no longer applies. Even though you may think that nothing changes between queries (ie, no rows inserted or deleted), so you'll get the same optimization plan, that is not true.
For one thing, the block cache will have changed between queries, which may cause the optimizer to choose a different query plan. Or maybe not. But I wouldn't take the word of anyone other than one of the MySQL maintainers that it won't.
Bottom line: use an order by on whatever column(s) you're using to paginate. And if you're paginating by the primary key, that might actually improve your performance.
The key point here is that database engines need to handle potentially large datasets and need to care (a lot!) about performance. MySQL is never going to waste any resource (CPU cycles, memory, whatever) doing an operation that doesn't serve any purpose. Sorting result sets that aren't required to be sorted is a pretty good example of this.
When issuing a given query MySQL will try hard to return the requested data as quick as possible. When you insert a bunch of rows and then run a simple SELECT * FROM my_table query you'll often see that rows come back in the same order than they were inserted. That makes sense because the obvious way to store the rows is to append them as inserted and the obvious way to read them back is from start to end. However, this simplistic scenario won't apply everywhere, every time:
Physical storage changes. You won't just be appending new rows at the end forever. You'll eventually update values, delete rows. At some point, freed disk space will be reused.
Most real-life queries aren't as simple as SELECT * FROM my_table. Query optimizer will try to leverage indices, which can have a different order. Or it may decide that the fastest way to gather the required information is to perform internal sorts (that's typical for GROUP BY queries).
You mention paging. Indeed, I can think of some ways to create a paginator that doesn't require sorted results. For instance, you can assign page numbers in advance and keep them in a hash map or dictionary: items within a page may appear in random locations but paging will be consistent. This is of course pretty suboptimal, it's hard to code and requieres constant updating as data mutates. ORDER BY is basically the easiest way. What you can't do is just base your paginator in the assumption that SQL data sets are ordered sets because they aren't; neither in theory nor in practice.
As an anecdote, I once used a major framework that implemented pagination using the ORDER BY and LIMIT clauses. (I won't say the same because it isn't relevant to the question... well, dammit, it was CakePHP/2). It worked fine when sorting by ID. But it also allowed users to sort by arbitrary columns, which were often not unique, and I once found an item that was being shown in two different pages because the framework was naively sorting by a single non-unique column and that row made its way into both ORDER BY type LIMIT 10 and ORDER BY type LIMIT 10, 10 because both sortings complied with the requested condition.
I have already prepared a MySQL statement that gives me a "friend suggestions".
The table/result goes like
suggestion_id | suggestion_count
suggestion_count tells how many of "my" friends have the suggestion_id in their friends.
In other words it tells how many of "my" friends have this mutual friend.
The goal is to choose a few random rows from this result.
Please note that the goal is not just ORDER BY RAND()...
But this randomness should show more often the suggestion_ids with more count,
but not everytime.
The goal is to choose random suggestions but more often those with higher suggestion_count.
I am stuck at the ORDER BY RAND() part - is RAND() somehow settable for this?
Any suggestions ?
You are looking for a weighted random sample.
The RAND() function returns a value from 0 to 1. So, you need to generate a random number that's based on the value of suggestion_count.
How about this?
ORDER BY (100.0*RAND()) - LEAST(100,suggestion_count)
This gives a random number that's smaller the higher your suggestion_count value goes. It's based on a guess that 100 is a large suggestion_count value.
Edit
I arbitrarily chose a number of 100 as a "largest" value for suggestion_count. My little formula works like this:
For each row in the table it generates a random number in the range 0-100.
It then subtracts the value of suggestion_count for that row from it. So, if suggestion_count is 10 in one row and 20 in another row, the row with the 20 is more likely to come up first than the row with the 10 in the ORDER BY operation.
But if there was a row with more than 100 in suggestion_count, it would overwhelm the random number and come first every time. So we use that number 100 for all large suggestion_count values. That's the purpose of LEAST().
I hope that helps explain my procedure.
Edit I used the value 100 because using MAX(suggestion_count) is a bit harder to code up and debug. To do that you'll need a more complex query, maybe like this. But this isn't going to work directly for you because I don't know exactly what your tables look like.
SELECT a.suggestion_id
FROM suggestions AS a
JOIN ( SELECT MAX(suggestion_count) FROM suggestions) AS maxsug) AS b
ORDER BY (maxsug*RAND()) - LEAST(maxsug,a.suggestion_count)
If you used just MAX() in your ORDER BY clause, you turned your whole query into a single-row aggregate query, because MAX() is an aggregate function.
In my jsp application I have a search box that lets user to search for user names in the database. I send an ajax call on each keystroke and fetch 5 random names starting with the entered string.
I am using the below query:
select userid,name,pic from tbl_mst_users where name like 'queryStr%' order by rand() limit 5
But this is very slow as I have more than 2000 records in my table.
Is there any better approach which takes less time and let me achieve the same..? I need random values.
How slow is "very slow", in seconds?
The reason why your query could be slow is most likely that you didn't place an index on name. 2000 rows should be a piece of cake for MySQL to handle.
The other possible reason is that you have many columns in the SELECT clause. I assume in this case the MySQL engine first copies all this data to a temp table before sorting this large result set.
I advise the following, so that you work only with indexes, for as long as possible:
SELECT userid, name, pic
FROM tbl_mst_users
JOIN (
-- here, MySQL works on indexes only
SELECT userid
FROM tbl_mst_users
WHERE name LIKE 'queryStr%'
ORDER BY RAND() LIMIT 5
) AS sub USING(userid); -- join other columns only after picking the rows in the sub-query.
This method is a bit better, but still does not scale well. However, it should be sufficient for small tables (2000 rows is, indeed, small).
The link provided by #user1461434 is quite interesting. It describes a solution with almost constant performance. Only drawback is that it returns only one random row at a time.
does table has indexing on name?
if not apply it
2.MediaWiki uses an interesting trick (for Wikipedia's Special:Random feature): the table with the articles has an extra column with a random number (generated when the article is created). To get a random article, generate a random number and get the article with the next larger or smaller (don't recall which) value in the random number column. With an index, this can be very fast. (And MediaWiki is written in PHP and developed for MySQL.)
This approach can cause a problem if the resulting numbers are badly distributed; IIRC, this has been fixed on MediaWiki, so if you decide to do it this way you should take a look at the code to see how it's currently done (probably they periodically regenerate the random number column).
3.http://jan.kneschke.de/projects/mysql/order-by-rand/
I'm trying to figure out how to make a scope that will return random ActiveRecords while also supporting will_paginate.
Right now each page view gets a completely different random set of ActiveRecords. Thus each pagination link, is actually just another random set of records.
How might I set up a scope so that it will be a random order that persists through pagination?
I'm guessing that I need to have some sort of seed that is based on time?
I'm guessing that you're using ORDER BY RAND() in your SQL to get your random ordering. If you are, then you can provide a seed for the random number generator:
RAND(), RAND(N)
[...] If a constant integer argument N is specified, it is used as the seed value, which produces a repeatable sequence of column values.
So you just need to pick a seed (possibly even by using seed = rand(1e6) or something similar in your Ruby code) and track that seed in the session. Then, for the next page, pull the seed out of the session and feed it to MySQL's RAND function.
Keep in mind that ORDER BY RAND() (with or without a seed) isn't the cheapest operation on the planet. If your searchable table is small, you could generate a table with a bunch of columns, fill it with random numbers (probably generated by RAND), and join that table in to provide your random sequence to order by. You would provide different sequences to different viewers by choosing different columns from the random number table: for one user you sort by column 11 of the random number table/matrix but another user will use column 23. Keep in mind that tables (with a primary key) really are just functions on a finite domain (and vice versa) so which you choose is often just an implementation detail. Implementing RAND using a table will usually be pretty cumbersome but I thought I'd mention the option anyway.
I want to pick a random number from a database say 1, when I query again say 10 times I'm querying it should return all different random numbers.
Use ORDER BY RAND()
If you want to get just one random record, your query should look like:
SELECT * FROM table ORDER BY RAND() LIMIT 1
If you only want a random number, don't use a database, use a random number generator. If you don't want repeats, simply keep track of the random numbers you've seen before and, if you pick one again, increment the new number by one until you reach a number you haven't seen yet.
If you want, say 10, random records from the database, then use #sAc's solution, but get them all at the same time. That will ensure that you don't have repeats in your selection. If you must select them one at a time, use the same technique as for the random number and keep track of the records you've seen before. Don't use the LIMIT directive, but simply select the first record you haven't seen on each iteration.