What I'm trying to achieve is to return a random sample of x size from a dataset, then order it based on a column. This is what I have tried:
SELECT *
FROM Table
WHERE integerField > 0
ORDER BY RAND(), integerField DESC
LIMIT 100
The idea here is that it will first order the table by random, effectively shuffling it, then order the first 100 rows returned by the integerField. I believe the problem is that it does not do the limit before the order, so I'm either going to get 100 random lines back or the first 100 lines of the database ordered by score (In this example, it's the former)
Is there a way to achieve this in a single query, or will the output have to be manually parsed through external logic/additional queries?
Solution: Utilise a substring to collect the initial randomised sample, then order it:
(SELECT * FROM Table
WHERE integerString >= 0
ORDER BY RAND() LIMIT 100)
ORDER BY integerField DESC
Credit to Akina and jarlh for the pointer to use substring
Related
I need to select say 2000000 records at random from a very large database. I looked at previous questions. So please do not mark this question as duplicate. I need clarification. Most answers suggest using ORDER BY RAND() function. So my query will be:
SELECT DISTINCT no
FROM table
WHERE name != "null"
ORDER BY RAND()
LIMIT 2000000;
I want each record to be selected at random. I am not sure if I understand the ORDER BY RAND() effect here. But I am afraid it will select a random record, say 3498 and will continue selection from there, say, the next records will be: 3499, 3500, 3501, etc.
I want each recor to be random, not to start the order from a random record.
How can I select 2000000 random record where each record is selected at random? Can you simplify what exactly ORDER BY RAND() does?
Note that I use Google BigQuery so the performance issue should not be a big problem here. I just want to achieve the requirement of selecting random 2000000 records.
SELECT x
FROM T
ORDER BY RAND()
is equivalent to
SELECT x
FROM (
SELECT x, RAND() AS r
FROM T
)
ORDER BY r
The query generates a random value for each row, then uses that random value to order the rows. If you include a limit:
SELECT x
FROM T
ORDER BY RAND()
LIMIT 10
This randomly selects 10 rows from the table.
We have a table of 50k items and we display it at a search page with a random sort and 10 items per page. We need to apply some filters.
RAND() with or without a seed is very slow. Note that items have three categories. The first category should be displayed first with random order, and then the second category, also with random order.
generating a random number between 0 and max_id s not working because of pages and the previously mentioned constraints
randomizing the records with php makes items always display at the same page
Is there a better solution to speed up this random search?
here are few tip hopes it works
Put indexes on your main field on which you are filtering
reduce number of column in your select query (only use needed columns)
recheck your Joins
recheck your conditions
recheck your group/having/order By clause
Tip: Don't seed your RAND() call unless you're trying to test with a reproducible sequence of items.
This is tricky to do nearly perfectly without a lot of programming. In the meantime here are a couple of things to do.
First, try this. Instead of doing SELECT * FROM t ORDER BY RAND() LIMIT 10 use the following kind of subquery:
SELECT * FROM t
WHERE id IN (
SELECT id FROM t WHERE category = 1 ORDER BY RAND() LIMIT 10
UNION ALL
SELECT id FROM t WHERE category = 2 ORDER BY RAND() LIMIT 10
)
ORDER BY RAND()
This should save some time on the ORDER BY RAND() LIMIT 10 operation because it only has to shuffle the id values, not the whole record. But it's not an algorithmic change, just a volume-of-data change: it still has to shuffle the whole list of id values. So it's a quick patch, not a real fix.
Second, if you can write a PHP function that will generate a text string with, let's say, 100 random numbers between 1 and max_id, you could try this to get your first category.
SELECT * FROM t WHERE id IN
( SELECT DISTINCT id FROM t
WHERE category = 1 AND id IN (num, num, num, ..., num, num)
LIMIT 10 )
ORDER BY RAND()
This will give you ten, or fewer, randomly chosen records in the named category, pretty cheaply. Notice that you must provide many more than ten random numbers in your (num, num, num, num) list because not all the num values will be valid for rows with category = 1.
If you need more than one category, just use a similar query in a UNION to get the other category.
Both these approaches' performance will be improved by a compound index on (category, id).
Notice there's an extra ORDER BY RAND() at the end of each of those approaches' queries. That's because the lists of id values generated by the subqueries are likely to be in a non-random order.
I am trying to query my database to return, say, the top 16 ordered results (ordered by a field called rank) but in a random order.
I can do this easily by shuffling the returned (and ordered) 16 results using php to adjust the array that php will use. I am wondering if there is an easy way to do this directly in the query itself.
try
select * from
(
select * from your_table
order by rank
limit 16
) x
order by rand()
Is there a way to grab an exact amount of entries from a database example. For example say you had a table that just had an id and total visits for the columns. Say you wanted to grab exactly 20 entries and sort them by total visits. How would you go about this? I know how to sort the whole table, but would like to be able to grab the top twenty total visits and then sort them. Thanks
O and right now I am using sqlite, but I know in the future I will be using mysql also. Thanks
Try with:
SELECT * FROM TableName ORDER BY TotalVisits LIMIT 20
using limit to get the top 20,
and if you want to add another sort, add it after visit column
like :
SELECT * FROM mytable ORDER BY visits DESC
/*here put another order by field like date */
, date
LIMIT 20
Use ORDER - LIMIT clause
SELECT * FROM table ORDER BY field [ASC|DESC] LIMIT 20 OFFSET [offset value]
You need to use LIMIT, but you will need to put the whole thing in a subquery if you intend to re-sort the top 20 based on separate criteria. So
SELECT * from <table> order by <total visits column> LIMIT 20
will get you the top 20, but then to sort within that result you would do something like
SELECT * from
(SELECT * from <table> ORDER BY <total visits column> LIMIT 20)
ORDER BY <other criteria>
The LIMIT clause can be used to constrain the number of rows returned by the SELECT statement. LIMIT takes one or two numeric arguments, which must both be nonnegative integer constants, with these exceptions:
With two arguments, the first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return. The offset of the initial row is 0 (not 1):
SELECT * FROM tbl LIMIT 5,10; # Retrieve rows 6-15
To retrieve all rows from a certain offset up to the end of the result set, you can use some large number for the second parameter. This statement retrieves all rows from the 96th row to the last:
SELECT * FROM tbl LIMIT 95,18446744073709551615;
With one argument, the value specifies the number of rows to return from the beginning of the result set:
SELECT * FROM tbl LIMIT 5; # Retrieve first 5 rows
In other words, LIMIT row_count is equivalent to LIMIT 0, row_count.
All on: http://dev.mysql.com/doc/refman/5.5/en/select.html better explnation for mysql, however sqlite works same way: http://www.sqlite.org/lang_select.html
I have a query that looks like this:
SELECT article FROM table1 ORDER BY publish_date LIMIT 20
How does ORDER BY work? Will it order all records, then get the first 20, or will it get 20 records and order them by the publish_date field?
If it's the last one, you're not guaranteed to really get the most recent 20 articles.
It will order first, then get the first 20. A database will also process anything in the WHERE clause before ORDER BY.
The LIMIT clause can be used to constrain the number of rows returned by the SELECT statement. LIMIT takes one or two numeric arguments, which must both be nonnegative integer constants (except when using prepared statements).
With two arguments, the first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return. The offset of the initial row is 0 (not 1):
SELECT * FROM tbl LIMIT 5,10; # Retrieve rows 6-15
To retrieve all rows from a certain offset up to the end of the result set, you can use some large number for the second parameter. This statement retrieves all rows from the 96th row to the last:
SELECT * FROM tbl LIMIT 95,18446744073709551615;
With one argument, the value specifies the number of rows to return from the beginning of the result set:
SELECT * FROM tbl LIMIT 5; # Retrieve first 5 rows
In other words, LIMIT row_count is equivalent to LIMIT 0, row_count.
All details on: http://dev.mysql.com/doc/refman/5.0/en/select.html
Just as #James says, it will order all records, then get the first 20 rows.
As it is so, you are guaranteed to get the 20 first published articles, the newer ones will not be shown.
In your situation, I recommend that you add desc to order by publish_date, if you want the newest articles, then the newest article will be first.
If you need to keep the result in ascending order, and still only want the 10 newest articles you can ask mysql to sort your result two times.
This query below will sort the result descending and limit the result to 10 (that is the query inside the parenthesis). It will still be sorted in descending order, and we are not satisfied with that, so we ask mysql to sort it one more time. Now we have the newest result on the last row.
select t.article
from
(select article, publish_date
from table1
order by publish_date desc limit 10) t
order by t.publish_date asc;
If you need all columns, it is done this way:
select t.*
from
(select *
from table1
order by publish_date desc limit 10) t
order by t.publish_date asc;
I use this technique when I manually write queries to examine the database for various things. I have not used it in a production environment, but now when I bench marked it, the extra sorting does not impact the performance.
You could add [asc] or [desc] at the end of the order by to get the earliest or latest records
For example, this will give you the latest records first
ORDER BY stamp DESC
Append the LIMIT clause after ORDER BY
If there is a suitable index, in this case on the publish_date field, then MySQL need not scan the whole index to get the 20 records requested - the 20 records will be found at the start of the index. But if there is no suitable index, then a full scan of the table will be needed.
There is a MySQL Performance Blog article from 2009 on this.
You can use this code
SELECT article FROM table1 ORDER BY publish_date LIMIT 0,10
where 0 is a start limit of record & 10 number of record
LIMIT is usually applied as the last operation, so the result will first be sorted and then limited to 20. In fact, sorting will stop as soon as first 20 sorted results are found.
Could be simplified to this:
SELECT article FROM table1 ORDER BY publish_date DESC FETCH FIRST 20 ROWS ONLY;
You could also add many argument in the ORDER BY that is just comma separated like: ORDER BY publish_date, tab2, tab3 DESC etc...