Results within radius - Optimising slow MySQL query - mysql

SELECT property.paon, property.saon, property.street, property.postcode, property.lastSalePrice, property.lastTransferDate,
epc.ADDRESS1, epc.POSTCODE, epc.TOTAL_FLOOR_AREA,
(
3959 * acos (
cos (radians(54.6921))
* cos(radians(property.latitude))
* cos(radians(property.longitude) - radians(-1.2175))
+ sin(radians(54.6921))
* sin(radians(property.latitude))
)
) AS distance
FROM property
RIGHT JOIN epc ON property.postcode = epc.POSTCODE AND CONCAT(property.paon, ', ', property.street) = epc.ADDRESS1
WHERE property.paon IS NOT NULL AND epc.TOTAL_FLOOR_AREA > 0
GROUP BY CONCAT(property.paon, ', ', property.street)
HAVING distance < 1.4
ORDER BY property.lastTransferDate DESC
LIMIT 10
table property has 22 million rows, table epc has 14 million rows
Without the GROUP BY and ORDER BY, it runs fast.
Property table often has the same property multiple times, but I need to select the one with the most current lastTransferDate.
If there is a better approach I'm open to it
Here is the explain of query:
Query-Explain-Image

You can do a few things:
Create a new column so you don't need to use CONCAT CONCAT(property.paon, ', ', property.street) in the GROUP BY and the JOIN (this will speed it up a lot!)
As JackHacks says you need to create indexes at the right spot. (property postcode and the newly created column, and epc postcode and address)
Remove the HAVING with epc.TOTAL_FLOOR_AREA > 0 and add it to the WHERE
If you need more help, share en EXPLAIN of your query with us.

Do you control the database? If you do, you could try adding indexes on the address and postcode columns (anything in the join clause). That should probably speed up the query.
Edit: reread your question.
If the GROUP BY and ORDER BY clauses are slowing it down, I would try adding indexes on the columns referenced in those clauses.

Related

How can I improve SQL Count performance?

One of my SQL queries is very slow. I need to COUNT on a table with a total of close to 300,000 records, but it takes 8 seconds for the query to return results.
SELECT oc_subject.*,
(SELECT COUNT(sid) FROM oc_details
WHERE DATE(oc_details.created) > DATE(NOW() - INTERVAL 1 DAY)
AND oc_details.sid = oc_subject.id) as totalDetails
FROM oc_subject
WHERE oc_subject.status='1'
ORDER BY created DESC LIMIT " . (int)$start . ", " . (int)$limit;
In this way: total 50, Query 8.5837 second
SELECT oc_subject.*
FROM oc_subject
WHERE oc_subject.status='1'
ORDER BY created DESC LIMIT 0, 50
Without Count: total 50, Query 0.0457 second
Lots of improvements possible:
Firstly, let's talk about the outer query (main SELECT query) on the oc_subject table. This query can take the benefit of ORDER BY Optimization by using the composite index: (status, created). So, define the following index (if not defined already):
ALTER TABLE oc_subject ADD INDEX (status, created);
Secondly, your subquery to get Count is not Sargeable, because of using Date() function on the column inside WHERE clause. Due to this, it cannot use indexes properly.
Also, DATE(oc_details.created) > DATE(NOW() - INTERVAL 1 DAY) simply means that you are trying to consider those details which are created on the current date (today). This can be simply written as: oc_details.created >= CURRENT_DATE . Trick here is that even if created column is of datetime type, MySQL will implictly typecast the CURRENT_DATE value to CURRENT_DATE 00:00:00.
So change the inner subquery to as follows:
SELECT COUNT(sid)
FROM oc_details
WHERE oc_details.created >= CURRENT_DATE
AND oc_details.sid = oc_subject.id
Now, all the improvements on inner subquery will only be useful when you have a proper index defined on the oc_details table. So, define the following Composite (and Covering) Index on the oc_details table: (sid, created). Note that the order of columns is important here because created is a Range condition, hence it should appear at the end. So, define the following index (if not defined already):
ALTER TABLE oc_details ADD INDEX (sid, created);
Fourthly, in case of multi-table queries, it is advisable to use Aliasing, for code clarity (enhanced readability), and avoiding unambiguous behaviour.
So, once you have defined all the indexes (as discussed above), you can use the following query:
SELECT s.*,
(SELECT COUNT(d.sid)
FROM oc_details AS d
WHERE d.created >= CURRENT_DATE
AND d.sid = s.id) as totalDetails
FROM oc_subject AS s
WHERE s.status='1'
ORDER BY s.created DESC LIMIT " . (int)$start . ", " . (int)$limit;
Instead of several subquery (one of each row in your oc_subject table) you could try using a join on a single subquery grouped by sid, and date()
SELECT oc_subject.*, b.count_sid
FROM oc_subject
INNER JOIN (
SELECT sid, DATE(oc_details.created) date_created, COUNT(sid) count_sid
FROM oc_details
GROUP BY sid, DATE(oc_details.created)
) b on b.sid = oc_subject.id
AND b.date_created > DATE(NOW() - INTERVAL 1 DAY)
WHERE oc_subject.status='1'
ORDER BY created DESC LIMIT " . (int)$start . ", " . (int)$limit;
anyway be careful using php var for limit .. be sure you use sanited values for avoid sqlinjection .

Within radius - Slow query [duplicate]

SELECT property.paon, property.saon, property.street, property.postcode, property.lastSalePrice, property.lastTransferDate,
epc.ADDRESS1, epc.POSTCODE, epc.TOTAL_FLOOR_AREA,
(
3959 * acos (
cos (radians(54.6921))
* cos(radians(property.latitude))
* cos(radians(property.longitude) - radians(-1.2175))
+ sin(radians(54.6921))
* sin(radians(property.latitude))
)
) AS distance
FROM property
RIGHT JOIN epc ON property.postcode = epc.POSTCODE AND CONCAT(property.paon, ', ', property.street) = epc.ADDRESS1
WHERE property.paon IS NOT NULL AND epc.TOTAL_FLOOR_AREA > 0
GROUP BY CONCAT(property.paon, ', ', property.street)
HAVING distance < 1.4
ORDER BY property.lastTransferDate DESC
LIMIT 10
table property has 22 million rows, table epc has 14 million rows
Without the GROUP BY and ORDER BY, it runs fast.
Property table often has the same property multiple times, but I need to select the one with the most current lastTransferDate.
If there is a better approach I'm open to it
Here is the explain of query:
Query-Explain-Image
You can do a few things:
Create a new column so you don't need to use CONCAT CONCAT(property.paon, ', ', property.street) in the GROUP BY and the JOIN (this will speed it up a lot!)
As JackHacks says you need to create indexes at the right spot. (property postcode and the newly created column, and epc postcode and address)
Remove the HAVING with epc.TOTAL_FLOOR_AREA > 0 and add it to the WHERE
If you need more help, share en EXPLAIN of your query with us.
Do you control the database? If you do, you could try adding indexes on the address and postcode columns (anything in the join clause). That should probably speed up the query.
Edit: reread your question.
If the GROUP BY and ORDER BY clauses are slowing it down, I would try adding indexes on the columns referenced in those clauses.

MySQL update join from select very slow

We have a stored procedure that is used to prepare data for a report. Note that the schema isn't as normalized as it should be, but it is what it is and we cannot modify it, hence building the temporary table for the report. MySQL version is 5.1.70.
There is an update statement that updated the temp table from a join to a select. Note that all columns in this query that should have an index do, including the temp table (OI):
UPDATE `tmpOrderInquiry` OI
INNER JOIN (
SELECT `SalesOrderNo`,
GROUP_CONCAT(Inv.`InvoiceNo` ORDER BY `InvoiceNo` ASC SEPARATOR '~|~') as `InvoiceNoGRP`,
GROUP_CONCAT(DATE_FORMAT(date(`ShipDate`), ' %c/%d/%y') ORDER BY `ShipDate` ASC SEPARATOR '~|~') as `ShipDateGRP`,
GROUP_CONCAT(DATE_FORMAT(date(`LastPmtDate`), ' %c/%d/%y') ORDER BY `LastPmtDate` ASC SEPARATOR '~|~') as `LastPmtDateGRP`
FROM `InsynchInvoiceHistoryHeader` Inv
GROUP BY Inv.`SalesOrderNo`
) as OrdInv ON OI.`SalesOrderNo` = OrdInv.`SalesOrderNo`
SET OI.`InvoiceNo` = OrdInv.`InvoiceNoGRP`, OI.`ShipDate` = OrdInv.`ShipDateGRP`, OI.`LastPmtDate` = OrdInv.`LastPmtDateGRP`
This query takes approximately 70 seconds on average to complete. The select on its own executes sub-second. After much head banging, on a lark I replaced the above with:
CREATE TEMPORARY TABLE `tempWorking` AS SELECT
`SalesOrderNo`,
GROUP_CONCAT(Inv.`InvoiceNo` ORDER BY `InvoiceNo` ASC SEPARATOR '~|~') as `InvoiceNoGRP`,
GROUP_CONCAT(DATE_FORMAT(date(`ShipDate`), ' %c/%d/%y') ORDER BY `ShipDate` ASC SEPARATOR '~|~') as `ShipDateGRP`,
GROUP_CONCAT(DATE_FORMAT(date(`LastPmtDate`), ' %c/%d/%y') ORDER BY `LastPmtDate` ASC SEPARATOR '~|~') as `LastPmtDateGRP`
FROM `InsynchInvoiceHistoryHeader` Inv
GROUP BY Inv.`SalesOrderNo`;
UPDATE `tmpOrderInquiry` OI
INNER JOIN tempWorking as OrdInv ON OI.`SalesOrderNo` = OrdInv.`SalesOrderNo`
SET OI.`InvoiceNo` = OrdInv.`InvoiceNoGRP`, OI.`ShipDate` = OrdInv.`ShipDateGRP`, OI.`LastPmtDate` = OrdInv.`LastPmtDateGRP`
This query runs in about 2 seconds. Since the select itself is fast, I am at a loss to explain why the first query is so slow. Because the problem was acute I released this change to production but I don't like introducing a fix that all things being equal I would expect to actually make things slower.
Any insight as to why the first update statement is so slow would be appreciated.

What is the best way to pick a random row from a table in MySQL? [duplicate]

What is a fast way to select a random row from a large mysql table?
I'm working in php, but I'm interested in any solution even if it's in another language.
Grab all the id's, pick a random one from it, and retrieve the full row.
If you know the id's are sequential without holes, you can just grab the max and calculate a random id.
If there are holes here and there but mostly sequential values, and you don't care about a slightly skewed randomness, grab the max value, calculate an id, and select the first row with an id equal to or above the one you calculated. The reason for the skewing is that id's following such holes will have a higher chance of being picked than ones that follow another id.
If you order by random, you're going to have a terrible table-scan on your hands, and the word quick doesn't apply to such a solution.
Don't do that, nor should you order by a GUID, it has the same problem.
I knew there had to be a way to do it in a single query in a fast way. And here it is:
A fast way without involvement of external code, kudos to
http://jan.kneschke.de/projects/mysql/order-by-rand/
SELECT name
FROM random AS r1 JOIN
(SELECT (RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 1;
MediaWiki uses an interesting trick (for Wikipedia's Special:Random feature): the table with the articles has an extra column with a random number (generated when the article is created). To get a random article, generate a random number and get the article with the next larger or smaller (don't recall which) value in the random number column. With an index, this can be very fast. (And MediaWiki is written in PHP and developed for MySQL.)
This approach can cause a problem if the resulting numbers are badly distributed; IIRC, this has been fixed on MediaWiki, so if you decide to do it this way you should take a look at the code to see how it's currently done (probably they periodically regenerate the random number column).
Here's a solution that runs fairly quickly, and it gets a better random distribution without depending on id values being contiguous or starting at 1.
SET #r := (SELECT ROUND(RAND() * (SELECT COUNT(*) FROM mytable)));
SET #sql := CONCAT('SELECT * FROM mytable LIMIT ', #r, ', 1');
PREPARE stmt1 FROM #sql;
EXECUTE stmt1;
Maybe you could do something like:
SELECT * FROM table
WHERE id=
(FLOOR(RAND() *
(SELECT COUNT(*) FROM table)
)
);
This is assuming your ID numbers are all sequential with no gaps.
Add a column containing a calculated random value to each row, and use that in the ordering clause, limiting to one result upon selection. This works out faster than having the table scan that ORDER BY RANDOM() causes.
Update: You still need to calculate some random value prior to issuing the SELECT statement upon retrieval, of course, e.g.
SELECT * FROM `foo` WHERE `foo_rand` >= {some random value} LIMIT 1
There is another way to produce random rows using only a query and without order by rand().
It involves User Defined Variables.
See how to produce random rows from a table
In order to find random rows from a table, don’t use ORDER BY RAND() because it forces MySQL to do a full file sort and only then to retrieve the limit rows number required. In order to avoid this full file sort, use the RAND() function only at the where clause. It will stop as soon as it reaches to the required number of rows.
See
http://www.rndblog.com/how-to-select-random-rows-in-mysql/
if you don't delete row in this table, the most efficient way is:
(if you know the mininum id just skip it)
SELECT MIN(id) AS minId, MAX(id) AS maxId FROM table WHERE 1
$randId=mt_rand((int)$row['minId'], (int)$row['maxId']);
SELECT id,name,... FROM table WHERE id=$randId LIMIT 1
I see here a lot of solution. One or two seems ok but other solutions have some constraints. But the following solution will work for all situation
select a.* from random_data a, (select max(id)*rand() randid from random_data) b
where a.id >= b.randid limit 1;
Here, id, don't need to be sequential. It could be any primary key/unique/auto increment column. Please see the following Fastest way to select a random row from a big MySQL table
Thanks
Zillur
- www.techinfobest.com
For selecting multiple random rows from a given table (say 'words'), our team came up with this beauty:
SELECT * FROM
`words` AS r1 JOIN
(SELECT MAX(`WordID`) as wid_c FROM `words`) as tmp1
WHERE r1.WordID >= (SELECT (RAND() * tmp1.wid_c) AS id) LIMIT n
The classic "SELECT id FROM table ORDER BY RAND() LIMIT 1" is actually OK.
See the follow excerpt from the MySQL manual:
If you use LIMIT row_count with ORDER BY, MySQL ends the sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result.
With a order yo will do a full scan table.
Its best if you do a select count(*) and later get a random row=rownum between 0 and the last registry
An easy but slow way would be (good for smallish tables)
SELECT * from TABLE order by RAND() LIMIT 1
In pseudo code:
sql "select id from table"
store result in list
n = random(size of list)
sql "select * from table where id=" + list[n]
This assumes that id is a unique (primary) key.
Take a look at this link by Jan Kneschke or this SO answer as they both discuss the same question. The SO answer goes over various options also and has some good suggestions depending on your needs. Jan goes over all the various options and the performance characteristics of each. He ends up with the following for the most optimized method by which to do this within a MySQL select:
SELECT name
FROM random AS r1 JOIN
(SELECT (RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 1;
HTH,
-Dipin
I'm a bit new to SQL but how about generating a random number in PHP and using
SELECT * FROM the_table WHERE primary_key >= $randNr
this doesn't solve the problem with holes in the table.
But here's a twist on lassevks suggestion:
SELECT primary_key FROM the_table
Use mysql_num_rows() in PHP create a random number based on the above result:
SELECT * FROM the_table WHERE primary_key = rand_number
On a side note just how slow is SELECT * FROM the_table:
Creating a random number based on mysql_num_rows() and then moving the data pointer to that point mysql_data_seek(). Just how slow will this be on large tables with say a million rows?
I ran into the problem where my IDs were not sequential. What I came up with this.
SELECT * FROM products WHERE RAND()<=(5/(SELECT COUNT(*) FROM products)) LIMIT 1
The rows returned are approximately 5, but I limit it to 1.
If you want to add another WHERE clause it becomes a bit more interesting. Say you want to search for products on discount.
SELECT * FROM products WHERE RAND()<=(100/(SELECT COUNT(*) FROM pt_products)) AND discount<.2 LIMIT 1
What you have to do is make sure you are returning enough result which is why I have it set to 100. Having a WHERE discount<.2 clause in the subquery was 10x slower, so it's better to return more results and limit.
Use the below query to get the random row
SELECT user_firstname ,
COUNT(DISTINCT usr_fk_id) cnt
FROM userdetails
GROUP BY usr_fk_id
ORDER BY cnt ASC
LIMIT 1
In my case my table has an id as primary key, auto-increment with no gaps, so I can use COUNT(*) or MAX(id) to get the number of rows.
I made this script to test the fastest operation:
logTime();
query("SELECT COUNT(id) FROM tbl");
logTime();
query("SELECT MAX(id) FROM tbl");
logTime();
query("SELECT id FROM tbl ORDER BY id DESC LIMIT 1");
logTime();
The results are:
Count: 36.8418693542479 ms
Max: 0.241041183472 ms
Order: 0.216960906982 ms
Answer with the order method:
SELECT FLOOR(RAND() * (
SELECT id FROM tbl ORDER BY id DESC LIMIT 1
)) n FROM tbl LIMIT 1
...
SELECT * FROM tbl WHERE id = $result;
I have used this and the job was done
the reference from here
SELECT * FROM myTable WHERE RAND()<(SELECT ((30/COUNT(*))*10) FROM myTable) ORDER BY RAND() LIMIT 30;
Create a Function to do this most likely the best answer and most fastest answer here!
Pros - Works even with Gaps and extremely fast.
<?
$sqlConnect = mysqli_connect('localhost','username','password','database');
function rando($data,$find,$max = '0'){
global $sqlConnect; // Set as mysqli connection variable, fetches variable outside of function set as GLOBAL
if($data == 's1'){
$query = mysqli_query($sqlConnect, "SELECT * FROM `yourtable` ORDER BY `id` DESC LIMIT {$find},1");
$fetched_data = mysqli_fetch_assoc($query);
if(mysqli_num_rows($fetched_data>0){
return $fetch_$data;
}else{
rando('','',$max); // Start Over the results returned nothing
}
}else{
if($max != '0'){
$irand = rand(0,$max);
rando('s1',$irand,$max); // Start rando with new random ID to fetch
}else{
$query = mysqli_query($sqlConnect, "SELECT `id` FROM `yourtable` ORDER BY `id` DESC LIMIT 0,1");
$fetched_data = mysqli_fetch_assoc($query);
$max = $fetched_data['id'];
$irand = rand(1,$max);
rando('s1',$irand,$max); // Runs rando against the random ID we have selected if data exist will return
}
}
}
$your_data = rando(); // Returns listing data for a random entry as a ASSOC ARRAY
?>
Please keep in mind this code as not been tested but is a working concept to return random entries even with gaps.. As long as the gaps are not huge enough to cause a load time issue.
Quick and dirty method:
SET #COUNTER=SELECT COUNT(*) FROM your_table;
SELECT PrimaryKey
FROM your_table
LIMIT 1 OFFSET (RAND() * #COUNTER);
The complexity of the first query is O(1) for MyISAM tables.
The second query accompanies a table full scan. Complexity = O(n)
Dirty and quick method:
Keep a separate table for this purpose only. You should also insert the same rows to this table whenever inserting to the original table. Assumption: No DELETEs.
CREATE TABLE Aux(
MyPK INT AUTO_INCREMENT,
PrimaryKey INT
);
SET #MaxPK = (SELECT MAX(MyPK) FROM Aux);
SET #RandPK = CAST(RANDOM() * #MaxPK, INT)
SET #PrimaryKey = (SELECT PrimaryKey FROM Aux WHERE MyPK = #RandPK);
If DELETEs are allowed,
SET #delta = CAST(#RandPK/10, INT);
SET #PrimaryKey = (SELECT PrimaryKey
FROM Aux
WHERE MyPK BETWEEN #RandPK - #delta AND #RandPK + #delta
LIMIT 1);
The overall complexity is O(1).
SELECT DISTINCT * FROM yourTable WHERE 4 = 4 LIMIT 1;

quick selection of a random row from a large table in mysql

What is a fast way to select a random row from a large mysql table?
I'm working in php, but I'm interested in any solution even if it's in another language.
Grab all the id's, pick a random one from it, and retrieve the full row.
If you know the id's are sequential without holes, you can just grab the max and calculate a random id.
If there are holes here and there but mostly sequential values, and you don't care about a slightly skewed randomness, grab the max value, calculate an id, and select the first row with an id equal to or above the one you calculated. The reason for the skewing is that id's following such holes will have a higher chance of being picked than ones that follow another id.
If you order by random, you're going to have a terrible table-scan on your hands, and the word quick doesn't apply to such a solution.
Don't do that, nor should you order by a GUID, it has the same problem.
I knew there had to be a way to do it in a single query in a fast way. And here it is:
A fast way without involvement of external code, kudos to
http://jan.kneschke.de/projects/mysql/order-by-rand/
SELECT name
FROM random AS r1 JOIN
(SELECT (RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 1;
MediaWiki uses an interesting trick (for Wikipedia's Special:Random feature): the table with the articles has an extra column with a random number (generated when the article is created). To get a random article, generate a random number and get the article with the next larger or smaller (don't recall which) value in the random number column. With an index, this can be very fast. (And MediaWiki is written in PHP and developed for MySQL.)
This approach can cause a problem if the resulting numbers are badly distributed; IIRC, this has been fixed on MediaWiki, so if you decide to do it this way you should take a look at the code to see how it's currently done (probably they periodically regenerate the random number column).
Here's a solution that runs fairly quickly, and it gets a better random distribution without depending on id values being contiguous or starting at 1.
SET #r := (SELECT ROUND(RAND() * (SELECT COUNT(*) FROM mytable)));
SET #sql := CONCAT('SELECT * FROM mytable LIMIT ', #r, ', 1');
PREPARE stmt1 FROM #sql;
EXECUTE stmt1;
Maybe you could do something like:
SELECT * FROM table
WHERE id=
(FLOOR(RAND() *
(SELECT COUNT(*) FROM table)
)
);
This is assuming your ID numbers are all sequential with no gaps.
Add a column containing a calculated random value to each row, and use that in the ordering clause, limiting to one result upon selection. This works out faster than having the table scan that ORDER BY RANDOM() causes.
Update: You still need to calculate some random value prior to issuing the SELECT statement upon retrieval, of course, e.g.
SELECT * FROM `foo` WHERE `foo_rand` >= {some random value} LIMIT 1
There is another way to produce random rows using only a query and without order by rand().
It involves User Defined Variables.
See how to produce random rows from a table
In order to find random rows from a table, don’t use ORDER BY RAND() because it forces MySQL to do a full file sort and only then to retrieve the limit rows number required. In order to avoid this full file sort, use the RAND() function only at the where clause. It will stop as soon as it reaches to the required number of rows.
See
http://www.rndblog.com/how-to-select-random-rows-in-mysql/
if you don't delete row in this table, the most efficient way is:
(if you know the mininum id just skip it)
SELECT MIN(id) AS minId, MAX(id) AS maxId FROM table WHERE 1
$randId=mt_rand((int)$row['minId'], (int)$row['maxId']);
SELECT id,name,... FROM table WHERE id=$randId LIMIT 1
I see here a lot of solution. One or two seems ok but other solutions have some constraints. But the following solution will work for all situation
select a.* from random_data a, (select max(id)*rand() randid from random_data) b
where a.id >= b.randid limit 1;
Here, id, don't need to be sequential. It could be any primary key/unique/auto increment column. Please see the following Fastest way to select a random row from a big MySQL table
Thanks
Zillur
- www.techinfobest.com
For selecting multiple random rows from a given table (say 'words'), our team came up with this beauty:
SELECT * FROM
`words` AS r1 JOIN
(SELECT MAX(`WordID`) as wid_c FROM `words`) as tmp1
WHERE r1.WordID >= (SELECT (RAND() * tmp1.wid_c) AS id) LIMIT n
The classic "SELECT id FROM table ORDER BY RAND() LIMIT 1" is actually OK.
See the follow excerpt from the MySQL manual:
If you use LIMIT row_count with ORDER BY, MySQL ends the sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result.
With a order yo will do a full scan table.
Its best if you do a select count(*) and later get a random row=rownum between 0 and the last registry
An easy but slow way would be (good for smallish tables)
SELECT * from TABLE order by RAND() LIMIT 1
In pseudo code:
sql "select id from table"
store result in list
n = random(size of list)
sql "select * from table where id=" + list[n]
This assumes that id is a unique (primary) key.
Take a look at this link by Jan Kneschke or this SO answer as they both discuss the same question. The SO answer goes over various options also and has some good suggestions depending on your needs. Jan goes over all the various options and the performance characteristics of each. He ends up with the following for the most optimized method by which to do this within a MySQL select:
SELECT name
FROM random AS r1 JOIN
(SELECT (RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 1;
HTH,
-Dipin
I'm a bit new to SQL but how about generating a random number in PHP and using
SELECT * FROM the_table WHERE primary_key >= $randNr
this doesn't solve the problem with holes in the table.
But here's a twist on lassevks suggestion:
SELECT primary_key FROM the_table
Use mysql_num_rows() in PHP create a random number based on the above result:
SELECT * FROM the_table WHERE primary_key = rand_number
On a side note just how slow is SELECT * FROM the_table:
Creating a random number based on mysql_num_rows() and then moving the data pointer to that point mysql_data_seek(). Just how slow will this be on large tables with say a million rows?
I ran into the problem where my IDs were not sequential. What I came up with this.
SELECT * FROM products WHERE RAND()<=(5/(SELECT COUNT(*) FROM products)) LIMIT 1
The rows returned are approximately 5, but I limit it to 1.
If you want to add another WHERE clause it becomes a bit more interesting. Say you want to search for products on discount.
SELECT * FROM products WHERE RAND()<=(100/(SELECT COUNT(*) FROM pt_products)) AND discount<.2 LIMIT 1
What you have to do is make sure you are returning enough result which is why I have it set to 100. Having a WHERE discount<.2 clause in the subquery was 10x slower, so it's better to return more results and limit.
Use the below query to get the random row
SELECT user_firstname ,
COUNT(DISTINCT usr_fk_id) cnt
FROM userdetails
GROUP BY usr_fk_id
ORDER BY cnt ASC
LIMIT 1
In my case my table has an id as primary key, auto-increment with no gaps, so I can use COUNT(*) or MAX(id) to get the number of rows.
I made this script to test the fastest operation:
logTime();
query("SELECT COUNT(id) FROM tbl");
logTime();
query("SELECT MAX(id) FROM tbl");
logTime();
query("SELECT id FROM tbl ORDER BY id DESC LIMIT 1");
logTime();
The results are:
Count: 36.8418693542479 ms
Max: 0.241041183472 ms
Order: 0.216960906982 ms
Answer with the order method:
SELECT FLOOR(RAND() * (
SELECT id FROM tbl ORDER BY id DESC LIMIT 1
)) n FROM tbl LIMIT 1
...
SELECT * FROM tbl WHERE id = $result;
I have used this and the job was done
the reference from here
SELECT * FROM myTable WHERE RAND()<(SELECT ((30/COUNT(*))*10) FROM myTable) ORDER BY RAND() LIMIT 30;
Create a Function to do this most likely the best answer and most fastest answer here!
Pros - Works even with Gaps and extremely fast.
<?
$sqlConnect = mysqli_connect('localhost','username','password','database');
function rando($data,$find,$max = '0'){
global $sqlConnect; // Set as mysqli connection variable, fetches variable outside of function set as GLOBAL
if($data == 's1'){
$query = mysqli_query($sqlConnect, "SELECT * FROM `yourtable` ORDER BY `id` DESC LIMIT {$find},1");
$fetched_data = mysqli_fetch_assoc($query);
if(mysqli_num_rows($fetched_data>0){
return $fetch_$data;
}else{
rando('','',$max); // Start Over the results returned nothing
}
}else{
if($max != '0'){
$irand = rand(0,$max);
rando('s1',$irand,$max); // Start rando with new random ID to fetch
}else{
$query = mysqli_query($sqlConnect, "SELECT `id` FROM `yourtable` ORDER BY `id` DESC LIMIT 0,1");
$fetched_data = mysqli_fetch_assoc($query);
$max = $fetched_data['id'];
$irand = rand(1,$max);
rando('s1',$irand,$max); // Runs rando against the random ID we have selected if data exist will return
}
}
}
$your_data = rando(); // Returns listing data for a random entry as a ASSOC ARRAY
?>
Please keep in mind this code as not been tested but is a working concept to return random entries even with gaps.. As long as the gaps are not huge enough to cause a load time issue.
Quick and dirty method:
SET #COUNTER=SELECT COUNT(*) FROM your_table;
SELECT PrimaryKey
FROM your_table
LIMIT 1 OFFSET (RAND() * #COUNTER);
The complexity of the first query is O(1) for MyISAM tables.
The second query accompanies a table full scan. Complexity = O(n)
Dirty and quick method:
Keep a separate table for this purpose only. You should also insert the same rows to this table whenever inserting to the original table. Assumption: No DELETEs.
CREATE TABLE Aux(
MyPK INT AUTO_INCREMENT,
PrimaryKey INT
);
SET #MaxPK = (SELECT MAX(MyPK) FROM Aux);
SET #RandPK = CAST(RANDOM() * #MaxPK, INT)
SET #PrimaryKey = (SELECT PrimaryKey FROM Aux WHERE MyPK = #RandPK);
If DELETEs are allowed,
SET #delta = CAST(#RandPK/10, INT);
SET #PrimaryKey = (SELECT PrimaryKey
FROM Aux
WHERE MyPK BETWEEN #RandPK - #delta AND #RandPK + #delta
LIMIT 1);
The overall complexity is O(1).
SELECT DISTINCT * FROM yourTable WHERE 4 = 4 LIMIT 1;