I have a fairly large dataset and a query that requires two joins, so efficiency of the query is very important to me. I need to retrieve 3 random rows from the database that meet a condition based on the result of a join. Most obvious solution is pointed out as inefficient here, because
[these solutions] need a sequential scan of all the table (because the random value associated with each row needs to be calculated - so that the smallest one can be determined), which can be quite slow for even medium sized tables.
However, the method suggested by the author there (SELECT * FROM table WHERE num_value >= RAND() * (SELECT MAX(num_value) FROM table) LIMIT 1 where num_value is ID) doesn't work for me because some IDs might be missing (because some rows may have been been deleted by users).
So, what would be the most efficient way to retrieve 3 random rows in my situation?
EDIT: the solution does not need to be pure SQL. I also use PHP.
Since you don't want many results, there are a couple of interesting options using LIMIT and OFFSET.
I'm going to assume an id column which is unique and suitable for sorting.
The first step is to execute a COUNT(id), and then select random 3 numbers from 0 to COUNT(id) - 1 in PHP. (How to do that is a separate question, and the best approach depends on the number of rows total and the number you want).
The second step has two options. Suppose the random numbers you selected are 0, 15, 2234. Either have a loop in PHP
// $offsets = array(0, 15, 2234);
foreach ($offsets as $offset) {
$rows[] = execute_sql('SELECT ... ORDER BY id LIMIT 1 OFFSET ?', $offset);
}
or build a UNION. Note: this requires sub-selects because we're using ORDER BY.
// $offsets = array(0, 15, 2234);
$query = '';
foreach ($offsets as $index => $offset) {
if ($query) $query .= ' UNION ';
$query .= 'SELECT * FROM (SELECT ... ORDER BY id LIMIT 1 OFFSET ?) Sub'.$index;
}
$rows = execute_sql($query, $offsets);
Adding your RAND() call into the ORDER BY clause should allow you to ignore the ID. Try this:
SELECT * FROM table WHERE ... ORDER BY RAND() LIMIT 3;
After having performance issues pointed out, your best bet may be something along these lines (utilizing PHP):
$result = PDO:query('SELECT MAX(id) FROM table');
$max = $result->fetchColumn();
$ids = array();
$rows = 5;
for ($i = 0; $i < $rows; $i++) {
$ids[] = rand(1, $max);
}
$ids = implode(', ', $ids);
$query = PDO::prepare('SELECT * FROM table WHERE id IN (:ids)');
$results = $query->execute(array('ids' => $ids));
At this point you should be able to select the first 3 results. The only issue with this approach is dealing with deleted rows and you might have to either bump the $rows var or add some logic to do another query in case you didn't receive at least 3 results back.
Related
How can I get the total number of results using mysql and sphinx?
First I tried with a PDO statement, which does return a number but it is not accurate.
$array = $pdo_sphinx->prepare("select * from `my_index` where MATCH ('#name ($search)') limit $start, $limit");
$array->execute();
$query = $pdo_sphinx->prepare("select COUNT(*) from `my_index` where MATCH ('#name ($search)')");
$query->execute();
$total = $query->fetchColumn();
Then I read you can get total_found from SHOW META if you run it after the query
$array = $sphinx->Query("select * from `my_index` where MATCH ('#name ($search)') limit $start, $limit; SHOW META");
$total = $array['total_found'];
$total is returning 0, when it should be 9. How do I get the correct total_found from the query above? Is there a way to do this with the PDO statement? I need the correct result for paging
Note when you add the 'SHOW META' it makes it a multi-query. There are two separate queries, each with their own resultset.
http://php.net/manual/en/pdostatement.nextrowset.php
PDO multiple queries
(yes, using COUNT(*) may be inaccurate, because grouping can be somewhat approximate)
Hi I need to get the results and apply the order by only in the limited section. You know, when you apply order by you are ordering all the rows, what I want is to sort only the limited section, here is an example:
// all rows
SELECT * FROM users ORDER BY name
// partial 40 rows ordered "globally"
SELECT * FROM users ORDER BY name LIMIT 200,40
The solution is:
// partial 40 rows ordered "locally"
SELECT * FROM (SELECT * FROM users LIMIT 200,40) AS T ORDER BY name
This solution works well but there is a problem: I'm working with a Listview component that needs the TOTAL rows count in the table (using SQL_CALC_FOUND_ROWS). If I use this solution I cannot get this total count, I will get the limited section count (40).
I hope you will give me solution based on the query, for example something like: "ORDER BY LOCALLY"
Since you're using PHP, might as well make things simple, right? It is possible to do this in MySQL only, but why complicate things? (Also, placing less load on the MySQL server is always a good idea)
$result = db_query_function("SELECT SQL_CALC_FOUND_ROWS * FROM `users` LIMIT 200,40");
$users = array();
while($row = db_fetch_function($result)) $users[] = $row;
usort($users,function($a,$b) {return strnatcasecmp($a['name'],$b['name']);});
$totalcount = db_fetch_function(db_query_function("SELECT FOUND_ROWS() AS `count`"));
$totalcount = $totalcount['count'];
Note that I used made-up function names, to show that this is library-agnostic ;) Sub in your chosen functions.
if(strcmp($sort,"popular") == 0){
"SELECT * FROM projects WHERE project_id IN ($idResults) ORDER BY rating";
}
I first selected the projects by genres the resulting IDs are in $idResults, then i want them all sorted by rating.
It may be a very long list of results, so now my question is how can i adjust the SELECT so that I only get the first say 7 results(at the next call from 7 to 14, etc)
Thanks in advance ;)
I now tried,
$query = "SELECT * FROM projects WHERE project_id IN ".implode(',',$idResults)." ORDER BY rating LIMIT ".$count.", 7";
cause I got a php error, that there is an array to string conversion, so i added a "," after each element in the array and imploded it to a string.
but if i continue doing:
$result = $this->dbc->query($query);
$resultLines = array();
while($row = $result->fetch_array(MYSQLI_BOTH)){
$resultLines[] = $row;
}
return resultLines;
i get an error:
Fatal error: Call to a member function fetch_array() on a non-object in * on line 66
so i suspect something is still wrong in my query, but i cant figure out what.
Use the LIMIT clause - LIMIT 7.
For subsequent queries, you can use the full LIMIT clause - i.e. LIMIT offset, count. For example:
SELECT * FROM projects WHERE project_id IN ($idResults) ORDER BY rating LIMIT 0, 7;
SELECT * FROM projects WHERE project_id IN ($idResults) ORDER BY rating LIMIT 7, 7;
Is there a better way to check if there are at least two rows in a table for a given condition?
Please look at this PHP code:
$result = mysql_query("SELECT COUNT(*) FROM table WHERE .......");
$has_2_rows = (mysql_result($result, 0) >= 2);
The reason I dislike this, is because I assume MySQL will get and count ALL the matching rows, which can be slow for large results.
I would like to know if there is a way that MySQL will stop and return "1" or true when two rows are found. Would a LIMIT 2 at the end help?
Thank you.
This is a good question, and yes there is a way to make this very efficient even for large tables.
Use this query:
SELECT count(*) FROM (
SELECT 1
FROM table
WHERE .......
LIMIT 2
) x
All the work is done by the inner query, which stops when it gets 2 rows. Also note the select 1, which gives you a tiny bit more efficiency, since it doesn't have to retrieve any values from columns - just the constant 1.
The outer count(*) will count at most 2 rows.
Note: Since this is an SQL question, I've omitted PHP code from the answer.
The query below will only inspect two rows as you request. mysql_num_rows can check how many rows are returned without any looping.
$result = mysql_query("SELECT col1 FROM t1 WHERE ... LIMIT 2");
if (mysql_num_rows($result) == 2) {
Please avoid using ext/mysql and switch to PDO or mysqli if you can.
$result = mysql_query("SELECT COUNT(*) FROM table WHERE .......");
if(mysql_num_rows($result) > 1)
{
echo 'at least 2 rows';
}
else
{
echo 'less than 2 rows';
}
my $sth = $dbh->prepare("SELECT id
FROM user
WHERE group == '1'
ORDER BY id DESC
LIMIT 1");
I was trying to get the id of the last row in a table without reading the whole table.
I am already accessing via:
my $sth = $dbh->prepare("SELECT name,
group
FROM user
WHERE group == '1'
LIMIT $from, $thismany");
$sth->execute();
while(my ($name,$group) = $sth->fetchrow_array()) {
...and setting up a little pagination query as you can see.
But, I am trying to figure out how to detect when I am on the last (<= 500) rows so I can turn off my "next 500" link. Everything else is working fine. I figured out how to turn off the "previous 500" link when on first 500 page all by myself!
I thought I would set up a "switch" in the while loop so if ($id = $last_id) I can set the "switches" var.
Like:
if ($id = $last_id) {
$lastpage = 1; #the switch
}
So I can turn off next 500 link if ($lastpage == 1).
I am really new to this and keep getting stuck on these types of things.
Thanks for any assistance.
Try to grab an extra row and see how many rows you really got. Something like this:
my #results = ( );
my $total = 0;
my $sth = $dbh->prepare(qq{
SELECT name, group
FROM user
WHERE group = ?
LIMIT ?, ?
});
$sth->execute(1, $from, $thismany + 1);
while(my ($name, $group) = $sth->fetchrow_array()) {
push(#results, [$name, $group]); # Or something more interesting.
++$total;
}
$sth->finish();
my $has_next = 0;
if($total == $thismany + 1) {
pop(#results);
$has_next = 1;
}
And BTW, please use placeholders in all of your SQL, interpolation is fraught with danger.
Always asking for one more row than you are going to show, as suggested by mu is too short, is a good way.
But if you want to take the other suggested approach of doing two separate queries, one to get the desired rows, and one to get the total count if there had not been a limit clause, MySQL provides an easy way to do that while combining as much of the work as possible:
SELECT SQL_CALC_FOUND_ROWS name, group FROM user WHERE group = '1' LIMIT ..., ...;
then:
SELECT FOUND_ROWS();
The SQL_CALC_FOUND_ROWS qualifier changes what a following FOUND_ROWS() returns without requiring you to do a whole separate SELECT COUNT(*) from user WHERE group = '1' query.
SELECT COUNT(*) from tablename will give you the number of rows, so if you keep a running count of how many rows you have read so far, you'll know when you're on the last page of results.
You could generate that query with (untested; away from a good workstation at the moment):
my $sth = $dbh->prepare("select COUNT(*) FROM user WHERE group == '1'");
my #data = $sth->fetchrow_array;
my $count = $data->[0];
(PS. you should be aware of SQL injection issues -- see here for why.)
As Ether mentioned in the comments, pagination usually requires two queries. One to return your paged set, the other to return the total number of records (using a COUNT query).
Knowing the total number of records, your current offset and the number of records in each page is enough data to work out how many pages there are in total and how many before and after the current page.
Although your initial suggestion of SELECT id FROM table WHERE ... ORDER BY id DESC LIMIT 1 should work for finding the highest matching ID, the standard way of doing this is SELECT max(id) FROM table WHERE ...