I'm writing a query where I'm looking to pull related fields from a database with a limit of 10 rows.
The query is easy to write, however I was wondering if there is a way to write the query so it searches for related items and pulls those first and if those are < 10 it will just pull random fields for the remaining ones.
Here is the query I use to pull the related rows
SELECT * FROM table WHERE term LIKE '%term1%' or term LIKE '%term2%' LIMIT 0,10
Your just need to order the table by the terms that you are looking for first, one way of doing this is as follows:
SELECT * FROM table
ORDER BY (
(
CASE WHEN term LIKE '%term1%'
THEN 1
ELSE 0
END
) + (
CASE WHEN term LIKE '%term2%'
THEN 1
ELSE 0
END
)
) DESC
LIMIT 0,10
Related
I need to select say 2000000 records at random from a very large database. I looked at previous questions. So please do not mark this question as duplicate. I need clarification. Most answers suggest using ORDER BY RAND() function. So my query will be:
SELECT DISTINCT no
FROM table
WHERE name != "null"
ORDER BY RAND()
LIMIT 2000000;
I want each record to be selected at random. I am not sure if I understand the ORDER BY RAND() effect here. But I am afraid it will select a random record, say 3498 and will continue selection from there, say, the next records will be: 3499, 3500, 3501, etc.
I want each recor to be random, not to start the order from a random record.
How can I select 2000000 random record where each record is selected at random? Can you simplify what exactly ORDER BY RAND() does?
Note that I use Google BigQuery so the performance issue should not be a big problem here. I just want to achieve the requirement of selecting random 2000000 records.
SELECT x
FROM T
ORDER BY RAND()
is equivalent to
SELECT x
FROM (
SELECT x, RAND() AS r
FROM T
)
ORDER BY r
The query generates a random value for each row, then uses that random value to order the rows. If you include a limit:
SELECT x
FROM T
ORDER BY RAND()
LIMIT 10
This randomly selects 10 rows from the table.
As the title states, I'm trying to count the number of matches within the last 100 records in a certain table.
This query works, but the data is very dynamic, with lots of inserts on that particular table, and similar queries are being run, and they all end up being extremely slow (20s) probably blocking each other out.
Because caching the result is not acceptable (data has to be live) I'm thinking of switching the exterior query to a PHP, even though I know that would be slower because it would still be faster than 20s.
Here's the query
SELECT count(*) as matches
FROM (
SELECT first_name FROM customers
WHERE division = 'some_division'
AND type = 'employee'
ORDER BY request_time DESC
LIMIT 0, 100
) as entries
WHERE first_name = 'some_first_name_here'
What I'm looking for a more optimized way of performing the same task, without having to implement it in PHP since that's the naive/obviously wrong approach.
the table looks something like this:
id first_name last_name type division request_time
Just to set things straight, this is obviously not the actual table / data due to NDA reasons, but, the table looks exactly the same with different column names.
So again, what I'm trying to achieve is to pull a count of matches found WITHIN the last 100 records which have some contraints.
for example,
how many times does the name 'John' appear within the last 100 employees added in the HR division?
I see.
How about something like this...
SELECT i
FROM
( SELECT CASE WHEN first_name = 'some_first_name_here' THEN #i:=#i+1 END i
FROM customers
WHERE division = 'some_division'
AND type = 'employee'
, (SELECT #i:=0)val
ORDER
BY request_time
DESC
LIMIT 0,100
) n
ORDER
BY i DESC
LIMIT 1;
Try this:
SELECT SUM(matches)
FROM
(
SELECT IF(first_name = 'some_first_name_here', 1, 0) AS matches
FROM customers
WHERE division = 'some_division' AND type = 'employee'
ORDER BY request_time DESC
LIMIT 0,100
) AS entries
I have a large dataset in MySQL and I would like to speed up the select statement when reading data. Assuming that there are 1000 records, I would like to issue a select statement that retrieves half of them for example but based on time-stamp.
Using something like this will not work, while id is not tightly coupled with time-stamp
select * from table where table.id mod 5 = 0;
Retrieving all the data and afterwards select the data needed is not a solution while I want to avoid retrieving the large dataset. Thus, I 'm looking for something that would distinguish the records upon select.
Thnx
If you need speed then try this
select * from table ORDER BY table.id DESC LIMIT 0,500;
select * from table ORDER BY table.id DESC LIMIT 500,500;
and so on...
I've looked around and there doesnt seem to be any easy way to do this. It almost looks like it's easier just to grab a subset of records and do all the randomizing in code (perl). The methods I've seen online seem like theyre geared more to at most hundreds of thousands, but certainly not millions.
The table I'm working with has 6 million records (and growing), the IDs are auto incremented, but not always stored in the table (non-gapless).
I've tried to do the LIMIT 1 query that's been recommended, but the query takes forever to run -- is there a quick way to do this, given that there are gaps in the record? I can't just take the max and randomize over the range.
Update:
One idea I had maybe was to grab the max, randomize a limit based on the max, and then grab a range of 10 records from random_limit_1 to random_limit_2 and then taking the first record found in that range.
Or if I know the max, is there a way i can just pick say the 5th record of the table, without having to know which ID it is. Then just grabbing the id of that record.
Update:
This query is somewhat faster-ish. Still not fast enough =/
SELECT t.id FROM table t JOIN (SELECT(FLOOR(max(id) * rand())) as maxid FROM table) as tt on t.id >= tt.maxid LIMIT 1
SELECT * FROM TABLE ORDER BY RAND() LIMIT 1;
Ok, this is slow. If you'll search for ORDER BY RAND() MYSQL, you will find alot of results saying that this is very slow and this is the case. I did a little research and I found this alternative MySQL rand() is slow on large datasets
I hope this is better
Yeah, idea seems good:
select min(ID), max(ID) from table into #min, #max;
set #range = #max - #min;
set #mr = #min + ((#range / 1000) * (rand() * 1000));
select ID from table
where ID >= #mr and ID <= #mr + 1000
order by rand()
limit 1
-- into #result
;
May change 1000 to 10000 or whatever as needed to scale...
EDIT: you could also try this:
select ID from table
where (ID % 1000) = floor(rand() * 1000)
order by rand()
limit 1
;
Splits it along different lines...
EDIT 2:
See: What is the best way to pick a random row from a table in MySQL?
This is probably the fastest way:
select #row := floor(count(*) * rand()) from some_tbl;
select some_ID from some_tbl limit #row, 1;
unfortunately, variables can't be used in limit clause so you'd have to use a dynamic query, either writing the query string in code, or using PREPARE and EXECUTE. Also, limit n, 1 still requires scanning n items into the table, so it's only about twice as fast as the second method listed above on average. (Though it is probably more uniform and guarantees a matching row will always be found)
SELECT ID
FROM YourTable
ORDER BY RAND() LIMIT 1;
I need to randomly select, in an efficient way, 10 rows from my table.
I found out that the following works nicely (after the query, I just select 10 random elements in PHP from the 10 to 30 I get from the query):
SELECT * FROM product WHERE RAND() <= (SELECT 20 / COUNT(*) FROM product)
However, the subquery, though relatively cheap, is computed for every row in the table. How can I prevent that? With a variable? A join?
Thanks!
A variable would do it. Something like this:
SELECT #myvar := (SELECT 20 / COUNT(*) FROM product);
SELECT * FROM product WHERE RAND() <= #myvar;
Or, from the MySql math functions doc:
You cannot use a column with RAND()
values in an ORDER BY clause, because
ORDER BY would evaluate the column
multiple times. However, you can
retrieve rows in random order like
this:
mysql> SELECT * FROM tbl_name ORDER BY
> RAND();
ORDER BY RAND() combined with LIMIT is
useful for selecting a random sample
from a set of rows:
mysql> SELECT * FROM table1, table2
> WHERE a=b AND c<d -> ORDER BY RAND()
> LIMIT 1000;
RAND() is not meant to be a perfect
random generator. It is a fast way to
generate random numbers on demand that
is portable between platforms for the
same MySQL version.
Its a highly mysql specific trick but by wrapping it in another subquery MySQL will make it a constant table and compute it only once.
SELECT * FROM product WHERE RAND() <= (
select * from ( SELECT 20 / COUNT(*) FROM product ) as const_table
)
SELECT * FROM product ORDER BY RAND() LIMIT 10
Don't use order by rand(). This will result in a table scan. If you have much data at all in your table this will not be efficient at all. First determine how many rows are in the table:
select count(*) from table might work for you, though you should probably cache this value for some time since it can be slow for large datasets.
explain select * from table will give you the db statistics for the table (how many rows the statistics thinks are in the table) This is much faster, however it is less accurate and less accurate still for InnoDB.
once you have the number of rows, you should write some code like:
pseudo code:
String SQL = "SELECT * FROM product WHERE id IN (";
for (int i=0;i<numResults;i++) {
SQL += (int)(Math.rand() * tableRows) + ", ";
}
// trim off last ","
SQL.trim(",");
SQL += ")";
this will give you fast lookup on PK and avoid the table scan.