I have a large dataset in MySQL and I would like to speed up the select statement when reading data. Assuming that there are 1000 records, I would like to issue a select statement that retrieves half of them for example but based on time-stamp.
Using something like this will not work, while id is not tightly coupled with time-stamp
select * from table where table.id mod 5 = 0;
Retrieving all the data and afterwards select the data needed is not a solution while I want to avoid retrieving the large dataset. Thus, I 'm looking for something that would distinguish the records upon select.
Thnx
If you need speed then try this
select * from table ORDER BY table.id DESC LIMIT 0,500;
select * from table ORDER BY table.id DESC LIMIT 500,500;
and so on...
Related
`SELECT * FROM Post
WHERE Tid = Id
ORDER BY Time
LIMIT 0,1`
`SELECT * FROM Post
WHERE Tid = Id
ORDER BY Time
LIMIT start,offset;`
Can I use only one SELECT to complete this?
Just like
`SELECT * FROM Post
WHERE Tid = Id
ORDER BY Time
LIMIT 0,1 and start,offset;`
In this case combine the 2 sql statements in a union, since you cannot provide multiple limit clauses in a single select:
SELECT * FROM Post LIMIT 1,2
UNION ALL
SELECT * FROM Post LIMIT 5,6;
However, I would add an order by clause to the 2 select statements just to make 100% sure you know which records will be selected.
UPDATE: Technically you could do this in a single statement using a running counter and filtering on the counter in where. However, that would not really be a good idea from performance wise, since mysql would have to loop through all records within the table. It cannot know which records would satisfy the criteria. Limit clauses are better optmised.
I have a scenario. I have say 300 records in my table. I execute a query to get the total count. Then , since i have to implement pagination,
I select the data from the same table using limits according t the count. I was thinking if i can get the count and data in a single query.? .
I tried below code:
Select * ,count(*) as cnt from table;
But this gave me the total count but only 1 record!
Is there a way to save my time exhausted in query and get results in a single query?
something like:
select t1.*,t2.cnt
from table t1
cross join (select count(*) as cnt from table) t2
limit 'your limit for the first page'
or
select *,(select count(*) from table) as cnt
from table
limit 'your limit for the first page'
You can get information in data structure you mentioned, but there is really no reason to do it. There is no performance problem when you do two queries - one for getting rows count and another for data selection. You don't save anything when you try to select all information in one query. Do two simple queries instead, it will be better solution for your app - you will preserve its simplicity and clarity.
Using two queries might not be as bad as you may think, you can read this for more information.
Currently I am using:
SELECT *
FROM
table AS t1
JOIN (
SELECT (RAND() * (SELECT MAX(id) FROM table where column_x is null)) AS id
) AS t2
WHERE
t1.id >= t2.id
and column_x is null
ORDER BY t1.id ASC
LIMIT 1
This is normally extremely fast however when I include the highlighted column_x being Y (null) condition, it gets slow.
What would be the fastest random querying solution where the records' column X is null?
ID is PK, column X is int(4). Table contains about a million records and over 1 GB in total size doubling itself every 24 hours currently.
column_x is indexed.
Column ID may not be consecutive.
The DB engine used in this case is InnoDB.
Thank you.
Getting a genuinely random record can be slow. There's not really much getting around this fact; if you want it to be truly random, then the query has to load all the relevant data in order to know which records it has to choose from.
Fortunately however, there are quicker ways of doing it. They're not properly random, but if you're happy to trade a bit of pure randomness for speed, then they should be good enough for most purposes.
With that in mind, the fastest way to get a "random" record is to add an extra column to your DB, which is populated with a random value. Perhaps a salted MD5 hash of the primary key? Whatever. Add appropriate indexes on this column, and then simply add the column to your ORDER BY clause in the query, and you'll get your records back in a random order.
To get a single random record, simply specify LIMIT 1 and add a WHERE random_field > $random_value where random value would be a value in the range of your new field (say an MD5 hash of a random number, for example).
Of course the down side here is that although your records will be in a random order, they'll be stuck in the same random order. I did say it was trading perfection for query speed. You can get around this by updating them periodically with fresh values, but I guess that could be a problem for you if you need to keep it fresh.
The other down-side is that adding an extra column might be too much to ask if you have storage constraints and your DB is already massive in size, or if you have a strict DBA to get past before you can add columns. But again, you have to trade off something; if you want the query speed, you need this extra column.
Anyway, I hope that helped.
I don't think you need a join, nor an order by, nor a limit 1 (providing the ids are unique).
SELECT *
FROM myTable
WHERE column_x IS NULL
AND id = ROUND(RAND() * (SELECT MAX(Id) FROM myTable), 0)
Have you ran explain on the query? What was the output?
Why not store or cache the value of : SELECT MAX(id) FROM table where column_x is null and use that as a variable. your query would then become:
$rand = rand(0, $storedOrCachedMaxId);
SELECT *
FROM
table AS t1
WHERE
t1.id >= $rand
and column_x is null
ORDER BY t1.id ASC
LIMIT 1
A simpler query will likely be easier on the db.
Know that if your data contains sizable holes - you aren't going to get consistently random results with these kind of queries.
I'm new to MySQL syntax, but digging a little further I think a dynamic query might work. We select the Nth row, where the Nth is random:
SELECT #r := CAST(COUNT(1)*RAND() AS UNSIGNED) FROM table WHERE column_x is null;
PREPARE stmt FROM
'SELECT *
FROM table
WHERE column_x is null
LIMIT 1 OFFSET ?';
EXECUTE stmt USING #r;
Is there any way to reference a subquery in a union?
I am trying to do something like the following, and would like to avoid a temporary table, but the subquery will be drawn from a much larger dataset so it makes sense to only do it once..
SELECT * FROM (SELECT * FROM ads WHERE state='FL' AND city='Maitland' AND page='home' ORDER BY RAND()) AS sq WHERE spot = 'full-banner' LIMIT 1
UNION
SELECT * FROM sq WHERE spot = 'leaderboard' LIMIT 1
UNION
SELECT * FROM sq WHERE spot = 'rectangle1' LIMIT 1
UNION
SELECT * FROM sq WHERE spot = 'rectangle2' LIMIT 1
.... etc,,
It's a shame that DISTINCT can't be specified for a single column of a result set.
Well, there is no way to do what you're trying to do without repeating the creation of the derived table.
If querying ads is really expensive then you should try adding an index like:
alter table ads add index (state, city, page, spot);
If after adding that index the query takes too much, then I'd recommend creating a table to store this data and then query that table for each spot.
Depending on your data, you could play around with GROUP BY to get similar results.
I need to randomly select, in an efficient way, 10 rows from my table.
I found out that the following works nicely (after the query, I just select 10 random elements in PHP from the 10 to 30 I get from the query):
SELECT * FROM product WHERE RAND() <= (SELECT 20 / COUNT(*) FROM product)
However, the subquery, though relatively cheap, is computed for every row in the table. How can I prevent that? With a variable? A join?
Thanks!
A variable would do it. Something like this:
SELECT #myvar := (SELECT 20 / COUNT(*) FROM product);
SELECT * FROM product WHERE RAND() <= #myvar;
Or, from the MySql math functions doc:
You cannot use a column with RAND()
values in an ORDER BY clause, because
ORDER BY would evaluate the column
multiple times. However, you can
retrieve rows in random order like
this:
mysql> SELECT * FROM tbl_name ORDER BY
> RAND();
ORDER BY RAND() combined with LIMIT is
useful for selecting a random sample
from a set of rows:
mysql> SELECT * FROM table1, table2
> WHERE a=b AND c<d -> ORDER BY RAND()
> LIMIT 1000;
RAND() is not meant to be a perfect
random generator. It is a fast way to
generate random numbers on demand that
is portable between platforms for the
same MySQL version.
Its a highly mysql specific trick but by wrapping it in another subquery MySQL will make it a constant table and compute it only once.
SELECT * FROM product WHERE RAND() <= (
select * from ( SELECT 20 / COUNT(*) FROM product ) as const_table
)
SELECT * FROM product ORDER BY RAND() LIMIT 10
Don't use order by rand(). This will result in a table scan. If you have much data at all in your table this will not be efficient at all. First determine how many rows are in the table:
select count(*) from table might work for you, though you should probably cache this value for some time since it can be slow for large datasets.
explain select * from table will give you the db statistics for the table (how many rows the statistics thinks are in the table) This is much faster, however it is less accurate and less accurate still for InnoDB.
once you have the number of rows, you should write some code like:
pseudo code:
String SQL = "SELECT * FROM product WHERE id IN (";
for (int i=0;i<numResults;i++) {
SQL += (int)(Math.rand() * tableRows) + ", ";
}
// trim off last ","
SQL.trim(",");
SQL += ")";
this will give you fast lookup on PK and avoid the table scan.