I want to get first and last row in table, and i found 2 way to do it:
$first = DB::table('shops')->first();
$last = DB::table('shops')->orderBy('id','DESC')->first();
And:
$shops = DB::table('shops')->get();
$first2 = $shops[0];
$last2 = $shops[count($shops)-1];
My question is, which way perform faster with the same DB? or any way log query time?
DB maybe large, 1.000 rows, 10.000 rows, etc,...
For about 8000 rows, here are the results:
The first way:
[2016-03-01 19:14:11] local.DEBUG: select * from `shops` limit 1; in 1.27 ms
[2016-03-01 19:14:11] local.DEBUG: select * from `shops` order by `id` desc limit 1; in 3.04 ms
The second way:
local.DEBUG: select * from `shops`; in 188.98 ms
You can see that the second way totally slower than the first.
Because in the second way you have to get all records from shops table. It takes a lot of times.
For bigger data set, I think the second way will not work because of timeout in request.
Update:
Just for another experiment.
I try the third way to resolve your problem in one query like the following:
$shops = DB::table('shops')
->whereRaw('id = (SELECT MIN(id) from shops)')
->orWhereRaw('id = (Select MAX(id) from shops)')
->get();
And I compare with the first way. And here is the result:
# the 3rd way
[2016-03-01 19:51:56] local.DEBUG: select * from `shops` where id = (SELECT MIN(id) from shops) or id = (Select MAX(id) from shops); in 1.04 ms
# the 1st way
[2016-03-01 19:52:02] local.DEBUG: select * from `shops` limit 1; in 0.67 ms
[2016-03-01 19:52:02] local.DEBUG: select * from `shops` order by `id` desc limit 1; in 0.5 ms
It seems that with a subquery the query time is faster.
The difference between the timing on the queries will really come down to how quickly your machine can process them, and of course the number of rows it is working with.
I can tell you now, though, that the first solution is only grabbing one result in each query and so will be much quicker. You are fetching only two rows with two queries.
The second is loading all of your rows in to an array and then you are grabbing the first and last, with 1,000 to 10,000 rows this could take a fair bit of time.
You could always try run the queries manually and see just how long they take to load to get an actual difference.
You can get the exact SQL executed by adding toSql() to the end of the eloquent query, like so.
Like I commented: The first example will be way faster. Because you're only getting two records. The last example will get all the records, which in your example might be thousands of records. That's a huge difference in page loading time.
If you want to see how long your page loading is you might want to check barryvdh/laravel-debugbar. Which will give great information about all sorts of things, including page loading time.
Related
I'm currently working on a multi-thread program (in Java) that will need to select random rows in a database, in order to update them. This is working well but I started to encounter some performance issue regarding my SELECT request.
I tried multiple solutions before finding this website :
http://jan.kneschke.de/projects/mysql/order-by-rand/
I tried with the following solution :
SELECT * FROM Table
JOIN (SELECT FLOOR( COUNT(*) * RAND() ) AS Random FROM Table)
AS R ON Table.ID > R.Random
WHERE Table.FOREIGNKEY_ID IS NULL
LIMIT 1;
It selects only one row below the random id number generated. This is working pretty good (an average of less than 100ms per request on 150k rows). But after the process of my program, the FOREIGNKEY_ID will no longer be NULL (it will be updated with some value).
The problem is, my SELECT will "forget" some rows than have an id below the random generated id, and I won't be able to process them.
So I tried to adapt my request, doing this :
SELECT * FROM Table
JOIN (SELECT FLOOR(
(SELECT COUNT(id) FROM Table WHERE FOREIGNKEY_ID IS NULL) * RAND() )
AS Random FROM Table)
AS R ON Table.ID > R.Random
WHERE Table.FOREIGNKEY_ID IS NULL
LIMIT 1;
With that request, no more problems of skipping some rows, but performances are decreasing drastically (an average of 1s per request on 150k rows).
I could simply execute the fast one when I still have a lot of rows to process, and switch to the slow one when it remains just a few rows, but it will be a "dirty" fix in the code, and I would prefer an elegant SQL request that can do the work.
Thank you for your help, please let me know if I'm not clear or if you need more details.
For your method to work more generally, you want max(id) rather than count(*):
SELECT t.*
FROM Table t JOIN
(SELECT FLOOR(MAX(id) * RAND() ) AS Random FROM Table) r
ON t.ID > R.Random
WHERE t.FOREIGNKEY_ID IS NULL
ORDER BY t.ID
LIMIT 1;
The ORDER BY is usually added to be sure that the "next" id is returned. In theory, MySQL could always return the maximum id in the table.
The problem is gaps in ids. And, it is easy to create distributions where you never get a random number . . . say that the four ids are 1, 2, 3, 1000. Your method will never get 1000000. The above will almost always get it.
Perhaps the simplest solution to your problem is to run the first query multiple times until it gets a valid row. The next suggestion would be an index on (FOREIGNKEY_ID, ID), which the subquery can use. That might speed the query.
I tend to favor something more along these lines:
SELECT t.id
FROM Table t
WHERE t.FOREIGNKEY_ID IS NULL AND
RAND() < 1.0 / 1000
ORDER BY RAND()
LIMIT 1;
The purpose of the WHERE clause is to reduce the volume considerable, so the ORDER BY doesn't take much time.
Unfortunately, this will require scanning the table, so you probably won't get responses in the 100 ms range on a 150k table. You can reduce that to an index scan with an index on t(FOREIGNKEY_ID, ID).
EDIT:
If you want a reasonable chance of a uniform distribution and performance that does not increase as the table gets larger, here is another idea, which -- alas -- requires a trigger.
Add a new column to the table called random, which is initialized with rand(). Build an index onrandom`. Then run a query such as:
select t.*
from ((select t.*
from t
where random >= #random
order by random
limit 10
) union all
(select t.*
from t
where random < #random
order by random desc
limit 10
)
) t
order by rand();
limit 1;
The idea is that the subqueries can use the index to choose a set of 20 rows that are pretty arbitrary -- 10 before and after the chosen point. The rows are then sorted (some overhead, which you can control with the limit number). These are randomized and returned.
The idea is that if you choose random numbers, there will be arbitrary gaps and these would make the chosen numbers not quite uniform. However, by taking a larger sample around the value, then the probability of any one value being chosen should approach a uniform distribution. The uniformity would still have edge effects, but these should be minor on a large amount of data.
Your ID's are probably gonna contain gaps. Anything that works with COUNT(*) is not going to be able to find all the ID's.
A table with records with ID's 1,2,3,10,11,12,13 has only 7 records. Doing a random with COUNT(*) will often result in a miss as records 4,5 and 6 donot exist, and it will then pick the nearest ID which is 3. This is not only unbalanced (it will pick 3 far too often) but it will also never pick records 10-13.
To get a fair uniformly distrubuted random selection of records, I would suggest loading the ID's of the table first. Even for 150k rows, loading a set of integer id's will not consume a lot of memory (<1 MB):
SELECT id FROM table;
You can then use a function like Collections.shuffle to randomize the order of the ID's. To get the rest of the data, you can select records one at a time or for example 10 at a time:
SELECT * FROM table WHERE id = :id
Or:
SELECT * FROM table WHERE id IN (:id1, :id2, :id3)
This should be fast if the id column has an index, and it will give you a proper random distribution.
If prepared statement can be used, then this should work:
SELECT #skip := Floor(Rand() * Count(*)) FROM Table WHERE FOREIGNKEY_ID IS NULL;
PREPARE STMT FROM 'SELECT * FROM Table WHERE FOREIGNKEY_ID IS NULL LIMIT ?, 1';
EXECUTE STMT USING #skip;
LIMIT in SELECT statement can be used to skip rows
I have 1 query that returns over 180k rows. I need to make a slight change, so that it returns only about 10 less.
How do I show only the 10 rows as a result?
I've tried EXCEPT but it seems to return a lot more than just the 10.
You can use LIMIT. This will show first n rows. Example:
SELECT * FROM Orders LIMIT 10
If you are trying to make pagination add OFFSET. It will return 10 rows starting from row 20. Example:
SELECT * FROM Orders LIMIT 10 OFFSET 20
MySQL doesn't support EXCEPT (to my knowledge).
Probably the most efficient route would be to incorporate the two WHERE clauses into one. I say efficient in the sense of "Do it that way if you're going to run this query in a regular report or production application."
For example:
-- Query 1
SELECT * FROM table WHERE `someDate`>'2016-01-01'
-- Query 2
SELECT * FROM table WHERE `someDate`>'2016-01-10'
-- Becomes
SELECT * FROM table WHERE `someDate` BETWEEN '2016-01-01' AND '2016-01-10'
It's possible you're implying that the queries are quite complicated, and you're after a quick (read: not necessarily efficient) way of getting the difference for a one-off investigation.
That being the case, you could abuse UNION and a sub-query:
(Untested, treat as pseudo-SQL...)
SELECT
*
FROM (
SELECT * FROM table WHERE `someDate`>'2016-01-01'
UNION ALL
SELECT * FROM table WHERE `someDate`>'2016-01-10'
) AS sub
GROUP BY
`primaryKey`
HAVING
COUNT(1) = 1;
It's ugly though. And not efficient.
Assuming that the only difference is only that one side (I'll call it the "right hand side") is missing records that the left includes, you could LEFT JOIN the two queries (as subs) and filter to right-side-is-null. But that'd be dependent on all those caveats being true.
Temporary tables can be your friend - especially given they're so easily created (and can be indexed):
CREATE TEMPORARY TABLE tmp_xyz AS SELECT ... FROM ... WHERE ...;
I have a scenario. I have say 300 records in my table. I execute a query to get the total count. Then , since i have to implement pagination,
I select the data from the same table using limits according t the count. I was thinking if i can get the count and data in a single query.? .
I tried below code:
Select * ,count(*) as cnt from table;
But this gave me the total count but only 1 record!
Is there a way to save my time exhausted in query and get results in a single query?
something like:
select t1.*,t2.cnt
from table t1
cross join (select count(*) as cnt from table) t2
limit 'your limit for the first page'
or
select *,(select count(*) from table) as cnt
from table
limit 'your limit for the first page'
You can get information in data structure you mentioned, but there is really no reason to do it. There is no performance problem when you do two queries - one for getting rows count and another for data selection. You don't save anything when you try to select all information in one query. Do two simple queries instead, it will be better solution for your app - you will preserve its simplicity and clarity.
Using two queries might not be as bad as you may think, you can read this for more information.
I need to query the MYSQL with some condition, and get five random different rows from the result.
Say, I have a table named 'user', and a field named 'cash'. I can compose a SQL like:
SELECT * FROM user where cash < 1000 order by RAND() LIMIT 5.
The result is good, totally random, unsorted and different from each other, exact what I want.
But I got from google that the efficiency is bad when the table get large because MySQL creates a temporary table with all the result rows and assigns each one of them a random sorting index. The results are then sorted and returned.
Then I go on searching and got a solution like:
SELECT * FROM `user` AS t1 JOIN (SELECT ROUND(RAND() * ((SELECT MAX(id) FROM `user`)- (SELECT MIN(id) FROM `user`))+(SELECT MIN(id) FROM `user`)) AS id) AS t2 WHERE t1.id >= t2.id AND cash < 1000 ORDER BY t1.id LIMIT 5;
This method uses JOIN and MAX(id). And the efficiency is better than the first one according to my testing. However, there is a problem. Since I also needs a condition "cash<1000" and if the the RAND() is so big that no row behind it has the cash<1000, then no result will return.
Anyone has good idea of how to compose the SQL that has have the same effect as the first one but has better efficiency?
Or, shall I just do simple query in MYSQL and let PHP randomly pick 5 different rows from the query result?
Your help is appreciated.
To make first query faster, just SELECT id - that will make the temporary table rather small (it will contain only IDs and not all fields of each row) and maybe it will fit in memory (temp table with text/blob are always created on-disk for example). Then when you get a result, run another query SELECT * FROM xy WHERE id IN (a,b,c,d,...). As you mentioned this approach is not very efficient, but as a quick fix this modification will make it several times faster.
One of the best approaches seems to be getting the total number of rows, choosing random numbers and for each result run a new query SELECT * FROM xy WHERE abc LIMIT $random,1. It should be quite efficient for random 3-5, but not good if you want 100 random rows each time :)
Also consider caching your results. Often you don't need different random rows to be displayed on each page load. Generate your random rows only once per minute. If you will generate the data for example via cron, you can live also with query which takes several seconds, as users will see the old data while new data are being generated.
Here are some of my bookmarks for this problem for reference:
http://jan.kneschke.de/projects/mysql/order-by-rand/
http://www.titov.net/2005/09/21/do-not-use-order-by-rand-or-how-to-get-random-rows-from-table/
Is there a way in MySQL to COUNT(*) from a table where if the number is greater than x, it will stop counting there? Basically, I only want to know if the number of records returned from a query is more or less than a particular number. If it's more than that number, I don't really care how many rows there are, if it's less, tell me the count.
I've been able to fudge it like this:
-- let x be 100
SELECT COUNT(*) FROM (
SELECT `id` FROM `myTable`
WHERE myCriteria = 1
LIMIT 100
) AS temp
...but I was wondering if there was some handy built-in way to do this?
Thanks for the suggestions, but I should have been more clear about the reasons behind this question. It's selecting from a couple of joined tables, each with tens of millions of records. Running COUNT(*) using an indexed criteria still takes about 80 seconds, running one without an index takes about 30 minutes or so. It's more about optimising the query rather than getting the correct output.
SELECT * FROM WhateverTable WHERE WhateverCriteria
LIMIT 100, 1
LIMIT 100, 1 returns 101th record, if there is one, or no record otherwise. You might be able to use the above query as a sub-query in EXIST clauses, if that helps.
I can't think of anything. It looks to me like what you're doing exactly fulfils the purpose, and SQL certainly doesn't seem to go out of its way to make it easy for you to do this more succinctly.
Consider this: What you're trying to do doesn't make sense in a strict context of set arithmetic. Mathematically, the answer is to count everything and then take the MIN() with 100, which is what you're (pragmatically and with good reason) trying to avoid.
This works:
select count(*) from ( select * from stockinfo s limit 100 ) s
but was not any faster (that I could tell) from just:
select count(*) from stockinfo
which returned 5170965.