Simple select nth highest - mysql

I'm trying to figure out which is the more efficient way to get the nth highest record in a mySQL database:
SELECT *
FROM table_name
ORDER BY column_name DESC
LIMIT n - 1, 1
or
SELECT *
FROM table_name AS a
WHERE n - 1 = (
SELECT COUNT(primary_key_column)
FROM products b
WHERE b.column_name > a. column_name)
There is an index on column_name.
I would think mySQL would efficiently perform the limit clause and the first option is the way to go.
I wasn't too clear what the 2nd query does exactly, so if that is more efficient can someone explain why.
Thanks.

I tried EXPLAIN on both those queries on a database of mine (note: the optimizer may choose different plans for your schema/data) and it definitely looks like the first one wins in every regard: it's simpler to read and understand, and will most likely be faster.
As aaronls said, and EXPLAIN confirms, the second query has a correlated subquery which will require an extra iteration through the entire table for each row.
Since the first one is way easier to read, I'd choose it in a shot. If you do find that it's a bottleneck (after profiling your application), you could give the second a try but I don't see how it could possibly be faster.

I think with the second query it's going to do an inner loop to run the subquery for evaluating against each row in table_name. If that is the case, this means you might have something like O(n^2) runtime.
Based on that I would personnally go with the first query, but if it were that important to me, I would do some performance testing. Make sure you test against very large data sets as well to get a good idea of how the performance scales. Something that runs at O(n) is faster for very small datasets, but something that runs at O(log(n)) is much better for large data sets.

Run explain on both queries and see which one MySQL thinks is more complicated.

This isn't really an answer but...
Go with the first query assuming your loads aren't super heavy, just because it works and is simple. You can always go back later and change if really necessary.

I would suggest (though I'm not sure about the exact SQL syntax myself) that you compute an additional RANK column on a simple query that orders elements as desired (DESC). Then just select the row where RANK = n.
You can probably do this with a variable that gets incremented I guess. It's basically something that says how many rows come before this row, so it should be very easy to compute.

If you're really concerned about efficiency, maybe you should look into implementing a selection algorithm in SQL.

for 2nd highest record
select max(Price) as price from OrderDetails
where Price<(select max(Price ) from OrderDetails)
for N th highest record
SELECT * FROM OrderDetails AS a
WHERE n-1 = ( SELECT COUNT(OrderNo) FROM OrderDetails b WHERE b.Price > a. Price)

Related

SQL get result and number of rows in the result with LIMIT

I have a large database in which I use LIMIT in order not to fetch all the results of the query every time (It is not necessary). But I have an issue: I need to count the number of results. The dumbest solution is the following and it works:
We just get the data that we need:
SELECT * FROM table_name WHERE param > 3 LIMIT 10
And then we find the length:
SELECT COUNT(1) FROM table_name WHERE param > 3 LIMIT 10
But this solution bugs me because unlike the query in question, the one that I work with is complex and you have to basically run it twice to achieve the result.
Another dumb solution for me was to do:
SELECT COUNT(1), param, anotherparam, additionalparam FROM table_name WHERE param > 3 LIMIT 10
But this results in only one row. At this point I will be ok if it would just fill the count row with the same number, I just need this information without wasting computation time.
Is there a better way to achieve this?
P.S. By the way, I am not looking to get 10 as the result of COUNT, I need the length without LIMIT.
You should (probably) run the query twice.
MySQL does have a FOUND_ROWS() function that reports the number of rows matched before the limit. But using this function is often worse for performance than running the query twice!
https://www.percona.com/blog/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/
...when we have appropriate indexes for WHERE/ORDER clause in our query, it is much faster to use two separate queries instead of one with SQL_CALC_FOUND_ROWS.
There are exceptions to every rule, of course. If you don't have an appropriate index to optimize the query, it could be more costly to run the query twice. The only way to be sure is to repeat the tests shown in that blog, using your data and your query on your server.
This question is very similar to: How can I count the numbers of rows that a MySQL query returned?
See also: https://mariadb.com/kb/en/found_rows/
This is probably the most efficient solution to your problem, but it's best to test it using EXPLAIN with a reasonably sized dataset.

Faster counts with mysql by sampling table

I'm looking for a way I can get a count for records meeting a condition but my problem is the table is billions of records long and a basic count(*) is not possible as it times out.
I thought that maybe it would be possible to sample the table by doing something like selecting 1/4th of the records. I believe that older records will be more likely to match so I'd need a method which accounts for this (perhaps random sorting).
Is it possible or reasonable to query a certain percent of rows in mysql? And is this the smartest way to go about solving this problem?
The query I currently have which doesn't work is pretty simple:
SELECT count(*) FROM table_name WHERE deleted_at IS NOT NULL
SHOW TABLE STATUS will 'instantly' give an approximate Row count. (There is an equivalent SELECT ... FROM information_schema.tables.) However, this may be significantly far off.
A count(*) on an index on any column in the PRIMARY KEY will be faster because it will be smaller. But this still may not be fast enough.
There is no way to "sample". Or at least no way that is reliably better than SHOW TABLE STATUS. EXPLAIN SELECT ... with some simple query will do an estimate; again, not necessarily any better.
Please describe what kind of data you have; there may be some other tricks we can use.
See also Random . There may be a technique that will help you "sample". Be aware that all techniques are subject to various factors of how the data was generated and whether there has been "churn" on the table.
Can you periodically run the full COUNT(*) and save it somewhere? And then maintain the count after that?
I assume you don't have this case. (Else the solution is trivial.)
AUTO_INCREMENT id
Never DELETEd or REPLACEd or INSERT IGNOREd or ROLLBACKd any rows
ADD an index key with deleted_at column, to improve time execution
and try to count id if id is set.

Best way to check for updated rows in MySQL

I am trying to see if there were any rows updated since the last time it was checked.
I'd like to know if there are any better alternatives to
"SELECT id FROM xxx WHERE changed > some_timestamp;"
However, as there are 200,000+ rows it can get heavy pretty fast... would a count be any better?
"SELECT count(*) FROM xxx WHERE changed > some_timestamp;"
I have thought of creating a unit test but I am not the best at this yet /:
Thanks for the help!
EDIT: Because in many cases there would not be any rows that changed, would it be better to always test with a MAX(xx) first, and if its greater than the old update timestamp given, then do a query?
If you just want to know if any rows have changed, the following query is probably faster than either of yours:
SELECT id FROM xxx WHERE changed > some_timestamp LIMIT 1
Just for the sake of completeness: Make sure you have an index on changed.
Edit: A tiny performance improvement
Now that I think about it, you should probably do a SELECT change instead of selecing the id, because that eliminates accessing the table at all. This query will tell you pretty quickly if any change was performed.
SELECT changed FROM xxx WHERE changed > some_timestamp LIMIT 1
It should be a tiny bit faster than my first query - not by a lot, though, since accessing a single table row is going to be very fast.
Should I select MAX(changed) instead?
Selecting MAX(changed), as suggested by Federico should pretty much result in the same index access pattern. Finding the highest element in an index is a very cheap operation. Finding any element that is greater than some constant is potentially cheaper, so both should have approximately the same performance. In either case, both queries are extremely fast even on very large tables if - and only if - there is an index.
Should I first check if any rows were changed, and then retrieve the rows in a separate step
No. If there is no row that has changed, SELECT id FROM xxx WHERE changed > some_timestamp will be as fast as any such check making it pointless to perform it separately. It only turns into a slower operation when there are results. Unless you add expensive operations (such as ORDER BY), the performance should be (almost) linear to the number of rows retrieved.
Make an index on some_timestamp and run:
SELECT MAX(some_timestamp) FROM xxx;
If the table is MyISAM, the query will be immediate.

Should you always do a COUNT(*) before a SELECT * to determine if there are any rows?

In MySQL, is it generally a good idea to always do a COUNT(*) first to determine if you should do a SELECT * to actually fetch the rows, or is it better to just do the SELECT * directly and then check if it returned any rows?
Unless you lock the table/s in question, doing a select count(*) is useless. Consider:
Process 1:
SELECT COUNT(*) FROM T;
Process 2:
INSERT INTO T
Process 1:
...now doing something based on the obsolete count retrieved before...
Of course, locking a table is not a very good idea in a server environment.
It depends on whether you need the number, but in particular in mysql there's a calc_found_rows, IIRC. Look up the docs.
always the SELECT [field1, field2 | *] FROM.... The SELECT COUNT(*) will just bloat your code, add additional transport and data overhead and generally be unmaintainable.
The form is 2 queries, the latter is 1 query. Each query needs to talk with the database server. Do the math.
The answer is as in many of this kind questions - "it depends". What you shouldn't do is performing those two queries when you do not have an index on a table. In general, performing just COUNT is a waste of IO time, so if if this operation will help you to save some time spent on IO in MOST cases, than it might be an option.
In some cases some db driver implementations may not return the count of actually selected rows for select statement that returns records itself. The 'count(*)' issued beforehand is useful when you need to know the precise size of resulting recordset before you select actual data.

MySql queries: really never use SELECT *?

I'm a self taught developer and ive always been told not to use SELECT *, but most of my queries require to know all the values of a certain row...
what should i use then? should i list ALL of the properties every time? like Select elem1,elem2,elem3,....,elem15 FROM...?
thanks
If you really need all the columns and you're fetching the results by name, I would go ahead and use SELECT *. If you're fetching the row results by index, then it makes sense to specify the column names or else they might not be in the order you expect (especially if the table schema changes).
SELECT * FROM ... is not always the best way to go unless you need all columns. That's because if for example a table has 10 columns and you only need 2-3 of them and these columns are indexed, then if you use SELECT * the query will run slower, because the server must fetch all rows from the datafiles. If instead you used only the 2-3 columns that you actually needed, then the server could run the query much faster if the rows were fetch from a covering index. A covering index is one that is used to return results without reading the datafile.
So, use SELECT * only when you actually need all columns.
If you absolutely have to use *, try to limit it to a specific table; e.g.:
SELECT t.*
FROM mytable t
List only the columns that you need, ideally with a table alias:
SELECT t.elem1,
t.elem2
FROM YOUR_TABLE t
The presence of a table alias helps demonstrate what is a column (and where it's from) vs a derived column.
If you are positive that you will always need all the columns then select * should be ok. But my reasoning for avoiding it is: say another developer has another column added to the table which isn't required by your query..then there is overhead. This can get worse as more columns get added.
The only real performance hit you take from using select * is in the bandwidth required to send back extra columns in your result set, if they're not necessary. Other than that, there's nothing inherently "bad" about using select *.
You might SELECT * from a subquery.
Yes, Select * is bad. You do not state what language you will be using process the returned data. Suppose you receive these records back as an array (not a hash map). In this case, what is in Row[12]? Maybe it was ZipCode when you wrote the app, but guess what happens when someone inserts a field SuiteNumber before ZipCode.
Or suppose the next coder appends a huge blob field to each record. Ouch!
Or a little more subtle: let's assume you're doing a join or subselect, and you have no Text or Blob type fields. MySQL will create any temp files it needs in memory. But as soon as you include a Text field (even TinyText), MySQL will need to create the temp file on disk, and call sort-merge. This won't break the program, but it can kill performance.
Select * is sacrificing maintainability in order to save a little typing.