How to add a LIMIT to a user-generated sql query - mysql

I have built a query editor where a user can enter in a query. However, it needs to limit the user's entry to 1000 results, otherwise the user could enter in something like :
SELECT * FROM mybigtable
It could try and download 1 billion results.
What would be the best way to enforce a limit? The first approach I thought of was to do:
SELECT * FROM (
user-query
) x LIMIT 1000
However, this would execute the entire query (and could take forever) before doing the actual limit. What would be the best way to enforce a strict limit on the user's sql input?

This is too long for a comment.
I don’t think that there is generic solution for this.
Wrapping the user query in a SELECT * FROM ... LIMIT 1000 statement is attractive but ;
there are edge cases where it can produce invalid SQL, for example if the user query contains a CTE (the WITH clause must be placed at the very beginning of the query)
while it will happily limit the number of rows returned to the user, it will not prevent the database from scanning the entire resultset
The typical solution for the second use case is to filter rows according to an autoincremented integer column (usually the primary key of your table). But that’s even harder to make generic.
To make it short : manipulating SQL at distance is tricky : if you want a complete solution for your use case, get yourself a real query builder (or at least a sql parser).

Related

Limiting search result items

I want to optimize my paginated search result page
For example I have 100millions post to search. and user just type "a". It will take very long to search all of that because we use SQL_CALC_FOUND_ROWS for pagiation purposes
The fact is that there is no need to search all milions of rows (posts) and the answer "1000+" is enough for users. So we need to stop search after we found 1000 results.
We want to show information like this to user:
Showing 1–10 of 1000+ results
[RESULTS]
Page 1 .... Page 100
How to do this without losing our pagination functionality?
My current query looks maybe something like this:
SELECT SQL_CALC_FOUND_ROWS xxx_posts.ID
FROM xxx_posts
WHERE 1=1
AND (((xxx_posts.post_title LIKE '%a%')
LIMIT 0, 10
Your particular example is too risky to try to speed up. However, for the general case...
SELECT id
FROM xxx_posts
WHERE ...
LIMIT 1000, 1;
If you get a row, then there are at least 1000 rows.
Do not use GROUP BY or ORDER BY unless an index can handle the WHERE and those clauses. Otherwise, they will require fetching all the rows, then sorting before getting to the LIMIT.
Your particular example is risky... Using LIKE with an initial wildcard cannot use an index. If there are not 1000+ matching rows, it check every title in the entire table without satisfying the LIMIT 1000, 1. Nothing saved!
Can you use a FULLTEXT index?
See also MariaDB's setting to limit an individual statement's execution time.

Limit maximum number of records/rows in Table [duplicate]

Is it possible to set the number of rows that a table can accommodate in MySQL ?
I don't want to use any java code. I want to do this using pure mysql scripts.
I wouldn't recommend trying to limit the number of rows in a SQL table, unless you had a very good reason to do so. It seems you would be better off using a query like:
select top 1000 entityID, entityName from TableName
rather than physically limiting the rows of the table.
However, if you really want to limit it to 1000 rows:
delete from TableName where entityID not in (select top 1000 entityID from TableName)
Mysql supports a MAX_ROWS parameter when creating (and maybe altering?) a table. http://dev.mysql.com/doc/refman/5.0/en/create-table.html
Edit: Sadly it turns out this is only a hint for optimization
"The maximum number of rows you plan to store in the table. This is not a hard limit, but rather a hint to the storage engine that the table must be able to store at least this many rows."
.. Your question implied that scripts are ok; is it ridiculous to make one as simple as a cron job regularly dropping table rows above a given ID ? It's not nearly as elegant as it would've been to have mysql throw errors when something tries to add a row too many, but it would do the job - and you may be able to have your application also then check if it's ID is too high, and throw a warning to the user/relevant party.

Selecting all fields (but one) instead of using asterix (*) decreases running time by 10 times [duplicate]

I've Googled this question and can't seem to find a consistent opinion, or many opinions that are based on solid data. I simply would like to know if using the wildcard in a SQL SELECT statement incurs additional overhead than calling each item out individually. I have compared the execution plans of both in several different test queries, and it seems that the estimates always read the same. Is it possible that some overhead is incurred elsewhere, or are they truly handled identically?
What I am referring to specifically:
SELECT *
vs.
SELECT item1, item2, etc.
SELECT * FROM...
and
SELECT every, column, list, ... FROM...
will perform the same because both are an unoptimised scan
The difference is:
the extra lookup in sys.columns to resolve *
the contract/signature change when the table schema changes
inability to create a covering index. In fact, no tuning options at all, really
have to refresh views needed if non schemabound
can not index or schemabind a view using *
...and other stuff
Other SO questions on the same subject...
What is the reason not to use select * ?
Is there a difference betweeen Select * and Select list each col
SQL Query Question - Select * from view or Select col1,col2…from view
“select * from table” vs “select colA,colB,etc from table” interesting behaviour in SqlServer2005
Do you mean select * from ... instead of select col1, col2, col3 from ...?
I think it's always better to name the column and retrieve the minimal amount of information, because
your code will work independently of the physical order of the columns in the db. The column order should not impact your application, but it will be the case if you use *. It can be dangerous in case of db migration, etc.
if you name the columns, the DBMS can optimize further the execution. For instance, if there is an index that contains all the data your are interested in, the table will not be accessed at all.
If you mean something else with "wildcard", just ignore my answer...
EDIT: If you are talking about the asterisk wild card as in Select * From ... then see other responses...
If you are talking about wildcards in predicate clauses, or other query expressions using Like operator, (_ , % ) as described below, then:
This has to do with whether using the Wildcard affects whether the SQL is "SARG-ABLE" or not. SARGABLE, (Search-ARGument-able)means whether or not the query's search or sort arguments can be used as entry parameters to an existing index. If you prepend the wild card to the beginning of an argument
Where Name Like '%ing'
Then there is no way to traverse an index on the name field to find the nodes that end in 'ing'.
If otoh you append the wildcard to the end,
Where Name like 'Donald%'
then the optimizer can still use an index on the name column, and the query is still SARG-able
If that you call SQL wild car is *. It does not imply performance overhead by it self. However, if the table is extended you could find yourself retrieving fields you doesn't search.
In general not being specific in the fields you search or insert is a bad habit.
Consider
insert into mytable values(1,2)
What happen if the table is extended to three fields?
It may not be more work from an execution plan standpoint. But if you're fetching columns you don't actually need, that's additional network bandwidth being used between the database and your application. Also if you're using a high-level client API that performs some work on the returned data (for example, Perl's selectall_hashref) then those extra columns will impose performance cost on the client side. How much? Depends.

Fast mysql query to randomly select N usernames

In my jsp application I have a search box that lets user to search for user names in the database. I send an ajax call on each keystroke and fetch 5 random names starting with the entered string.
I am using the below query:
select userid,name,pic from tbl_mst_users where name like 'queryStr%' order by rand() limit 5
But this is very slow as I have more than 2000 records in my table.
Is there any better approach which takes less time and let me achieve the same..? I need random values.
How slow is "very slow", in seconds?
The reason why your query could be slow is most likely that you didn't place an index on name. 2000 rows should be a piece of cake for MySQL to handle.
The other possible reason is that you have many columns in the SELECT clause. I assume in this case the MySQL engine first copies all this data to a temp table before sorting this large result set.
I advise the following, so that you work only with indexes, for as long as possible:
SELECT userid, name, pic
FROM tbl_mst_users
JOIN (
-- here, MySQL works on indexes only
SELECT userid
FROM tbl_mst_users
WHERE name LIKE 'queryStr%'
ORDER BY RAND() LIMIT 5
) AS sub USING(userid); -- join other columns only after picking the rows in the sub-query.
This method is a bit better, but still does not scale well. However, it should be sufficient for small tables (2000 rows is, indeed, small).
The link provided by #user1461434 is quite interesting. It describes a solution with almost constant performance. Only drawback is that it returns only one random row at a time.
does table has indexing on name?
if not apply it
2.MediaWiki uses an interesting trick (for Wikipedia's Special:Random feature): the table with the articles has an extra column with a random number (generated when the article is created). To get a random article, generate a random number and get the article with the next larger or smaller (don't recall which) value in the random number column. With an index, this can be very fast. (And MediaWiki is written in PHP and developed for MySQL.)
This approach can cause a problem if the resulting numbers are badly distributed; IIRC, this has been fixed on MediaWiki, so if you decide to do it this way you should take a look at the code to see how it's currently done (probably they periodically regenerate the random number column).
3.http://jan.kneschke.de/projects/mysql/order-by-rand/

MySql queries: really never use SELECT *?

I'm a self taught developer and ive always been told not to use SELECT *, but most of my queries require to know all the values of a certain row...
what should i use then? should i list ALL of the properties every time? like Select elem1,elem2,elem3,....,elem15 FROM...?
thanks
If you really need all the columns and you're fetching the results by name, I would go ahead and use SELECT *. If you're fetching the row results by index, then it makes sense to specify the column names or else they might not be in the order you expect (especially if the table schema changes).
SELECT * FROM ... is not always the best way to go unless you need all columns. That's because if for example a table has 10 columns and you only need 2-3 of them and these columns are indexed, then if you use SELECT * the query will run slower, because the server must fetch all rows from the datafiles. If instead you used only the 2-3 columns that you actually needed, then the server could run the query much faster if the rows were fetch from a covering index. A covering index is one that is used to return results without reading the datafile.
So, use SELECT * only when you actually need all columns.
If you absolutely have to use *, try to limit it to a specific table; e.g.:
SELECT t.*
FROM mytable t
List only the columns that you need, ideally with a table alias:
SELECT t.elem1,
t.elem2
FROM YOUR_TABLE t
The presence of a table alias helps demonstrate what is a column (and where it's from) vs a derived column.
If you are positive that you will always need all the columns then select * should be ok. But my reasoning for avoiding it is: say another developer has another column added to the table which isn't required by your query..then there is overhead. This can get worse as more columns get added.
The only real performance hit you take from using select * is in the bandwidth required to send back extra columns in your result set, if they're not necessary. Other than that, there's nothing inherently "bad" about using select *.
You might SELECT * from a subquery.
Yes, Select * is bad. You do not state what language you will be using process the returned data. Suppose you receive these records back as an array (not a hash map). In this case, what is in Row[12]? Maybe it was ZipCode when you wrote the app, but guess what happens when someone inserts a field SuiteNumber before ZipCode.
Or suppose the next coder appends a huge blob field to each record. Ouch!
Or a little more subtle: let's assume you're doing a join or subselect, and you have no Text or Blob type fields. MySQL will create any temp files it needs in memory. But as soon as you include a Text field (even TinyText), MySQL will need to create the temp file on disk, and call sort-merge. This won't break the program, but it can kill performance.
Select * is sacrificing maintainability in order to save a little typing.