What is the maximum number of items passed to IN() in MySQL - mysql

I am given a bunch of IDs (from an external source) that I need to cross-reference with the ones on our database and filter out those that fall between a certain date and are also "enabled" and some other parameters. I can easily do this by doing:
SELECT * FROM `table` WHERE `id` IN (csv_list_of_external_ids)
AND (my other cross-reference parameters);
And by doing this, of all those incoming IDs I will get the ones that I want. But obviously this is not a very efficient method when the external ids are in the thousands. And I'm not sure if MySQL will even support such a huge query.
Given that nothing can be cached (since both the user data and the external ids change pretty much on every query), and that these queries happen at least every 10 seconds. What other SQL alternatives are there?

I believe the only limit is the length of the actual query, which is controlled by the "max_allowed_packet" parameter in your my.cnf file.

If you express it as a subquery:
SELECT * FROM table
WHERE id IN (SELECT ID FROM SOME_OTHER_TABLE)
AND ...
there is no limit.

Related

Querying a DB2 table using the "DISTINCT" keyword from MSAccess

I've got a rather simple query to a linked DB2 table.
SELECT GC_TBSELC.*
FROM GC_TBSELC
WHERE SELC_EFF_DATE > #1/1/2017#;
Works fine, returns results. However, when I add the "DISTINCT" keyword, I get an error:
ODBC -- CALL FAILED
[[IBM][CLI Driver][DB2] SQL0904N Unsuccessful execution caused by an
unavailable resource. Reason code: "00C90305", type of resource:
"00000100", and resource name: "DSNDB07". SQLSTATE=57011
Any idea on why the "DISTINCT" keyword would cause this, and if there's a way around it to get distinct records from the table?
SQL0904N with Reason code: 00C90305 indicates the following:
The limit on the space usage of the work file storage by an agent was
exceeded. The space usage limit is determined by the zparm keyword
MAXTEMPS.
By adding the DISTINCT clause on a SELECT * (all columns), you likely exceeded the work space available.
Let me ask a better question: Why would you want to DISTINCT all columns from a Table? Is this really the result set you are looking for? Would it be more appropriate to DISTINCT a subset of the columns in this table?
The query without the DISTINCT did not require duplicate removal - rows could just be streamed back to the caller.
The DISTINCT tells Db2 - remove duplicates before passing back the rows. In this case, Db2 likely materialized the rows into sort work and sorted to remove duplicates and during that process, sort work limits were exceeded.

How to add a LIMIT to a user-generated sql query

I have built a query editor where a user can enter in a query. However, it needs to limit the user's entry to 1000 results, otherwise the user could enter in something like :
SELECT * FROM mybigtable
It could try and download 1 billion results.
What would be the best way to enforce a limit? The first approach I thought of was to do:
SELECT * FROM (
user-query
) x LIMIT 1000
However, this would execute the entire query (and could take forever) before doing the actual limit. What would be the best way to enforce a strict limit on the user's sql input?
This is too long for a comment.
I don’t think that there is generic solution for this.
Wrapping the user query in a SELECT * FROM ... LIMIT 1000 statement is attractive but ;
there are edge cases where it can produce invalid SQL, for example if the user query contains a CTE (the WITH clause must be placed at the very beginning of the query)
while it will happily limit the number of rows returned to the user, it will not prevent the database from scanning the entire resultset
The typical solution for the second use case is to filter rows according to an autoincremented integer column (usually the primary key of your table). But that’s even harder to make generic.
To make it short : manipulating SQL at distance is tricky : if you want a complete solution for your use case, get yourself a real query builder (or at least a sql parser).

Should I use the sql COUNT(*) or use SELECT to fetch rows and count the rows later

I am writing a NodeJs application which should be very light in weight to the mysql db(Engine- InnoDB).
I am trying to count the number of records of a table in the mysql db
So I was wondering whether I should use the COUNT(*) function or get all the rows with a SELECT query and then count the rows using JavaScript.
Which way is better with respect to,
DB Operation cost
Overall performance
Definitely use the count() function - unless you need the data within the records as well for other purpose.
If you query all rows, then on MySQL side the server has to prepare a resultset (memory consumption, time to fetch data into resultset), then push it down through the connection to your application (more data takes more time), your application has to receive the data (again, memory consumption and time to create the resultset), and finally your application has to count the number of records in the resultset.
If you use count(), MySQL counts records and returns just a single number.
count() is obviously better than fetch and count separately.
As count() fetch the total count from index key (if there is any primary key).
Also the fetching data takes too much of time( disk I/O and network operations).
Thanks
When getting information from a database, the usual best approach is to get what you need and nothing more. This includes things like selecting specific columns rather than select *, and aggregating at the DBMS rather than in your client code. In this case, since all you apparently need is a count, use count().
It's a good bet that will outperform any other attempted solution since:
you'll be sending only what's absolutely necessary over the network (this may be less important for local databases but, once you have your data elsewhere, it can have a real impact); and
the DBMS will almost certainly be optimised for that use case.
Do a count(FIELD_NAME) as it will be much faster when you fetch all rows .It will only get count which is always index in table.

Why shouldn't we use Select * in a mysql query on a production server?

Based on this question here Selecting NOT NULL columns from a table One of the posters said
you shouldn't use SELECT * in production.
My Question: Is it true that we shouldn't use Select * in a mysql query on a production server? If yes, why shouldn't we use select all?
Most people do advise against using SELECT * in production, because it tends to break things. There are a few exceptions though.
SELECT * fetches all columns - while most of the times you don't
need them all. This causes the SQL-server to send more columns than
needed, which is a waste and makes the system slower.
With SELECT *, when you later add a column, the old query will also
select this new column, while typically it will not need it. Naming
the columns explicitly prevents this.
Most people that write SELECT * queries also tend to grab the rows
and use column order to get the columns - which WILL break your code
once columns are injected between existing columns.
Explicitly naming the columns also guarantees they are always in the same order, while SELECT * might behave differently when the table column order is modified.
But there are exceptions, for example statements like these:
INSERT INTO table_history
SELECT * FROM table
A query like that takes rows from table, and inserts them into table_history. If you want this query to keep working when new rows are added to table AND to table_history, SELECT * is the way to go.
Remember that your database server isn't necessarily on the same machine as the program querying the database. The database server could be on a network with limited bandwidth; it could even be halfway across the world.
If you really do need every column, then by all means do SELECT * FROM table.
If you only need certain columns, though, it would waste bandwidth to ask for all columns using SELECT * FROM table only to throw half the columns away.
Other potential reasons it might be good to specify which exact columns you want:
The database structure may change. If your program assumes certain column names, then it may fail if the column names change, for example. Explicitly naming the columns you want to retrieve will make the program fail immediately if your assumptions about the column names are violated.
As #Konerak mentioned, naming the columns you want also ensures that the order of the columns in your result is the same, even if the table schema changes (i.e. inserting one column in-between two others.) This is important if you're depending on FirstName being the [2]nd element of a result.
(Note: a more robust and self-documenting way of dealing with this is to ask for your database results as a list of key-value pairs, like a PHP associative array, Perl hash or a Python dict. That way you never need to use a number to index into the result (name = result[2] ) - instead you can use the column name: name = result["FirstName"].)
Using SELECT * is very inefficient, especially for tables that have a lot of columns. You should only select the columns you need.
Besides this, using column names makes the query easier to read and maintain.

MySql queries: really never use SELECT *?

I'm a self taught developer and ive always been told not to use SELECT *, but most of my queries require to know all the values of a certain row...
what should i use then? should i list ALL of the properties every time? like Select elem1,elem2,elem3,....,elem15 FROM...?
thanks
If you really need all the columns and you're fetching the results by name, I would go ahead and use SELECT *. If you're fetching the row results by index, then it makes sense to specify the column names or else they might not be in the order you expect (especially if the table schema changes).
SELECT * FROM ... is not always the best way to go unless you need all columns. That's because if for example a table has 10 columns and you only need 2-3 of them and these columns are indexed, then if you use SELECT * the query will run slower, because the server must fetch all rows from the datafiles. If instead you used only the 2-3 columns that you actually needed, then the server could run the query much faster if the rows were fetch from a covering index. A covering index is one that is used to return results without reading the datafile.
So, use SELECT * only when you actually need all columns.
If you absolutely have to use *, try to limit it to a specific table; e.g.:
SELECT t.*
FROM mytable t
List only the columns that you need, ideally with a table alias:
SELECT t.elem1,
t.elem2
FROM YOUR_TABLE t
The presence of a table alias helps demonstrate what is a column (and where it's from) vs a derived column.
If you are positive that you will always need all the columns then select * should be ok. But my reasoning for avoiding it is: say another developer has another column added to the table which isn't required by your query..then there is overhead. This can get worse as more columns get added.
The only real performance hit you take from using select * is in the bandwidth required to send back extra columns in your result set, if they're not necessary. Other than that, there's nothing inherently "bad" about using select *.
You might SELECT * from a subquery.
Yes, Select * is bad. You do not state what language you will be using process the returned data. Suppose you receive these records back as an array (not a hash map). In this case, what is in Row[12]? Maybe it was ZipCode when you wrote the app, but guess what happens when someone inserts a field SuiteNumber before ZipCode.
Or suppose the next coder appends a huge blob field to each record. Ouch!
Or a little more subtle: let's assume you're doing a join or subselect, and you have no Text or Blob type fields. MySQL will create any temp files it needs in memory. But as soon as you include a Text field (even TinyText), MySQL will need to create the temp file on disk, and call sort-merge. This won't break the program, but it can kill performance.
Select * is sacrificing maintainability in order to save a little typing.