Go-MySQL-Driver: Prepared Statements with Variable Query Parameters - mysql

I'd like to use prepared statements with MySQL on my Go server, but I'm not sure how to make it work with an unknown number of parameters. One endpoint allows users to send an array of id's, and Go will SELECT the objects from the database matching the given id's. This array could contain anywhere from 1 to 20 id's, so how would I construct a prepared statement to handle that? All the examples I've seen require you to know exactly the number of query parameters.
The only (very unlikely) option I can think is to prepare 20 different SELECT statements, and use the one that matches the number of id's the user submits - but this seems like a terrible hack. Would I even see the performance benefits of prepared statements at that point?
I'm pretty stuck here, so any help would be appreciated!

No RDBMS I'm aware of is able to bind an unknown number of parameters. It is never possible to match an array with an unknown number of parameter placeholders. It means there is not smart way to bind an array to a query such as:
SELECT xxx FROM xxx WHERE xxx in (?,...,?)
This is not a limitation of the client driver, this is simply not supported by database servers.
There are various workarounds.
You can create the query with 20 ?, bind the values you have, and complete the binding by NULL values. It works fine, because of the particular semantic of comparison operations involving NULL values. A condition like "field = ?" evaluates always to false when the parameter is bound to a NULL value, even if some rows would match. Supposing you have 5 values in your array, the database server will have to deal with 5 provided values, plus 15 NULL values. It is usually smart enough to just ignore the NULL values
An alternative solution is to prepare all the queries (each one with a different number of parameters). It is only interesting if the maximum number of parameters is limited. It works well on database for which prepared statements really matters (such as Oracle).
As far as MySQL is concerned, the gain of using a prepared statement is quite limited. Keep in mind that prepared statements are only maintained per session, they are not shared across sessions. If you have a lot of sessions, they take memory. On the other hand, parsing statements with MySQL does not involve much overhead (contrary to some other database systems). Generally, generating plenty of prepared statements to cover a single query is not worth it.
Note that some MySQL drivers offer a prepared statement interface, while they do not use internally the prepared statement capability of the MySQL protocol (again, because often, it is not worth it).
There are also some other solutions (like relying on a temporary table), but they are only interesting if the number of parameters is significant.

Related

Optimization: WHERE x IN (1, 2 .., 100.000) vs INNER JOIN tmp_table USING(x)?

I've visited one interesting job interview recently. There I was asked a question about optimizing a query with a WHERE..IN clause containing long list of scalars (thousands of values, that is). This question is NOT about subqueries in the IN clause, but about simple list of scalars.
I answered right away, that this can be optimized using an INNER JOIN with another table (possibly temporary one), which will contain only those scalars. My answer was accepted and there was a note from the reviewer, that "no database engine currently can optimize long WHERE..IN conditions to be performant enough". I nodded.
But when I walked out, I started to have some doubts. The condition seemed rather trivial and widely used for modern RDBMS not to be able to optimize it. So, I started some digging.
PostgreSQL:
It seems, that PostgreSQL parse scalar IN() constructions into ScalarArrayOpExpr structure, which is sorted. This structure is later used during index scan to locate matching rows. EXPLAIN ANALYZE for such queries shows only one loop. No joins are done. So, I expect such query to be even faster, than INNER JOIN. I tried some queries on my existing database and my tests proved that position. But I didn't care about test purity and that Postgres was under Vagrant so I might be wrong.
MSSQL Server:
MSSQL Server builds a hash structure from the list of constant expressions and then does a hash join with the source table. Even though no sorting seems to be done, that is a performance match, I think. I didn't do any tests since I don't have any experience with this RDBMS.
MySQL Server:
The 13th of these slides says, that before 5.0 this problem indeed took place in MySQL with some cases. But other than that, I didn't find any other problem related to bad IN () treatment. I didn't find any proofs of the inverse, unfortunately. If you did, please kick me.
SQLite:
Documentation page hints some problems, but I tend to believe things described there are really at conceptual level. No other information was found.
So, I'm starting to think I misunderstood my interviewer or misused Google ;) Or, may be, it's because we didn't set any conditions and our talk became a little vague (we didn't specify any concrete RDBMS or other conditions. That was just abstract talk).
It looks like the days, where databases rewrote IN() as a set of OR statements (which can cause problems sometimes with NULL values in lists, btw) are long ago. Or not?
Of course, in cases where a list of scalars is longer than allowed database protocol packet, INNER JOIN might be the only solution available.
I think in some cases query parsing time (if it was not prepared) alone can kill performance.
Also databases could be unable to prepare IN(?) query which will lead to reparsing it again and again (which may kill performance). Actually, I never tried, but I think that even in such cases query parsing and planning is not huge comparing to query execution.
But other than that I do not see other problems. Well, other than the problem of just HAVING this problem. If you have queries, that contain thousands of IDs inside, something is wrong with your architecture.
Do you?
Your answer is only correct if you build an index (preferably a primary key index) on the list, unless the list is really small.
Any description of optimization is definitely database specific. However, MySQL is quite specific about how it optimizes in:
Returns 1 if expr is equal to any of the values in the IN list, else
returns 0. If all values are constants, they are evaluated according
to the type of expr and sorted. The search for the item then is done
using a binary search. This means IN is very quick if the IN value
list consists entirely of constants.
This would definitely be a case where using IN would be faster than using another table -- and probably faster than another table using a primary key index.
I think that SQL Server replaces the IN with a list of ORs. These would then be implemented as sequential comparisons. Note that sequential comparisons can be faster than a binary search, if some elements are much more common than others and those appear first in the list.
I think it is bad application design. Those values using IN operator are most probably not hardcoded but dynamic. In such case we should always use prepared statements the only reliable mechanism to prevent SQL injection.
In each case it will result in dynamically formatting the prepared statement (as number of placeholders is dynamic too) and it will also result in having excessive hard parsing (as many unique queries as we have number of IN values - IN (?), IN (?,?), ...).
I would either load these values into table as use join as you mentioned (unless loading is too overhead) or use Oracle pipelined function IN foo(params) where params argument can be complex structure (array) coming from memory (PLSQL/Java etc).
If the number of values is larger I would consider using EXISTS (select from mytable m where m.key=x.key) or EXISTS (select x from foo(params) instead of IN. In such case EXISTS provides better performance than IN.

performance of dynamic cursors in stored procedures

I would like to know what is the best practice for better performance when you have a select query that can have any combination of a number (20+) of parameters in the where clause that is passed to a stored procedure.
let's say I have a query that should return the list of people and their addresses (maybe more than 1 address per person). The user wants to search by any possible combination of fields from the person/address tables. The search could be on one field or all 20 or anything in between.
The way I use to handle this is by creating one cursor like this
(for simplicity I am listing 2 variables only a varchar and an int)
create procedure dynasp for (
in in_name varchar(40),
in in_age int
..... rest of parameters here...
declare cursor cs for
select .... from person join address....
where
(in_age = 0 or in_age = person_age) and
(in_name is null or rtrim(in_name)='' or in_name = person.name)
and...
I believe since the value of an input variable is constant, the query should not evaluate it on each row or does it?
The other option that I use is using dynamic cursor built from string in the sp. this way it will contain only the fields that are not empty in the where clause, but I believe this means that the sql needs to be constructed and recompiled on every call to the SP.
My question is for best practices which method above is more recommended, and is there any other better way than the 2 methods mentioned above?
Thank you
The question of performance basically hinges on one simple thing: does your table have any indexes that you intend to use to improve the performance of the query?
If indexes aren't an issue, then your approach is fine. Well, let me add: assuming a cursor is necessary for the additional processing that you are doing. If you can just return the result set and do set-based processing, that is superior to using cursors.
If indexes are an issue, then a long complex where statement with a bunch of constant expressions might confuse the MySQL compiler. The documentation on using indexes for where clauses is here. MySQL definitely removes constant expressions. However, in a very complex expression, I'm not sure how well this interacts with choosing the right index. (I am assuming you are using MySQL based on the syntax).
For this latter case, a dynamic cursor would be beneficial, because it would encourage MySQL to choose an execution plan that uses indexes.
So, if you are not using indexing (or partitioning), then your current approach is fine. If you are, look at the execution plan for your queries. If they use the appropriate indexes, then your current approach is fine. If they are not, consider dynamic cursors.
It depends on the size of the tables you're searching. If you have 20+ criteria in the where clause, all containing or operators, the query optimizer will not be able to choose a good index to use and will likely just scan the entire table(s). For small tables, this won't matter but for very large tables, it will be slow.
The other alternative, constructing a dynamic query, will occur some overhead in parsing and choosing a query plan, but when executed, the query will likely be more efficient. (Make sure you're protecting against SQL injection vulnerabilities).
So the best practice is to benchmark both and see what's best in your situation.

Codeigniter Complex MySQL Query - Removing Backticks - is it a Security Issue?

I'm trying to build a MySQL query to return appropriate search results, by examining several different database fields. For example if a user searches "plumber leeds", if a business had 'leeds' in the 'city' field and the word 'plumber' as part of their name, I would want that search result to be returned.
User searches could contain several words and are unpredictable. I'm currently achieving what I need by exploding the search term, trimming it and using it to compile a complex search query to return all relevant results.
I'm storing this complex query in a variable and using Codeigniter's Active Record to run the query.
$this->db->where($compiled_query, null, false);
What I'm concerned about is that I'm not protecting the query with backticks and I'm unsure if this is a security issue. I have XSS Clean enabled but still not sure if this is ok.
According to CI's user manual:
$this->db->where() accepts an optional third parameter. If you set it to FALSE, CodeIgniter will not try to protect your field or table names with backticks.
Source: http://ellislab.com/codeigniter/user-guide/database/active_record.html
Some info about how I compile the query here in a separate question. I'm aware mysql_real_escape_string is about to be deprecated and isn't a catch-all, hence part of my concern about this method.
https://stackoverflow.com/questions/13321642/codeigniter-active-record-sql-query-of-multiple-words-vs-multiple-database-fi
Any help appreciated
Backticks have nothing to do with security. They are really just a way to "stringify" your field and table names, so that you could use a field called datatype for example and not have ti conflict with mysql keywords
You are safe
I wouldn't say you're absolutely "safe", because you're never technically safe if you accept user input in a SQL query (even if you've manipulated it... when there's a will, there's a way).
Once you relinquish control over what is given to your application, you must be very careful how you deal with that data so that you don't open yourself up to an injection attack.
XSS Clean will help with POST or cookie data -- it does not run automatically on GET variables. I would manually run $data = $this->security->xss_clean($data); on the input if it's from the GET array.

What databases could run the following SQL?

I have constructed a query and I'm wondering if it would work on any database besides MySQL. I have never actually used another database so I'm not great with the differences.
UPDATE `locks` AS `l1`
CROSS JOIN (SELECT SUM(`value`) AS `sum` FROM `locks`
WHERE `key` IN ("key3","key2")) AS `l2`
SET `l1`.`value` = `l1`.`value` + 1
WHERE `l1`.`key` = "key1" AND (`l2`.`sum` < 1);
Here are the specific features I'm relying on (as I can think of them):
Update queries.
Joins in update queries.
Aggregate functions in non-explicitly-grouped queries.
WHERE...IN condition.
I'm sure people will be curious exactly what this does, and this may also include database features that might not be ubiquitous. This is an implementation of mutual exclusion using a database, intended for a web application. In my case I needed it because certain user actions cause tables to be dropped and recreated with different columns, and I want to avoid errors if other parts of the application try to insert data. The implementation, therefore, is specialized to solve the readers-writers problem.
This query assumes there exists a table locks with two fields: key (varchar) and value (int). It further assumes that the table contains a row such that key="key1". Then it tries to increment the value for "key1". It only does so if for every key in the list ("key2","key3"), the associated value is 0 (the WHERE condition for l2 is an approximation that assumes value is never negative). Therefore this query only "obtains a lock" if certain conditions are met, presumably in an atomic fashion. Then, the application checks if it received a lock by the return value of the query which presumably states how many rows were affected. If and only if no rows were affected, the application did not receive a lock.
So, here are the additional conditions not discernable from the query itself:
Assumes that in a multi-threaded environment, a copy of this query will never be interleaved with another copy.
Processing the query must return whether any values were affected.
As a secondary request, I would appreciate any resources on "standard SQL." I've heard about it but never been able to find any kind of definition, and I feel like I'm missing a lot of things when the MySQL documentation says "this feature is an extension of standard SQL."
Based on the responses, this query should work better across all systems:
UPDATE locks AS l1
CROSS JOIN (SELECT SUM(val) AS others FROM locks
WHERE keyname IN ('key3','key2')) AS l2
SET l1.val = l1.val + 1
WHERE l1.keyname = 'key1' AND (l2.others < 1);
Upvotes for everyone because of the good answers. The marked answer seeks to directly answer my question, even if just for one other DBMS, and even though there may be better solutions to my particular problem (or even the problem of cross-platform SQL in general).
This exact syntax would only work in MySQL.
It's an ugly workaround for this construct:
UPDATE locks
SET value = 1
WHERE key = 'key1'
AND NOT EXISTS
(
SELECT NULL
FROM locks li
WHERE li.key IN ('key2', 'key3')
AND li.value > 0
)
which works in all systems except MySQL, because the latter does not allow subqueries on the target table in UPDATE or DELETE statements.
For PostgreSQL
1) Update queries.
Can't imagine a RDBMS that has no UPDATE. (?)
2) Joins in update queries.
In PostgreSQL you would include additional tables with FROM from_list.
3) Aggregate functions in non-grouped queries.
Not possible in PostgreSQL. Use subqueries, CTE or Window functions for that.
But your query is grouped. The GROUP BY clause is just not spelled out. That works in PostgreSQL, too.
The presence of HAVING turns a query into a grouped query even if
there is no GROUP BY clause. This is the same as what happens when the
query contains aggregate functions but no GROUP BY clause.
(Quote from the manual).
4) WHERE...IN condition
Works in any RDBMS I know of.
"Additional conditions": Assumes that in a multi-threaded environment, a copy of this query will never be interleaved with another copy.
PostgreSQL's multiversion model MVCC (Multiversion Concurrency Control) is superior to MySQL for handling concurrency. Then again, most RDBMS are superior to MySQL in this respect.
Processing the query must return whether any values were affected.
Postgres does that, most every RDBMS does.
Furthermore, this query wouldn't run in PostgreSQL because:
no identifiers with backticks (that's MySQL slang).
values need to be single-quoted, not double-quoted.
See the list of reserved words in Postgres and SQL standards.
A combined list for various RDBMS.
This will only work in mysql, just because you use "`" delimiter, which is mysql-specific only.
What if you replace delimiter with more "standard" one: then probably it will work in all modern DBMS (postgres, sql server, oracle), but I would never write a general query for all - I'd better written a specific query for each used (or potentially used) DBMS to use its specific language dialects to get the best performance and query readability.
What about "As a secondary request, I would appreciate any resources on "standard SQL."" --- get a look at http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt

Microsoft Access and paging large datasets

Is there an easy way to page large datasets using the Access database via straight SQL? Let's say my query would normally return 100 rows, but I want the query to page through the results so that it only retrieves (let's say) the first 10 rows. Not until I request the next 10 rows would it query for rows 11-20.
If you run a ranking query, you will get a column containing ascending numbers in your output. You can then run a query against this column using a BETWEEN...AND clause to perform your paging.
So, for example, if your pages each contain 10 records and you want the third page, you would do:
SELECT * FROM MyRankingQuery WHERE MyAscendingField BETWEEN 30 and 39
How to Rank Records Within a Query
Microsoft support KB208946
The Access Database Engine doesn’t handle this very well: the proprietary TOP N syntax returns ties and the N cannot be parameterized; the optimizer doesn't handle the equivalent subquery construct very well at all :(
But, to be fair, this is something SQL in general doesn't handle very well. This is one of the few scenarios where I would consider dynamic SQL (shudder). But first I would consider using an ADO classic recordset, which has properties for AbsolutePage, PageCount, and PageSize (which, incidentally, the DAO libraries lack).
You could also consider using the Access Database Engine's little-known LIMIT TO nn ROWS syntax. From the Access 2003 help:
You may want to use ANSI-92 SQL for
the following reasons... ...
Using the LIMIT TO nn ROWS clause to limit the number of rows returned by a query
Could come in handy?
... my tongue is firmly embedded in my cheek :) This syntax doesn't exist in the Access Database Engine and never has. Instead, it's yet another example of the appalling state of the Access documentation on the engine side of the house.
Is the product fit for purpose if the documentation has massive holes and content cannot be trusted? is Caveat emptor.
I'm not certain how ranking answers your question. Also, I'm having trouble imagining why you would need this -- this is usually something you do on a website in order to break down the data retrieved into small chunks. But a Jet/ACE database is not a very good candidate for a website back end, unless it's strictly read-only.
One other SQL solution would use nested TOP N, but usually requires on-the-fly procedural code to write the SQL.
It also has the problem with ties, in that unless you include a unique field in your ORDER BY, you can get 11 records with a TOP 10 should two records have a tie on the values in the ORDER BY clause.
I'm not suggesting this is a better solution, just a different one.