How is this MySQL query vulnerable to SQL injection? - mysql

In a comment on a previous question, someone said that the following sql statement opens me up to sql injection:
select
ss.*,
se.name as engine,
ss.last_run_at + interval ss.refresh_frequency day as next_run_at,
se.logo_name
from
searches ss join search_engines se on ss.engine_id = se.id
where
ss.user_id='.$user_id.'
group by ss.id
order by ss.project_id, ss.domain, ss.keywords
Assuming that the $userid variable is properly escaped, how does this make me vulnerable, and what can I do to fix it?

Every SQL interface library worth using has some kind of support for binding parameters. Don't try to be clever, just use it.
You may really, really think/hope you've escaped stuff properly, but it's just not worth the time you don't.
Also, several databases support prepared statement caching, so doing it right can also bring you efficiency gains.
Easier, safer, faster.

Assuming it is properly escaped, it doesn't make you vulnerable. The thing is that escaping properly is harder than it looks at first sight, and you condemn yourself to escape properly every time you do a query like that. If possible, avoid all that trouble and use prepared statements (or binded parameters or parameterized queries). The idea is to allow the data access library to escape values properly.
For example, in PHP, using mysqli:
$db_connection = new mysqli("localhost", "user", "pass", "db");
$statement = $db_connection->prepare("SELECT thing FROM stuff WHERE id = ?");
$statement->bind_param("i", $user_id); //$user_id is an integer which goes
//in place of ?
$statement->execute();

If $user_id is escaped, then you should not be vulnerable to SQL Injection.
In this case, I would also ensure that the $user_id is numeric or an integer (depending on the exact type required). You should always limit the data to the most restrictive type you can.

If it is properly escaped and validated, then you don't have a problem.
The problem arises when it is not properly escaped or validated. This could occur by sloppy coding or an oversight.
The problem is not with particular instances, but with the pattern. This pattern makes SQL injection possible, while the other pattern makes it impossible.

All answers are good and right, but I feel I need to add that the prepare/execute paradigm is not the only solution, either. You should have a database abstraction layer, rather than using the library functions directly and such a layer is a good place to explicitly escape string parameters, whether you let prepare do it, or you do it yourself.

I think 'Properly Escaped' here is the keyword. In your last question, I'm making the assumption that your code is copy pasted from your production code, and since you asked question about three tables join, I also make the assumption that you didn't do proper escaping, hence my remark on SQL Injection attack.
To answer your question, as so many people here has described, IF the variable has been 'Properly Escaped', then you have no problem. But why trouble yourself by doing that? As some people have pointed out, sometimes Properly Escaping is not a straightforward thing to do. There are patterns and library in PHP that makes SQL Injection impossible, why don't we just use that? (I also deliberately make assumption that your code is in fact PHP). Vinko Vrsalovic answer may give you ideas on how to approach this problem.

That statement as such isn't really a problem, its "safe", however I don't know how you are doing this (one level up on the API stack). If $user_id is getting inserted into the statement using string operations (like as if you are letting Php automatically fill out the statement) then its dangerous.
If its getting filled in using a binding API, then your ready to go.

Related

What pattern to check on an SQL query for possible injection?

I want to detect possible SQL injection atack by checking the SQL query. I am using PDO and prepared statement, so hopefully I am not in the danger of getting attacked by someone. However, what I want to detect is the possibility of input/resulting query string that may become a dangerous query. For example, my app--properly--will never generate "1=1" query, so I may check the generated query string for that, and flag the user/IP producing that query. Same thing with "drop table", but maybe I can check only by looping the input array; or maybe I should just check to the generated query all over again. I am using MySQL, but pattern for other drivers are also appreciated.
I have read RegEx to Detect SQL Injection and some of the comments are heading in this direction. To my help, I'm developing for users that rarely use English as input, so a simple /drop/ match on the query may be enough to log the user/query for further inspection. Some of the pattern I found while researching SQL injection are:
semicolon in the middle of sentence -- although this may be common
double dash/pound sign for commenting the rest of the query
using quote in the beginning & ending of value
using hex (my target users have small to low chance for inputting 0x in their form)
declare/exec/drop/1=1 (my app should not generate these values)
html tag (low probability coming from intended user/use case)
etc.
All of the above are easier to detect by looping the input values before the query string is generated because they haven't been escaped. But how much did I miss? (a lot, I guess) Any other obscure pattern I should check? What about checking the generated query? Any pattern that may emerge?
tl;dr: What pattern to match an SQL query (MySQL) to check for possible injection? I am using PDO with prepared statement and value binding, so the check is for logging/alert purposes.
In my shop we have two rules.
Always use parameters in SQL queries.
If for some reason you can't follow rule one, then every piece of data put into a query must be sanitized, either with intval() for integer parameters or an appropriate function to sanitize a string variable according to its application data type. For example, a personal name might be Jones or O'Brien or St. John-Smythe but will never have special characters other than apostrophe ', hyphen -, space, or dot. A product number probably contains only letters or numbers. And so forth.
If 2 is too hard follow rule 1.
We inspect code to make sure we're doing these things.
But how much did I miss?
You guess right. Creating a huge blacklist wouldn't make your code immune. This approach is history. The other questions follow the same idea.
Your best bets are:
Validating input data (input doesn't necessarily come from an external party)
Using prepared statements.
Few steps but bulletproof.
Not possible.
You will spend the rest of your life in an armament race -- you build a defense, they build a better weapon, then you build a defense against that, etc, etc.
It is probably possible to write a 'simple' SELECT that will take 24 hours to run.
Unless you lock down the tables, they can look, for example, at the encrypted passwords and re-attack with a root login.
If you allow any type of string, it will be a challenge to handle the various combinations of quoting.
There are nasty things that can be done with semi-valid utf8 strings.
And what about SET statements.
And LOAD DATA.
And Stored procs.
Instead, decide on the minimal set of queries you allow, then parameterize that so you can check, or escape, the pieces individually. Then build the query.

Why do people on SO prefer CASE WHEN to other alternatives?

I have noticed that on SO a lot of people seem to prefer CASE ... WHEN to other alternatives.
For example all of the answers in this question use CASE ... WHEN whereas I would have used a simple IF. IF is quite a bit less to type and is prevalent in all programming languages so it seems kind of weird to me that not a single answer uses it. (I would also expect that IF is a bit faster though I did not measure it).
Even more interesting are the answers to this question. 2 out of 3 answers (among them the accepted answer) suggest using CASE ... WHEN when from my point of view COALESCE is the better solution (after all COALESCE was created for exactly the problem the OP has). (Also, in this case I am almost certain that COALESCE would be faster.)
So, my question is, is there any benefit to CASE ... WHEN (that offsets the additional typing) that I am missing or is it a case of "To a man with a hammer, everything looks like a nail"?
One reason, a good one actually, is that a CASE WHEN expression is ANSI compliant while IF is not. Were someone to face porting a MySQL query to another database the IF calls in MySQL would probably all have to be rewritten.
MySQL, like most databases, extended ANSI by introducing the IF() function. Perhaps IF, or something similar to it, will become part of the standard some day.
CASE WHEN is in the SQL standard. IF is not. As SQL databases do have vastly different dialects, it is not the worst idea to stick to code that will work on most databases for the following reasons:
If you build the habit of using code that is specific to one database, you will have troubles when working on another.
If you use code that is specific to one database, you cannot test your query with other databases by simply copy pasting them. You can also not migrate your application to other databases without changing your SQL queries.
CASE WHEN is the ANSI standard expression for conditional expressions. IF() is a function specific to MySQL.
In general, I prefer ANSI standard functionality when available -- although there are occasional exceptions.
Specifically about IF() as a function. It is easily confused with IF as a statement in MySQL. Using it as a function seems like unnecessary confusion (admittedly, there are other databases where CASE can be confused with a CASE statement in the scripting language, but that is not an issue in MySQL).
In addition, IF() is pretty close to control flow, which makes it different from most other functions anyway.

Table name changing to avoid SQL injection attack

I understand the basic process of SQL injection attack. My question is related to SQL injection prevention. I was told that one way to prevent such an attack is by frequently changing the table name! Is that possible?
If so, can someone provide me a link to read about it more because I couldn't find an explanation about it on the web.
No. That makes no sense. You'd either have to change every line of code that references the table or you'd have to leave in place something like a view with the old table name that acts exactly like the old table. No reasonable person would do that. Plus, it's not like there are a ton of reasonable names for tables so you'd be doing crazy things like saying table A stores customer data and AA stores employer data and AAA was the intersection between customers and employers.
SQL injection is almost comically simple to prevent. Use prepared statements with bind variables. Don't dynamically build SQL statements. Done. Of course, in reality, making sure that the new developer doesn't violate this dictum either because they don't know any better or because they can hack something out in a bit less time if they just do a bit of string concatenation makes it a bit more complex. But the basic approach is very simple.
Pffft. What? Frequently changing a table name?
That's bogus advice, as far as "preventing SQL Injection".
The only prevention for SQL Injection vulnerabilities is to write code that isn't vulnerable. And in the vast majority of cases, that is very easy to do.
Changing table names doesn't do anything to close a SQL Injection vulnerability. It might make a successful attack vector less repeatable, requiring an attacker to make some adjustments. But it does nothing prevent SQL Injection.
As a starting point for research on SQL Injection, I recommend OWASP (Open Web Application Security Project)
Start here: https://www.owasp.org/index.php/SQL_Injection
If you run across "changing a table name" as a mitigation, let me know. I've never run across that as a prevention or mitigation for SQL Injection vulnerability.
Here's things you can do to prevent SQL injection:
Use an ORM that encapsulates your SQL calls and provides a friendly layer to your database records. Most of these are very good at writing high quality queries and protecting you from injection bugs simply because of how you use them.
Use prepared statements with placeholder values whenever possible. Write queries like this:
INSERT INTO table_name (name, age) VALUES (:name, :age)
Be very careful to properly escape any and all values that are inserted into SQL though any other method. This is always a risky thing to do, so any code you do write like this should have any escaping you do made blindingly obvious so that a quick code review can verify it's working properly. Never hide escaping behind abstractions or methods with cute names like scrub or clean. Those methods might be subtly broken and you'd never notice.
Be absolutely certain any table name parameters, if dynamic, are tested versus a white list of known-good values. For example, if you can create records of more than one type, or put data into more than one table ensure that the parameter supplied is valid.
Trust nothing supplied by the user. Presume every single bit of data is tainted and hostile unless you've taken the trouble to clean it up. This goes doubly for anything that's in your database if you got your database from some other source, like inheriting a historical project. Paranoia is not unfounded, it's expected.
Write your code such that deleting a line does not introduce a security problem. That means never doing this:
$value = $db->escaped(value);
$db->query("INSERT INTO table (value) VALUES ('$value')");
You're one line away from failure here. If you must do this, write it like so:
$value_escaped = $db->escaped(value);
$db->query("INSERT INTO table (value) VALUES ('$value_escaped')");
That way deleting the line that does the escaping does not immediately cause an injection bug. The default here is to fail safely.
Make every effort to block direct access to your database server by aggressively firewalling it and restricting access to those that actually need access. In practice this means blocking port 3306 and using SSH for any external connections. If you can, eliminate SSH and use a secured VPN to connect to it.
Never generate errors which spew out stack traces that often contain information highly useful to attackers. For example, an error that includes a table name, a script path, or a server identifier is providing way too much information. Have these for development, and ensure these messages are suppressed on production servers.
Randomly changing table names is utterly pointless and will make your code a total nightmare. It will be very hard to keep all your code in sync with whatever random name the table is assuming at any particular moment. It will also make backing up and restoring your data almost impossible without some kind of decoder utility.
Anyone who recommends doing this is proposing a pointless and naïve solution to a an already solved problem.
Suggesting that randomly changing the table names fixes anything demonstrates a profound lack of understanding of the form SQL injection bugs take. Knowing the table name is a nice thing to have, it makes your life easier as an attacker, but many attacks need no knowledge of this. A common attack is to force a login as an administrator by injecting additional clauses in the WHERE condition, the table name is irrelevant.

SQL parameterization: How does this work behind the scenes?

SQL parameterization is a hot topic nowadays, and for a good reason, but does it really do anything besides escaping decently?
I could imagine a parameterization engine simply making sure the data is decently escaped before inserting it into the query string, but is that really all it does? It would make more sense to do something differently in the connection, e.g. like this:
> Sent data. Formatting: length + space + payload
< Received data
-----
> 69 SELECT * FROM `users` WHERE `username` LIKE ? AND `creation_date` > ?
< Ok. Send parameter 1.
> 4 joe%
< Ok. Send parameter 2.
> 1 0
< Ok. Query result: [...]
This way would simply eliminate the issue of SQL injections, so you wouldn't have to avoid them through escaping. The only other way I can think of how parameterization might work, is by escaping the parameters:
// $params would usually be an argument, not in the code like this
$params = ['joe%', 0];
// Escape the values
foreach ($params as $key=>$value)
$params[$key] = mysql_real_escape_string($value);
// Foreach questionmark in the $query_string (another argument of the function),
// replace it with the escaped value.
$n = 0;
while ($pos = strpos($query_string, "?") !== false && $n < count($params)) {
// If it's numeric, don't use quotes around it.
$param = is_numeric($params[$n]) ? $params[$n] : "'" . $params[$n] . "'";
// Update the query string with the replaced question mark
$query_string = substr($query_string, 0, $pos) //or $pos-1? It's pseudocode...
. $param
. substr($query_string, $pos + 1);
$n++;
If the latter is the case, I'm not going to switch my sites to parameterization just yet. It has no advantage that I can see, it's just another strong vs weak variable typing discussion. Strong typing may catch more errors in compiletime, but it doesn't really make anything possible that would be hard to do otherwise - same with this parameterization. (Please correct me if I'm wrong!)
Update:
I knew this would depend on the SQL server (and also on the client, but I assume the client uses the best possible techniques), but mostly I had MySQL in mind. Answers concerning other databases are (and were) also welcome though.
As far as I understand the answers, parameterization does indeed do more than simply escaping the data. It is really sent to the server in a parameterized way, so with variables separated and not as a single query string.
This also enables the server to store and reuse the query with different parameters, which provides better performance.
Did I get everything? One thing I'm still curious about is whether MySQL has these features, and if query reusage is automatically done (or if not, how this can be done).
Also, please comment when anyone reads this update. I'm not sure if it bumps the question or something...
Thanks!
I'm sure that the way that your command and parameters are handled will vary depending on the particular database engine and client library.
However, speaking from experience with SQL Server, I can tell you that parameters are preserved when sending commands using ADO.NET. They are not folded into the statement. For example, if you use SQL Profiler, you'll see a remote procedure call like:
exec sp_executesql N'INSERT INTO Test (Col1) VALUES (#p0)',N'#p0 nvarchar(4000)',#p0=N'p1'
Keep in mind that there are other benefits to parameterization besides preventing SQL injection. For example, the query engine has a better chance of reusing query plans for parameterized queries because the statement is always the same (just the parameter values change).
In response to update:
Query parameterization is so common I would expect MySQL (and really any database engine) to handle it similarly.
Based on the MySQL protocol documentation, it looks like prepared statements are handled using COM_PREPARE and COM_EXECUTE packets, which do support separate parameters in binary format. It's not clear if all parameterized statements will be prepared, but it does look like unprepared statements are handled by COM_QUERY which has no mention of parameter support.
When in doubt: test. If you really want to know what's sent over the wire, use a network protocol analyzer like Wireshark and look at the packets.
Regardless of how it's handled internally and any optimizations it may or may not currently provide for a given engine, there's very little (nothing?) to gain from not using parameters.
Parameterized query are passed to SQL implementation as parameterized query, the parameters are never concatenated to the query itself unless an implementation decided to fallback to concatenating themselves. Parameterized query avoids the need for escaping, and improves performance since the query is generic and it is more likely that a compiled form of the query is already cached by the database server.
The straight answer is "it's implemented whatever way it's implemented in the particular implementation in question". There's dozens of databases, dozens of access layers and in some cases more than one way for the same access layer to deal with the same code.
So, there isn't a single correct answer here.
One example would be that if you use Npgsql with a query that isn't a prepared statement, then it pretty much just escapes things correctly (though escaping in Postgresql has some edge cases that people who know about escaping miss, and Npgsql catches them all, so still a gain). With a prepared statement, it sends parameters as prepared-statment parameters. So one case allows for greater query-plan reuse than another.
The SQLServer driver for the same framework (ADO.NET) passes queries through as calls to sp_executesql, which allows for query-plan re-use.
As well as that, the matter of escaping is still worth considering for a few reasons:
It's the same code each time. If you're escaping yourself, then either you're doing so through the same piece of code each time (so it's not like there's any downside to using someone else's same piece of code), or you're risking a slip-up each time.
They're also better at not escaping. There's no point going through every character in the string representation of a number looking for ' characters, for example. But does not escaping count as a needless risk, or a reasonable micro-optimisation.
Well, "reasonable micro-optimisation" in itself means one of two things. Either it requires no mental effort to write or to read for correctness afterwards (in which case you might as well), or it's hit frequently enough that tiny savings will add up, and it's easily done.
(Relatedly, it also makes more sense to write a highly optimised escaper - the sort of string replacement involved is the sort of case where the most common approach of replacing isn't as fast as some other approaches in some languages at least, but the optimisation only makes sense if the method will be called a very large number of times).
If you've a library that includes type checking the parameter (either in basing the format used on the type, or by validation, both of which are common with such code), then it's easy to do and since these libraries aim at mass use, it's a reasonable micro-opt.
If you're thinking each time about whether parameter number 7 of an 8-parameter call could possibly contain a ' character, then it's not.
They're also easier to translate to other systems if you want. To again look at the two examples I gave above, apart from the classes created, you can use pretty much identical code with System.Data.SqlClient as with Npgsql, though SQL-Server and Postgresql have different escaping rules. They also have an entirely different format for binary strings, date-times and a few other datatypes they have in common.
Also, I can't really agree with calling this a "hot topic". It's had a well-established consensus for well over a decade at the very least.

Should I sanitize inputs to a parametrized query?

I have a couple of basic questions on parametrized queries
Consider this code:
$id = (int)$_GET['id'];
mysql_query("UPDATE table SET field=1 WHERE id=".$id);
Now the same thing using a parametrized query
$sql = "UPDATE table SET field=1 WHERE id=?";
$q = $db->prepare($sql);
$q->execute(array($_GET['id']));
My questions are:
is there any situation where the first code (i.e. with the (int) cast) is unsafe?
is the second piece of code OK or should I also cast $_GET['id'] to int?
is there any known vulnerability of the second piece of code? That is, is there any way an SQL attack can be made if I am using the second query?
is there any situation where the first code (i.e. with the (int) cast) is unsafe?
I'm not a PHP expert, but I think there shouldn't be. That's not to say that PHP doesn't have bugs (either known or yet to be discovered) that could be exploited here.
is the second piece of code OK or should I also cast $_GET['id'] to int?
Likewise, the second piece of code should be absolutely fine - even if the data type was a string, MySQL would know not to evaluate it for SQL as it's a parameter and therefore only to be treated as a literal value. However, there's certainly no harm in also performing the cast (which would avoid any flaws in MySQL's handling of parameters) - I'd recommend doing both.
EDIT - #Tomalak makes a very good point about cast potentially resulting in incorrect data and suggests first verifying your inputs with sanity checks such as is_numeric(); I agree wholeheartedly.
is there any known vulnerability of the second piece of code? That is, is there any way an SQL attack can be made if I am using the second query?
Not to my knowledge.
(int) will yield 0 when the conversion fails. This could lead to updating the wrong record. Besides, it's sloppy and an open invitation to "forget" proper type casting when the query gets more complex later-on.
It's safe in its current form (against SQL injection, not against updating the wrong record) but I'd still not recommend it. Once the query gets more complex you're bound to use prepared statements anyway, so just do it right from the start - also for the sake of consistency.
That's sloppy, too. The parameter will be transferred to the DB as a string and the DB will try to cast it. It's safe (against SQL injection), but unless you know exactly how the DB server reacts when you pass invalid data, you should sanitize the value up-front (is_numeric() and casting).
No. (Unless there is a bug in PDO, that is.)
As a rule of thumb:
Don't pass unchecked data to the database and expect the right thing to happen.
Don't knowingly pass invalid data and trust that the other system reacts in a certain way. Do sanity checks and error handling yourself.
Don't make "Oh, that converts to 0 and I don't have a record with ID 0 anyway so that's okay." part of your thought process.