Table name changing to avoid SQL injection attack - mysql

I understand the basic process of SQL injection attack. My question is related to SQL injection prevention. I was told that one way to prevent such an attack is by frequently changing the table name! Is that possible?
If so, can someone provide me a link to read about it more because I couldn't find an explanation about it on the web.

No. That makes no sense. You'd either have to change every line of code that references the table or you'd have to leave in place something like a view with the old table name that acts exactly like the old table. No reasonable person would do that. Plus, it's not like there are a ton of reasonable names for tables so you'd be doing crazy things like saying table A stores customer data and AA stores employer data and AAA was the intersection between customers and employers.
SQL injection is almost comically simple to prevent. Use prepared statements with bind variables. Don't dynamically build SQL statements. Done. Of course, in reality, making sure that the new developer doesn't violate this dictum either because they don't know any better or because they can hack something out in a bit less time if they just do a bit of string concatenation makes it a bit more complex. But the basic approach is very simple.

Pffft. What? Frequently changing a table name?
That's bogus advice, as far as "preventing SQL Injection".
The only prevention for SQL Injection vulnerabilities is to write code that isn't vulnerable. And in the vast majority of cases, that is very easy to do.
Changing table names doesn't do anything to close a SQL Injection vulnerability. It might make a successful attack vector less repeatable, requiring an attacker to make some adjustments. But it does nothing prevent SQL Injection.
As a starting point for research on SQL Injection, I recommend OWASP (Open Web Application Security Project)
Start here: https://www.owasp.org/index.php/SQL_Injection
If you run across "changing a table name" as a mitigation, let me know. I've never run across that as a prevention or mitigation for SQL Injection vulnerability.

Here's things you can do to prevent SQL injection:
Use an ORM that encapsulates your SQL calls and provides a friendly layer to your database records. Most of these are very good at writing high quality queries and protecting you from injection bugs simply because of how you use them.
Use prepared statements with placeholder values whenever possible. Write queries like this:
INSERT INTO table_name (name, age) VALUES (:name, :age)
Be very careful to properly escape any and all values that are inserted into SQL though any other method. This is always a risky thing to do, so any code you do write like this should have any escaping you do made blindingly obvious so that a quick code review can verify it's working properly. Never hide escaping behind abstractions or methods with cute names like scrub or clean. Those methods might be subtly broken and you'd never notice.
Be absolutely certain any table name parameters, if dynamic, are tested versus a white list of known-good values. For example, if you can create records of more than one type, or put data into more than one table ensure that the parameter supplied is valid.
Trust nothing supplied by the user. Presume every single bit of data is tainted and hostile unless you've taken the trouble to clean it up. This goes doubly for anything that's in your database if you got your database from some other source, like inheriting a historical project. Paranoia is not unfounded, it's expected.
Write your code such that deleting a line does not introduce a security problem. That means never doing this:
$value = $db->escaped(value);
$db->query("INSERT INTO table (value) VALUES ('$value')");
You're one line away from failure here. If you must do this, write it like so:
$value_escaped = $db->escaped(value);
$db->query("INSERT INTO table (value) VALUES ('$value_escaped')");
That way deleting the line that does the escaping does not immediately cause an injection bug. The default here is to fail safely.
Make every effort to block direct access to your database server by aggressively firewalling it and restricting access to those that actually need access. In practice this means blocking port 3306 and using SSH for any external connections. If you can, eliminate SSH and use a secured VPN to connect to it.
Never generate errors which spew out stack traces that often contain information highly useful to attackers. For example, an error that includes a table name, a script path, or a server identifier is providing way too much information. Have these for development, and ensure these messages are suppressed on production servers.
Randomly changing table names is utterly pointless and will make your code a total nightmare. It will be very hard to keep all your code in sync with whatever random name the table is assuming at any particular moment. It will also make backing up and restoring your data almost impossible without some kind of decoder utility.
Anyone who recommends doing this is proposing a pointless and naïve solution to a an already solved problem.
Suggesting that randomly changing the table names fixes anything demonstrates a profound lack of understanding of the form SQL injection bugs take. Knowing the table name is a nice thing to have, it makes your life easier as an attacker, but many attacks need no knowledge of this. A common attack is to force a login as an administrator by injecting additional clauses in the WHERE condition, the table name is irrelevant.

Related

SSIS Data Flow: duplicated rule problem after lookup

I have a data flow that I need to get a column value from 'SQL tableA' and do a lookup task in 'SQL tableB' using this column value. If the lookup found a connection between the two tables, I need to get the value of another column from 'SQL tableA' and put this value in 'SQL tableC'( the table that will be persisted ). If lookup fail, this column value will be NULL.
My problem: After this behavior above, the rest of my flow is the same. So I have two duplicated equal flows below lookup. And this is terrible for readability and maintenance.
What do I can do to resolve this situation with little performance loss?
The data model is legacy, so change the data model is impossible.
Best Regards,
Luis
The way I see it, there are really three options:
Use UNION ALL and possibly sacrifice performance for modularity. There may in fact be no performance issue. You should test and see
If possible, implement all of this in a stored procedure. You can implement code reuse there and it will quite possibly run much faster
Build a custom transformation component that implements those last three steps.
This option appeals to all programmers but may have the worst performance and in my opinion will just cause issues down the track. If you're writing reams of C# code inside SSIS then you'll eventually reach a point where it's easier to just build a standalone app.
It would be much easier to answer if you explained
What you're really doing
slowly changing dimension?
data cleansing?
adding reference data?
spamming
What are those three activities?
sending an email?
calling a web service?
calling some other API?
What your constraints are
Is all of this data on one server and can you create stored procs and tables?

What pattern to check on an SQL query for possible injection?

I want to detect possible SQL injection atack by checking the SQL query. I am using PDO and prepared statement, so hopefully I am not in the danger of getting attacked by someone. However, what I want to detect is the possibility of input/resulting query string that may become a dangerous query. For example, my app--properly--will never generate "1=1" query, so I may check the generated query string for that, and flag the user/IP producing that query. Same thing with "drop table", but maybe I can check only by looping the input array; or maybe I should just check to the generated query all over again. I am using MySQL, but pattern for other drivers are also appreciated.
I have read RegEx to Detect SQL Injection and some of the comments are heading in this direction. To my help, I'm developing for users that rarely use English as input, so a simple /drop/ match on the query may be enough to log the user/query for further inspection. Some of the pattern I found while researching SQL injection are:
semicolon in the middle of sentence -- although this may be common
double dash/pound sign for commenting the rest of the query
using quote in the beginning & ending of value
using hex (my target users have small to low chance for inputting 0x in their form)
declare/exec/drop/1=1 (my app should not generate these values)
html tag (low probability coming from intended user/use case)
etc.
All of the above are easier to detect by looping the input values before the query string is generated because they haven't been escaped. But how much did I miss? (a lot, I guess) Any other obscure pattern I should check? What about checking the generated query? Any pattern that may emerge?
tl;dr: What pattern to match an SQL query (MySQL) to check for possible injection? I am using PDO with prepared statement and value binding, so the check is for logging/alert purposes.
In my shop we have two rules.
Always use parameters in SQL queries.
If for some reason you can't follow rule one, then every piece of data put into a query must be sanitized, either with intval() for integer parameters or an appropriate function to sanitize a string variable according to its application data type. For example, a personal name might be Jones or O'Brien or St. John-Smythe but will never have special characters other than apostrophe ', hyphen -, space, or dot. A product number probably contains only letters or numbers. And so forth.
If 2 is too hard follow rule 1.
We inspect code to make sure we're doing these things.
But how much did I miss?
You guess right. Creating a huge blacklist wouldn't make your code immune. This approach is history. The other questions follow the same idea.
Your best bets are:
Validating input data (input doesn't necessarily come from an external party)
Using prepared statements.
Few steps but bulletproof.
Not possible.
You will spend the rest of your life in an armament race -- you build a defense, they build a better weapon, then you build a defense against that, etc, etc.
It is probably possible to write a 'simple' SELECT that will take 24 hours to run.
Unless you lock down the tables, they can look, for example, at the encrypted passwords and re-attack with a root login.
If you allow any type of string, it will be a challenge to handle the various combinations of quoting.
There are nasty things that can be done with semi-valid utf8 strings.
And what about SET statements.
And LOAD DATA.
And Stored procs.
Instead, decide on the minimal set of queries you allow, then parameterize that so you can check, or escape, the pieces individually. Then build the query.

What, exactly, does allowMultiQueries do?

Adding allowMultiQueries=true to the JDBC string makes MySQL accept Statements with multiple queries.
But what exactly does this do? Is there any benefit to this?
Perhaps it reduces the delay due to round trips? Something like
LOCK
UPDATE ...
UNLOCK
which if done in one statements holds the lock for less time.
When, if ever, would I want to combine queries in a single Statement, rather than in separate ones?
For running safe scripts of your own creation that otherwise would need to be run line by line. For instance, a script from mysqldump, or one that you would have run anyway, safely and trusted. This was pointed out to me once by someone when I asked "why would you want to do that?" He responded, his stockpile of scripts, of his own, each of which has no user input for tomfoolery and the potential of sql injection. The size of these routines is limited by max_allowed_packet and the strategy would be, of course, reading the file into your buffer, and using that for the query in a Multi.
For running a few statements in concert where one relies on the other in the transient nature of a call. Transient meaning that had you issued a subsequent call not via a Multi, that the necessary information is no longer available for a piece of it. A common example often given, wise or not, is the duo of SQL_CALC_FOUND_ROWS and FOUND_ROWS() which popularly was debunked in the Percona article To SQL_CALC_FOUND_ROWS or not to SQL_CALC_FOUND_ROWS?. There is an argument to be made in that situation that a single call that not only returns the resultset but has available the count to be grabbed shortly thereafter is a wiser route for more accurate pagination routines. This assumes that a separate call for count(*) and another for the data could generate a discrepancy in multi-user concurrent systems like all of ours most likely. So, the just mentioned verbiage addresses accuracy, not performance which is what the Percona article is about. Another use-case is priming and using User-Defined Variables into queries. Many of these can be folded into the query and initialized with a cross join, however.
When, if ever, would I want to combine queries in a single Statement, rather than in separate ones?
There are two great use cases for this feature:
If you are lazy and like to blindly run queries without checking for success or row counts or auto_increment value assignment, or
If you like the idea of increasing the odds of SQL injection vulnerabilities username ='' AND 0 = 1; ← right here. With this mode inactive, anything after the injected semicolon is an error, as it should be. With this mode active, a whole world of "oops" can open right up.
What I am saying is... You're right. Don't use it.
Yes, it reduces the impact of round-trip time to the database, pipelining queries... which can be significant with a distant database... but at the cost of increased risk that isn't worth it.

SQL parameterization: How does this work behind the scenes?

SQL parameterization is a hot topic nowadays, and for a good reason, but does it really do anything besides escaping decently?
I could imagine a parameterization engine simply making sure the data is decently escaped before inserting it into the query string, but is that really all it does? It would make more sense to do something differently in the connection, e.g. like this:
> Sent data. Formatting: length + space + payload
< Received data
-----
> 69 SELECT * FROM `users` WHERE `username` LIKE ? AND `creation_date` > ?
< Ok. Send parameter 1.
> 4 joe%
< Ok. Send parameter 2.
> 1 0
< Ok. Query result: [...]
This way would simply eliminate the issue of SQL injections, so you wouldn't have to avoid them through escaping. The only other way I can think of how parameterization might work, is by escaping the parameters:
// $params would usually be an argument, not in the code like this
$params = ['joe%', 0];
// Escape the values
foreach ($params as $key=>$value)
$params[$key] = mysql_real_escape_string($value);
// Foreach questionmark in the $query_string (another argument of the function),
// replace it with the escaped value.
$n = 0;
while ($pos = strpos($query_string, "?") !== false && $n < count($params)) {
// If it's numeric, don't use quotes around it.
$param = is_numeric($params[$n]) ? $params[$n] : "'" . $params[$n] . "'";
// Update the query string with the replaced question mark
$query_string = substr($query_string, 0, $pos) //or $pos-1? It's pseudocode...
. $param
. substr($query_string, $pos + 1);
$n++;
If the latter is the case, I'm not going to switch my sites to parameterization just yet. It has no advantage that I can see, it's just another strong vs weak variable typing discussion. Strong typing may catch more errors in compiletime, but it doesn't really make anything possible that would be hard to do otherwise - same with this parameterization. (Please correct me if I'm wrong!)
Update:
I knew this would depend on the SQL server (and also on the client, but I assume the client uses the best possible techniques), but mostly I had MySQL in mind. Answers concerning other databases are (and were) also welcome though.
As far as I understand the answers, parameterization does indeed do more than simply escaping the data. It is really sent to the server in a parameterized way, so with variables separated and not as a single query string.
This also enables the server to store and reuse the query with different parameters, which provides better performance.
Did I get everything? One thing I'm still curious about is whether MySQL has these features, and if query reusage is automatically done (or if not, how this can be done).
Also, please comment when anyone reads this update. I'm not sure if it bumps the question or something...
Thanks!
I'm sure that the way that your command and parameters are handled will vary depending on the particular database engine and client library.
However, speaking from experience with SQL Server, I can tell you that parameters are preserved when sending commands using ADO.NET. They are not folded into the statement. For example, if you use SQL Profiler, you'll see a remote procedure call like:
exec sp_executesql N'INSERT INTO Test (Col1) VALUES (#p0)',N'#p0 nvarchar(4000)',#p0=N'p1'
Keep in mind that there are other benefits to parameterization besides preventing SQL injection. For example, the query engine has a better chance of reusing query plans for parameterized queries because the statement is always the same (just the parameter values change).
In response to update:
Query parameterization is so common I would expect MySQL (and really any database engine) to handle it similarly.
Based on the MySQL protocol documentation, it looks like prepared statements are handled using COM_PREPARE and COM_EXECUTE packets, which do support separate parameters in binary format. It's not clear if all parameterized statements will be prepared, but it does look like unprepared statements are handled by COM_QUERY which has no mention of parameter support.
When in doubt: test. If you really want to know what's sent over the wire, use a network protocol analyzer like Wireshark and look at the packets.
Regardless of how it's handled internally and any optimizations it may or may not currently provide for a given engine, there's very little (nothing?) to gain from not using parameters.
Parameterized query are passed to SQL implementation as parameterized query, the parameters are never concatenated to the query itself unless an implementation decided to fallback to concatenating themselves. Parameterized query avoids the need for escaping, and improves performance since the query is generic and it is more likely that a compiled form of the query is already cached by the database server.
The straight answer is "it's implemented whatever way it's implemented in the particular implementation in question". There's dozens of databases, dozens of access layers and in some cases more than one way for the same access layer to deal with the same code.
So, there isn't a single correct answer here.
One example would be that if you use Npgsql with a query that isn't a prepared statement, then it pretty much just escapes things correctly (though escaping in Postgresql has some edge cases that people who know about escaping miss, and Npgsql catches them all, so still a gain). With a prepared statement, it sends parameters as prepared-statment parameters. So one case allows for greater query-plan reuse than another.
The SQLServer driver for the same framework (ADO.NET) passes queries through as calls to sp_executesql, which allows for query-plan re-use.
As well as that, the matter of escaping is still worth considering for a few reasons:
It's the same code each time. If you're escaping yourself, then either you're doing so through the same piece of code each time (so it's not like there's any downside to using someone else's same piece of code), or you're risking a slip-up each time.
They're also better at not escaping. There's no point going through every character in the string representation of a number looking for ' characters, for example. But does not escaping count as a needless risk, or a reasonable micro-optimisation.
Well, "reasonable micro-optimisation" in itself means one of two things. Either it requires no mental effort to write or to read for correctness afterwards (in which case you might as well), or it's hit frequently enough that tiny savings will add up, and it's easily done.
(Relatedly, it also makes more sense to write a highly optimised escaper - the sort of string replacement involved is the sort of case where the most common approach of replacing isn't as fast as some other approaches in some languages at least, but the optimisation only makes sense if the method will be called a very large number of times).
If you've a library that includes type checking the parameter (either in basing the format used on the type, or by validation, both of which are common with such code), then it's easy to do and since these libraries aim at mass use, it's a reasonable micro-opt.
If you're thinking each time about whether parameter number 7 of an 8-parameter call could possibly contain a ' character, then it's not.
They're also easier to translate to other systems if you want. To again look at the two examples I gave above, apart from the classes created, you can use pretty much identical code with System.Data.SqlClient as with Npgsql, though SQL-Server and Postgresql have different escaping rules. They also have an entirely different format for binary strings, date-times and a few other datatypes they have in common.
Also, I can't really agree with calling this a "hot topic". It's had a well-established consensus for well over a decade at the very least.

Alternatives to LINQ To SQL on high loaded pages

To begin with, I LOVE LINQ TO SQL. It's so much easier to use than direct querying.
But, there's one great problem: it doesn't work well on high loaded requests. I have some actions in my ASP.NET MVC project, that are called hundreds times every minute.
I used to have LINQ to SQL there, but since the amount of requests is gigantic, LINQ TO SQL almost always returned "Row not found or changed" or "X of X updates failed". And it's understandable. For instance, I have to increase some value by one with every request.
var stat = DB.Stats.First();
stat.Visits++;
// ....
DB.SubmitChanges();
But while ASP.NET was working on those //... instructions, the stats.Visits value stored in the table got changed.
I found a solution, I created a stored procedure
UPDATE Stats SET Visits=Visits+1
It works well.
Unfortunately now I'm getting more and more moments like that. And it sucks to create stored procedures for all cases.
So my question is, how to solve this problem? Are there any alternatives that can work here?
I hear that Stackoverflow works with LINQ to SQL. And it's more loaded than my site.
This isn't exactly a problem with Linq to SQL, per se, it's an expected result with optimistic concurrency, which Linq to SQL uses by default.
Optimistic concurrency means that when you update a record, you check the current version in the database against the copy that was originally retrieved before making any offline updates; if they don't match, report a concurrency violation ("row not found or changed").
There's a more detailed explanation of this here. There's also a fairly sizable guide on handling concurrency errors. Typically the solution involves simply catching ChangeConflictException and picking a resolution, such as:
try
{
// Make changes
db.SubmitChanges();
}
catch (ChangeConflictException)
{
foreach (var conflict in db.ChangeConflicts)
{
conflict.Resolve(RefreshMode.KeepCurrentValues);
}
}
The above version will overwrite whatever is in the database with the current values, regardless of what other changes were made. For other possibilities, see the RefreshMode enumeration.
Your other option is to disable optimistic concurrency entirely for fields that you expect might be updated. You do this by setting the UpdateCheck option to UpdateCheck.Never. This has to be done at the field level; you can't do it at the entity level or globally at the context level.
Maybe I should also mention that you haven't picked a very good design for the specific problem you're trying to solve. Incrementing a "counter" by repeatedly updating a single column of a single row is not a very good/appropriate use of a relational database. What you should be doing is actually maintaining a history table - such as Visits - and if you really need to denormalize the count, implement that with a trigger in the database itself. Trying to implement a site counter at the application level without any data to back it up is just asking for trouble.
Use your application to put actual data in your database, and let the database handle aggregates - that's one of the things databases are good at.
Use a producer/consumer or message queue model for updates that don't absolutely have to happen immediately, particularly status updates. Instead of trying to update the database immediately keep a queue of updates that the asp.net threads can push to and then have a writer process/thread that writes the queue to the database. Since only one thread is writing, there will be much less contention on the relevant tables/roles.
For reads, use caching. For high volume sites even caching data for a few seconds can make a difference.
Firstly, you could call DB.SubmitChanges() right after stats.Visits++, and that would greatly reduce the problem.
However, that still is not going to save you from the concurrency violation (that is, simultaneously modifying a piece of data by two concurrent processes). To fight that, you may use the standard mechanism of transactions. With LINQ-to-SQL, you use transactions by instantiating a TransactionScope class, thusly:
using( TransactionScope t = new TransactionScope() )
{
var stats = DB.Stats.First();
stats.Visits++;
DB.SubmitChanges();
}
Update: as Aaronaught correctly pointed out, TransactionScope is not going to help here, actually. Sorry. But read on.
Be careful, though, not to make the body of a transaction too long, as it will block other concurrent processes, and thus, significantly reduce your overall performance.
And that brings me to the next point: your very design is probably flawed.
The core principle in dealing with highly shared data is to design your application in such way that the operations on that data are quick, simple, and semantically clear, and they must be performed one after another, not simultaneously.
The one operation that you're describing - counting visits - is pretty clear and simple, so it should be no problem, once you add the transaction. I must add, however, that while this will be clear, type-safe and otherwise "good", the solution with stored procedure is actually a much preferred one. This is actually exactly the way database applications were being designed in ye olden days. Think about it: why would you need to fetch the counter all the way from the database to your application (potentially over the network!) if there is no business logic involved in processing it. The database server may increment it just as well, without even sending anything back to the application.
Now, as for other operations, that are hidden behind // ..., it seems (by your description) that they're somewhat heavy and long. I can't tell for sure, because I don't see what's there, but if that's the case, you probably want to separate them into smaller and quicker ones, or otherwise rethink your design. I really can't tell anything else with this little information.