Does JSON.stringify a string protect against (My)SQL injection? - mysql

I've run across some node.js code that gets a user-supplied string, calls JSON.stringify(str) and injects the value directly into an SQL statement.
e.g.
var x = JSON.stringify(UNSAFE_USER_STRING);
mysql_execute('UPDATE foo SET v = ' + x + ' WHERE id = 1');
Obviously this is an abuse of JSON.stringify, however this is not my code and the authors would like to see an attack vector before they patch it. Because UNSAFE_USER_STRING is a string, not an object and does escaping of the obvious " and \ it's not obvious if there is a serious problem
Is this code safe? And if not, could someone demonstrate what would be unsafe input?
Thanks!

If you are sure x is a string, then I'm 99% sure this makes it impossible to conduct an SQL injection attack. My confidence goes down to 90% when you are unsure of the type for x. That said, considering all of the following should not pose a vulnerability:
Null, NaN, Infinity, -Infinity all seem to come back as null which is safe.
Undefined comes back as the value undefined, not a string, so I'm not sure about that. I think it would just be considered invalid SQL rather than pose a vulnerability.
Date in node.js JSON.stringify(new Date()) returns '"2015-11-09T18:53:46.198Z"' which is exactly what you'd want.
Arrays and Objects should result in invalid SQL although a smart conversion could enable successful use of SQL arrays. That said, there might be some tricky way to fill the array with Objects that might cause a vulnerability, but I doubt it.
Hex seems to just convert it to an integer.
Buffers and Uint8Arrays seem to come back as objects. Again, there might be some way to populate the Object with something that would be a vulnerability, but I doubt it.

Even if characters like " are being escaped. Character(combinations) used for comments like -- or # could still cause the WHERE clause to be ignored.

Related

Restrict text field to mathematical expressions - Convert String to mathematical expression

How can I restrict the input of a TextFieldsuch that it can only contain mathematical expressions?
Accepted inputs would be:
"3+5"
"-5 + 6"
"3/2(6*4)"
"6--5"
"+5-3"
etc..
And rejected inputs would be:
"5+++3"
"6(7)"
"6-6-+-7"
and so on.
Basically; the syntax I want it to be restricted to is the kind of syntax that programming languages normally use for evaluating mathematical expressions, kinda like the syntax input you'd expect from your everyday calculator.
I'm making a program in which I want the user to be able to input numbers and/or calculations into a text box, instead of having to use a calculator to do it and then arduously type out a number with 7 decimal places.
I've done a little look around and I've seen a lot of stuff involving Regex, postfix, BNF, and the like. A lot of it looked very complicated, too complex for my understanding, and none of it had anything to do with AS3.
However, I've had a thought about making this problem a whole lot simpler by just converting the string into a mathematical expression that AS3 can understand, and let Flash handle the errors using try catch, but I don't know how to do that either (Number("3+5")resulted to NaN).
I'm currently restricting text input to just numbers using Event.CHANGE, like this:
function Restrict(event:Event):void
{
if (event.currentTarget.text.indexOf(".") == -1)
{
event.currentTarget.restrict = "0-9.";
}
else
{
event.currentTarget.restrict = "0-9";
}
}
and it's seemed to work well so far.
I intend to implement this new restriction in this manner, but if there is a much more efficient way of restricting text input, please feel free to include it in a response.
Just to reiterate for clarity, I am asking how to implement functionality that will enable someone to input a mathematical expression into a TextField, and the program will register that input as an expression and calculate it.
Thanks for reading.
EDIT: I've done a bit more research and I've stumbled upon a Reverse Polish Notation calculator/parser/utility class/library/thingy that looks very useful. Seems kinda similar to the Executer class that fsbmain mentioned, but it looks a lot simpler to use and easier for me to understand.
However the problem still remains that I'd have to find an efficient way to restrict the syntax of user input to mathematical expressions, but at least now I have at least two ways of converting the string into a number for calculations.
That's quite a tough question actually, even definition for valid mathematical expressions which you mentioned is very complicated itself, i.e. expression 6-6-+-7 is a valid from as3 syntax point of view and gives result 7.
Regarding second part of your question:
converting the string into a mathematical expression that AS3 can understand
That's not possible to do with only native as3 means since eval-like functions are gone since as2 time, but you can try to use some as3-written syntax translator, i.e. Executer from flash-console project:
var exec:Executer = new Executer();
var res:* = exec.exec(this, "6-7");
trace("exec = " + res); //output "-1"
Although it's failed with some complex expressions from your question:
var exec:Executer = new Executer();
var res:* = exec.exec(this, "6-6-+-7");
trace("exec = " + res); //output "- 7"

Hashed password must be sanitized?

It's just a curiosity. If you encrypt a password (using sha1 or other methods) before inserting it in a query, it must be anyway sanitized? Or the hash's result is always safe?
This simple code are safe?
$salt = "123xcv";
$password = $_POST['password'];
$password = sha1($password+$salt);
$query = "select * from user where password='$password'";
Unless you validated the input somehow you shouldn't assume that it will always return a safe output because functions such as SHA1 can return error values if given unexpected input. For example:
echo '<?php echo sha1(''); ?>' | php
Warning: sha1() expects at least 1 parameter, 0 given in - on line 1
And this output obviously violates the assumption that "it's always a hex string". Other hashing functions in other languages can present yet another behaviour.
Apart from that, the above password hashing code scheme ($password = sha1($password+$salt);) is very weak (see why) and I would strongly recommend not using it even in an example as someone is eventually guaranteed to find it on StackOverflow and use in production.
Also, as already noted above, building SQL queries by concatenating strings is also a bad practice and can lead to security issues in future: today the only parameter in the query will be the password, tomorrow someone decides to add some other option and I bet they won't rewrite the query but just use the template that is already there...
This sql injection question question is asked out of a common delusion.
In fact, there is no such thing like "sanitization" at all, nor any function to perform such non-existent task. As well as there is no "safe" or "unsafe" data. Every data is "safe", as long as you're following simple rules.
Not to mention that a programmer have a lot more important things to keep in mind, other than if some particular piece of data is "safe" in some particular context.
What you really need, is to avoid raw SQL for such silly queries at all, using an ORM to run SQL for you. While in such rare cases when you really need to run a complex query, you have to use placeholders to substitute every variable in your query.
From the documentation:
The value is returned as a string of 40 hex digits, or NULL if the argument was NULL.
Assuming you have a large enough varchar column, you have no sanitization to do.
This being said, it's always cleaner to use prepared statements, there's no reason to just concat strings to build queries.

How should substring() work?

I do not understand why Java's [String.substring() method](http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html#substring(int,%20int%29) is specified the way it is. I can't tell it to start at a numbered-position and return a specified number of characters; I have to compute the end position myself. And if I specify an end position beyond the end of the String, instead of just returning the rest of the String for me, Java throws an Exception.
I'm used to languages where substring() (or substr()) takes two parameters: a start position, and a length. Is this objectively better than the way Java does it, and if so, can you prove it? What's the best language specification for substring() that you have seen, and when if ever would it be a good idea for a language to do things differently? Is that IndexOutOfBoundsException that Java throws a good design idea, or not? Does all this just come down to personal preference?
There are times when the second parameter being a length is more convenient, and there are times when the second parameter being the "offset to stop before" is more convenient. Likewise there are times when "if I give you something that's too big, just go to the end of the string" is convenient, and there are times when it indicates a bug and should really throw an exception.
The second parameter being a length is useful if you've got a fixed length of field. For instance:
// C#
String guid = fullString.Substring(offset, 36);
The second parameter being an offset is useful if you're going up to another delimited:
// Java
int nextColon = fullString.indexOf(':', start);
if (start == -1)
{
// Handle error
}
else
{
String value = fullString.substring(start, nextColon);
}
Typically, the one you want to use is the opposite to the one that's provided on your current platform, in my experience :)
I'm used to languages where
substring() (or substr()) takes two
parameters: a start position, and a
length. Is this objectively better
than the way Java does it, and if so,
can you prove it?
No, it's not objectively better. It all depends on the context in which you want to use it. If you want to extract a substring of a specific length, it's bad, but if you want to extract a substring that ends at, say, the first occurrence of "." in the string, it's better than if you first had to compute a length. The question is: which requirement is more common? I'd say the latter. Of course, the best solution would be to have both versions in the API, but if you need the length-based one all the time, using a static utility method isn't that horrible.
As for the exception, yeah, that's definitely good design. You asked for something specific, and when you can't get that specific thing, the API should not try to guess what you might have wanted instead - that way, bugs become apparent more quickly.
Also, Java DOES have an alternative substring() method that returns the substring from a start index until the end of the string.
second parameter should be optional, first parameter should accept negative values..
If you leave off the 2nd parameter it will go to the end of the string for you without you having to compute it.
Having gotten some feedback, I see when the second-parameter-as-index scenario is useful, but so far all of those scenarios seem to be working around other language/API limitations. For example, the API doesn't provide a convenient routine to give me the Strings before and after the first colon in the input String, so instead I get that String's index and call substring(). (And this explains why the second position parameter in substr() overshoots the desired index by 1, IMO.)
It seems to me that with a more comprehensive set of string-processing functions in the language's toolkit, the second-parameter-as-index scenario loses out to second-parameter-as-length. But somebody please post me a counterexample. :)
If you store this away, the problem should stop plaguing your dreams and you'll finally achieve a good night's rest:
public String skipsSubstring(String s, int index, int length) {
return s.subString(index, index+length);
}

Are hard-coded STRINGS ever acceptable?

Similar to Is hard-coding literals ever acceptable?, but I'm specifically thinking of "magic strings" here.
On a large project, we have a table of configuration options like these:
Name Value
---- -----
FOO_ENABLED Y
BAR_ENABLED N
...
(Hundreds of them).
The common practice is to call a generic function to test an option like this:
if (config_options.value('FOO_ENABLED') == 'Y') ...
(Of course, this same option may need to be checked in many places in the system code.)
When adding a new option, I was considering adding a function to hide the "magic string" like this:
if (config_options.foo_enabled()) ...
However, colleagues thought I'd gone overboard and objected to doing this, preferring the hard-coding because:
That's what we normally do
It makes it easier to see what's going on when debugging the code
The trouble is, I can see their point! Realistically, we are never going to rename the options for any reason, so about the only advantage I can think of for my function is that the compiler would catch any typo like fo_enabled(), but not 'FO_ENABLED'.
What do you think? Have I missed any other advantages/disadvantages?
If I use a string once in the code, I don't generally worry about making it a constant somewhere.
If I use a string twice in the code, I'll consider making it a constant.
If I use a string three times in the code, I'll almost certainly make it a constant.
if (config_options.isTrue('FOO_ENABLED')) {...
}
Restrict your hard coded Y check to one place, even if it means writing a wrapper class for your Map.
if (config_options.isFooEnabled()) {...
}
Might seem okay until you have 100 configuration options and 100 methods (so here you can make a judgement about future application growth and needs before deciding on your implementation). Otherwise it is better to have a class of static strings for parameter names.
if (config_options.isTrue(ConfigKeys.FOO_ENABLED)) {...
}
I realise the question is old, but it came up on my margin.
AFAIC, the issue here has not been identified accurately, either in the question, or the answers. Forget about 'harcoding strings" or not, for a moment.
The database has a Reference table, containing config_options. The PK is a string.
There are two types of PKs:
Meaningful Identifiers, that the users (and developers) see and use. These PKs are supposed to be stable, they can be relied upon.
Meaningless Id columns which the users should never see, that the developers have to be aware of, and code around. These cannot be relied upon.
It is ordinary, normal, to write code using the absolute value of a meaningful PK IF CustomerCode = "IBM" ... or IF CountryCode = "AUS" etc.
referencing the absolute value of a meaningless PK is not acceptable (due to auto-increment; gaps being changed; values being replaced wholesale).
.
Your reference table uses meaningful PKs. Referencing those literal strings in code is unavoidable. Hiding the value will make maintenance more difficult; the code is no longer literal; your colleagues are right. Plus there is the additional redundant function that chews cycles. If there is a typo in the literal, you will soon find that out during Dev testing, long before UAT.
hundreds of functions for hundreds of literals is absurd. If you do implement a function, then Normalise your code, and provide a single function that can be used for any of the hundreds of literals. In which case, we are back to a naked literal, and the function can be dispensed with.
the point is, the attempt to hide the literal has no value.
.
It cannot be construed as "hardcoding", that is something quite different. I think that is where your issue is, identifying these constructs as "hardcoded". It is just referencing a Meaningfull PK literally.
Now from the perspective of any code segment only, if you use the same value a few times, you can improve the code by capturing the literal string in a variable, and then using the variable in the rest of the code block. Certainly not a function. But that is an efficiency and good practice issue. Even that does not change the effect IF CountryCode = #cc_aus
I really should use constants and no hard coded literals.
You can say they won't be changed, but you may never know. And it is best to make it a habit. To use symbolic constants.
In my experience, this kind of issue is masking a deeper problem: failure to do actual OOP and to follow the DRY principle.
In a nutshell, capture the decision at startup time by an appropriate definition for each action inside the if statements, and then throw away both the config_options and the run-time tests.
Details below.
The sample usage was:
if (config_options.value('FOO_ENABLED') == 'Y') ...
which raises the obvious question, "What's going on in the ellipsis?", especially given the following statement:
(Of course, this same option may need to be checked in many places in the system code.)
Let's assume that each of these config_option values really does correspond to a single problem domain (or implementation strategy) concept.
Instead of doing this (repeatedly, in various places throughout the code):
Take a string (tag),
Find its corresponding other string (value),
Test that value as a boolean-equivalent,
Based on that test, decide whether to perform some action.
I suggest encapsulating the concept of a "configurable action".
Let's take as an example (obviously just as hypthetical as FOO_ENABLED ... ;-) that your code has to work in either English units or metric units. If METRIC_ENABLED is "true", convert user-entered data from metric to English for internal computation, and convert back prior to displaying results.
Define an interface:
public interface MetricConverter {
double toInches(double length);
double toCentimeters(double length);
double toPounds(double weight);
double toKilograms(double weight);
}
which identifies in one place all the behavior associated with the concept of METRIC_ENABLED.
Then write concrete implementations of all the ways those behaviors are to be carried out:
public class NullConv implements MetricConverter {
double toInches(double length) {return length;}
double toCentimeters(double length) {return length;}
double toPounds(double weight) {return weight;}
double toKilograms(double weight) {return weight;}
}
and
// lame implementation, just for illustration!!!!
public class MetricConv implements MetricConverter {
public static final double LBS_PER_KG = 2.2D;
public static final double CM_PER_IN = 2.54D
double toInches(double length) {return length * CM_PER_IN;}
double toCentimeters(double length) {return length / CM_PER_IN;}
double toPounds(double weight) {return weight * LBS_PER_KG;}
double toKilograms(double weight) {return weight / LBS_PER_KG;}
}
At startup time, instead of loading a bunch of config_options values, initialize a set of configurable actions, as in:
MetricConverter converter = (metricOption()) ? new MetricConv() : new NullConv();
(where the expression metricOption() above is a stand-in for whatever one-time-only check you need to make, including looking at the value of METRIC_ENABLED ;-)
Then, wherever the code would have said:
double length = getLengthFromGui();
if (config_options.value('METRIC_ENABLED') == 'Y') {
length = length / 2.54D;
}
// do some computation to produce result
// ...
if (config_options.value('METRIC_ENABLED') == 'Y') {
result = result * 2.54D;
}
displayResultingLengthOnGui(result);
rewrite it as:
double length = converter.toInches(getLengthFromGui());
// do some computation to produce result
// ...
displayResultingLengthOnGui(converter.toCentimeters(result));
Because all of the implementation details related to that one concept are now packaged cleanly, all future maintenance related to METRIC_ENABLED can be done in one place. In addition, the run-time trade-off is a win; the "overhead" of invoking a method is trivial compared with the overhead of fetching a String value from a Map and performing String#equals.
I believe that the two reasons you have mentioned, Possible misspelling in string, that cannot be detected until run time and the possibility (although slim) of a name change would justify your idea.
On top of that you can get typed functions, now it seems you only store booleans, what if you need to store an int, a string etc. I would rather use get_foo() with a type, than get_string("FOO") or get_int("FOO").
I think there are two different issues here:
In the current project, the convention of using hard-coded strings is already well established, so all the developers working on the project are familiar with it. It might be a sub-optimal convention for all the reasons that have been listed, but everybody familiar with the code can look at it and instinctively knows what the code is supposed to do. Changing the code so that in certain parts, it uses the "new" functionality will make the code slightly harder to read (because people will have to think and remember what the new convention does) and thus a little harder to maintain. But I would guess that changing over the whole project to the new convention would potentially be prohibitively expensive unless you can quickly script the conversion.
On a new project, symbolic constants are the way IMO, for all the reasons listed. Especially because anything that makes the compiler catch errors at compile time that would otherwise be caught by a human at run time is a very useful convention to establish.
Another thing to consider is intent. If you are on a project that requires localization hard coded strings can be ambiguous. Consider the following:
const string HELLO_WORLD = "Hello world!";
print(HELLO_WORLD);
The programmer's intent is clear. Using a constant implies that this string does not need to be localized. Now look at this example:
print("Hello world!");
Here we aren't so sure. Did the programmer really not want this string to be localized or did the programmer forget about localization while he was writing this code?
I too prefer a strongly-typed configuration class if it is used through-out the code. With properly named methods you don't lose any readability. If you need to do conversions from strings to another data type (decimal/float/int), you don't need to repeat the code that does the conversion in multiple places and can cache the result so the conversion only takes place once. You've already got the basis of this in place already so I don't think it would take much to get used to the new way of doing things.

Should functions that only output return anything?

I'm rewriting a series of PHP functions to a container class. Many of these functions do a bit of processing, but in the end, just echo content to STDOUT.
My question is: should I have a return value within these functions? Is there a "best practice" as far as this is concerned?
In systems that report errors primarily through exceptions, don't return a return value if there isn't a natural one.
In systems that use return values to indicate errors, it's useful to have all functions return the error code. That way, a user can simply assume that every single function returns an error code and develop a pattern to check them that they follow everywhere. Even if the function can never fail right now, return a success code. That way if a future change makes it possible to have an error, users will already be checking errors instead of implicitly silently ignoring them (and getting really confused why the system is behaving oddly).
Can the processing fail? If so, should the caller know about that? If either of these is no, then I don't see value in a return. However, if the processing can fail, and that can make a difference to the caller, then I'd suggest returning a status or error code.
Do not return a value if there is no value to return. If you have some value you need to convey to the caller, then return it but that doesn't sound like the case in this instance.
I will often "return: true;" in these cases, as it provides a way to check that the function worked. Not sure about best practice though.
Note that in C/C++, the output functions (including printf()) return the number of bytes written, or -1 if this fails. It may be worth investigating this further to see why it's been done like this. I confess that
I'm not sure that writing to stdout could practically fail (unless you actively close your STDOUT stream)
I've never seen anyone collect this value, let alone do anything with it.
Note that this is distinct from writing to file streams - I'm not counting stream redirection in the shell.
To do the "correct" thing, if the point of the method is only to print the data, then it shouldn't return anything.
In practice, I often find that having such functions return the text that they've just printed can often be useful (sometimes you also want to send an error message via email or feed it to some other function).
In the end, the choice is yours. I'd say it depends on how much of a "purist" you are about such things.
You should just:
return;
In my opinion the SRP (single responsibility principle) is applicable for methods/functions as well, and not only for objects. One method should do one thing, if it outputs data it shouldn't do any data processing - if it doesn't do processing it shouldn't return data.
There is no need to return anything, or indeed to have a return statement. It's effectively a void function, and it's comprehensible enough that these have no return value. Putting in a 'return;' solely to have a return statement is noise for the sake of pedantry.