Prevent quotes to be saved in DB as HTML-entities after sanitization - mysql

This might be a stupid question but after all the research on best practices – including this great SO post that explains sanitizing, validation, escaping for storage and escaping for display – I am still confused.
I have built a routine where I sanitize user input – say, a comment post, or "edit my first name" string – with $value = filter_var($value, FILTER_SANITIZE_STRING);. Given a value of O'Hara, that gets rid of <a></a> and similar tags nicely. Then this new value gets validated: error if empty value and field is not nullable; or if too long; etc. Lastly, I save that value in the DB using a CakePHP query builder – which, of course, supports binding string values.
But when I then save that value in the DB, it is saved as O'Hara instead of O'Hara – because of said sanitization.
Am I supposed to decode it back / to yet another format? If so with which method?
Or, am I to use the sanitized version for validation but then the original value for DB stora-- that can't be it.
Or is FILTER_SANITIZE_STRING a flag I need to tweak? The tutorials I've seen [1] [2] suggest that the flag is enough.
I feel so dumb because that great post mentioned earlier seems to still not be enough for me. All I can find are posts from ~2012 that say you should bind.
Any help would be appreciated.

Related

XSS validation: Is it safe to check only if value contains <> and %3C %3E

I've been doing server side XSS validation. Here is what I found to use:
List of forbidden attributes: javascript:,mocha:,eval(,alert(,vbscript:,livescript:,expression(,url(,&{,&#,/*,*/,onclick,oncontextmenu,ondblclick,onmousedown,onmouseenter,onmouseleave,onmousemove,onmouseover,onmouseout,onmouseup,onkeydown,onkeypress,onkeyup,onblur,onchange,onfocus,onfocusin,onfocusout,oninput,oninvalid,onreset,onsearch,onselect,onsubmit,ondrag,ondragend,ondragenter,ondragleave,ondragover,ondragstart,ondrop,oncopy,oncut,onpaste,ontouchcancel,ontouchend,ontouchmove,ontouchstart,onwheel
However they seem to be too strict since some correct values such as "™" are considered as illegal if I use this list.
I'm thinking if to check only value doesn't contain any of those characters like '<', '>', "%3E", "%3C" would be safe to prevent XSS attack?
You do not need to block any of those characters, long lesson-learned you're just cutting out other language support potentially and you can work with these characters safely.
Imagine we have a chat-box, as this is where these may get used most often in a textarea. The user can pass the following <html></html> and if it wasn't handled properly every user receiving the message will open up a new HTML document inside this chat (scary).
This gets handled on the client side fortunately, some hacks to make it "text-only".
When dealing with login you could regex check username, if that passes you can then SQL search the username if match is made check if password matches too by encrypting the password just given and matching it, no match get out.
I've never needed to block characters or treat anything special knowing proper encoding/decoding (escape practices and whatnot). Maybe this will help your search.

Refactor old Web Audio API to new one

I'm evaluating HTML5 Web Audio API example and trying to get it work. Here is what I'm working with. As far as I got I understood that it's using old API and I need to refactor function refreshFilterType() on line ~590. Link - www.smartjava.org/examples/webaudio-filters/
According to Web Audio BiquadFilterNode I need to rework switch statement and make it
and to use the new string-based values. (I.e. a value of "3" - the default lowshelf filter - needs to be passed into currentFilterType as "lowshelf"). I've tried to implement new BiquadFilterNode, but still it was unsuccessfully.
Thank you in advance.
I just opened a pull request with the necessary fixes. Or look at https://github.com/cwilso/smartjava.
...and actually, I note the real problem with the code you have is not that it's using numeric values - that's wrong, according to the spec, but it's still supported - it's line 663:
filter.type = currentFilterType;
currentFilterType is "3" - that is, the STRING "3" - and type now takes a string, so it's not being coerced. If you changed this line to
filter.type = parseInt(currentFilterType);
it would actually fix the problem (because filter.type accepts ints, and will coerce them to the appropriate string of "lowpass", etc. - but it doesn't accept the string of a number.)
However, this will fail in the long term, when we remove the deprecated types.

Iterating over a string in Vimscript or Parse a JSON file

So I'm creating a vim script that needs to load and parse a JSON file into a local object graph. I searched and I couldn't find any native way to process a JSON file, and I don't want to add any dependencies to the script. So I wrote my own function to parse the JSON string (gotten from the file), but it's really slow. At the moment, I iterate through each character in the file like so:
let len = strlen(jsonString) - 1
let i = 0
while i < len
let c = strpart(jsonString, i, 1)
let i += 1
" A lot of code to process file....
" Note: I've tried short cutting the process by searching for enclosing double-quotes when I come across the initial double quotes (also taking into account escaping '\' character. It doesn't help
endwhile
I've also tried this method:
for c in split(jsonString, '\zs')
" Do a lot of parsing ....
endfor
For reference, a file with ~29,000 characters takes about 4 seconds to process, which is unacceptable.
Is there a better way to iterate over a string in vim script?
Or better yet, have I missed a native function to parse JSON?
Update:
I asked for no dependencies because I:
Didn't want to deal with them
Genuinely wanted some ideas for best way to do this without someone else's work.
Sometimes I just like to do things manually even though the problem has already been solved.
I'm not against plugins or dependencies at all, it's just that I'm curious. Thus the question.
I ended up creating my own function to parse the JSON file. I was creating a script that could parse the package.json file associated with node.js modules. Because of this, I could rely on a fairly consistent format and quit the processing whenever I'd retrieved the information I needed. This usually cut out large chunks of the file since most developers put the largest chunk of the file, their "readme" section, at the end. Because the package.json file is strictly defined, I left the process somewhat fragile. It assumed a root dictionary { } and actively looks for certain entries. You can find the script here: https://github.com/ahayman/vim-nodejs-complete/blob/master/after/ftplugin/javascript.vim#L33.
Of course, this doesn't answer my own question. It's only the solution to my unique problem. I'll wait a few days for new answers and pick the best one before the bounty ends (already set an alarm on my phone).
The simplest solution with the least dependencies is just using the json_decode vim function.
let dict = json_decode(jsonString)
Even though Vim's origin dates back a lot it happens that its internal string() eval() representation is that close to JSON that its likely to work unless you need special characters.
You can lookup the implementation here which even supports true/false/null if you want:
https://github.com/MarcWeber/vim-addon-json-encoding
Better use that library (vim-addon-manager allows to install dependencies easily).
Now it depends on your data whether this is good enough.
Now Benjamin Klein posted your question to vim_use which is why I'm replying.
Best and fast replies happen if you subscribe to the Vim mailinglist.
Goto vim.sf.net and follow the community link.
You cannot expect the Vim community to scrape stackoverflow.
I've added the keyword "json" and "parsing" to that little code that it can be found easier.
If this solution does not work for you you can try the many :h if_* bindings or write an external script which extracts the information you're looking for, or turns JSON into Vim's dictionary representation which can be read by eval() escaping special characters you care about correctly.
If you seek for completely correct solution omitting dependencies is one of the worst thing you can do. The eval() variant mentioned by #MarcWeber is one of the fastest, but it has its disadvantages:
Using solution for securing eval I mentioned in comment makes it no longer the fastest. In fact after you use this it makes eval() slower by more then an order of magnitude (0.02s vs 0.53s in my test).
It does not respect surrogate pairs.
It cannot be used to verify that you have correct JSON: it accepts some strings (e.g. "\<C-o>") that are not JSON strings and it allows trailing commas.
It fails to give normal error messages. It fails badly if you use vam#VerifyIsJSON I mentioned in p.1.
It fails to load floating point values like 1e10 (vim requires numbers to look like 1.0e10, but numbers like 1e10 are allowed: note “and/or” in the first paragraph).
. All of the above (except for the first) statements also apply to vim-addon-json-encoding mentioned by #MarcWeber because it uses eval. There are some other possibilities:
Fastest and the most correct is using python: pyeval('json.loads(vim.eval("varname"))'). Not faster then eval, but fastest among other possibilities. (0.04 in my test: approximately two times slower then eval())
Note that I use pyeval() here. If you want solution for vim version that lacks this functionality it will no longer be one of the fastest.
Use my json.vim plugin. It has an advantages of slightly better error reporting compared to failed vam#VerifyIsJSON, slightly worse compared to eval() and it correctly loads floating-point numbers. It can be used for verification of strings (it does not accept "\<C-a>"), but it loads lists with trailing comma just fine. It does not support surrogate pairs. It is also very slow: in the test I used (it uses 279702 character long strings) it takes 11.59s to load. Json.vim tries to use python if possible though.
For the best error reporting you can take yaml.vim and purge YAML support out of it leaving only JSON (I once have done the same thing for pyyaml, though in python: see markedjson library used in powerline: it is pyyaml minus YAML stuff plus classes with marks). But this variant is even slower then json.vim and should only be used if the main thing you need is error reporting: 207 seconds for loading the same 279702 character long string.
Note that the only variant mentioned that satisfies both requirements “no dependencies” and “no python” is eval(). If you are not fine with its disadvantages you have to throw away one or both of these requirements. Or copy-paste code. Though if you take speed into account only two candidates are left: eval() and python: if you want to parse json fast you really must use C and only these solutions spend most time in functions written in C.
Most other interpreters (ruby/perl/TCL) do not have pyeval() equivalent so they will be slower even if their JSON implementation is written in C. Some other (lua/racket (mzscheme)) have pyeval() equivalent, but e.g. luaeval('{}') is zero meaning that you will have to add additional step explicitly and recursively converting objects into vim dictionaries and lists (e.g. luaeval('vim.dict({})')) which will impact performance. Cannot say anything about mzeval(), but I have never heard about anybody actually using racket (mzscheme) with vim.

What data format is this?

I was checking one share trading site's AJAX response and below is what it showed up in Firebug Response tab of XHR section. Can anyone explain me what format is this and how is it parsed ?
<ST=tat>
<SI=0>
<TB=txtSearch>
<560v=Tata Motors Ltdv=TATMOT>
<566v=Tata Steel Ltdv=TATSTE>
<3199v=Ashram Online.com Ltdv=ASHONL>
<4866v=Kreon Finnancial Services Ltdv=KREFIN>
<552v=Tata Chemicals Ltdv=TATCHE>
<554v=Tata Power Company Ltdv=TATPOW>
<2986v=Tata Metaliks Ltdv=TATMET>
<300v=Tata Sponge Iron Ltdv=TATSPO>
<121v=Tata Coffee Ltdv=TATCOF>
<2295v=Tata Communications Ltdv=TATCOM>
<0v=Time In Milli-Secondsv=0>
I think what we are dealing with here is some proprietary format, likely an Eldricht SGML Horror of some sort.
Banking in general has all sorts of Eldricht horrors running about.
On a related note, this is very much not XML.
Edit:
A quick analysis* indicates that this is a format consisting of a series of statements bracketed by <>; with the parts of the statements separated by = or v=. = seems to indicate a parameter to a control statement, indicated by a two-letter code. (<ST=tat>), while v= seems to indicate an assignment or coupling of some kind (short for "value"?), or perhaps just a field separator.
<ST appears to be short for "search term"; <TB appears to be short for "(source) table". The meaning of <SI eludes me. It is possible that <TB terminates the metadata section, but it's equally possible that the metadata section has a fixed number of terms.
As nothing refers to the number of fields in each statement in the data section, and they are all of the same length (3 fields), it is likely that the number of fields is fixed, but it might derive from the value of <TB, or even <SI, in some way.
What is abundantly clear, however, is that this data is not intended for consumption by other applications than the one that supplies it.
*Caveat: Without a much larger sample it's impossible to tell if this analysis is valid.
It is not a commonly used "web format".
It is probably a proprietary format used by that site and will be parsed by their custom JavaScript.

sanitizing data before mysql injection and xss - am i doing it right with pdo and htmlpurifier

I am still working with securing my web app. I decided to use PDO library to prevent mysql injection and html purifier to prevent xss attacks. Because all the data that comes from input goes to database I perform such steps in order to work with data:
get data from input field
start pdo, prepare query
bind each variable (POST variable) to query, with sanitizing it using html purifier
execute query (save to database).
In code it looks like this:
// start htmlpurifier
require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$purifier = new HTMLPurifier($config);
// start pdo
$pdo = new PDO('mysql:host=host;dbname=dbname', 'login', 'pass');
$pdo -> setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
// prepare and bind
$stmt = $pdo -> prepare('INSERT INTO `table` (`field1`) VALUES ( :field1 )');
// purify data and bind it.
$stmt -> bindValue(':field1', $purifier->purify($_POST['field1']), PDO::PARAM_INT);
// execute (save to database)
$stmt -> execute();
Here are the questions:
Is that all I have to do to prevent XSS and mysql injection? I am aware that i cant be 100% sure but in most cases should it work fine and is it enough?
Should I sanitize the data once again when grabing it from db and putting to browser or filtering before saving is just enough?
I was reading on wiki that it's smart to turn of magic_quotes. Ofocurse if magic quotes puts unnecessery slahes it can be annoying but if I don't care about those slashes isn't turning it of just losing another line of defense?
Answer:
Please note that code I have written in this example is just an example. There is a lot of inputs and query to DB is much more complicated. Unfortunately I can't agree with you that if PDO type of variable should be int I do not have to filter it with XSS attacks. Correct me if I am wrong:
If the input should be an integer, and it is then it's ok - I can put it to DB. But remember that any input can be changed and we have to expect the worse. So if everything is alright than it is alright, but if a malicious user would input XSS code than I have multiple lines of defense:
client side defense - check if it is numeric value. Easy to compromise, but can stop total newbies.
server side - xss injection test (with html purify or ie htmlspecialchars)
db side - if somehow somebody puts malicious code that will avoid xss protection than database is going to return error because there should be integer, not any other kind of variable.
I guess it is not doing anything wrong, and it can do a lot of good. Ofcourse we are losing some time to calculate everything, but i guess we have to put on the weight performance and security and determine what is more important for you. My app is going to be used by 2-3 users at a time. Not many. And a security is much more important for me than performance.
Fortunately my whole site is with UTF8 so I do not expect any problems with encoding.
While searching the net i met a lot of opinions about addslashes(), stripslashes(), htmlspecialchars(), htmlentities().. and i've chosen htmlpurity and pdo. Everyone is saying that they are best solutions before xss and mysql injections threats. If you have any other opinion please share.
As for SQL injection, yes, you can be 100% sure if you always use prepared statements. As for XSS, you must also make sure that all your pages are UTF-8. HTML Purifier sanitizes data with the assumption that it's encoded in UTF-8, so there may be unexpected problems if you put that data in a page with a different encoding. Every page should have a <meta> tag that specifies the encoding as UTF-8.
Nope, you don't need to sanitize the data after you grab it from the DB, provided that you already sanitized it and you're not adding any user-submitted stuff to it.
If you always use prepared statements, magic quotes is nothing but a nuisance. It does not provide any additional lines of defense because prepared statements are bulletproof.
Now, here's a question for you. PDO::PARAM_INT will turn $field1 into an integer. An integer cannot be used in an SQL injection attack. Why are you passing it through HTML Purifier if it's just an integer?
HTML Purifier slows down everything, so you should only use it on fields where you want to allow HTML. If it's an integer, just do intval($var) to destroy anything that isn't a number. If it's a string that shouldn't contain HTML anyway, just do htmlspecialchars($var, ENT_COMPAT, 'UTF-8') to destroy all HTML. Both of these are much more efficient and equally secure if you don't need to allow HTML. Every field should be sanitized, but each field should be sanitized according to what it's supposed to contain.
Response to your additions:
I didn't mean to imply that if a variable should contain an integer, then it need not be sanitized. Sorry if my comment came across as suggesting that. What I was trying to say is that if a variable should contain an integer, it should not be sanitized with HTML Purifier. Instead, it should be validated/sanitized with a different function, such as intval() or ctype_digit(). HTML Purifier will not only use unnecessary resources in this case, but it also can't guarantee that the variable will contain an integer afterwards. intval() guarantees that the result will be an integer, and the result is equally secure because nobody can use an integer to carry out an XSS or SQL injection attack.
Similarly, if the variable should not contain any HTML in the first place, like the title of a question, you should use htmlspecialchars() or htmlentities(). HTML Purifier should only be used if you want your users to enter HTML (using a WYSIWYG editor, for example). So I didn't mean to suggest that some kinds of inputs don't need sanitization. My view is that inputs should be sanitized using different functions depending on what you want them to contain. There is no single solution that works on all types of inputs. It's perfectly possible to write a secure website without using HTML Purifier if you only ever accept plain-text comments.
"Client-side defense" is not a line of defense, it's just a convenience.
I'm also getting the nagging feeling that you're lumping XSS and SQL injection together when they are completely separate attack vectors. "XSS injection"? What's that?
You'll probably also want to add some validation to your code in addition to sanitization. Sanitization ensures that the data is safe. Validation ensures that the data is not only safe but also correct.