Why is this %2B string being urldecoded? - html

[This may not be precisely a programming question, but it's a puzzle that may best be answered by programmers. I tried it first on the Pro Webmasters site, to overwhelming silence]
We have an email address verification process on our website. The site first generates an appropriate key as a string
mykey
It then encodes that key as a bunch of bytes
&$dac~ʌ����!
It then base64 encodes that bunch of bytes
JiRkYWN+yoyIhIQ==
Since this key is going to be given as a querystring value of a URL that is to be placed in an HTML email, we need to first URLEncode it then HTMLEncode the result, giving us (there's no effect of HTMLEncoding in the example case, but I can't be bothered to rework the example)
JiRkYWN%2ByoyIhIQ%3D%3D
This is then embedded in HTML that is sent as part of an email, something like:
click here.
Or paste <b>http://myapp/verify?key=JiRkYWN%2ByoyIhIQ%3D%3D</b> into your browser.
When the receiving user clicks on the link, the site receives the request, extracts the value of the querystring 'key' parameter, base64 decodes it, decrypts it, and does the appropriate thing in terms of the site logic.
However on occasion we have users who report that their clicking is ineffective. One such user forwarded us the email he had been sent, and on inspection the HTML had been transformed into (to put it in terms of the example above)
click here
Or paste <b>http://myapp/verify?key=JiRkYWN+yoyIhIQ%3D%3D</b> into your browser.
That is, the %2B string - but none of the other percentage encoded strings - had been converted into a plus. (It's definitely leaving us with the right values - I've looked at the appropriate SMTP logs).
key=JiRkYWN%2ByoyIhIQ%3D%3D
key=JiRkYWN+yoyIhIQ%3D%3D
So I think that there are a couple of possibilities:
There's something I'm doing that's stupid, that I can't see, or
Some mail clients convert %2b strings to plus signs, perhaps to try to cope with the problem of people mistakenly URLEncoding plus signs
In case of 1 - what is it? In case of 2 - is there a standard, known way of dealing with this kind of scenario?
Many thanks for any help

The problem lies at this step
on inspection the HTML had been transformed into (to put it in terms of the example above)
click here
Or paste <b>http://myapp/verify?key=JiRkYWN+yoyIhIQ%3D%3D</b> into
your browser.
That is, the %2B string - but none of the other percentage encoded
strings - had been converted into a plus
Your application at "the other end" must be missing a step of unescaping. Regardless of if there is a %2B or a + a function like perls uri_unescape returns consistent answers
DB<9> use URI::Escape;
DB<10> x uri_unescape("JiRkYWN+yoyIhIQ%3D%3D")
0 'JiRkYWN+yoyIhIQ=='
DB<11> x uri_unescape("JiRkYWN%2ByoyIhIQ%3D%3D")
0 'JiRkYWN+yoyIhIQ=='
Here is what should be happening. All I'm showing are the steps. I'm using perl in a debugger. Step 54 encodes the string to base64. Step 55 shows how the base64 encoded string could be made into a uri escaped parameter. Steps 56 and 57 are what the client end should be doing to decode.
One possible work around is to ensure that your base64 "key" does not contain any plus signs!
DB<53> $key="AB~"
DB<54> x encode_base64($key)
0 'QUJ+
'
DB<55> x uri_escape('QUJ+')
0 'QUJ%2B'
DB<56> x uri_unescape('QUJ%2B')
0 'QUJ+'
DB<57> $result=decode_base64('QUJ+')
DB<58> x $result
0 'AB~'

What may be happening here is that the URLDecode is turning the %2b into a +, which is being interpreted as a space character in the URL. I was able to overcome a similar problem by first urldecoding the string, then using a replace function to replace spaces in the decoded string with + characters, and then decrypting the "fixed" string.

Related

Scala Play 2.4.x handling extended characters through anorm (MySQL) to Java Mail

I was under the impression that UTF-8 was the answer to everything :0
Problem: Using Play's idiomatic form handling to go from a web page (basic HTML Text Area Input field) to a MySQL database through the Anorm abstraction layer (so all properly escaped) and then reading the database to gather that data and create an email using the JavaMail API's to send HTML email with alternate characters (accented characters like é for example. (I'd post more but I suspect we might get strange artifacts here as well -- I'll try that in a comment below perhaps)
I can use a moderate set of characters and create a TEXT email (edited via Atom and placed into the stream directly at the code level) and it comes through as an email with all the characters I've chosen in tact.
I have not yet systematically worked through the characters I was just using a relatively random sampling as an initial test.
I place the same set of characters into a text field and try to save them to the database and I can only save about 1 in 5 or less of them.
The errors look like this:
SQLException: Incorrect string value: '\xC4\x93\x0D\x0A\x0D\x0A...' for column 'content' at row 1
I suspect I'm about to learn a ton of new information about either Play and/or UTF-8 or HTML or some part of the chain where this is going off the rails.
My question then is this: Is there an idiomatic Play example of how to handle UTF-8 end to end through Anorm and into Java Mail?
(I think I kinda expected it to be "built-in" but then I expected a LOT more to be baked into the core product as well...)
I want/need both a TEXT and and HTML path for the email portion. (I can write BOTH and they work fine -- the problem is moving alternate characters though the channels as indicated above).
I'm currently seeing if this might be an answer:
https://objectpartners.com/2013/04/24/html-encoding-utf-8-characters/
However presently hitting this roadblock...
How to turn off specific Implicit's in Scala that prevent code from compiling due to overloaded methods?
This appears to be a hopeful candidate -- I am researching it now end to end.
import org.apache.commons.lang3._
def htmlEncode(input: String) = htmlEncode_sb(input).toString
def htmlEncode_sb(input: String, stringBuilder: StringBuilder = new StringBuilder()) = {
stringBuilder.synchronized {
for ((c, i) <- input.zipWithIndex) {
if (CharUtils.isAscii(c)) {
// Encode common HTML equivalent characters
stringBuilder.append(StringEscapeUtils.escapeHtml4(c.toString()))
} else {
// Why isn't this done in escapeHtml4()?
stringBuilder.append(s"""&#${Character.codePointAt(input, i)};""")
}
}
stringBuilder
}
}
In order to get it to work inside Play you'll need this in your build.sbt file
"org.apache.commons" % "commons-lang3" % "3.4",
This blog post lead me to write that code: https://objectpartners.com/2013/04/24/html-encoding-utf-8-characters/
Update: Confirmed that it does work end to end.
Web Page Input as TextArea inside a Form saved to MySQL database escaped by Anorm, reread from database and displayed inside a TextArea on a web page with extended characters (visually) appearing precisely as input.
You'll need to call #Html(htmlContentString) inside the Twirl template to re-render this as the original HTML but the browser (Safari 8.0.7) displayed exactly what I gave it after a round trip to and from the database.
One caveat -- it creates machine readable HTML not human readable HTML. It would be nice if it didn't encode angle brackets and such so it looks more like HTML that we expect. I'm sure a pattern match block will be added next to exclude just that :)

Find a specific string in html source

My goal is to find a predefined string in an HTML source of a specific site that I have extracted using c++, but I'm getting some errors. Here is my source code so far:
So after I connect to the internet and the site and all I have this...
addr = InternetOpenUrl...
dmbp = char dmbp[5000]
dba = DWORD dba = 0
while (InternetReadFile(addr, dmbp, 80000, &dba) && dba)
{
string str2 = dmbp;
size_t sf1 = str2.find(string1);
if (sf1!=string::npos)
{printf("found");
// manipulate it...
}else{printf("not found");}
}
My problem is that it never actually confirms that it found the value that I need, it always says that the value is not found, but I even statically insert the page and look at myself and i can see the value that i need, it just doesnt show up. Does anyone with experience in html extraction with c++ know what I'm missing or how I can get this to work?
There is nothing wrong with the string search code as far as I can see, the problem is that we don't know exactly what you are searching for.
As pure HTML can be full of special characters (such as " or ", the string you might be looking for should deal with those characters. Also, strings can contain newlines and html tags (such as <b></b> within a single word), and they should be specified in the search string as string::find looks for an exact match (including any newline).
Also, I suggest debugging your code and see if the website's text/code is actually loaded into str2.
Looking at the information given that's currently the only issue I can think of why your code doesn't work.

how to encrypt/encode url parameters in jsp

I want to encrypt a URL variable so that the user can't see or modify the information when it is passed in jsp.
This is an example URL:
localhost/somewebpage/name.jsp?id=1234&tname=Employee_March_2013
Here I want to encrypt or encode the parameters id and tname.
Could someone please help me write a short script that encodes / encrypts and then decrypts the parameters
EDIT:
I am sending this url as a attachment in email... when receiver clicks on this link their payslip information will displayed on the web page'
The best way to encode / decode in Base64 without using any third party libraries, you can use Using sun.misc.BASE64Encoder / sun.misc.BASE64Decoder.
try this snippet
String id="1234";
byte[] bytesEncoded = Base64.encodeBase64(id.getBytes());//encoding part
String encoded_id=new String(bytesEncoded);
String id1=request.getParameter("id");
byte[] valueDecoded= Base64.decodeBase64(id1);//decoding part
String decoded_id=new String(valueDecoded);
Send 'encoded_id' as a url parameter instead of passing 'id'
Your question became solvable the moment we knew that you are 'sending this url as attachment in email... when receiver click on this link their payslip is confirmed'
That means there are 3 options: encrypting, hashing and using random string(s).
In this case I recommend the random strings (or hashing) instead of encrypting. The reason is 2-fold:
You are not sending out potentially private data (for google gmail to read, for example)
random string(s) (or hashing) is simpler, shorter and safer (for this case).
Assuming you have a database containing your user-data, then you'd generate a unique random string (or hash) for that specific user/transaction. Then you store this data (you could hash it again internally) together with or linked to your user-data.
Now you only send out the link with the random string(s)/hash that is uniquely linked to the user-data.
Have a look on SO for https://stackoverflow.com/search?q=[jsp]+hash
and please, for the love of [enter deity here], be sure you read Wikipedia about 'salt' etc.!!
You do not want to make mistakes with user-payments!
Now, make a choice, set it up and return with questions should you get stuck!
EDIT:
In fact.. instead of hashing, a completely 'random' (fixed length) unique string(s) is sufficient! Better yet: or two random strings, for a two-factor check: one string for identification, one for authentication.
URLEncoder.encode(Encryption.encrypt(parameters), "UTF-8")
Always use POST method.
And even in POST method, user can see the id and can change it in browser console network tab.So that, user can see other's email attachment since you mentioned in your comment like that.
So, try to set id in jsp session and get the id in the java servlet code.
it is really good practice.

Actionscript 3.0 Reg Exp Find first URL, Ignore Email

I've got a bit of code where I have a loop with a string.search() to parse a string of HTML. The purpose is to seek out any valid URLs and surround each one with HREF tags appropriately while ignoring anything else, like email addresses. The problem is no matter how I modify the regular expression, it either kicks back the part of the email address after the # sign and highlights it or it highlights or ignores everything.
An example string would be:
"</span><span class='blue'>Weaselgrease:</span><span class='magenta'>weaselgrease#weasel.grs vs weaselgrease.weasel.grs</span><span class='blue'> [12:41:33 AM]</span>"
Where 'weaselgrease.weasel.grs' would be identified as a proper URL and 'weaselgrease#weasel.grs' would be ignored.
The code I have currently is /([fh]t{1,2}ps?:\/\/)?[\w-]+\.\w{2,4}/
I know it's rather simple, but it doesn't need to be complex yet.
I've tried a conditional and gotten nowhere. I may just be missing something, but my searching and even playing legos with http://regex101.com/ has gotten me no closer.
Ultimately I'm going to have it do the following:
Identify a valid URL's index in the string
Ignore if it's an email
Ignore if it's just an IP address (no prepending http:// and no trailing slash)
But I'd be happy with just an inkling of help on what I need to do to get it to ignore email addresses.
URL without proper protocol (i.e. http, https, ftp) cannot be validated as such, because this means that almost everything that has . (dot) in it is a valid url.
So there is not a way to properly check if it's url or e-mail if you don't use the protocol. Example:
end of sentence.New sentence -> sentence.New is valid url in your case
weaselgrease#weasel.grs -> everything before # is ignored and weasel.grs is valid url

Regex not matching URL Params

I am currently working on a stub server I can plug into a webpage so I do not need to hit sagepay every time I test my payment screen. I need the server to receive a request from the web page and use the dynamic parameters contained in the URL to build the server response. The stub uses regex targets to pick out the parameters I need and add them to the response.
I am using this stub server
I built the accepted URL piece by piece, using the regex tester contained here to test each bit of logic. The expressions work separately, but when I try to join two or more of them together they refuse to work. Each parameter is separated by an ampersand (&) and the name of the parameter.
Here is a sample of the parameters:
paymentType=A&amount=147.06&policyUid=07ef493b-0000-0000-6a05-9fa4d6a5b5ad&paymentMethod=A&script=Retail/accept.py&scriptParams=uid=07ef461a-0000-0000-6a059fa44a8870bf&invokePCL=true&paymentType=A&description=New Business Payment&firstName=Adam&surname=Har&addressLine1=20 Potters Road&city=London&postalCode=EC1 4JS&payerUid=07ef3ff7-0000-0000-6a05-9fa42e92d56b&cardType=valid&continuousAuthority=true&makeCurrent=true
and in a list for ease of reading (without &'s)
paymentType=A
amount=147.06
policyUid=07ef493b-0000-0000-6a05-9fa4d6a5b5ad
paymentMethod=A
script=Retail/accept.py
scriptParams=uid=07ef461a-0000-0000-6a059fa44a8870bf&invokePCL=true&paymentType=A
description=New Business Payment
firstName=Adam
surname=Har
addressLine1=20 Chase road
city=London
postalCode=EC1 3PF
payerUid=07ef3ff7-0000-0000-6a05-9fa42e92d56b
cardType=valid
continuousAuthority=true
makeCurrent=true
And here is my accepted URL parameters with the regex logic:
paymentType=A&amount=([0-9]+.[0-9]{2})&policyUid=([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)$)&paymentMethod=([a-zA-Z]+)&script=([a-zA-Z]+/[a-zA-Z]+.py)&scriptParams=[a-zA-Z]{3}=(([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)))&description=([a-zA-Z0-9 ]+s)&firstName=[A-Za-z]&surname=[A-Za-z]&addressLine1=[a-zA-Z0-9 ]+&city=([a-zA-Z ]+)&postalCode=[a-zA-Z0-9 ]+&payerUid=([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)$)&cardType=[a-zA-Z]+&continuousAuthority=[a-zA-Z]+&makeCurrent=[a-zA-Z]+
again in a list:
registerPayment?outputType=xml
country=GB
paymentType=A
amount=([0-9]+.[0-9]{2})
policyUid=([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)*$)
paymentMethod=([a-zA-Z]+)
script=([a-zA-Z]+/[a-zA-Z]+.py)
scriptParams=[a-zA-Z]{3}=(([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)))
description=([a-zA-Z0-9 ]+s)
firstName=[A-Za-z]
surname=[A-Za-z]
addressLine1=[a-zA-Z0-9 ]+
city=([a-zA-Z ]+)
postalCode=[a-zA-Z0-9 ]+
payerUid=([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)*$)
cardType=[a-zA-Z]+
continuousAuthority=[a-zA-Z]+
makeCurrent=[a-zA-Z]+
My question is; why does my regex and sample match ok seperately, but dont when I put them all together ?
Additional question:
I am using the logic (([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+))) for the whole ScriptParams parameter (the &'s here are part of the parameter.) If I just want to get the 'uid' part and leave the rest, what expression would I need to target this (it is made up of A-z a-z 0-9 and dashes)?
thank you
UPDATE
I have tweaked your answer slightly, because the stub server I am using will not accept the (?:[\s-]) when it loads the file containing the URL templates. I have also incorporated a lot of % and 0-9 because the request is UTF encoded before it is matched (which I had not anticipated), and a few of the params have rogue spaces beyond my control. Other than that, your solution worked great :)
Here is my new version of the scriptParams regex:
&scriptParams=[a-zA-Z]{3}%3d[-A-Za-z0-9]+
This accepts the whole parameter, and works fine in the regex tester. Now when I link anything after this part, there is an unsuccessful match.
I do not understand why this is a problem as the regex seem to string together nicely otherwise. Any ideas are appreciated.
Here is the full regex:
paymentType=[-%a-zA-Z0-9 ]+&amount=[0-9]+.[0-9]{2}&policyUid=([-A-Za-z0-9]+)&paymentMethod=([%a-zA-Z0-9]+)&script=[%/.a-zA-Z0-9]+&scriptParams=[a-zA-Z]{3}%3d[-A-Za-z0-9]+&description=[%a-zA-Z0-9 ]+&firstName=[-%A-Za-z0-9]+&surname=[-%A-Za-z0-9]+&addressLine1=[-%a-zA-Z0-9 ]+&city=[-%a-zA-Z 0-9]+&postalCode=[-%a-zA-Z 0-9]+&payerUid=([-A-Za-z0-9]+)&cardType=[%A-Za-z0-9]+&continuousAuthority=[A-Za-z]+&makeCurrent=[A-Za-z]+
And here is the full set of URL params (with UTF encoding present):
paymentType=A&amount=104.85&policyUid=16a9cc22-0000-0000-5a96-5654d9a31f92&paymentMethod=A%20&script=RetailQuotes%2FacceptQuote.py%20&scriptParams=uid%3d16a9c958-0000-0000-5a96-565435311d07%26invokePCL%3dtrue%26paymentType%3dA%20&description=New%2520Business%2520Payment&firstName=Adam&surname=Har%20&addressLine1=26%2520Close&city=Potters%2520Town&postalCode=EC1%25206LR%20&payerUid=16a9c24e-0000-0000-5a96-5654b3f956e0&cardType=valid%20&continuousAuthority=true&makeCurrent=true
Thank you
PS
(Solved the server problem. Was a slight mistake I was making in the usage of URL params.)
First, your regex not all work, some are missing quantifiers, others have a $ for some reason and some parameters are even missing! Here's what they should have been:
paymentType=A
amount=([0-9]+.[0-9]{2})
policyUid=([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)*)
paymentMethod=([a-zA-Z]+)
script=([a-zA-Z]+/[a-zA-Z]+.py)
scriptParams=[a-zA-Z]{3}=(([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)+))
invokePCL=([a-z]+)
paymentType=A
description=([a-zA-Z0-9 ]+)
firstName=[A-Za-z]+
surname=[A-Za-z]+
addressLine1=[a-zA-Z0-9 ]+
city=([a-zA-Z ]+)
postalCode=[a-zA-Z0-9 ]+
payerUid=([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)*)
cardType=[a-zA-Z]+
continuousAuthority=[a-zA-Z]+
makeCurrent=[a-zA-Z]+
And combined, you get:
paymentType=A&amount=([0-9]+.[0-9]{2})&policyUid=([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)*)&paymentMethod=([a-zA-Z]+)&script=([a-zA-Z]+/[a-zA-Z]+.py)&scriptParams=[a-zA-Z]{3}=(([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)+))&invokePCL=([a-z]+)&paymentType=A&description=([a-zA-Z0-9 ]+)&firstName=[A-Za-z]+&surname=[A-Za-z]+&addressLine1=[a-zA-Z0-9 ]+&city=([a-zA-Z ]+)&postalCode=[a-zA-Z0-9 ]+&payerUid=([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)*)&cardType=[a-zA-Z]+&continuousAuthority=[a-zA-Z]+&makeCurrent=[a-zA-Z]+
regex101 demo
[Note, I took your regexes where they matched and ran minimal edits to them].
For your second question, I'm not sure what you mean by the Uid part and that & are part of the parameter. Given that there are 3 Uids in the url with similar format (policy, scriptparams, user), you will have to put them in the expression, unless you know a specific pattern to the scriptparams' Uid.
In the expression below, I made use of the fact that only scriptparams' uid was in lowercase:
uid=[0-9a-f]+(?:-[0-9a-f]+)+
regex101 demo