Windows Phone deep links are not properly encoded - windows-phone-8

Let's assume I have a Windows Phone app capable of handling deep links with "my-app:" moniker. Given the following sample link:
my-app://do/stuff/?artist=Macklemore%20%26%20Ryan%20Lewis&test=1
We can visible see two query string params, artist = "Macklemore & Ryan Lewis" and test = "1".
If I create a webpage with that link on it and open the page inside Internet Explorer in the phone, this is what gets to the app UriMapper:
/Protocol?encodedLaunchUri=my-app%3A%2F%2Fdo%2Fstuff%2F%3Fartist%3DMacklemore%20%26%20Ryan%20Lewis%26test%3D1
So, it seems that none of the % encoded values got re-encoded, but yet it encoded the & just before the test parameter…
This seems to me like a platform bug, as we won’t be able to distinguish the & chars we get on the UriMapper!
So the question is if anyone know a way of using encoded ampersands (%26) in a windows phone deep link?

I suspect that the issue might be too low level to access with any of the public APIs.
As a starting point, I figured what we do have at our disposal (and that won't require changes on the user side), which is the ability to detect where the & signs are. From that we can determine if the & is part of a value or a query string delimiter. If the latter, we replace it with a random character and split on that character.
Regex rx = new Regex(#"(\b&.*?)=");
The above regex matches only the & that is followed by = (so it would match &test= below but not Macklemore & Ryan Lewis).
We then replace all instances of & that are matched by the above regex with a random character that won't be used elsewhere. For this example, I just used |.
string mapperInput = #"Protocol?encodedLaunchUri=my-app://do/stuff/?artist=Macklemore & Ryan Lewis&test=1";
string final = rx.Replace(mapperInput,
new MatchEvaluator(
new Func<Match, string>(x => x.Value.Replace('&', '|'))
));
We then take that result and put it in a collection.
//skip 2 because the first two matches include the protocol section
var values = final.Split(new char[] { '?', '|' }).Skip(2).ToArray();
The values array now contains two elements (which can be iterated and placed into a Dictionary to have Key-Value access)
artist=Macklemore & Ryan Lewis
test=1
This would have to be tested with various inputs that include characters but from a quick test, it seemed to work fine.

Related

= in URL changes into %3D

I have url with parameters in my static web page
https://www.nejlepsi-skolky.cz/list-map-of-kindergartens-seznam-mapa-skolek-jesli?locationFrom=Velk%C3%A9%20Opatovice,Dlouh%C3%A1%20429;latFrom=49.6120414733887;lonFrom=16.6786785125732
the problem is that the second and the third equals signs are changed into %3D, when I click on the URL. Is there a possibility to say to the browser not to change equals? And why the first equal works fine?
I have found some answers about nginx, but this is just static site not connected to nginx settings.
Thank you
EDIT: Problem was caused with saving the page to the local storage. During the save the & signs are being changed by firefox into ; signs
the problem is that the second and the third equals signs are changed into %3D, when I click on the URL.
This is normal. It should not be a problem if you are reading the URL with something that supports the standards.
And why the first equal works fine?
The standard format of a query string is a series of key=value pairs which are separated by & characters.
Older versions of the standard supported ; as well as &, but they are obsolete.
= signs, being the symbol that indicates the end of a key and start of a value, should be encoded as %3D when they occur inside a key or a value.
The first equal in your example is the symbol that separates the key from the value. The second is data inside the value.
Is there a possibility to say to the browser not to change equals?
Change your URL so you use & as the separator instead of ;.

Scala Play 2.4.x handling extended characters through anorm (MySQL) to Java Mail

I was under the impression that UTF-8 was the answer to everything :0
Problem: Using Play's idiomatic form handling to go from a web page (basic HTML Text Area Input field) to a MySQL database through the Anorm abstraction layer (so all properly escaped) and then reading the database to gather that data and create an email using the JavaMail API's to send HTML email with alternate characters (accented characters like é for example. (I'd post more but I suspect we might get strange artifacts here as well -- I'll try that in a comment below perhaps)
I can use a moderate set of characters and create a TEXT email (edited via Atom and placed into the stream directly at the code level) and it comes through as an email with all the characters I've chosen in tact.
I have not yet systematically worked through the characters I was just using a relatively random sampling as an initial test.
I place the same set of characters into a text field and try to save them to the database and I can only save about 1 in 5 or less of them.
The errors look like this:
SQLException: Incorrect string value: '\xC4\x93\x0D\x0A\x0D\x0A...' for column 'content' at row 1
I suspect I'm about to learn a ton of new information about either Play and/or UTF-8 or HTML or some part of the chain where this is going off the rails.
My question then is this: Is there an idiomatic Play example of how to handle UTF-8 end to end through Anorm and into Java Mail?
(I think I kinda expected it to be "built-in" but then I expected a LOT more to be baked into the core product as well...)
I want/need both a TEXT and and HTML path for the email portion. (I can write BOTH and they work fine -- the problem is moving alternate characters though the channels as indicated above).
I'm currently seeing if this might be an answer:
https://objectpartners.com/2013/04/24/html-encoding-utf-8-characters/
However presently hitting this roadblock...
How to turn off specific Implicit's in Scala that prevent code from compiling due to overloaded methods?
This appears to be a hopeful candidate -- I am researching it now end to end.
import org.apache.commons.lang3._
def htmlEncode(input: String) = htmlEncode_sb(input).toString
def htmlEncode_sb(input: String, stringBuilder: StringBuilder = new StringBuilder()) = {
stringBuilder.synchronized {
for ((c, i) <- input.zipWithIndex) {
if (CharUtils.isAscii(c)) {
// Encode common HTML equivalent characters
stringBuilder.append(StringEscapeUtils.escapeHtml4(c.toString()))
} else {
// Why isn't this done in escapeHtml4()?
stringBuilder.append(s"""&#${Character.codePointAt(input, i)};""")
}
}
stringBuilder
}
}
In order to get it to work inside Play you'll need this in your build.sbt file
"org.apache.commons" % "commons-lang3" % "3.4",
This blog post lead me to write that code: https://objectpartners.com/2013/04/24/html-encoding-utf-8-characters/
Update: Confirmed that it does work end to end.
Web Page Input as TextArea inside a Form saved to MySQL database escaped by Anorm, reread from database and displayed inside a TextArea on a web page with extended characters (visually) appearing precisely as input.
You'll need to call #Html(htmlContentString) inside the Twirl template to re-render this as the original HTML but the browser (Safari 8.0.7) displayed exactly what I gave it after a round trip to and from the database.
One caveat -- it creates machine readable HTML not human readable HTML. It would be nice if it didn't encode angle brackets and such so it looks more like HTML that we expect. I'm sure a pattern match block will be added next to exclude just that :)

Find a specific string in html source

My goal is to find a predefined string in an HTML source of a specific site that I have extracted using c++, but I'm getting some errors. Here is my source code so far:
So after I connect to the internet and the site and all I have this...
addr = InternetOpenUrl...
dmbp = char dmbp[5000]
dba = DWORD dba = 0
while (InternetReadFile(addr, dmbp, 80000, &dba) && dba)
{
string str2 = dmbp;
size_t sf1 = str2.find(string1);
if (sf1!=string::npos)
{printf("found");
// manipulate it...
}else{printf("not found");}
}
My problem is that it never actually confirms that it found the value that I need, it always says that the value is not found, but I even statically insert the page and look at myself and i can see the value that i need, it just doesnt show up. Does anyone with experience in html extraction with c++ know what I'm missing or how I can get this to work?
There is nothing wrong with the string search code as far as I can see, the problem is that we don't know exactly what you are searching for.
As pure HTML can be full of special characters (such as " or ", the string you might be looking for should deal with those characters. Also, strings can contain newlines and html tags (such as <b></b> within a single word), and they should be specified in the search string as string::find looks for an exact match (including any newline).
Also, I suggest debugging your code and see if the website's text/code is actually loaded into str2.
Looking at the information given that's currently the only issue I can think of why your code doesn't work.

Regex not matching URL Params

I am currently working on a stub server I can plug into a webpage so I do not need to hit sagepay every time I test my payment screen. I need the server to receive a request from the web page and use the dynamic parameters contained in the URL to build the server response. The stub uses regex targets to pick out the parameters I need and add them to the response.
I am using this stub server
I built the accepted URL piece by piece, using the regex tester contained here to test each bit of logic. The expressions work separately, but when I try to join two or more of them together they refuse to work. Each parameter is separated by an ampersand (&) and the name of the parameter.
Here is a sample of the parameters:
paymentType=A&amount=147.06&policyUid=07ef493b-0000-0000-6a05-9fa4d6a5b5ad&paymentMethod=A&script=Retail/accept.py&scriptParams=uid=07ef461a-0000-0000-6a059fa44a8870bf&invokePCL=true&paymentType=A&description=New Business Payment&firstName=Adam&surname=Har&addressLine1=20 Potters Road&city=London&postalCode=EC1 4JS&payerUid=07ef3ff7-0000-0000-6a05-9fa42e92d56b&cardType=valid&continuousAuthority=true&makeCurrent=true
and in a list for ease of reading (without &'s)
paymentType=A
amount=147.06
policyUid=07ef493b-0000-0000-6a05-9fa4d6a5b5ad
paymentMethod=A
script=Retail/accept.py
scriptParams=uid=07ef461a-0000-0000-6a059fa44a8870bf&invokePCL=true&paymentType=A
description=New Business Payment
firstName=Adam
surname=Har
addressLine1=20 Chase road
city=London
postalCode=EC1 3PF
payerUid=07ef3ff7-0000-0000-6a05-9fa42e92d56b
cardType=valid
continuousAuthority=true
makeCurrent=true
And here is my accepted URL parameters with the regex logic:
paymentType=A&amount=([0-9]+.[0-9]{2})&policyUid=([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)$)&paymentMethod=([a-zA-Z]+)&script=([a-zA-Z]+/[a-zA-Z]+.py)&scriptParams=[a-zA-Z]{3}=(([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)))&description=([a-zA-Z0-9 ]+s)&firstName=[A-Za-z]&surname=[A-Za-z]&addressLine1=[a-zA-Z0-9 ]+&city=([a-zA-Z ]+)&postalCode=[a-zA-Z0-9 ]+&payerUid=([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)$)&cardType=[a-zA-Z]+&continuousAuthority=[a-zA-Z]+&makeCurrent=[a-zA-Z]+
again in a list:
registerPayment?outputType=xml
country=GB
paymentType=A
amount=([0-9]+.[0-9]{2})
policyUid=([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)*$)
paymentMethod=([a-zA-Z]+)
script=([a-zA-Z]+/[a-zA-Z]+.py)
scriptParams=[a-zA-Z]{3}=(([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)))
description=([a-zA-Z0-9 ]+s)
firstName=[A-Za-z]
surname=[A-Za-z]
addressLine1=[a-zA-Z0-9 ]+
city=([a-zA-Z ]+)
postalCode=[a-zA-Z0-9 ]+
payerUid=([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)*$)
cardType=[a-zA-Z]+
continuousAuthority=[a-zA-Z]+
makeCurrent=[a-zA-Z]+
My question is; why does my regex and sample match ok seperately, but dont when I put them all together ?
Additional question:
I am using the logic (([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+))) for the whole ScriptParams parameter (the &'s here are part of the parameter.) If I just want to get the 'uid' part and leave the rest, what expression would I need to target this (it is made up of A-z a-z 0-9 and dashes)?
thank you
UPDATE
I have tweaked your answer slightly, because the stub server I am using will not accept the (?:[\s-]) when it loads the file containing the URL templates. I have also incorporated a lot of % and 0-9 because the request is UTF encoded before it is matched (which I had not anticipated), and a few of the params have rogue spaces beyond my control. Other than that, your solution worked great :)
Here is my new version of the scriptParams regex:
&scriptParams=[a-zA-Z]{3}%3d[-A-Za-z0-9]+
This accepts the whole parameter, and works fine in the regex tester. Now when I link anything after this part, there is an unsuccessful match.
I do not understand why this is a problem as the regex seem to string together nicely otherwise. Any ideas are appreciated.
Here is the full regex:
paymentType=[-%a-zA-Z0-9 ]+&amount=[0-9]+.[0-9]{2}&policyUid=([-A-Za-z0-9]+)&paymentMethod=([%a-zA-Z0-9]+)&script=[%/.a-zA-Z0-9]+&scriptParams=[a-zA-Z]{3}%3d[-A-Za-z0-9]+&description=[%a-zA-Z0-9 ]+&firstName=[-%A-Za-z0-9]+&surname=[-%A-Za-z0-9]+&addressLine1=[-%a-zA-Z0-9 ]+&city=[-%a-zA-Z 0-9]+&postalCode=[-%a-zA-Z 0-9]+&payerUid=([-A-Za-z0-9]+)&cardType=[%A-Za-z0-9]+&continuousAuthority=[A-Za-z]+&makeCurrent=[A-Za-z]+
And here is the full set of URL params (with UTF encoding present):
paymentType=A&amount=104.85&policyUid=16a9cc22-0000-0000-5a96-5654d9a31f92&paymentMethod=A%20&script=RetailQuotes%2FacceptQuote.py%20&scriptParams=uid%3d16a9c958-0000-0000-5a96-565435311d07%26invokePCL%3dtrue%26paymentType%3dA%20&description=New%2520Business%2520Payment&firstName=Adam&surname=Har%20&addressLine1=26%2520Close&city=Potters%2520Town&postalCode=EC1%25206LR%20&payerUid=16a9c24e-0000-0000-5a96-5654b3f956e0&cardType=valid%20&continuousAuthority=true&makeCurrent=true
Thank you
PS
(Solved the server problem. Was a slight mistake I was making in the usage of URL params.)
First, your regex not all work, some are missing quantifiers, others have a $ for some reason and some parameters are even missing! Here's what they should have been:
paymentType=A
amount=([0-9]+.[0-9]{2})
policyUid=([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)*)
paymentMethod=([a-zA-Z]+)
script=([a-zA-Z]+/[a-zA-Z]+.py)
scriptParams=[a-zA-Z]{3}=(([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)+))
invokePCL=([a-z]+)
paymentType=A
description=([a-zA-Z0-9 ]+)
firstName=[A-Za-z]+
surname=[A-Za-z]+
addressLine1=[a-zA-Z0-9 ]+
city=([a-zA-Z ]+)
postalCode=[a-zA-Z0-9 ]+
payerUid=([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)*)
cardType=[a-zA-Z]+
continuousAuthority=[a-zA-Z]+
makeCurrent=[a-zA-Z]+
And combined, you get:
paymentType=A&amount=([0-9]+.[0-9]{2})&policyUid=([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)*)&paymentMethod=([a-zA-Z]+)&script=([a-zA-Z]+/[a-zA-Z]+.py)&scriptParams=[a-zA-Z]{3}=(([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)+))&invokePCL=([a-z]+)&paymentType=A&description=([a-zA-Z0-9 ]+)&firstName=[A-Za-z]+&surname=[A-Za-z]+&addressLine1=[a-zA-Z0-9 ]+&city=([a-zA-Z ]+)&postalCode=[a-zA-Z0-9 ]+&payerUid=([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)*)&cardType=[a-zA-Z]+&continuousAuthority=[a-zA-Z]+&makeCurrent=[a-zA-Z]+
regex101 demo
[Note, I took your regexes where they matched and ran minimal edits to them].
For your second question, I'm not sure what you mean by the Uid part and that & are part of the parameter. Given that there are 3 Uids in the url with similar format (policy, scriptparams, user), you will have to put them in the expression, unless you know a specific pattern to the scriptparams' Uid.
In the expression below, I made use of the fact that only scriptparams' uid was in lowercase:
uid=[0-9a-f]+(?:-[0-9a-f]+)+
regex101 demo

Why is this %2B string being urldecoded?

[This may not be precisely a programming question, but it's a puzzle that may best be answered by programmers. I tried it first on the Pro Webmasters site, to overwhelming silence]
We have an email address verification process on our website. The site first generates an appropriate key as a string
mykey
It then encodes that key as a bunch of bytes
&$dac~ʌ����!
It then base64 encodes that bunch of bytes
JiRkYWN+yoyIhIQ==
Since this key is going to be given as a querystring value of a URL that is to be placed in an HTML email, we need to first URLEncode it then HTMLEncode the result, giving us (there's no effect of HTMLEncoding in the example case, but I can't be bothered to rework the example)
JiRkYWN%2ByoyIhIQ%3D%3D
This is then embedded in HTML that is sent as part of an email, something like:
click here.
Or paste <b>http://myapp/verify?key=JiRkYWN%2ByoyIhIQ%3D%3D</b> into your browser.
When the receiving user clicks on the link, the site receives the request, extracts the value of the querystring 'key' parameter, base64 decodes it, decrypts it, and does the appropriate thing in terms of the site logic.
However on occasion we have users who report that their clicking is ineffective. One such user forwarded us the email he had been sent, and on inspection the HTML had been transformed into (to put it in terms of the example above)
click here
Or paste <b>http://myapp/verify?key=JiRkYWN+yoyIhIQ%3D%3D</b> into your browser.
That is, the %2B string - but none of the other percentage encoded strings - had been converted into a plus. (It's definitely leaving us with the right values - I've looked at the appropriate SMTP logs).
key=JiRkYWN%2ByoyIhIQ%3D%3D
key=JiRkYWN+yoyIhIQ%3D%3D
So I think that there are a couple of possibilities:
There's something I'm doing that's stupid, that I can't see, or
Some mail clients convert %2b strings to plus signs, perhaps to try to cope with the problem of people mistakenly URLEncoding plus signs
In case of 1 - what is it? In case of 2 - is there a standard, known way of dealing with this kind of scenario?
Many thanks for any help
The problem lies at this step
on inspection the HTML had been transformed into (to put it in terms of the example above)
click here
Or paste <b>http://myapp/verify?key=JiRkYWN+yoyIhIQ%3D%3D</b> into
your browser.
That is, the %2B string - but none of the other percentage encoded
strings - had been converted into a plus
Your application at "the other end" must be missing a step of unescaping. Regardless of if there is a %2B or a + a function like perls uri_unescape returns consistent answers
DB<9> use URI::Escape;
DB<10> x uri_unescape("JiRkYWN+yoyIhIQ%3D%3D")
0 'JiRkYWN+yoyIhIQ=='
DB<11> x uri_unescape("JiRkYWN%2ByoyIhIQ%3D%3D")
0 'JiRkYWN+yoyIhIQ=='
Here is what should be happening. All I'm showing are the steps. I'm using perl in a debugger. Step 54 encodes the string to base64. Step 55 shows how the base64 encoded string could be made into a uri escaped parameter. Steps 56 and 57 are what the client end should be doing to decode.
One possible work around is to ensure that your base64 "key" does not contain any plus signs!
DB<53> $key="AB~"
DB<54> x encode_base64($key)
0 'QUJ+
'
DB<55> x uri_escape('QUJ+')
0 'QUJ%2B'
DB<56> x uri_unescape('QUJ%2B')
0 'QUJ+'
DB<57> $result=decode_base64('QUJ+')
DB<58> x $result
0 'AB~'
What may be happening here is that the URLDecode is turning the %2b into a +, which is being interpreted as a space character in the URL. I was able to overcome a similar problem by first urldecoding the string, then using a replace function to replace spaces in the decoded string with + characters, and then decrypting the "fixed" string.