Multilingual URLs showing as unicode in breadcrumb menu - html

I have a Norwegian URL path which looks like this /om-os/bæredygtighed/socialt-ansvar
In my breadcrumb menu, I expect to see something like this:
Om os > Bæredygtighed > Socialt-ansvar
However, the æ is appearing as %c3%a6. So my breadcrumb looks like this:
Om os > B%c3%a6redygtighed > Socialt-ansvar
I have <meta charset="utf-8"> in the head, so I'm unsure why these characters are still appearing?

I don't know how you are building the URLs, but, except for the domains, that have a different encoding, all non-ASCII parts of a URL must be URL-encoded, AKA percent-encoded. The browser does it for you if you don't do it yourself. OTOH, the browser will in most cases show you the unencoded version of your characters. You might not be aware that what is sent over the wire is URL-encoded.
E.g., your path is sent over the wire as /om-os/b%c3%a6redygtighed/socialt-ansvar, even if you see /om-os/bæredygtighed/socialt-ansvar in the address bar. Check it with the developer tools. If you use Firefox, you will have to look at the Headers tab of the HTTP call's details in the Network tab. Chrome, instead, will also show you the HTTP call's summary row URL-encoded. That %c3%a6 in the path is the hex value of the two bytes, C3 and A6, that make up the UTF-8 encoding of the character æ.
You can even set your window.location.pathname programmatically to /om-os/bæredygtighed/socialt-ansvar, but when you read window.location.pathname afterwards, you will get it URL-encoded:
window.location.pathname = '/om-os/bæredygtighed/socialt-ansvar'
[...]
console.log(window.location.pathname)
/om-os/b%C3%A6redygtighed/socialt-ansvar
I don't know how your path flows into your breadcrumbs, but you clearly can reverse the URL-encoding before using your strings.
In JavaScript you normally do that with decodeURIComponent():
console.log(decodeURIComponent('b%c3%a6redygtighed'))
bæredygtighed
console.log(decodeURIComponent('/om-os/b%c3%a6redygtighed/socialt-ansvar'))
/om-os/bæredygtighed/socialt-ansvar
In PHP you normally do that with urldecode:
$decoded = urldecode('b%c3%a6redygtighed'); // will contain 'bæredygtighed'
But it would be better if you could make your data flow in a way that avoids the encoding and decoding steps before reaching your breadcrumbs.

If you have not yet figured out the fix -
just to add on top of whatever walter-tross has already mentioned in above answer -
For the given input - (/om-os/bæredygtighed/socialt-ansvar)
the encodeURI js-method output is as follows -
/om-os/b%C3%A6redygtighed/socialt-ansvar
and the the encodeURIComponent js-method output is as follows -
%2Fom-os%2Fb%C3%A6redygtighed%2Fsocialt-ansvar.
Given the above, it appears that you are fetching the bread-crumb input from the URL. And the behaviour is equivalent to encodeURI method, thus enabling you to split on the '/' character.
The fix, as already noted, would be to perform url-decode using decodeURI or decodeURIComponent on the individual components prior to using it as content.

Related

What is the difference between these URL syntax?

I was sent a hyperlink to a Tableau Public link by a client. When I tried opening it, I got a 404 exception. I wrote back to the client but was told by the same that the link was working fine. I visited his profile page and was able to open the presentation there, but the URL that ended up working was slightly different than the one behind the original, non-functioning link.
Here's the anonymized URL behind the original link
https://public.tableau.com/profile/[client_name]%23!/vizhome/Project-AirportDelay/FlightPerformancesinUSA?publish=yes
And here's the URL via the profile page:
https://public.tableau.com/profile/[client_name]#!/vizhome/Project-AirportDelay/FlightPerformancesinUSA
The only differences I see are ?publish=yes and %23!. I tried appending the former, ?publish=yes, to the working URL, and it was still functional. So I suspect that it has to do with the other difference %23! vs. #!. Could the first work because he is opening it from his computer where he is likely logged onto Tableau Public? What's the difference between these syntax? Any ideas about why the original hyperlink might not be functional?
For obvious privacy reasons, I can't provide the whole URL.
It looks like the basic URL pattern for passing filters ?publish=yes
and
%23 is the URL encoded representation of #
The first # after the authority component starts the fragment component. If the # should be part of the path component or the query component, it has to be percent-encoded as %23.
As # is a reserved character, these URIs aren’t equivalent:
http://example.com/foo#bar
http://example.com/foo%23bar
There are countless ways how a URI reference could become erroneous. The culprit is often a software, like a word processor, where someone pastes the correct URI, and the software incorrectly percent-encodes it (maybe assuming that the user didn’t paste the real/correct URI).
Copy-pasting the URI from the browser address bar into a plain text document should always work correctly.

How can I display ë character in the url?

I have an url that contains this ë character. Is there any way so that I can display as ë in front end, but in backend it can be converted to ASCII value %C3%AB of this character. When you view this question particular page url will display ë character. So I want same thing to be display. Thanks in advance for any suggestion.
Well, you'd do good to look at the HTML for this page then:
<a href="/questions/38720183/how-can-i-display-%c3%ab-character-in-the-url">
You must use the correctly URL-encoded version, %c3%ab. The browser may then decide to render it as "ë". That's entirely up to the browser, and it won't do it for all characters, specifically it won't decode particular lookalike characters which can be used to spoof a URL to look identical to another URL but actually be different.
You should use percent-encoding which is
a mechanism for encoding information in a Uniform Resource Identifier
(URI) under certain circumstances. Although it is known as URL
encoding it is, in fact, used more generally within the main Uniform
Resource Identifier (URI) set, which includes both Uniform Resource
Locator (URL) and Uniform Resource Name (URN). As such, it is also
used in the preparation of data of the
application/x-www-form-urlencoded media type, as is often used in the
submission of HTML form data in HTTP requests.
There is a website http://www.url-encode-decode.com/ that will do it for you

HTML Hrefs in Tomcat

I'm building a HTML string in tomcat and I notice that in my JSON object, my clickable href link is something like:
http://localhost/%22/https://myLinkHere.com/%22
This is a 2 part question. First, should it contain the http://localhost in front? And secondly, why is the %22 there?
Here is what my JSON href looks like in text:
linkDisplayName
This looks right to me, but I can't tell why the last %22 is there.
I think you won't need the localhost as long as you are supplying the relative path
The ascii code for %22 is " which is correctly referenced in your link.
HTML parsers are very lenient, which often leads to confusing behavior. Without the exact JSON it's hard to say for sure but there are a couple of obvious issues. Ultimately the issue is your HTML is malformed and/or mis-escaped.
%22 is " URL-encoded, which means that the quotes you've \-escaped are being included in the URL rather than surrounding them. That likely means that in the JSON they're double-escaped. That might mean it's \\" or something similar; try just a single backslash (\") or no backslash at all (").
Notice that the protocol (https:/) in your URL is also wrong; a URL starts with a protocol (like https) followed by a :, and generally followed by two slashes (//). Your URL follows the protocol with just a single slash, which makes it look like a relative URL rather than an absolute one. Browsers will prefix relative URLs with whatever they infer the current host to be, which in your context appears to be localhost.
The HTML should look like this:
linkDisplayName
So in summary no, the URL should probably not contain http://localhost, and it should not contain those %22s either. They're showing up because your JSON is malformed.

A html space is showing as %2520 instead of %20

Passing a filename to the firefox browser causes it to replace spaces with %2520 instead of %20.
I have the following HTML in a file called myhtml.html:
<img src="C:\Documents and Settings\screenshots\Image01.png"/>
When I load myhtml.html into firefox, the image shows up as a broken image. So I right click the link to view the picture and it shows this modified URL:
file:///c:/Documents%2520and%2520Settings/screenshots/Image01.png
^
^-----Firefox changed my space to %2520.
What the heck? It converted my space into a %2520. Shouldn't it be converting it to a %20?
How do I change this HTML file so that the browser can find my image? What's going on here?
A bit of explaining as to what that %2520 is :
The common space character is encoded as %20 as you noted yourself.
The % character is encoded as %25.
The way you get %2520 is when your url already has a %20 in it, and gets urlencoded again, which transforms the %20 to %2520.
Are you (or any framework you might be using) double encoding characters?
Edit:
Expanding a bit on this, especially for LOCAL links. Assuming you want to link to the resource C:\my path\my file.html:
if you provide a local file path only, the browser is expected to encode and protect all characters given (in the above, you should give it with spaces as shown, since % is a valid filename character and as such it will be encoded) when converting to a proper URL (see next point).
if you provide a URL with the file:// protocol, you are basically stating that you have taken all precautions and encoded what needs encoding, the rest should be treated as special characters. In the above example, you should thus provide file:///c:/my%20path/my%20file.html. Aside from fixing slashes, clients should not encode characters here.
NOTES:
Slash direction - forward slashes / are used in URLs, reverse slashes \ in Windows paths, but most clients will work with both by converting them to the proper forward slash.
In addition, there are 3 slashes after the protocol name, since you are silently referring to the current machine instead of a remote host ( the full unabbreviated path would be file://localhost/c:/my%20path/my%file.html ), but again most clients will work without the host part (ie two slashes only) by assuming you mean the local machine and adding the third slash.
For some - possibly valid - reason the url was encoded twice. %25 is the urlencoded % sign. So the original url looked like:
http://server.com/my path/
Then it got urlencoded once:
http://server.com/my%20path/
and twice:
http://server.com/my%2520path/
So you should do no urlencoding - in your case - as other components seems to to that already for you. Use simply a space
When you are trying to visit a local filename through firefox browser, you have to force the file:\\\ protocol (http://en.wikipedia.org/wiki/File_URI_scheme) or else firefox will encode your space TWICE. Change the html snippet from this:
<img src="C:\Documents and Settings\screenshots\Image01.png"/>
to this:
<img src="file:\\\C:\Documents and Settings\screenshots\Image01.png"/>
or this:
<img src="file://C:\Documents and Settings\screenshots\Image01.png"/>
Then firefox is notified that this is a local filename, and it renders the image correctly in the browser, correctly encoding the string once.
Helpful link: http://support.mozilla.org/en-US/questions/900466
Try using this
file:///c:/Documents%20and%20Settings/screenshots/Image01.png
Whenever you are trying to open a local file in the browser using cmd or any html tag use "file:///" and replace spaces with %20 (url encoding of space)
The following code snippet resolved my issue. Thought this might be useful to others.
var strEnc = this.$.txtSearch.value.replace(/\s/g, "-");
strEnc = strEnc.replace(/-/g, " ");
Rather using default encodeURIComponent my first line of code is converting all spaces into hyphens using regex pattern /\s\g and the following line just does the reverse, i.e. converts all hyphens back to spaces using another regex pattern /-/g. Here /g is actually responsible for finding all matching characters.
When I am sending this value to my Ajax call, it traverses as normal spaces or simply %20 and thus gets rid of double-encoding.
Try this?
encodeURIComponent('space word').replace(/%20/g,'+')

prevent browser from evaluating %2F

I have a php script which generates a bunch of links like so
link
but when I hover over this link or click on it, it really goes to
http://localhost/explorer/index.php?repository_id=default&folder=/mypath/inner/inner2
How do I prevent this behavior and force it to go to http://localhost/explorer/index.php?repository_id=default&folder=%2Fmypath%2Finner%2Finner2
The tool which receives this input needs to have %2F inside of the /
The hover display is often unescaped for ease of use. If you inspect the page source it should still be uri escaped.
When you use the link the GET param will still be uri escaped and get to your php script intact.
You need to encode the URL string you are using. http://php.net/manual/en/function.urlencode.php
Or manually Replace %2 with %252F (% encoded + 2F)