I am confused about encoded URLs.
For example, when I write my browser:
stackoverflow.com/questions
I can successfully view the page.
However, when I write:
stackoverflow.com%2Fquestions
I am unable to view.
Since %2F means "/", I want to understand why this does not work properly.
The reason why I want to find out is that I am getting an encoded URL and I don't know how I can decode that URL right after I receive it in order not to have an error page.
The / is one of the percent-encoding reserved characters. URLs use percent-encoding reserved characters for defining their syntax. Only when these characters are not used in their special role inside a URL, they need to be encoded.
Percent-encoding reserved characters:
! * ' ( ) ; : # & = + $ , / ? # [ ]
%21 %2A %27 %28 %29 %3B %3A %40 %26 %3D %2B %24 %2C %2F %3F %23 %5B %5D
%2F is a URL escaped /. It means, treat / as a character, not a directory separator.
In essence, it is looking for a domain stackoverflow.com/questions, not the domain stackoverflow.com with the path questions.
%2F is what you write when you want to include a / in a parameter but don't want the browser to navigate to a different directory/route.
So if you had the file path 'root/subdirectory' passed as a querystring parameter, you would want to encode that like:
http://www.testurl.com/page.php?path=root%2Fsubdirectory
rather than
http://www.testurl.com/page.php?path=root/subdirectory
URL encoding is used e.g. for encoding a string URL parameter coming from an HTML form, which contains special characters, like '/'. Writing "stackoverflow.com%2Fquestions" is wrong, in this case the '/' is part of the URL itself, and must not be encoded.
%2F is an escaped character entity - meaning it would be included in a name, rather than the character /, which denotes directory hierarchy, as specified in RFC 1630, page 8.
Related
I was just wondering why my wcf rest returns json which contains backslahses in the url. it is as below:
https:\/\/s3.amazonaws.com\/reiaustralia\/1fc00dfab25044ecb31e4882121b535e\/jpg\/download.jpg?AWSAccessKeyId=AKIAISTDESL6TBRAVM4Q&Expires=1380692091&Signature=MduuaUAjQRisadtM%2FDuVDemexLY%3D
Thanks
Forward slashes can be escaped with a backslash in JSON, but they don't have to be. So either one of the following:
{"url":"http://www.example.com/"}
or
{"url":"http\/\/www.example.com\/"}
Will parse into an object which has a url property whose string value is http://www.example.com/.
Some technologies will escape out the slashes when generating JSON, and some won't. PHP, for example, has an option called JSON_UNESCAPED_SLASHES which lets you control whether or not to escape your slashes.
You can get see the various escape characters from the json.org home page in the "string" section.
Because // (double slash) in javascript means comment and /{string}/ (string inside slash) is mean regula expression.
So. To keep correct value in json it have to put \ (back slash) in front of / (slash).
They are just escape characters, and when u consume the string in your application it would just be fine
I was wondering how you handled filenames with forward slashes in them? Currently(and I have no control or way to change it) our users can save files with slashes in them. When we test this out, the file on your system gets renamed to the whatever is after the last slash.
For example a file named Test10/09/2012.ppt would be renamed 2012.ppt.
What I would like to know is how do you guys handle incoming filename strings, and how we can encode them to have you accept a filenamed with slashes.
The forced renaming is actually not intended behavior. It's a bug that we're currently working on fixing. Box has a set of characters that are forbidden (\, /, ", :, <, >, |, *, ?, .), but we should alternatively be returning an error when you send such a character in the name of a file through the API.
We should have this fixed soon.
Ideally I would want something like example.com/resources/äFg4вNгё5, minimum number of visible characters, never mind that they have to be percent encoded before transmitting them over HTTP.
Can you tell a scheme which encodes 128b UUIDs into the least number of visible characters efficiently, without the results having characters which break URLs?
Base-64 is good for this.
{098ef7bc-a96c-43a9-927a-912fc7471ba2}
could be encoded as
vPeOCWypqUOSepEvx0cbog
The usual equal-signs at the end could be dropped, as they always make the string-length a multiple of 4. And instead of + and /, you could use some safe characters. You can pick two from: - . _ ~
More information:
RFC 4648
Storing UUID as base64 String (Java)
guid to base64, for URL (C#)
Short GUID (C#)
I use a url-safe base64 string. The following is some Python code that does this*.
The last line removes '=' or '==' sign that base 64 encoding likes to put on the end, they make putting the characters into a URL more difficult and are only necessary for de-encoding the information, which does not need to be done here.
import base64
import uuid
# get a UUID - URL safe, Base64
def get_a_Uuid():
r_uuid = base64.urlsafe_b64encode(uuid.uuid4().bytes)
return r_uuid.replace('=', '')
Above does not work for Python3. This is what I'm doing instead:
r_uuid = base64.urlsafe_b64encode(uuid.uuid4().bytes).decode("utf-8")
return r_uuid.replace('=', '')
*
This does follow the standards: base64.urlsafe_b64encode follows RFC 3548 and 4648 see https://docs.python.org/2/library/base64.html. Stripping == from base64 encoded data with known length is allowed see RFC 4648 §3.2. UUID/GUID are specified in RFC 4122; §4.1 Format states "The UUID format is 16 octets". The base64-fucntion encodes these 16 octets.
I created a file name called "%20%20.txt" and uploaded in my webs space.
When I am trying to access above file through URL by typing "http://mysite/%20%20.txt", it is showing an error that the file is not found. I know that "%20" will be decoded as a blank space.
How is it possible to access the file through URL?
The %20 that you use in the URL will be decoded, so you are looking for the file " .txt", but the %20 that you used to create the file is not decoded, so the actual name of the file is "%20%20.txt".
You need to use the URL http://mysite/%2520%2520.txt to access the file "%20%20.txt". The %25 is the encoded form of %.
Use %2520%2520.txt, %25 decodes as the percent-sign %. You can use the table on http://www.asciitable.com/. The number after the percent-sign is a hexadecimal representation of the ASCII value.
If you have a long string, you could also use Javascript's encodeURIComponent function:
prompt("Encoded:", encodeURIComponent("%20%20.txt"))
This could be executed in the Javascript console (Ctrl + Shift + J in Firefox) and displays a dialog containing the escape value.
If your file name really is %20%20.txt, try http://yoursite.com/%2520%2520.txt.
%25 is the percentage encoded.
You need to escape those percent signs:
http://mysite/%2520%2520.txt
Suppose my web application renders the following tag:
<object type="application/x-pdf" data="http://example.com/test%2Ctest.pdf">
<param name="showTableOfContents" value="true" />
<param name="hideThumbnails" value="false" />
</object>
Should data attribute be escaped (percent-encoded path) or no? In my example it is. I haven't found any specification.
addendum
Actually, I'm interested in specification on what should browser plugins consuming data attribute expect to see there. For example, Adobe Acrobat plugin takes both escaped and unescaped uri. However, QWebPluginFactory treats data attribute as a human readable URI (unescaped), and that leads to double percent encoding. And I'm wondering whether it is a bug of QWebPluginFactory or not.
The data attribute expects the value to be a URI. So you should provide a value that is a syntactically valid URI.
The current specification of URIs is RFC 3986. To see whether the , in the URI’s path needs to be encoded, take a look at how the path production rule is defined:
path = path-abempty ; begins with "/" or is empty
/ path-absolute ; begins with "/" but not "//"
/ path-noscheme ; begins with a non-colon segment
/ path-rootless ; begins with a segment
/ path-empty ; zero characters
Since we have a URI with authority information, we need to take a look at path-abempty (see URI production rule):
path-abempty = *( "/" segment )
segment is zero or more pchar characters that is defined as follows (I’ve already expanded the production rules):
pchar = ALPHA / DIGIT / "-" / "." / "_" / "~" / "%" HEXDIG HEXDIG / "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" / ":" / "#"
And as you can see, pchar expands to a literal ,. So you don’t need to encode the , in the path component. But since you are allowed to encode any non-delimiting character using the percent-encoding without changing its meaning, it is fine to use %2C instead of ,.
URLs generally can only contain specific characters. Unfortunately different specifications contain different lists of characters that are considered reserved and thus can't be used.
In your example the encoded character is a comma (,), which is a reserved character in some specifications, so it's not wrong to encode it.
Most webservers should handle unencoded and encoded commas equaly, however there can be some that don't, depending on their configuration. Due to that it is generally a good idea to avoid having special characters in filenames (as you have in your example) in the first place.
URL encoding is always needed when you have special characters in GET parameters. For example a GET parameter that is support to take C&A as a value has to be written as:
http://example.com/somescript.php?value=C%26A
EDIT:
Plugins (or even the browser) don't care either way. They don't try to (or need to) decode it or anything like that. They just request the URL as entered from the server.