Referencing a URL that includes a hyphen (NOT a dash) - html

I ran into a website that used a hyphen (not a dash) to build it's URL. I want to link to this website from another HTML document using the following code:
The Course
Difficult to see, but the hyphen is the 14th character from the right in the href. (Immediately before the second TSO abbreviation.)
Is there a way to reference this character properly using an ASCII escape code similar to a %20 for a space?
Or am I stuck with sending the other web-dev kindly asking them to fix the problem? :-)

You would use %e2%80%93 to encode that hyphen character:
The Course
I used this C# code to find out the encoding:
Console.WriteLine(HttpUtility.UrlEncode("–"));

Related

Proper Way to Escape the | Character Using HTML Entities

To escape the ampersand character in HTML, I use the & HTML entity, for example:
Link
If I have the following code in my HTML, how would I escape the | character?
Link
HTML Tidy is complaining, claiming an illegal character was found in my HTML.
I tried using ¦ and several other HTML entities, but Tidy says "malformed URI reference."
You wouldn't.
The problem (as the message says) is that the character is illegal in URLs. It is perfectly fine in HTML.
You need to apply encoding for URLs which would be %7C.
I don't know why tidy is complaining about it, but this character is not problematic in HTML nor in URL. | is not a reserved character and can be used in URL as is. You can percent-encode every character, but there is really no need for it.
What I would presume Tidy might be complaining is =. You have got two of them, the second being an invalid one.
There is no need to encode this character in HTML entities. It has no special meaning in HTML.

HTML Hrefs in Tomcat

I'm building a HTML string in tomcat and I notice that in my JSON object, my clickable href link is something like:
http://localhost/%22/https://myLinkHere.com/%22
This is a 2 part question. First, should it contain the http://localhost in front? And secondly, why is the %22 there?
Here is what my JSON href looks like in text:
linkDisplayName
This looks right to me, but I can't tell why the last %22 is there.
I think you won't need the localhost as long as you are supplying the relative path
The ascii code for %22 is " which is correctly referenced in your link.
HTML parsers are very lenient, which often leads to confusing behavior. Without the exact JSON it's hard to say for sure but there are a couple of obvious issues. Ultimately the issue is your HTML is malformed and/or mis-escaped.
%22 is " URL-encoded, which means that the quotes you've \-escaped are being included in the URL rather than surrounding them. That likely means that in the JSON they're double-escaped. That might mean it's \\" or something similar; try just a single backslash (\") or no backslash at all (").
Notice that the protocol (https:/) in your URL is also wrong; a URL starts with a protocol (like https) followed by a :, and generally followed by two slashes (//). Your URL follows the protocol with just a single slash, which makes it look like a relative URL rather than an absolute one. Browsers will prefix relative URLs with whatever they infer the current host to be, which in your context appears to be localhost.
The HTML should look like this:
linkDisplayName
So in summary no, the URL should probably not contain http://localhost, and it should not contain those %22s either. They're showing up because your JSON is malformed.

Unwanted characters being added to url in HTML

I'm trying to include a simple hyperlink in a website:
...Engineers (IEEE) projects:
So that it ends up looking like "...Engineers (IEEE) projects:" with "IEEE" being the hyperlink.
When I click on copy link address and paste the address, instead of getting
http://www.ieee.ucla.edu/
I get
http://www.ieee.ucla.edu/%C3%A2%E2%82%AC%C5%BD
and when I click on the link, it takes me to a 404 page.
Check the link. These special character are added automatically by browser (URL Encoding).
Url Encoding
Use this code and it will work::
IEEE
The proper format to add hyperlink to a html is as follow
(texts to be hyperlink)
and for better understanding go through this link http://www.w3schools.com/html/html_links.asp
%C3%A2%E2%82%AC%C5%BD represents „ which is when you get when a unicode „ is being parsed as Windows-1252 data.
Use straight quotes to delimit attribute values in your real code. You are doing this in the code you have included in the question, but that won't have the effect you are seeing. Presumably your codes are being transformed at some point in your real code.
Add appropriate HTTP headers and <meta> data to tell the browser what encoding your file is really using

A html space is showing as %2520 instead of %20

Passing a filename to the firefox browser causes it to replace spaces with %2520 instead of %20.
I have the following HTML in a file called myhtml.html:
<img src="C:\Documents and Settings\screenshots\Image01.png"/>
When I load myhtml.html into firefox, the image shows up as a broken image. So I right click the link to view the picture and it shows this modified URL:
file:///c:/Documents%2520and%2520Settings/screenshots/Image01.png
^
^-----Firefox changed my space to %2520.
What the heck? It converted my space into a %2520. Shouldn't it be converting it to a %20?
How do I change this HTML file so that the browser can find my image? What's going on here?
A bit of explaining as to what that %2520 is :
The common space character is encoded as %20 as you noted yourself.
The % character is encoded as %25.
The way you get %2520 is when your url already has a %20 in it, and gets urlencoded again, which transforms the %20 to %2520.
Are you (or any framework you might be using) double encoding characters?
Edit:
Expanding a bit on this, especially for LOCAL links. Assuming you want to link to the resource C:\my path\my file.html:
if you provide a local file path only, the browser is expected to encode and protect all characters given (in the above, you should give it with spaces as shown, since % is a valid filename character and as such it will be encoded) when converting to a proper URL (see next point).
if you provide a URL with the file:// protocol, you are basically stating that you have taken all precautions and encoded what needs encoding, the rest should be treated as special characters. In the above example, you should thus provide file:///c:/my%20path/my%20file.html. Aside from fixing slashes, clients should not encode characters here.
NOTES:
Slash direction - forward slashes / are used in URLs, reverse slashes \ in Windows paths, but most clients will work with both by converting them to the proper forward slash.
In addition, there are 3 slashes after the protocol name, since you are silently referring to the current machine instead of a remote host ( the full unabbreviated path would be file://localhost/c:/my%20path/my%file.html ), but again most clients will work without the host part (ie two slashes only) by assuming you mean the local machine and adding the third slash.
For some - possibly valid - reason the url was encoded twice. %25 is the urlencoded % sign. So the original url looked like:
http://server.com/my path/
Then it got urlencoded once:
http://server.com/my%20path/
and twice:
http://server.com/my%2520path/
So you should do no urlencoding - in your case - as other components seems to to that already for you. Use simply a space
When you are trying to visit a local filename through firefox browser, you have to force the file:\\\ protocol (http://en.wikipedia.org/wiki/File_URI_scheme) or else firefox will encode your space TWICE. Change the html snippet from this:
<img src="C:\Documents and Settings\screenshots\Image01.png"/>
to this:
<img src="file:\\\C:\Documents and Settings\screenshots\Image01.png"/>
or this:
<img src="file://C:\Documents and Settings\screenshots\Image01.png"/>
Then firefox is notified that this is a local filename, and it renders the image correctly in the browser, correctly encoding the string once.
Helpful link: http://support.mozilla.org/en-US/questions/900466
Try using this
file:///c:/Documents%20and%20Settings/screenshots/Image01.png
Whenever you are trying to open a local file in the browser using cmd or any html tag use "file:///" and replace spaces with %20 (url encoding of space)
The following code snippet resolved my issue. Thought this might be useful to others.
var strEnc = this.$.txtSearch.value.replace(/\s/g, "-");
strEnc = strEnc.replace(/-/g, " ");
Rather using default encodeURIComponent my first line of code is converting all spaces into hyphens using regex pattern /\s\g and the following line just does the reverse, i.e. converts all hyphens back to spaces using another regex pattern /-/g. Here /g is actually responsible for finding all matching characters.
When I am sending this value to my Ajax call, it traverses as normal spaces or simply %20 and thus gets rid of double-encoding.
Try this?
encodeURIComponent('space word').replace(/%20/g,'+')

HTML encoding issues - "Â" character showing up instead of " "

I've got a legacy app just starting to misbehave, for whatever reason I'm not sure. It generates a bunch of HTML that gets turned into PDF reports by ActivePDF.
The process works like this:
Pull an HTML template from a DB with tokens in it to be replaced (e.g. "~CompanyName~", "~CustomerName~", etc.)
Replace the tokens with real data
Tidy the HTML with a simple regex function that property formats HTML tag attribute values (ensures quotation marks, etc, since ActivePDF's rendering engine hates anything but single quotes around attribute values)
Send off the HTML to a web service that creates the PDF.
Somewhere in that mess, the non-breaking spaces from the HTML template (the s) are encoding as ISO-8859-1 so that they show up incorrectly as an "Â" character when viewing the document in a browser (FireFox). ActivePDF pukes on these non-UTF8 characters.
My question: since I don't know where the problem stems from and don't have time to investigate it, is there an easy way to re-encode or find-and-replace the bad characters? I've tried sending it through this little function I threw together, but it turns it all into gobbledegook doesn't change anything.
Private Shared Function ConvertToUTF8(ByVal html As String) As String
Dim isoEncoding As Encoding = Encoding.GetEncoding("iso-8859-1")
Dim source As Byte() = isoEncoding.GetBytes(html)
Return Encoding.UTF8.GetString(Encoding.Convert(isoEncoding, Encoding.UTF8, source))
End Function
Any ideas?
EDIT:
I'm getting by with this for now, though it hardly seems like a good solution:
Private Shared Function ReplaceNonASCIIChars(ByVal html As String) As String
Return Regex.Replace(html, "[^\u0000-\u007F]", " ")
End Function
Somewhere in that mess, the non-breaking spaces from the HTML template (the s) are encoding as ISO-8859-1 so that they show up incorrectly as an "Â" character
That'd be encoding to UTF-8 then, not ISO-8859-1. The non-breaking space character is byte 0xA0 in ISO-8859-1; when encoded to UTF-8 it'd be 0xC2,0xA0, which, if you (incorrectly) view it as ISO-8859-1 comes out as " ". That includes a trailing nbsp which you might not be noticing; if that byte isn't there, then something else has mauled your document and we need to see further up to find out what.
What's the regexp, how does the templating work? There would seem to be a proper HTML parser involved somewhere if your strings are (correctly) being turned into U+00A0 NON-BREAKING SPACE characters. If so, you could just process your template natively in the DOM, and ask it to serialise using the ASCII encoding to keep non-ASCII characters as character references. That would also stop you having to do regex post-processing on the HTML itself, which is always a highly dodgy business.
Well anyway, for now you can add one of the following to your document's <head> and see if that makes it look right in the browser:
for HTML4: <meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
for HTML5: <meta charset="utf-8">
If you've done that, then any remaining problem is ActivePDF's fault.
If any one had the same problem as me and the charset was already correct, simply do this:
Copy all the code inside the .html file.
Open notepad (or any basic text editor) and paste the code.
Go "File -> Save As"
Enter you file name "example.html" (Select "Save as type: All Files (.)")
Select Encoding as UTF-8
Hit Save and you can now delete your old .html file and the encoding should be fixed
Problem:
Even I was facing the problem where we were sending '£' with some string in POST request to CRM System, but when we were doing the GET call from CRM , it was returning '£' with some string content. So what we have analysed is that '£' was getting converted to '£'.
Analysis:
The glitch which we have found after doing research is that in POST call we have set HttpWebRequest ContentType as "text/xml" while in GET Call it was "text/xml; charset:utf-8".
Solution:
So as the part of solution we have included the charset:utf-8 in POST request and it works.
In my case this (a with caret) occurred in code I generated from visual studio using my own tool for generating code. It was easy to solve:
Select single spaces ( ) in the document. You should be able to see lots of single spaces that are looking different from the other single spaces, they are not selected. Select these other single spaces - they are the ones responsible for the unwanted characters in the browser. Go to Find and Replace with single space ( ). Done.
PS: It's easier to see all similar characters when you place the cursor on one or if you select it in VS2017+; I hope other IDEs may have similar features
In my case I was getting latin cross sign instead of nbsp, even that a page was correctly encoded into the UTF-8. Nothing of above helped in resolving the issue and I tried all.
In the end changing font for IE (with browser specific css) helped, I was using Helvetica-Nue as a body font changing to the Arial resolved the issue .
I was having the same sort of problem. Apparently it's simply because PHP doesn't recognise utf-8.
I was tearing my hair out at first when a '£' sign kept showing up as '£', despite it appearing ok in DreamWeaver. Eventually I remembered I had been having problems with links relative to the index file, when the pages, if viewed directly would work with slideshows, but not when used with an include (but that's beside the point. Anyway I wondered if this might be a similar problem, so instead of putting into the page that I was having problems with, I simply put it into the index.php file - problem fixed throughout.
The reason for this is PHP doesn't recognise utf-8.
Here you can check it for all Special Characters in HTML
http://www.degraeve.com/reference/specialcharacters.php
Well I got this Issue too in my few websites and all i need to do is customize the content fetler for HTML entites. before that more i delete them more i got, so just change you html fiter or parsing function for the page and it worked. Its mainly due to HTML editors in most of CMSs. the way they store parse the data caused this issue (In My case). May this would Help in your case too