Spaces in links using perl and html - html

I have a problem with spaces using perl. I'm taking my informations from a db and I'm using them for a query string, but if there is a space, I can't validate my page.
White Shark
Is there something I can do?

Turn your URLs into URI objects which will take care of all escaping and normalization for you. They can also be used as strings.
use URI;
# Prints fish.cgi?fish=White%20Shark
print URI->new("fish.cgi?fish=White Shark");
I'd recommend this over individually escaping URIs. Once you have a URI object, you know it's safe to use.

You can replace the spaces in links with %20 which is the URL-encoded version for a space. You may have to strip this out again after it's been posted, not quite sure.
Example: www.google.ca/this site would be encoded as www.google.ca/this%20site
So your version would be fish.cgi?fish=White%20Shark, this should post properly.

Related

HTML Hrefs in Tomcat

I'm building a HTML string in tomcat and I notice that in my JSON object, my clickable href link is something like:
http://localhost/%22/https://myLinkHere.com/%22
This is a 2 part question. First, should it contain the http://localhost in front? And secondly, why is the %22 there?
Here is what my JSON href looks like in text:
linkDisplayName
This looks right to me, but I can't tell why the last %22 is there.
I think you won't need the localhost as long as you are supplying the relative path
The ascii code for %22 is " which is correctly referenced in your link.
HTML parsers are very lenient, which often leads to confusing behavior. Without the exact JSON it's hard to say for sure but there are a couple of obvious issues. Ultimately the issue is your HTML is malformed and/or mis-escaped.
%22 is " URL-encoded, which means that the quotes you've \-escaped are being included in the URL rather than surrounding them. That likely means that in the JSON they're double-escaped. That might mean it's \\" or something similar; try just a single backslash (\") or no backslash at all (").
Notice that the protocol (https:/) in your URL is also wrong; a URL starts with a protocol (like https) followed by a :, and generally followed by two slashes (//). Your URL follows the protocol with just a single slash, which makes it look like a relative URL rather than an absolute one. Browsers will prefix relative URLs with whatever they infer the current host to be, which in your context appears to be localhost.
The HTML should look like this:
linkDisplayName
So in summary no, the URL should probably not contain http://localhost, and it should not contain those %22s either. They're showing up because your JSON is malformed.

A html space is showing as %2520 instead of %20

Passing a filename to the firefox browser causes it to replace spaces with %2520 instead of %20.
I have the following HTML in a file called myhtml.html:
<img src="C:\Documents and Settings\screenshots\Image01.png"/>
When I load myhtml.html into firefox, the image shows up as a broken image. So I right click the link to view the picture and it shows this modified URL:
file:///c:/Documents%2520and%2520Settings/screenshots/Image01.png
^
^-----Firefox changed my space to %2520.
What the heck? It converted my space into a %2520. Shouldn't it be converting it to a %20?
How do I change this HTML file so that the browser can find my image? What's going on here?
A bit of explaining as to what that %2520 is :
The common space character is encoded as %20 as you noted yourself.
The % character is encoded as %25.
The way you get %2520 is when your url already has a %20 in it, and gets urlencoded again, which transforms the %20 to %2520.
Are you (or any framework you might be using) double encoding characters?
Edit:
Expanding a bit on this, especially for LOCAL links. Assuming you want to link to the resource C:\my path\my file.html:
if you provide a local file path only, the browser is expected to encode and protect all characters given (in the above, you should give it with spaces as shown, since % is a valid filename character and as such it will be encoded) when converting to a proper URL (see next point).
if you provide a URL with the file:// protocol, you are basically stating that you have taken all precautions and encoded what needs encoding, the rest should be treated as special characters. In the above example, you should thus provide file:///c:/my%20path/my%20file.html. Aside from fixing slashes, clients should not encode characters here.
NOTES:
Slash direction - forward slashes / are used in URLs, reverse slashes \ in Windows paths, but most clients will work with both by converting them to the proper forward slash.
In addition, there are 3 slashes after the protocol name, since you are silently referring to the current machine instead of a remote host ( the full unabbreviated path would be file://localhost/c:/my%20path/my%file.html ), but again most clients will work without the host part (ie two slashes only) by assuming you mean the local machine and adding the third slash.
For some - possibly valid - reason the url was encoded twice. %25 is the urlencoded % sign. So the original url looked like:
http://server.com/my path/
Then it got urlencoded once:
http://server.com/my%20path/
and twice:
http://server.com/my%2520path/
So you should do no urlencoding - in your case - as other components seems to to that already for you. Use simply a space
When you are trying to visit a local filename through firefox browser, you have to force the file:\\\ protocol (http://en.wikipedia.org/wiki/File_URI_scheme) or else firefox will encode your space TWICE. Change the html snippet from this:
<img src="C:\Documents and Settings\screenshots\Image01.png"/>
to this:
<img src="file:\\\C:\Documents and Settings\screenshots\Image01.png"/>
or this:
<img src="file://C:\Documents and Settings\screenshots\Image01.png"/>
Then firefox is notified that this is a local filename, and it renders the image correctly in the browser, correctly encoding the string once.
Helpful link: http://support.mozilla.org/en-US/questions/900466
Try using this
file:///c:/Documents%20and%20Settings/screenshots/Image01.png
Whenever you are trying to open a local file in the browser using cmd or any html tag use "file:///" and replace spaces with %20 (url encoding of space)
The following code snippet resolved my issue. Thought this might be useful to others.
var strEnc = this.$.txtSearch.value.replace(/\s/g, "-");
strEnc = strEnc.replace(/-/g, " ");
Rather using default encodeURIComponent my first line of code is converting all spaces into hyphens using regex pattern /\s\g and the following line just does the reverse, i.e. converts all hyphens back to spaces using another regex pattern /-/g. Here /g is actually responsible for finding all matching characters.
When I am sending this value to my Ajax call, it traverses as normal spaces or simply %20 and thus gets rid of double-encoding.
Try this?
encodeURIComponent('space word').replace(/%20/g,'+')

Why is “ not showing up as a quote on my web page?

Other ASCII codes are doing the same thing.
Just to give you some background, these codes are part of the HTML that I'm reading from WordPress blog posts. I'm porting them over to BlogEngine.NET using a little C# WinForm app I wrote. Do I need to do some kind of conversion as I port them over to BlogEngine.NET (as XML files)?
It'd sure be nice if they just displayed properly without any intervention on my part.
Here's a code fragment from one of the WordPress source pages:
<link rel="alternate" type="application/rss+xml" title="INRIX® Traffic » Taking the “E” out of your “ETA” Comments Feed" href="http://www.inrixtraffic.com/blog/2012/taking-the-e-out-of-your-eta/feed/" />
Here's the corresponding chunk of XML that's in the XML file I output during the conversion:
<title>Taking the &#8220;E&#8221; out of your &#8220;ETA&#8221;</title>
UPDATE.
Tried this, but still no dice.
writer.WriteElementString("title", string.Format("<![CDATA[{0}]]>", post.Title));
...outputs this:
<title><![CDATA[Taking the &#8220;E&#8221; out of your &#8220;ETA&#8221;]]></title>
Since the data you are getting from Wordpress is already encoded you can decode it to a regular string and then let the XMLWriter encode it properly for XML.
string input = "Taking the “E” out of your “ETA”";
string decoded = System.Net.WebUtility.HtmlDecode(input);
//decoded = Taking the "E" out of your "ETA"
This may not be very efficient, but since this sounds like a one time conversion I don' think it will be an issue.
A similar question was asked here: How can I decode HTML characters in C#?
As I pointed out in my comment above: Your problem is that your Ü gets encoded into &8220;. When you output this in the browser it displays as Ü
I don't know how your porting works, but to fix this issue, you need to make sure that the & in the ASCII codes doesn't get encoded to &
Any chance CDATA tags solve the issue? Just make sure the text is correct in the source XML file. You don't need the ampersand magic (in the source) if you use CDATA tags.
<some_tag><![CDATA[Taking the “ out of your ...]]></some_tag>

extracting double quotes from html tags with a regex

I'm extracting some content from a website with this pattern:
([^+]+)
and it outputs
< img src=""http://www."" border=""0""/>
with double quotes. What is wrong with my query?
your problem only makes sense if you modify your regexp.
but first of all, beware:
in general, what you try to achieve is not feasible using regexes. they are the inappropriate tool to do it. you will not come up with a solution 100% correct using regexes.
having said this, try to replace ([^+]+) with (([^<!--]+([^<]|<[^!]|<![^-]|<!-[^-]))+). note that this regex assumes the following:
there are no html comments inside the message portion
there are no strings containing html comment openings inside the message portion
the message portion is a valid html fragment
(otherwise it would match eg. <!-<!-- / message -->)
you have been warned.
btw, the dquote doubling must be a standard escape mechanism of the imacro environment.

How do I prevent the GET method from encoding HTML special characters in the URI?

I have a form using the GET method.
If values are submitted with special characters, they appear in the URI as:
?value=fudge%20and%20stuff
How do I make it clean?
I don't want to use the header function because this is happening within a page in drupal.
A URL cannot contain spaces and many other "special characters", therefore they get encoded. Unfortunately there isn't a lot you can do about it. The most you could do is some JavaScript trickery in the form, but I don't think it's worth it.
If %20 bothers you, you can substitute (GREP replace) the + character (?value=fudge+and+stuff) for better readability. Otherwise there's not a lot you can do. Other "exotic" characters will be similarly escaped, and need to be.
URL :?value=fudge%20and%20stuff
Encoded as: fulg< space > and < space >stuff