HTML form gives odd artifact

HTML form gives odd artifact - html

I'm making a form and I keep getting an odd artifact next to the textbox.
So right after the <br> and right before the <textarea the page loads â€‹.
What could be causing this?
<input name="pasnr" type="text" value="<?=$pasnr ?>" size="79"><br>
Commentaar: <br>
<textarea name="comments" rows="10" cols="50"><?=$comments ?></textarea><br>
<input type="submit" name="Submit" value="Change">

This looks like an encoding issue.
The Unicode character ZERO WIDTH SPACE is at codepoint U+200B. When expressed in UTF-8 this character is represented by the three bytes E2 80 8B. If these bytes are then interpreted as characters in the CP-1252 encoding they appear as the characters â€‹.
From the position these characters appear on your page it looks like you’ve somehow introduced a zero width space character in your editor (which is using utf-8) and since you’re not specifying an encoding the browser is defaulting to CP-1252.
A simple fix in this case would be to specify the encoding of the page, either by setting a Content-type header, or by adding <meta charset='utf-8'> to your page (assuming you’re using HTML5). (Alternatively just find the character in the file and delete it).
More generally you need to make sure the encodings you use throughout your application are consistent (i.e. your pages, the database, data from form submissions). If you’re new to character encodings a good place to start is Joel Spolsky’s article.

Just viewed source of your page, there is simply some junk before textarea.
I bet this is UFT8 BOM(special 3char sequence at beginning of utf8 encoded file). If page source comes from text file check if there is UTF8 BOM at file beginning and save file without a BOM.

Related

excelvba selenium cannot get html address

I have a problem with my html page below. because the page is encrypted; I have uploaded a screenshot.
enter image description here
I want to get the text in the 1st field.unfortunately, it doesn't come apart from the text in the 2nd field.
<div class="col-sm-4" id="anneAdi">
<i class="feather icon-eye-off f-20 bak2" data-id="0" data-bak="anneAdi"></i>
<img src="/Common/bitMapResimGoster.aspx?BitMapResim= : SEVİM"></div>
I want to write SEVİM in excel cell
<'elementler = baglan.FindElementByCss("#anneAdi > img").Attribute("src")>
<'elementler = baglan.FindElementByCss("img").Attribute("src")>
Formula result: %C2%A0:%C2%A0SEV%C4%B0M
like. I tried constantly but it didn't work.
sorry for my broken english.I'm waiting for your help. thanks.

%C2%A0:%C2%A0SEV%C4%B0M If this is your problem, then you know that this is how it is encoded perhaps, because of the empty spaces that are there in the img src: /Common/bitMapResimGoster.aspx?BitMapResim= : SEVİM. Note that there are spaces after BitMapResim, and because of that the output shows like that.
See here
A snippet from W3Schools (https://www.w3schools.com/tags/ref_urlencode.asp)
URL Encoding (Percent Encoding)
URL encoding converts characters into a format that can be transmitted
over the Internet.
URLs can only be sent over the Internet using the ASCII character-set.
Since URLs often contain characters outside the ASCII set, the URL has
to be converted into a valid ASCII format.
URL encoding replaces unsafe ASCII characters with a "%" followed by
two hexadecimal digits.
URLs cannot contain spaces. URL encoding normally replaces a space
with a plus (+) sign or with %20.

HTML's handling of white-space characters depends on context - but what are the rules?

The Unicode catalogue includes a number of white-space characters, some of which don't appear to work in any context in HTML documents - but some of which, rather usefully, do.
Here is an example:
<h1 title="Hi! As a title attribute, 
I can contain horizontal tabs 
and carriage returns
and line feeds.">HTML's handling of &009; | &010; | &013;</h1>
<p>Hello. As a paragraph element, I can't contain horizontal tabs 
or carriage returns
or line feeds.</p>
<input type="submit" value="I am a value attribute and
like title I can also handle line feeds" /><br />
<input type="submit" value="I am another value attribute. Like title I can handle horizontal tabs" /><br />
<input type="submit" value="I am a third value attribute. 
Unlike title I can't handle carriage returns" />
Is there any official spec or series of guidelines which detail which white-space characters can be deployed in HTML documents and where?

It's a little unclear what you mean by work, but I'm going to assume you mean rendering, at which point what happens is really up to CSS.
https://www.w3.org/TR/CSS2/text.html#white-space-model defines how most whitespace characters are normalized away, unless you adjust the white-space property.
Note that the display of toolbars (such as from the title attribute) and form controls (such as from input elements) is not defined by any standard, leaving that effectively up to browsers.

Disclaimer: this answer was composed for the question as originally written, making explicit references to ASCII control characters. It was apparently a red herring so the information here may look confusing now.
First of all, I don't think nobody uses ASCII any more. In 2016 the only sensible encoding is UTF-8. Whatever, UTF-8 is a superset of ASCII (and you can use ASCII anyway) so the question is still be valid.
Secondly, your example isn't correct. All the HTML entities you mention are printable characters:
is 'CHARACTER TABULATION' (U+0009) (i.e. a tab)
is 'CARRIAGE RETURN (CR)' (U+000D) (i.e. a legacy MacOS line feed)
is 'LINE FEED (LF)' (U+000A) (i.e. a Unix line feed)
(And please note that Windows line feeds are a combination of CR+LF.)
If you're really talking about control characters:
EOT End of Transmission
ACK Acknowledgement
BEL Bell
...
... we first need to understand that HTML is meant to be plain text (as such, it's MIME content type is text/html). The HTML5 Living Standard provides a definition of control character that's wider than the ASCII one but in any case it doesn't seem to be allowed:
Any occurrences of any characters in the ranges U+0001 to U+0008,
U+000E to U+001F, U+007F to U+009F, U+FDD0 to U+FDEF, and characters
U+000B, U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF, U+3FFFE,
U+3FFFF, U+4FFFE, U+4FFFF, U+5FFFE, U+5FFFF, U+6FFFE, U+6FFFF,
U+7FFFE, U+7FFFF, U+8FFFE, U+8FFFF, U+9FFFE, U+9FFFF, U+AFFFE,
U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE, U+CFFFF, U+DFFFE, U+DFFFF,
U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF, U+10FFFE, and U+10FFFF are parse
errors. These are all control characters or permanently undefined
Unicode characters (noncharacters).
Any character that is a not a Unicode character, i.e. any isolated
surrogate, is a parse error. (These can only find their way into the
input stream via script APIs such as document.write().)
If you actually refer to the characters in your example, some of then are considered exceptions in the parsing stage:
U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF)
characters are treated specially. Any LF character that immediately
follows a CR character must be ignored, and all CR characters must
then be converted to LF characters. Thus, newlines in HTML DOMs are
represented by LF characters, and there are never any CR characters in
the input to the tokenization stage.
... but I suspect you are only interested in white-space collapsing:
In HTML, only the following characters are defined as white space
characters:
ASCII space ( )
ASCII tab ( )
ASCII form feed ()
Zero-width space ()
[...]
In particular, user agents should collapse input white space sequences
when producing output inter-word space.
[...]
The PRE element is used for preformatted text, where white space is
significant.
In other words, consecutive white space characters become a simple space (except inside <pre> tag). (I could only find a link for HTML 4 but that's something that hasn't changed significantly).
Is there any official spec or series of guidelines? Sure they are: you have the official W3C recommendations and the WHATWG specs but they're basically technical documentation mostly addressed at browser vendors: extensive, comprehensive and hard to decipher into plain English ;-)

&nbsp when switch encoding type turn into weird character

I use &nbsp in my web page, and when switch between different encoding, it becomes weird character, as you can see from below screen shot, it turn into chinese character, with other encoding, it become accented A character. For spacing, I will not use &nbsp anymore, but my JSF page, I need an empty label so I use
h:outputLabel value="&nbsp"
Now I am not sure what can I replace this above code with, to avoid these weird character to come up on my page. Please help

You need to control the JSF view encoding via <f:view encoding>, not by manually setting HttpServletResponse#setCharacterEncoding() outside JSF's control. Otherwise JSF is still writing the response using the default UTF-8 encoding and you're basically only telling the browser afterwards that it's in a different encoding and hence the browser interprets it wrongly.
<f:view encoding="#{bean.encoding}">
As to the weird characters which you're seeing, the exist in UTF-8 of bytes 0xC2 and 0xA0. When those bytes are by the receiver (read: the webbrowser) incorrectly been interpreted using ISO-8859-1, then you get according its codepage layout indeed respectively the characters Â and another non-breaking space. When those bytes are incorrectly been interpreted using KSC5601, then you get according EUC-KR codepage layout indeed the character 혻.

For &nbsp use the value  
Please refer HTML ISO-8859-1 Reference

For starters, &nbsp should have a semicolon at the end:
I don't think that would be causing this problem, however.
Is there a reason you can't simply use " " or leave it blank altogether?
h:outputLabel value=" "
h:outputLabel value=""
You shouldn't be using it for styling though. Instead, you can simply:
button.myCancelButton { margin-left:40px; }
radio.myColumnsRadio { margin-left:40px; }
Also if you're wanting your radio buttons to have equal spacing between them, you'd either wrap them in a container with a fixed width or put them in a table:
span.myRadioContainer { width:100px; display:inline-block; }
Or (HTML solution, not sure how JSF works):
<table>
<tbody>
<tr>
<td><input type="radio"/></td>
<td><input type="radio"/></td>
</tr>
<tr>
<td><input type="radio"/></td>
...
</tr>
</tbody>
</table>
JSFiddle example.

dealing with utf8 encoded characters in html input tag

I'm dynamically adding value of an input tag. The values are returned from the server, and at some cases they are UTF8 encoded. Long story, short, the value of input tags still keeps the encoded characters, rendering e.g. Sábado to the user.
In my span tags, the value is rendered as desired, meaning that 'Sábado' is outputted. I do use
<META http-equiv="Content-Type" content="text/html;charset=UTF-8">
How can I fix this?
thanks.

Those strings are not "UTF8 encoded", those are HTML entities.
Don't HTML escape the values. Possibly you're double-escaping them somewhere.
Hard to give more concrete advise without details about your code.

Post newline/carriage return as hidden field value

I need to post multi-line data via a hidden field. The data will be viewed in a textarea after post. How can I post a newline/carriage return in the html form?
I've tried \r\n but that just posts the actual "\r\n" data
<input type="hidden" name="multiline_data" value="line one\r\nline two" />
Is there a way to do this?

Instead of using
<input type="hidden">
Try using
<textarea style="visibility:hidden;position:absolute;">

While new lines (Carriage Return & Line Feed) are technically allowed in <input>'s hidden state, they should be escaped for compatibility with older browsers. You can do this by replacing all Carriage Returns (\u000D or \r) and all Line Feeds (\u000A or \n) with proprietary strings that are recognized by your application to be a Carriage Return or New Line (and also escaped, if present in the original string).
Simply character entities don't work here, due to non-conforming browsers possibly knowing
and 
 are new lines and stripping them from the value.
Example
For example, in PHP, if you were to echo the passed value to a textarea, you would include the newlines (and unescaped string).
<textarea>Some text with a \ included
and a new line with \r\n as submitted value</textarea>
However, in PHP, if you were to echo the value to the value attribute of an <input> tag, you would escape the new lines with your proprietary strings (e.g. \r and \n), and escape any instances of your proprietary strings in the submitted value.
<input type="hidden" value="Some text with a \\ included\r\nand a new line\\r\\n as submitted value">
Then, before using the value elsewhere (inserting into a database, emailing, etc), be sure to unescape the submitted value, if necessary.
Reassurance
As further reassurance, I asked the WHATWG, and Ian Hickson, editor of the HTML spec currently, replied:
bfrohs Question about <input type=hidden> -- Are Line Feeds and Carriage Returns allowed in the value? They are specifically disallowed in Text state and Search state, but no mention is made for Hidden state. And, if not, is there an acceptable HTML solution for storing form data from a textarea?
Hixie yes, they are allowed // iirc // for legacy reasons you may wish to escape them though as some browsers normalise them away // i forget if we fixed that or not // in the spec
Source

Depends on the character set really but
should be linefeed and 
 should be carriage return. You should be able to use those in the value attribute.

You don't say what this is for or what technology you're using, but you need to be aware that you can't trust the hidden field to remain with value="line one
line two", because a hostile user can tamper with it before it gets sent back in the POST. Since you're putting the value in a <textarea> later, you will definitely be subject to, for example, cross site scripting attacks unless you verify and/or sanitize your "multiline_data" field contents before you write it back out.
When writing a value into a hidden field and reading it back, it's usually better to just keep it on the server, as an attribute of the session, or pageflow, or whatever your environment provides to do this kind of thing.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

HTML form gives odd artifact - html

Just viewed source of your page, there is simply some junk before textarea. I bet this is UFT8 BOM(special 3char sequence at beginning of utf8 encoded file). If page source comes from text file check if there is UTF8 BOM at file beginning and save file without a BOM.

Related

excelvba selenium cannot get html address

HTML's handling of white-space characters depends on context - but what are the rules?

&nbsp when switch encoding type turn into weird character

dealing with utf8 encoded characters in html input tag

Post newline/carriage return as hidden field value

Categories

Resources