I have a file with following the HTML code:
<p><? comment ?></p>
Curl returns a normal response:
$ curl file:///path/to/the/file.html
<p><? comment ?></p>
But when I parse that response with Firefox 69 or Chrome 77, nothing is shown to me, because the HTML code is as follows:
<html><head></head><body><p><!--? comment ?--></p></body></html>
It looks very strange for me. Why does it happen?
Thanks.
That's part of HTML tokenizations rules.
The < character made your browser enter the tag-open-state.
12.2.5.6 Tag open state
Consume the next input character:
U+0021 EXCLAMATION MARK (!)
Switch to the markup declaration open state.
U+002F SOLIDUS (/)
Switch to the end tag open state.
ASCII alpha
Create a new start tag token, set its tag name to the empty string. Reconsume in the tag name state.
U+003F QUESTION MARK (?)
This is an unexpected-question-mark-instead-of-tag-name parse error. Create a comment token whose data is the empty string.
Reconsume in the bogus comment state.
...
So your ? character is handled as a known error, and then the parser switches to the bogus comment state, which will put everything until the next > character inside the comment token.
Related
I can't find a good explanation for this. What is the code after href= doing?
The variable here$ is a URL. The variable result is a directory name.
echo "<h1><a href=$here/$result>$result</a></h1>";
I understand this is html embedded in php. The echo gives it away.
The question is, what is this:
href=$here/$result
I do not recognize this code in html.
Usually href is referring to the url of a source, in this case its url + directory, href is necessary to the <a> tag which represents a link in HTML
This is not HTML, but PHP that write an HTML string. You can place a variable inside double quoted strings. PHP will parse the string and replace the variable by its value.
For example:
$example = "world";
echo "Hello, $example!";
// Outputs: "Hello, world!"
Notice that single quoted strings have not this behaviour.
In the case of your question, $here and $result will be replaced by the variable value, as follow:
$here = "LOCATION"; // just for example purposes
$result = "RESULT"; // just for example purposes
echo "<h1><a href=$here/$result>$result</a></h1>";
// Outputs: "<h1><a href=LOCATION/RESULT>RESULT</a></h1>"
It will output a link (<a>) where the href attribute is the address the link points to. If you don't have spaces or > symbols in the link, it will work without quotation marks ("), but you can as well write it between quotes:
echo "<h1>$result</h1>";
// Outputs: "<h1>RESULT</h1>"
The best way I know to understand the <a> and its href attribute behaviour is to try it in the browser and see it in action by yourself.
I'm running some code for a webpage through the W3 HTML5 validator and am getting the following error:
3 Line 29 Column 346: Bad value Iinterior-lighting?f[O]field_design_style%3A169
for attribute href on element a: Illegal character in query:
not a URL code point.
..tapi-inactive id=facetapi-link--151>Unspecified (80)<span c].ass=”element-in...
The > tag closing the is the one that is causing the error. Here is the full line of code that is causing the error.
<div class="item-list"><ul class="facetapi-facetapi-checkbox-links facetapi-facet-field-design-style" id="facetapi-facet-search-apisearch-api-solr-index-block-field-design-style"><li class="leaf first">Unspecified (80)<span class="element-invisible"> Apply Unspecified filter </span></li>
Once again the thing that is causing the error is the > in the opening tag.
This looks right to me and I think it might just be an error with the validator. There are 175 errors just like it on this page.
Thanks
According to the RFC 3986,
A host identified by an Internet Protocol literal address, version 6
[RFC3513] or later, is distinguished by enclosing the IP literal
within square brackets ("[" and "]"). This is the only place where
square bracket characters are allowed in the URI syntax.
So you should encode square brackets in the url.
(Shamelessly stealed from this answer)
I have a simple html page with a div element in it.
The innerHTML property of the div is set through query String.
In query string I pass html strings,i.e.
<p style='font-size:20px;color:green;'> Sun rises in the east </p> etc...
I get the appropriate output.
However, if I pass color code in style attribute say, #00990a, I am not displayed any content.
Can someone help me through this?
if theres a color code that contains a #, everything after that will be treated fragment identifier. to avoid this you have to url-encode your parameter-value (replacing # with %23 an d doing the same with other characters that have a special meaning (#&%=?#...)).
Finally your url should look like this:
PageUrl?Content=%3Cp+style%3D%27color%3A%23009900%27%3EContent%3C%2Fp%3E
Since you haven't shown us any code, I shall guess…
In a URI, # indicates the start of the fragment identifier (as ? indicates the start of the query string). Your colour is terminated the query string and starting the fragment identifier. You need to URL encode any character that has special meaning in URLs. (# is %23).
Do make sure that you sanitise the passed HTML and CSS on the server though. It is very easy to expose yourself to XSS attacks otherwise.
I have a strange problem:
In the database, I have a literal ampersand lt semicolon:
<div
whenever its printed into a html textarea tag, the source code of the page shows the > as >.
How do I stop this decoding?
You can't stop entities being decoded in a textarea since the content of a textarea is not (unlike a script or style element) intrinsic CDATA, even though error recovery may sometimes give the impression that it is.
The definition of the textarea element is:
<!ELEMENT TEXTAREA - - (#PCDATA) -- multi-line text field -->
i.e. it contains PCDATA which is described as:
Document text (indicated by the SGML construct "#PCDATA"). Text may contain character references. Recall that these begin with & and end with a semicolon (e.g., Hergé's adventures of Tintin contains the character entity reference for the e acute character).
This means that when you type (the invalid HTML of) "start of tag" (<) the browser corrects it to "less than sign" (<) but when you type "start of entity" (&), which is allowed, no error correction takes place.
You need to write what you mean. If you want to include some HTML as data then you must convert any character with special meaning to its respective character reference.
If the data is:
<div
Then the HTML must be:
<textarea><div</textarea>
You can use the standard functions for converting this (e.g. PHP's htmlspecialchars or Perl's HTML::Entities module).
NB 1: If you were using XHTML[2] (and really using it, it doesn't count if you serve it as text/html) then you could use an explicit CDATA block:
<textarea><![CDATA[<div]]></textarea>
NB 2: Or if browsers implemented HTML 4 correctly
Ok , but the question is . why it decodes them anyway ? assuming i've added & , save the textarea , ti will be saved < , but displayed as < , saving it again will convert it back to < (but it will remain < in the database) , saving again will save it a < in the database , why the textarea decodes it ?
The server sends (to the browser) data encoded as HTML.
The browser sends (to the server) data encoded as application/x-www-form-urlencoded (or multipart/form-data).
Since the browser is not sending the data as HTML, the characters are not represented as HTML entities.
If you take the data received from the client and then put it into an HTML document, then you must encode it as HTML first.
In PHP, this can be done using htmlentities(). Example below.
<?php
$content = "This string contains the TM symbol: ™";
print "<textarea>". htmlentities($content) ."</textarea>";
?>
Without htmlentities(), the textarea would interpret and display the TM symbol (™) instead of "™".
http://php.net/manual/en/function.htmlentities.php
You have to be sure that this is rendered to the browser:
<textarea name="somename"><div</textarea>
Essentially, this means that the & in < has to be html encoded to &. How to do it will depend on the technologies you're using.
UPDATE: Think about it like this. If you want to display <div> inside a textarea, you'll have to encode <> because otherwise, <div> would be a normal HTML element to the browser:
<textarea name="somename"><div></textarea>
Having said this, if you want to display <div> inside a textarea, you'll have to encode & again, because the browser decodes HTML entities when rendering HTML. It has nothing to do with your database.
You can serve your DB-content from a separate page and then place it in the textarea using a Javascript (jQuery) Ajax-call:
request = $.ajax
({
type: "GET",
url: "url-with-the-troubled-content.php",
success: function(data)
{
document.getElementById('id-of-text-area').value = data;
}
});
Explained at
http://www.endtask.net/how-to-prevent-a-textarea-element-from-decoding-html-entities/
I had the same problem and I just made two replacements on the text to show from the database before letting it into the text area:
myString = Replace(myString, "&", "&")
myString = Replace(myString, "<", "<")
Replace n:o 1 to trick the textarea to show the codes.
replace n:o 2: Without this replacement you can not show the word "" inside the textarea (it would end the textarea tag).
(Asp / vbscript code above, translate to a replace method of your language choice)
I found an alternative solution for reading and working with in-browser, simply read the element's text() using jQuery, it returns the characters as display characters and allows me to write from a textarea to a div's innerHTML using the property via html()...
With only JS and HTML...
...to answer the actual question, with a bare-minimal example:
<textarea id=myta></textarea>
<script id=mytext type=text/plain>
™
</script>
<script> myta.value = mytext.innerText; </script>
Explanation:
Script tags do not render html nor entities. By storing text in a script tag, it will remain unadultered-- problem is it will try to execute as JavaScript. So we use an empty textarea and store the text in a script tag (here, the first one).
To prevent that, we change the mime-type to text/plain instead of it's default, which is text/javascript. This will prevent it from running.
Then to populate the textarea, we copy the script tag's content to it (here done in the second script tag).
The only caveats I have found with this are you have to use JavaScript and you cannot include script tags directly in it.
what does it mean a value that contains no U+000A LINE FEED (LF) or U+000D CARRIAGE RETURN (CR) characters. Can some one explain in layman's terms and give an example?
I guess it means a string that doesn't contain a line feed or carriage return character, like this_one.
here_is
one_that_
does
Update
I got this info from w3.org
Please link to this. I thought it may have been don't use them in your HTML attributes, but I just validated a page with a multiline title attribute with the W3C validator.
When you press Enter in a text editor to go to the next line, an invisible LINE FEED and/or CARRIAGE RETURN character is inserted.
Some HTML attributes cannot have any line breaks in their values, according to the specification,
That has nothing to do with HTML attributes or values. LF and CR are end of line characters. Wikipedia has an excellent article about them. What are you trying to accomplish and where are you getting this error?
In HTML, common commands will include an element, an attribute and a value. For example, in <A HREF ="somevalue"> A is the element, HREF is the attribute and somevalue is the value.
When you say values cannot have a carriage return or a line feed, then the value statement should not look like this:
<A HREF ="somevalue ENTER
somevalue continuing after a carriage return and line feed"></A>
Avoid that. Instead, that same information should be typed, letting the code wrap around on its own.