Curly brackets in HTML - html

I stumbled upon this code:
<a href="#" class="text1"{text2}>...</a>
What does the {text2} do? Later on, this HTML is replaced with:
<a href="#" class="text1" {text2} style>...</a>
Is there a way I can retrieve the text2 value with jQuery?

In some cases that code is input in, so scripts can actually easily identify a the line. Or in some cases can be an indicator for a database to retrieve and store data once it has been pulled.
Or it could be invalid markup, doubtful if the person knows what they are doing.
But without any other information or variables it is hard to say. But the most common is access for scripts within Php, Javascript, and even C#. Cause they can parse the HTML document and manipulate it. If those braces are used, and it is incorrectly it will cause a parse error.
Hopefully that sort of clarifies it.
Update:
Yes, jQuery can find it. It is a form of Javascript. You could implement something such as:
$(function() {
var foundString = $('*:contains("{text1}")');
});
There is a vast amount of data that addresses this for more detail.

It does nothing in HTML. It's actually invalid markup. Looks like maybe you have a template system that finds and replaces that before it gets rendered to the browser.

I know that in jinja2, a python templating system, brackets contain commands to the template engine, either as:
Hello, {{varName}}
or:
<ol>
{%for l in varList%}
<li>{{l}}</li>
{%endfor%}
</ol>
That's in jinja, but jinja has similar syntax to django templates, and many other template engines probably copy django's syntax also.

its used in angular js and are called expressions {{expression}}
AngularJS is a JavaScript framework. It can be added to an HTML page with a tag.
AngularJS extends HTML attributes with Directives, and binds data to HTML with Expressions.

Related

RegEx to Filter some specific tags

I'm developing an ASP code that read a external websites and parse it via HTMLDocument interface Object ( "HTMLFILE" Object) to navigate contents via DOM structure. But there are some pages that throw an error :
'htmlfile error 80070057 Invalid Argument.'
After doing a lot of research, I've discovered that there are some HTML tags that, i don't know why, are not rendered or managed correctly by HTMLFILE object giving me that error.
Because ASP is too old and there isn't much content available today to be probing, I'm convinced that I have to parse it before send to HTMLFILE Object, and the best way that I have figured is to do via RegEx.
But I'm facing some problems (and because i don't have much practice).
I have to successfully locate HTML Tag Blocks that 'HTMLFILE' do not accept to be able to remove them.
For Example:
<head>
<script> ....... </script>
<style> ....... </style>
</head>
<body>
<iframe> ........ </iframe>
<div> ..... </div>
<table>.....</table>
I have to match full script block, style and iframe, leaving the rest of document intact.
From last days i've doing some research and have almost done it:
<(?:script|embed|object|frameset|frame|iframe|meta|style).+(.|\s)*?>$
I've tried to match single line tag (for example '<BR>') but I'm totally confused now and there are some inconsistencies on it, for example, some of lines that close some tags are improperly selected.
I Know that the best way is discover why HTMLFILE is throwing me on error, but there is no more information on error to debug it.
Thank for all the time and patience.
Here is the regex candidate:
<(script|meta|style|embed|object|frameset|frame|iframe)[\s\S]*?<\/(script|meta|style|embed|object|frameset|frame|iframe)>
DEMO with explanation
EDIT
Update with lazy match for [\s\S]*?
Regex is not best tool for that, take a look here, but if you really want, I think in simple cases you can also use one regex for all tags, also nested:
(?=(<([^>]+)>([\s\S]*?)<\/\2>))
DEMO
the 1st groups shows whole captured part, 2nd groups capture just tag, and 3rd group capture content of tag. It doesn't actually match text, only capture some fragments. However you probably can get start/end index of match, and use in as you want.
Still I think you should reconsider using regex, however suntex used above is quite useful, so it is worth to know how to use it.

Why do I need XSS library while I can use Html-encode?

I'm trying to understand why do I need to use XSS library when I can merely do HtlEncode when sending data from server to client ...?
For example , here in Stackoverflow.com - the editor - all the SO tem neads to do is save the user input and display it with html encode.
This way - there will never going to be a HTML tag - which is going to be executed.
I'm probably wrong here -but can you please contradict my statement , or exaplain?
For example :
I know that IMG tag for example , can has onmouseover , onload which a user can do malicious scripts , but the IMG won't event run in the browser as IMG since it's <img> and not <img>
So - where is the problem ?
HTML-encoding is itself one feature an “XSS library” might provide. This can be useful when the platform doesn't have a native HTML encoder (eg scriptlet-based JSP) or the native HTML encoder is inadequate (eg not escaping quotes for use in attributes, or ]]> if you're using XHTML, or #{} if you're worried about cross-origin-stylesheet-inclusion attacks).
There might also be other encoders for other situations, for example injecting into JavaScript strings in a <script> block or URL parameters in an href attribute, which are not provided directly by the platform/templating language.
Another useful feature an XSS library could provide might be HTML sanitisation, for when you want to allow the user to input data in HTML format, but restrict which tags and attributes they use to a safe whitelist.
Another less-useful feature an XSS library could provide might be automated scanning and filtering of input for HTML-special characters. Maybe this is the kind of feature you are objecting to? Certainly trying to handle HTML-injection (an output stage issue) at the input stage is a misguided approach that security tools should not be encouraging.
HTML encoding is only one aspect of making your output safe against XSS.
For example, if you output a string to JavaScript using this code:
<script>
var enteredName = '<%=EnteredNameVariableFromServer %>';
</script>
You will be wanting to hex entity encode the variable for proper insertion in JavaScript, not HTML encode. Suppose the value of EnteredNameVariableFromServer is O'leary, then the rendered code when properly encoded will become:
<script>
var enteredName = 'O\x27leary';
</script>
In this case this prevents the ' character from breaking out of the string and into the JavaScript code context, and also ensures proper treatment of the variable (HTML encoding it would result in the literal value of O'leary being used in JavaScript, affecting processing and display of the value).
Side note:
Also, that's not quite true of Stack Overflow. Certain characters still have special meanings like in the <!-- language: lang-none --> tag. See this post on syntax highlighting if you're interested.

What kind of technique is this HTML tag?

Facebook like button (XFBML) used this
<fb:like send="true" width="450" show_faces="true"></fb:like>
Clearly the <fb></fb> is a tag, XML will accept it but it's not HTML. So is it normal that the browser keep it in the document?
What kind of programming technique is this called? Is it the right way? Or just another way to create a hidden element and replace the id="fb" ?
What is the :something in <fb:like> stands for? How to access it with javascript?
This is XHP!
XHP is a PHP extension created by Facebook.
It makes PHP understand XML nodes, so you can write something like this (from their own example):
<?php
$href = 'http://www.facebook.com';
echo <a href={$href}>Facebook</a>;
?>
XHP also allows you to create PHP classes, which can be used in your markup. So the <fb:like /> node is actually turned into a PHP class at compile time. The definition of the class probably looks like this:
<?php
class :fb:like extends :x:element {
...
}
You can read more about it in the link to Github above, and on the creators blog which is all about XHP.
So to answer your questions:
will not be processed by the browser, but by XHP. XHP turns it into PHP objects, which lastly turns it into valid HTML tag(s). This is true when using XHP, but it is also possible for us to use the same tag, without XHP. I'm guessing this is just a matter of parsing the tag in javascript and sending the variable to the API, which probably uses API to recreate the structure, and send back the HTML.
Not really a technique, but a unique thing that Facebook has developed to make their lifes working with PHP easier.
Again, when it is returned to the browser, it has been transformed by XHP (after sending it to Facebook through javascript). Try looking at the rendered version - it looks different than the simple <fb:like> tag.

How to sanitize user generated html code in ruby on rails

I am storing user generated html code in the database, but some of the codes are broken (without end tags), so when this code will mess up the whole render of the page.
How could I prevent this sort of behaviour with ruby on rails.
Thanks
It's not too hard to do this with a proper HTML parser like Nokogiri which can perform clean-up as part of the processing method:
bad_html = '<div><p><strong>bad</p>'
puts Nokogiri.fragment(bad_html).to_s
# <div><p><strong>bad</strong></p></div>
Once parsed properly, you should have fully balanced tags.
My google-fu reveals surprisingly few hits, but here is the top one :)
Valid Well-formed HTML
Try using the h() escape function in your erb templates to sanitize. That should do the trick
Check out Loofah, an HTML sanitization library based on Nokogiri. This will also remove potentially unsafe HTML that could inject malicious script or embed objects on the page. You should also scrub out style blocks, which might mess up the markup on the page.

Limiting HTML Input into Text Box

How do I limit the types of HTML that a user can input into a textbox? I'm running a small forum using some custom software that I'm beta testing, but I need to know how to limit the HTML input. Any suggestions?
i'd suggest a slightly alternative approach:
don't filter incoming user data (beyond prevention of sql injection). user data should be kept as pure as possible.
filter all outgoing data from the database, this is where things like tag stripping, etc.. should happen
keeping user data clean allows you more flexibility in how it's displayed. filtering all outgoing data is a good habit to get into (along the never trust data meme).
You didn't state what the forum was built with, but if it's PHP, check out:
http://htmlpurifier.org/
Library Features: Whitelist, Removal, Well-formed, Nesting, Attributes, XSS safe, Standards safe
Once the text is submitted, you could strip any/all tags that don't match your predefined set using a regex in PHP.
It would look something like the following:
find open tag (<)
if contents != allowed tag, remove tag (from <..>)
Parse the input provides and strip out all html tags that don't match exactly the list you are allowing. This can either be a complex regex, or you can do a stateful iteration through the char[] of the input string building the allowed input string and stripping unwanted attributes on tags like img.
Use a different code system (BBCode, Markdown)
Find some code online that already does this, to use as a basis for your implementation. For example Slashcode must perform this, so look for its implementation in the Perl and use the regexes (that I assume are there)
Regardless what you use, be sure to be informed of what kind of HTML content can be dangerous.
e.g. a < script > tag is pretty obvious, but a < style > tag is just as bad in IE, because it can invoke JScript commands.
In fact, any style="..." attribute can invoke script in IE.
< object > would be one more tag to be weary of.
PHP comes with a simple function strip_tag to strip HTML tags. It allows for certain tags to not be stripped.
Example #1 strip_tags() example
<?php
$text = '<p>Test paragraph.</p><!-- Comment --> Other text';
echo strip_tags($text);
echo "\n";
// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>
The above example will output:
Test paragraph. Other text
<p>Test paragraph.</p> Other text
Personally for a forum, I would use BBCode or Markdown because the amount of support and features provided such as live preview.