How to match either of regexes? - html

I have two regular expressions:
selected.*(?<=value=)(['"])(.*)\1
and
(?<=value=)(['"])(.*)\1(?=.*selected)
Separately they work fine, but I need a regex that matches either the first one, or the secod one. The alternative sign | is of no use here.
How could I match either regex in a single one?
Inputs:
<option selected value="a">a</option>
<option value="b">b</option>
and
<option selected value="a">a</option>
<option value="b" selected>b</option>
The first regex matches 'a' in the first input, the second regex matches 'b' in the second one, but when I combine two regexes with alternative sign, new regex matches nothing in both inputs.'

This regex should work for both cases without using alternation i.e. |:
(?=.*?selected).*?(?<=value=)(['"])(.*?)\1
OR if these tags can go on to multiple lines:
(?=[^>]*\sselected\b)[^>]*(?<=value=)(['"])([^>]*?)\1
RegEx Demo

Related

HTML <sub> inside select option

I am trying to get a simple select field with two options, let's say ac and bc. Somehow, the sub-fields are ignored inside the option-field. How can I fix it to show "c" as an index? So far I've only tried this in Firefox.
Example:
<select>
<option>a<sub>c</sub></option>
<option>b<sub>c</sub></option>
</select>
The option element can only take text as content. You can however, use any unicode character inside it. Unicode has subscript characters built in. You would have to use the unicode characters for subscript.
You could use it like this:
<select name="" id="">
<option value="a">Hello</option>
<option value="a">h&#x2090</option>
</select>
Your best bet would probably be to just copy and paste whatever subscript character you need.

Cannot escape double quotes and slashes when initializing an input in html [duplicate]

I have a drop down on a web page which is breaking when the value string contains a quote.
The value is "asd, but in the DOM it always appears as an empty string.
I have tried every way I know to escape the string properly, but to no avail.
<option value=""asd">test</option>
<option value="\"asd">test</option>
<option value=""asd">test</option>
<option value=""asd">test</option>
How do I render this on the page so the postback message contains the correct value?
" is the correct way, the third of your tests:
<option value=""asd">test</option>
You can see this working below, or on jsFiddle.
alert($("option")[0].value);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<select>
<option value=""asd">Test</option>
</select>
Alternatively, you can delimit the attribute value with single quotes:
<option value='"asd'>test</option>
If you are using PHP, try calling htmlentities or htmlspecialchars function.
Per HTML syntax, and even HTML5, the following are all valid options:
<option value=""asd">test</option>
<option value=""asd">test</option>
<option value='"asd'>test</option>
<option value='"asd'>test</option>
<option value='"asd'>test</option>
<option value="asd>test</option>
<option value="asd>test</option>
Note that if you are using XML syntax the quotes (single or double) are required.
Here's a jsfiddle showing all of the above working.
Another option is replacing double quotes with single quotes if you don't mind whatever it is. But I don't mention this one:
<option value='"asd'>test</option>
I mention this one:
<option value="'asd">test</option>
In my case I used this solution.
If you are using JavaScript and Lodash, then you can use _.escape(), which escapes ", ', <, >, and &.
You really should only allow untrusted data into a whitelist of good attributes like: align, alink, alt, bgcolor, border, cellpadding, cellspacing, class, color, cols, colspan, coords, dir, face, height, hspace, ismap, lang, marginheight, marginwidth, multiple, nohref, noresize, noshade, nowrap, ref, rel, rev, rows, rowspan, scrolling, shape, span, summary, tabindex, title, usemap, valign, value, vlink, vspace, width
You really want to keep untrusted data out of javascript handlers as well as id or name attributes (they can clobber other elements in the DOM).
Also, if you are putting untrusted data into a SRC or HREF attribute, then its really a untrusted URL so you should validate the URL, make sure its NOT a javascript: URL, and then HTML entity encode.
More details on all of there here: https://www.owasp.org/index.php/Abridged_XSS_Prevention_Cheat_Sheet

XPath for string contained in one XML element or another?

I need an XPath that can find either an <a> tag, or an <option> tag, each one containing "something".
So the XPath would be able to match either
<a attributes='value'>something</a>
or
<option attributes="value">something</option>
I tried this:
$x("//*[local-name()='a' contains(.,'something') or local-name()='option' contains(.,'something')]")
I also tried this:
$x("//*[local-name(contains(.,'something'))='a' or local-name(contains(.,'something'))='option']")
But neither of them work. In the first one, I can exclude the contains() and it finds the tags, but I need to be able to search for those tags only containing the specified "something" text.
You really should post your input XML.
Let's say it's this:
<r>
<a>xxx something</a>
<a>yyy nothing</a>
<option>something xxx</option>
<option>nothing xxx</option>
</r>
(1) Then (if you're trying to ignore namespaces):
//*[(local-name() = 'a' or local-name() = 'option')][contains(., 'something')]
(2) or (if there are no namespaces) [credit: earlier #alecxe post]:
//*[self::option or self::a][contains(., "something")]
(3) or (if using XPath 2.0, again without namespaces):
//(a|option)[contains(., 'something')]
will select
<a>xxx something</a>
<option>something xxx</option>

HTML - Why boolean attributes do not have boolean value?

I noticed that some elements have attributes which are boolean. I wonder why the values are not true or false? or 1 and 0? Are there any reason behind why they are like this?
<option selected="selected">Ham Burger</option>
or
<input type="button" disabled="disabled" />
Thanks in advance!
In SGML, an attribute may be minimized so that its value alone is short for both the name and the value, with the only possible value for the attribute in this case obviously being the attribute's own name. HTML uses this for boolean attributes, where the presence or absence of the attribute is what's meaningful, and its value is irrelevant. But in XML, minimized attributes were disallowed, so we wound up with the awkwardness that is selected="selected" when XHTML came into vogue. If you're writing HTML rather than XHTML, you can just write selected.
The exact definition is:
Some attributes play the role of boolean variables (e.g., the selected
attribute for the OPTION element). Their appearance in the start tag
of an element implies that the value of the attribute is "true". Their
absence implies a value of "false".
Also:
Boolean attributes may legally take a single value: the name of the attribute itself [...] In HTML, boolean attributes may appear in minimized form
Basically, this implies that there are only two possible statuses for boolean attributes, true and false, but there isn't a not set status.
For the disabled attribute I think it's the presence of the attribute that disables the element regardless of its value.
It guess one of the reasons could be to allow more values than just yes/no in the future. For instance, instead of visible=true/false, you can have visibility=visible/hidden/collapsed
the HTML standard (Not the XHTML) is to have simply selected instead of selected="selected"
See here: http://www.w3.org/TR/html4/interact/forms.html#adef-selected
When XHTML was created to allow a a better integration with XML in HTML, (see http://www.w3.org/MarkUp/2004/xhtml-faq#need), the parts that do not fit to the XML-like structure requirements of HTML were corrected. So wordings like selected got transformed into selected="selected" to fit the standard
Readability, a lot of HTML is not coded by people with computer science backgrounds so the concept of "Boolean" would be foreign to them in those terms. Also it improves readability for Computer Science and other technical users by providing reinforced clues as to the function of a given statement.
As vc74 has said, it doesn't matter what value you have for selected or disabled.
<option selected="selected">Ham Burger</option>
will do the same as
<option selected="sjkhdaskj">Ham Burger</option>
i think this is just for ease to user to specify the attribute value in most human readable form if he/she dont know what is true/false
<html>
<body>
<select>
<option>1</option>
<option selected="blah">2</option>
<option >3</option>
</select>
</body>
you see in above code i have not use selected=selected, i used what i want it still select the option value, or you can simply use <option selected>2<option>.

How do I properly escape quotes inside HTML attributes?

I have a drop down on a web page which is breaking when the value string contains a quote.
The value is "asd, but in the DOM it always appears as an empty string.
I have tried every way I know to escape the string properly, but to no avail.
<option value=""asd">test</option>
<option value="\"asd">test</option>
<option value=""asd">test</option>
<option value=""asd">test</option>
How do I render this on the page so the postback message contains the correct value?
" is the correct way, the third of your tests:
<option value=""asd">test</option>
You can see this working below, or on jsFiddle.
alert($("option")[0].value);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<select>
<option value=""asd">Test</option>
</select>
Alternatively, you can delimit the attribute value with single quotes:
<option value='"asd'>test</option>
If you are using PHP, try calling htmlentities or htmlspecialchars function.
Per HTML syntax, and even HTML5, the following are all valid options:
<option value=""asd">test</option>
<option value=""asd">test</option>
<option value='"asd'>test</option>
<option value='"asd'>test</option>
<option value='"asd'>test</option>
<option value="asd>test</option>
<option value="asd>test</option>
Note that if you are using XML syntax the quotes (single or double) are required.
Here's a jsfiddle showing all of the above working.
Another option is replacing double quotes with single quotes if you don't mind whatever it is. But I don't mention this one:
<option value='"asd'>test</option>
I mention this one:
<option value="'asd">test</option>
In my case I used this solution.
If you are using JavaScript and Lodash, then you can use _.escape(), which escapes ", ', <, >, and &.
You really should only allow untrusted data into a whitelist of good attributes like: align, alink, alt, bgcolor, border, cellpadding, cellspacing, class, color, cols, colspan, coords, dir, face, height, hspace, ismap, lang, marginheight, marginwidth, multiple, nohref, noresize, noshade, nowrap, ref, rel, rev, rows, rowspan, scrolling, shape, span, summary, tabindex, title, usemap, valign, value, vlink, vspace, width
You really want to keep untrusted data out of javascript handlers as well as id or name attributes (they can clobber other elements in the DOM).
Also, if you are putting untrusted data into a SRC or HREF attribute, then its really a untrusted URL so you should validate the URL, make sure its NOT a javascript: URL, and then HTML entity encode.
More details on all of there here: https://www.owasp.org/index.php/Abridged_XSS_Prevention_Cheat_Sheet