Strange html syntax [duplicate] - html

This question already has answers here:
Is it guaranteed that non-numeric attribute values on every web-page HTML are always quoted?
(1 answer)
what are data-* HTML attributes?
(1 answer)
Closed 4 years ago.
In the code I am working on I found this:
<div class="icon icon2 screen-icon" data-screen-idx=1>
What puzzles me is the last "attribute" (or whatever it is )
Is this data-screen-idx-1 legal in html tag?
Please note that 1 is not quoted.
If yes, where can I find info about this.
If not, why would someone write such thing?

Yes, this is valid HTML. They are called "data-attributes" and can be whatever you want, as long as they begin with data-.
See this article for more information. MDN - Using data attributes

Related

Replace html tags using a regular expression - no script [duplicate]

This question already has answers here:
RegEx match open tags except XHTML self-contained tags
(35 answers)
Closed 1 year ago.
I've been searching for a solution for hours, but haven't found any examples that help.
I want to search a plain text file and remove all instances of <a id="pageXXX"></a> where XXX is the page number.
I have tried
(^<a id="page)(.*:?)("></a>)
(^<a id=\\"page)(.*:?)(\\"></a>)
(^<a id="page)([0-9]+)("></a>)
(^<a id=\\"page)([0-9]+)(\\"></a>)
What am I missing?
This works correctly.
(<a id=\"page)(.*:?)(\"></a>)

How to replace all html tags of one kind with another [duplicate]

This question already has answers here:
Find and replace HTML tags
(3 answers)
Closed 4 years ago.
I need to replace all HTML tags of one kind in a string with another, e.g., replace all <i> tags with <em> tag.
What's the best way to effectively change:
"<p><i>Random stuff here...</i></p>"
to the following?
"<p><em>Random stuff here...</em></p>"
There are millions of such strings, so a solution taking complexity into account would be nice.
You can make use gsub with block
string = "<p><i>Random stuff here...</i></p>"
string.gsub(/(<\/?)i(>)/) { "#{$1}em#{$2}" }
#=> "<p><em>Random stuff here...</em></p>"
Explanation:
Match an i html opening or closing tag and replace it with em

HTML5 elements id vs name attributes [duplicate]

This question already has answers here:
Difference between id and name attributes in HTML
(22 answers)
Closed 5 years ago.
Could anybody kindly explain me the difference or logic behind HTML5 id & name attributes of input & other elements.
Should I define both or one is enough?
Which one is required? id or name
Sometimes, both can be used for the same purpose. But,
Normally id attribute is used to call that element. and name attribute is used when you send a data to some other page from a form through post or get method, then we can access to the data of that element through that name.

How to extract numbers from a dataframe with html tags in R? [duplicate]

This question already has answers here:
Removing html tags from a string in R
(7 answers)
Closed 5 years ago.
I need to clean my dataframe to take of the HTML tags from columns 2-4.
Does anyone knows a simple way to do that?
df$col <- gsub("<[^>]+>", "", df$col)
Or
df$col <- gsub("<.*?>","",df$col)
This uses regex to strip all html tags which usually enclosed in <>.
Note: using regex to strip HTML is not advised at all times however in your case it seems like your data set will have numbers which is why regex would be the best and simple option for you to go about it.

Regex to delete <a name=">...</a> and leave inside text? [duplicate]

This question already has answers here:
RegEx match open tags except XHTML self-contained tags
(35 answers)
What to do Regular expression pattern doesn't match anywhere in string?
(8 answers)
Closed 8 years ago.
We have HTML code that looks like:
<h1><a name="_Toc22332223">Creating a record</a><h1>
<h1><a name="sectionB">Creating a record</a><h1>
Is there expression to use that we can find and delete the <a name=> and leave the text like this: <h1>Creating a record<h1>
We also do not way to remove other hyperlinks like <a href>
I tried <a name="[0-9]*">.+</a> to no avail.
Thanks!
As suggested by others DOM parsing is the most reliable way.
But if it has to be very simple you can use the the following regex
<[aA]\s+name\s*=[^>]*>(.*)[^<]<\/a>
Example on http://rubular.com/r/cI2CTwUCy3