Regex - Match multiline blocks of HTML code - html

I have a problem with my regular expression. I need to match blocks of HTML.
Example-Block here:
<tr class="tr-list " data-id="XX">
<td class="ip-img"><div class="gun-icon"></div><img src="https://example.com/images/stories/HCP/HCP_5.jpg"/></td>
<td class="ip-name ip-sort">Hotel Complex Project</td>
<td class="ip-price ip-sort">297.00</td>
<td class="ip-earnings ip-sort">43</td>
<td class="ip-shares ip-sort">86</td>
<td class="ip-status {'sorter':'currency'}"><img
src="/img/assets/arrow1.png" title="0.989990234375"/></td>
<td class="ip-blank-right"></td>
</tr>
Everyone of these blocks of HTML should match separately which I then want to extract the other data from (eg. ip-name, ip-price, ip-earnings..).
But my current regex matches everything until the "(?=)"-part is not true anymore:
http://regexhero.net/tester/?id=2b491d15-ee83-4dc7-8fe9-62e624945dcf
What do I need to change to have every block as a match?
Greetings! :)
PS.: Hope it is understandable what I mean...

This should get all the tr rows:
<tr class="tr-list[\s\S]+?</tr>
This should get all the tr rows with matching groups for the columns:
<tr class="tr-list[^<]*?<td class="ip-img">(.*?)</td>\s*<td class="ip-name.*?">(.*?)</td>\s*<td class="ip-price.*?">(.*?)</td>\s*<td class="ip-earnings.*?">(.*?)</td>\s*<td class="ip-shares.*?">(.*?)</td>\s*<td class="ip-status.*?">([\s\S]*?)</td>[\s\S]+?</tr>

nested html will require nested array from regular expression's match
it can be done using jquery or manually generate a tree using regular expression

This Regular Expression will capture a whole html block that is not self-enclosed:
var hmtlText="<div bar='baz'>foo</foo>";
var pattern = /<([\w]+)( (( +)?[\w]+=['"](\w+)?['"])?)+( )?(\/)?>((([\t\n\r\s]+)?)+(((.)+)?)+((\10)?)+)+?<\/(\1)>/igm;
console.log((pattern.test(htmlText) ? 'valid' : 'invalid') + ' html block');

Related

Output is not displaying in HTML format after transforming the xml result using xslt when the attribute name contains special character

My xml has just 1 value as name =RDXXX-LOWER_DECK, value=10 mm. When this is transformed using xslt I get output correctly as below:
<table>
<tr valign="top">
<td width="200">RDXXX-LOWER_DECK</td>
<td width="200">10.000000000000 mm</td>
</tr>
</table>
But when I replace RDXXX-LOWER_DECK as RDXXX||LOWER_DECK (hyphen is replaced with double pipe) I don't get the output. Empty value is printed and name is printed as "Attribute" .
<table>
<tr valign="top">
<td width="200">Attribute</td>
<td width="200"></td>
</tr>
</table>
KIndly let me know how to retain || in the output.
My xml has just 1 value as name =RDXXX-LOWER_DECK, value=10 mm.
But when I replace RDXXX-LOWER_DECK as RDXXX||LOWER_DECK (hyphen is replaced with double pipe)...
If by that you mean that you have an XML like this:
<RDXXX-LOWER_DECK>10mm</RDXXX-LOWER_DECK>
and you changed it to look like this:
<RDXXX||LOWER_DECK>10mm</RDXXX||LOWER_DECK>
then you no longer have a well-formed XML document. The | character is not allowed in an element name.
... I don't get the output. Empty value is printed and name is printed as "Attribute" .
That is strange, because you should have been getting an error.

Generate HTML tag with more than one attribute in SQL request

Could you please help me to understand how can I generate XML/HTML with more than one attribute
I have this SQL code
select
[td/#align] = 'center', td = format(GETDATE(),'dd.MM.yyyy'), null
for xml path('tr')
This code returns as its result:
<tr>
<td align="center">16.09.2020</td>
</tr>
and I need
<tr>
<td align="center" style="background-color: red;">16.09.2020</td>
</tr>
Can't find out how to do this...
If I try to use something like this [td/#align/#style] - SQL is causing an error
Column name 'td/#align/#style' contains an invalid XML identifier as required by FOR XML; '#'(0x0040) is the first character at fault
Are you looking for this:
select 'center' AS [td/#align]
,'background-color: red;' AS [td/#style]
,format(GETDATE(),'dd.MM.yyyy') AS [td]
for xml path('tr')
it yields this:
<tr>
<td align="center" style="background-color: red;">16.09.2020</td>
</tr>
You can think of one row columns as xml tag value and attributes, which are grouped using the alias AS. So, for more attributes, just add new value with the corresponding alias - td/#....

Thymeleaf, get first element list of an object that is already iterating

<tr th:each="current : ${object.list}" >
<td th:text="${current.currentList...???}"></td>
...
I have an object that has a list.
"current" has also a list inside it called currentList.
currentList has only one element called objectTwo. I want to access to the attributes of objectTwo.
It's possible?
Hope this will work for you.
<tr th:each="current : ${object.list}" >
<td th:text="${current.currentList[0].objectTwo...}"></td>
...

Order by in Thymeleaf

I need show this foreach by order Asc, how can I do it?
<tr th:each="ment : ${mentor}" th:if="${ment.jobId == job.id}">
<td th:text="${ment.id}"></td>
<td th:text="${ment.name}"></td>
<td th:text="${ment.qtyMentee}"></td>
<td th:text="${ment.jobId}"></td>
</tr>
This could be achieved with sort utility methods for lists, described here.
/*
* Sort a copy of the given list. The members of the list must implement
* comparable or you must define a comparator.
*/
${#lists.sort(list)}
${#lists.sort(list, comparator)}
Example
<tr th:each="ment : ${#lists.sort(mentor)}">
I had StackOverflow also, in my case I forgot to implement Comparable:
Comparable<E>
The error was missleading, thymeleaf could recognize that compareable needs to be implemented.

How to get line from table with Jsoup

I have table without any class or id (there are more tables on the page) with this structure:
<table cellpadding="2" cellspacing="2" width="100%">
...
<tr>
<td class="cell_c">...</td>
<td class="cell_c">...</td>
<td class="cell_c">...</td>
<td class="cell">SOME_ID</td>
<td class="cell_c">...</td>
</tr>
...
</table>
I want to get only one row, which contains <td class="cell">SOME_ID</td> and SOME_ID is an argument.
UPD.
Currently i am doing iy in this way:
doc = Jsoup.connect("http://www.bank.gov.ua/control/uk/curmetal/detail/currency?period=daily").get();
Elements rows = doc.select("table tr");
Pattern p = Pattern.compile("^.*(USD|EUR|RUB).*$");
for (Element trow : rows) {
Matcher m = p.matcher(trow.text());
if(m.find()){
System.out.println(m.group());
}
}
But why i need Jsoup if most of work is done by regexp ? To download HTML ?
If you have a generic HTML structure that always is the same, and you want a specific element which has no unique ID or identifier attribute that you can use, you can use the css selector syntax in Jsoup to specify where in the DOM-tree the element you are after is located.
Consider this HTML source:
<html>
<head></head>
<body>
<table cellpadding="2" cellspacing="2" width="100%">
<tbody>
<tr>
<td class="cell">I don't want this one...</td>
<td class="cell">Neither do I want this one...</td>
<td class="cell">Still not the right one..</td>
<td class="cell">BINGO!</td>
<td class="cell">Nothing further...</td>
</tr> ...
</tbody>
</table>
</body>
</html>
We want to select and parse the text from the fourth <td> element.
We specify that we want to select the <td> element that has the index 3 in the DOM-tree, by using td:eq(3). In the same way, we can select all <td> elements before index 3 by using td:lt(3). As you've probably figured out, this is equal and less than.
Without using first() you will get an Elements object, but we only want the first one so we specify that. We could use get(0) instead too.
So, the following code
Element e = doc.select("td:eq(3)").first();
System.out.println("Did I find it? " + e.text());
will output
Did I find it? BINGO!
Some good reading in the Jsoup cookbook!