I have one xml file which has some html content like bold, paragraph and tables. I have written shell script to parse all html tags except tables. I'm using XML (R package) to parse the data.
<Root>
<Title> This is dummy xml file </Title>
<Content> This table summarises data in BMC format.
<div class="abctable">
<table border="1" cellspacing="0" cellpadding="0" width="100%" class="coder">
<tbody>
<tr>
<th width="50%">ABC</th>
<th width="50%">Weight status</th>
</tr>
<tr>
<td>are 18.5</td>
<td>arew</td>
</tr>
<tr>
<td>18.5 — 24.9</td>
<td>rweq</td>
</tr>
<tr>
<td>25.0 — 29.9</td>
<td>qewrte</td>
</tr>
<tr>
<td>30.0 and hwerqer</td>
<td>rwqe</td>
</tr>
<tr>
<td>40.0 rweq rweq</td>
<td>rqwe reqw</td>
</tr>
</tbody>
</table>
</div>
</Content>
<Section>blah blah blah</Section>
</Root>
How to parse the content of this table which in present in xml?
Well there is a function called readHTMLTable in the XML package, that seems to do just what you need ?
Here is a way to do it with the following xml file :
<Root>
<Title> This is dummy xml file </Title>
<Content>
This table summarises data in BMC format.
<div class="abctable">
<table border="1" cellspacing="0" cellpadding="0" width="100%" class="coder">
<tbody>
<tr>
<th width="50%">ABC</th><th width="50%">Weight status</th>
</tr>
<tr>
<td>are 18.5</td>
<td>arew</td>
</tr>
<tr>
<td>18.5 — 24.9</td>
<td>rweq</td>
</tr>
<tr>
<td>25.0 — 29.9</td>
<td>qewrte</td>
</tr>
<tr>
<td>30.0 and hwerqer</td>
<td>rwqe</td>
</tr>
<tr>
<td>40.0 rweq rweq</td>
<td>rqwe reqw</td>
</tr>
</tbody>
</table>
</Content>
</div>
<Section>blah blah blah</Section>
</Root>
If this is saved in a file called /tmp/data.xml then you can use the following code :
doc <- htmlParse("/tmp/data.xml")
tableNodes <- getNodeSet(doc, "//table")
tb <- readHTMLTable(tableNodes[[1]])
Which fives :
R> tb
V1 V2
1 ABC Weight status
2 are 18.5 arew
3 18.5 — 24.9 rweq
4 25.0 — 29.9 qewrte
5 30.0 and hwerqer rwqe
6 40.0 rweq rweq rqwe reqw
The best method for xml parsing would be to use xpath expressions
Xpath Tutorial
Xpath and R
How to use XPath and R stackoverflow
Related
I'm trying to pass some html tags as the value of my xsl:param name="messageBody" to get executed before generating my template. But instead of executing the html code, they are displayed as text.
This is when I am generating a template with the WSO2 ESB. Once used xsl:value-of select="$messageBody" but displayed the html text instead of executing and the same for xsl:copy-of select="$messageBody"
Below is code fragment that which include the value of that xsl:param name="messageBody"
<table border="0" cellpadding="8px" cellspacing="10" style= "width:100%;border-bottom:1px solid rgb(255, 255, 255); color:#000000" align= "center">
<tbody>
<tr class="microservicerows">
<th>Target Account</th>
<th>Amount</th>
<th>Status</th>
<th>Remarks</th>
<th>Time</th>
</tr>
<xsl:copy-of select="$messageBody"/>
</tbody>
</table>
The expected output should be displaying new table rows under the header tags but instead the it's just plain text once my template is generated.
The value of the xsl:param name="messageBody" is below
<tr class="microservicerows">
<td>1000029061</td>
<td>55,000.00</td>
<td>Unsuccessful</td>
<td>My target</td>
<td>02/09/2019</td>
</tr>
<tr class="microservicerows">
<td>1000029078</td>
<td>900.00</td>
<td>Successful</td>
<td>Start later</td>
<td>03/09/2019</td>
</tr>]
I have made a custom matrix in power bi and have an extra comment column in the last of it! The matrix will send the JSON data to server through api whenever I press enter after finish my comment [purpose is for saving all data into Database] but I have no idea to make it! Can anyone help me please?!
Below is HTML structure's Custom Matrix:
<div class="datagrid">
<table>
<tbody>
<tr>
<th></th>
<th colspan="undefined">Northeast</th>
<th colspan="undefined">Southern</th>
</tr>
<tr>
<th>California</th>
<td>5</td>
<td>6</td>
<td>
<div contenteditable="">I'm editable</div>
</td>
</tr>
<tr>
<th>Florida</th>
<td>10</td>
<td>15</td>
<td>
<div contenteditable="">I'm editable</div>
</td>
</tr>
</tbody>
</table>
</div>
I have converted successfully with help of #Eric Jones in this post: How to convert Matrix HTML to JSON in Typescript (for sending through api)
The structure of my JSON is:
[
{
"name_column_field":"Arizona",
"value":"3759959.75",
"comment":"I'm editable",
"name_row_field":"Southern"
},...
]
I want to create a table that its data is a Map< String, List < Object> >.
So the table has one header that and the rows should have the exact data.
Map.key
Object.item1
Object.item2
Object.item3
So since it is a List of Object i want one row for every Object of the List and the Map.key to be repeated.
So i need to iterate through keys like
<table>
<thead>
<tr>
<th>Code</th>
<th>Status</th>
<th>Flag</th>
<th>Message</th>
</tr>
</thead>
<tbody>
<tr th:each= "result : ${myMap}">
<td th:text="${result.key}"></td>
<td><table>
<tr th:each="obj: ${result.value}">
<td th:text="${not #lists.isEmpty(obj.errorList)}?'Error':'Warning'"></td>
<td th:text="${obj.flag}==true?'YES':'NO'"></td>
<td th:text="${not #lists.isEmpty(obj.errorList)}?${obj.warningList}:${obj.errorList}"></td>
</tr>
</table></td>
</tr>
</tbody>
</table>
but this solution places a table in a table. I want to use one header and iterate the lists and place the variables in the main table .
I think you're looking for a structure like this:
<table>
<thead>
<tr>
<th>Code</th>
<th>Status</th>
<th>Flag</th>
<th>Message</th>
</tr>
</thead>
<tbody>
<th:block th:each= "result : ${myMap}">
<tr th:each="obj: ${result.value}">
<td th:text="${result.key}" />
<td th:text="${not #lists.isEmpty(obj.errorList)}?'Error':'Warning'" />
<td th:text="${obj.flag}==true?'YES':'NO'" />
<td th:text="${not #lists.isEmpty(obj.errorList)}?${obj.warningList}:${obj.errorList}" />
</tr>
</th:block>
</tbody>
</table>
I have used the nokogiri ruby gem to webscrape an html file for only the text under the tableData class. The html code is setup like so:
<div class="table-wrap">
<table class="table">
<tbody>
<tr>
<td class="tableData"> Jane Doe</td>
<td class="tableData"> 01/01/2017</td>
<td class="tableData">01/09/2017 </td>
<td class="tableData">Vacation</td>
</tr>
<tr>
<td class="tableData">John Doe</td>
<td class="tableData"> 01/01/2017</td>
<td class="tableData">01/09/2017 </td>
<td class="tableData">Vacation</td>
</tr>
</tbody>
</table>
</div>
and the code I used to webscrape looks like this:
vt = page.css("td[class='tableData']").text
puts vt
Which gives this output:
Jane Doe 01/01/201701/09/2017 VacationJohn Doe 01/01/201701/09/2017 Vacation
I want to populate an array within an array with only the 4 text values pertaining to each person. Which should look like this:
[[Jane Doe, 01/01/2017, 01/09/2017, Vacation], [John Doe, 01/01/2017, 01/09/2017, Vacation]]
I am new to coding and I'm not sure how to create a for loop to iterate over either the html code itself or the vt variable to produce an array of arrays. I know there are some push statements involved following the for loop but its the actual structure of the for loop that I am having trouble putting together. If you could provide some explanation in your answer for how the for loop works in this situation it would be much appreciated.
This is the basic structure you need. map is needed :
html=%q(<div class="table-wrap">
<table class="table">
<tbody>
<tr>
<td class="tableData"> Jane Doe</td>
<td class="tableData"> 01/01/2017</td>
<td class="tableData">01/09/2017 </td>
<td class="tableData">Vacation</td>
</tr>
<tr>
<td class="tableData">John Doe</td>
<td class="tableData"> 01/01/2017</td>
<td class="tableData">01/09/2017 </td>
<td class="tableData">Vacation</td>
</tr>
</tbody>
</table>
</div>)
require 'nokogiri'
doc = Nokogiri::XML(html)
array = doc.xpath('//tr').map do |tr|
tr.xpath('td').map{ |td| td.text }
end
p array
# [[" Jane Doe", " 01/01/2017", "01/09/2017 ", "Vacation"], ["John Doe", " 01/01/2017", "01/09/2017 ", "Vacation"]]
Try parsing the snippet as XML, finding all "tr" elements via XPath, and collecting their "td//text()" children:
require 'nokogiri'
doc = Nokogiri::XML(get_html_snippet)
data = doc.xpath('//tr').map do |tr|
tr.xpath('td').map { |td| td.text.strip }
end
data # => [["Jane Doe", "01/01/2017", "01/09/2017", "Vacation"], ["John Doe", "01/01/2017", "01/09/2017", "Vacation"]]
I don't know HTML (HORRIBLY EMBARRASSED but didn't ever have the need to). I am pretty perspicacious when it comes to UNIX however I am horribly confused with this assignment I have. I know what I need to do but am having the hardest time ever getting started.
I have the following files in my hwk12 directory:
roster.html
roster.txt
sample.html
sample.txt
The following is the content of the roster.html file:
<html>
<body>
<table border=2>
<tr><th>Name</th><th>Username</th><th>Email</th></tr>
<tr>
<td>Nikhil Banerjee</td>
<td>nbanerje</td>
<td>zetapsi796#hotmail.com</td>
</tr>
<tr>
<td>Jeff Nazarian</td>
<td>jnazaria</td>
<td>jeff.nazarian#asu.edu</td>
</tr>
<tr>
<td>Anna Melzer</td>
<td>amelzer</td>
<td>anna.melzer#asu.edu</td>
</tr>
<tr>
<td>Jose Garcia</td>
<td>jgarcia</td>
<td>garcia-j#msn.com</td>
</tr>
<tr>
<td>Jillian Testa</td>
<td>jtesta</td>
<td>jillian.testa#asu.edu</td>
</tr>
<tr>
<td>Clayton Lengelzigich</td>
<td>clengelz</td>
<td><a href="mailto:clayton.lengel-zigich#asu.edu">clayton.lengel-
zigich#asu.edu</a></td>
</tr>
<tr>
<td>Ashley Bennett</td>
<td>abennett</td>
<td>ashley.bennett#asu.edu</td>
</tr>
<tr>
<td>Ann Frost</td>
<td>afrost</td>
<td>ann.frost#asu.edu</td>
</tr>
<tr>
<td>Timothy Whipple</td>
<td>twhipple</td>
<td>tweed#asu.edu</td>
</tr>
<tr>
<td>Wei Shen</td>
<td>wshen</td>
<td>shenwei58#hotmail.com</td>
</tr>
<tr>
<td>Cari Mahon</td>
<td>cmahon</td>
<td>cari.mahon#asu.edu</td>
</tr>
<tr>
<td>Alberto Salas</td>
<td>asalas</td>
<td>alberto2504#msn.com</td>
</tr>
<tr>
<td>Dorothy Haskett</td>
<td>dhaskett</td>
<td>dorothy.haskett#asu.edu</td>
</tr>
<tr>
<td>Criss Bradbury</td>
<td>cbradbur</td>
<td>crissbradbury#hotmaiil.com</td>
</tr>
<tr>
<td>Steve Ellermann</td>
<td>sellerma</td>
<td>cis494#ellermann.com</td>
</tr>
<tr>
<td>Zewdie Bekele</td>
<td>zbekele</td>
<td>zewdiea#aol.com</td>
</tr>
<tr>
<td>Frederic Diziere</td>
<td>fdiziere</td>
<td>fsd#asu.edu</td>
</tr>
<tr>
<td>Matt Bowes</td>
<td>mbowes</td>
<td>matt.bowes#asu.edu</td>
</tr>
<tr>
<td>Jasen Meece</td>
<td>jmeece</td>
<td>jasen.meece#sun.com</td>
</tr>
<tr>
<td>Aaron Carpenter</td>
<td>acarpent</td>
<td>aaron.carpenter#asu.edu</td>
</tr>
<tr>
<td>Binqin Xi</td>
<td>bxi</td>
<td>binqin.xi#asu.edu</td>
</tr>
<tr>
<td>Yinting Chan</td>
<td>ychan</td>
<td>yin.chen#asu.edu</td>
</tr>
<tr>
<td>Michael Evans</td>
<td>mevans</td>
<td>michael.evans#asu.edu</td>
</tr>
<tr>
<td>Herman Beringer</td>
<td>hberinge</td>
<td>jber#cox.net</td>
</tr>
<tr>
<td>Andrew Jolley</td>
<td>ajolley</td>
<td>andrew#andrewjolley.com</td>
</tr>
<tr>
<td>Michael Raby</td>
<td>mraby</td>
<td>mike1071#yahoo.com</td>
</tr>
<tr>
<td>Hajar Alaoui</td>
<td>halaoui</td>
<td>hajar6#hotmail.com</td>
</tr>
<tr>
<td>Anne Lemar</td>
<td>alemar</td>
<td>anne.lemar#asu.edu</td>
</tr>
<tr>
<td>Russell Crotts</td>
<td>rcrotts</td>
<td>Russell.Crotts#asu.edu</td>
</tr>
<tr>
<td>Dan Mazzola</td>
<td>dmazzola</td>
<td>dan.mazzola#sun.com</td>
</tr>
<tr>
<td>Bill Boyton</td>
<td>bboyton</td>
<td>boytonb#earthlink.net</td>
</tr>
</table>
</body>
</html>
The following is the content of the roster.txt file:
Whipple Timothy tweed#asu.edu Shen Wei shenwei58#hotmail.com
Mahon Cari cari.mahon#asu.edu Salas Alberto alberto2504#msn.com
Haskett Dorothy dorothy.haskett#asu.edu Bradbury Criss
crissbradbury#hotmaiil.com Ellermann Steve
cis494#ellermann.com Bekele Zewdie zewdiea#aol.com Diziere Frederic
fsd#asu.edu Bowes Matt matt.bowes#asu.edu Meece Jasen
jasen.meece#sun.com Carpenter Aaron aaron.carpenter#asu.edu
Xi Binqin binqin.xi#asu.edu Chan Yinting yin.chen#asu.edu
Evans Michael michael.evans#asu.edu Beringer Herman
jber#cox.net Jolley Andrew andrew#andrewjolley.com Raby Michael
mike1071#yahoo.com Alaoui Hajar hajar6#hotmail.com Lemar Anne
anne.lemar#asu.edu Crotts Russell Russell.Crotts#asu.edu Mazzola Dan
dan.mazzola#sun.com Boyton Bill boytonb#earthlink.net
The following is the content of the sample.html file:
<html>
<body>
<table border=2>
<tr><th>Name</th><th>Username</th><th>Email</th></tr>
<tr>
<td>Michael Raby</td>
<td>mraby</td>
<td>mike1071#yahoo.com</td>
</tr>
<tr>
<td>Hajar Alaoui</td>
<td>halaoui</td>
<td>hajar6#hotmail.com</td>
</tr>
<tr>
<td>Anne Lemar</td>
<td>alemar</td>
<td>anne.lemar#asu.edu</td>
</tr>
<tr>
<td>Russell Crotts</td>
<td>rcrotts</td>
<td>Russell.Crotts#asu.edu</td>
</tr>
<tr>
<td>Dan Mazzola</td>
<td>dmazzola</td>
<td>dan.mazzola#sun.com</td>
</tr>
<tr>
<td>Bill Boyton</td>
<td>bboyton</td>
<td>boytonb#earthlink.net</td>
</tr>
</table>
</body>
</html>
The following is the content of the sample.txt file:
Raby Michael mike1071#yahoo.com
Alaoui Hajar hajar6#hotmail.com
Lemar Anne anne.lemar#asu.edu
Crotts Russell Russell.Crotts#asu.edu
Mazzola Dan dan.mazzola#sun.com
Boyton Bill boytonb#earthlink.net
I'm not asking for someone to do this for me because I LOVE UNIX and I want to learn it myself. Everytime I look at this HTML code I am confusing the #$$#& out of myself. I need help getting started.
The homework prompt is the following:
You are to write a nawk(1) script called ~/hwk12/mk_html.awk that converts a text file (sample.txt and roster.txt) to an html page that a web browser can read. I have given you the output in the file sample.html which is reproduced below (notice how each level of indentation is two spaces deep):
Again, I don't want someone to do this for me. Im just confused as to how data in the text file will append to the HTML table without the actual HTML code. Can someone please help me get started?
Looks like you'll need to define the necessary HTML tags within your script. The meat of the html file will be these lines:
<tr>
<td>$first $last</td>
<td>$username</td>
<td>$email</td>
</tr>
These tags define a table row. You can parse the variables from the text files with awk and use them to fill in the html. The other html markup can be copy-pasted as static text into the output html file.
Edit: You can do this to grab the first and last name and print to the html file.
last = $1
first = $2
print " <tr>"
print " <td>" first " " last "</td>"
print " </tr>"
You just need to expand that to get the email and username.