Using xidel to extract a key-value pair - html

I have multiple tables on a website like so:
<table>
<tr>
<td>Name</td>
<td>foo</td>
</tr>
<tr>
<td>Count</td>
<td>15</td>
</tr>
<tr>
<td>Date</td>
<td>2014-11-17</td>
</tr>
</table>
<table>
<tr>
<td>Name</td>
<td>bar</td>
</tr>
<tr>
<td>Count</td>
<td>42</td>
</tr>
<tr>
<td>Date</td>
<td>2014-12-24</td>
</tr>
</table>
...
I want to receive something like this
foo 15
bar 42
My first attempt in xidel was xidel --xpath "//table/tr[1]/td[2]" --xpath "//table/tr[2]/td[2]" but this is giving
foo
bar
15
42
How can I extract two values in one line?

Using XPath or XQuery 3.0: //table/tbody/(tr[1]/td[2] || ' ' || tr[2]/td[2]). I think you need to request that version explicitly, at least I needed to do so on http://videlibri.sourceforge.net/cgi-bin/xidelcgi. And I parsed as HTML where the parser adds a tbody element and the path needs that too.

xidel-0.9.5.4998.exe -s --input-format=xml <input> ^
--xquery "//table/concat(tr[1]/td[2],' ',tr[2]/td[2])"
foo 15
bar 42

Related

Xpath - Selecting self or closest previous non-empty element

I have the following document:
<html>
<body>
<div>
<table>
<tr>
<td>390920000</td>
<td>A</td>
</tr>
<tr>
<td>390920000</td>
<td></td>
</tr>
<tr>
<td>3924100011</td>
<td>B</td>
</tr>
<tr>
<td>3924100011</td>
<td></td>
</tr>
<tr>
<td>3924100019</td>
<td></td>
</tr>
<tr>
<td>3924100019</td>
<td>C</td>
</tr>
</table>
</div>
</body>
</html>
What I would like is to use an xpath to select /html/body/div/table/tr/td[2], but for each empty element select the previous non-empty element instead. So instead of getting the values 'A','','B','','','C' I would like to get 'A','A','B','B','B','C'. Is this possible?
Btw, nevermind that this is an html and not an xml. I am using HtmlAgilityPack so I create ordinary xpath expressions to select html elements.
If XPath 3 is fine, the following should work:
//table/tr ! head((., reverse(preceding-sibling::*))[normalize-space(td[2]/text()) != ""])/td[2]

XPath - Duplicating results based on tag counts

I have a table that I would like to enter into a spreadsheet/database by duplicating the title based on the number of rows within a sub-table.
I would like to avoid post-processing if possible, so I'm looking for an XPath expression that does this.
For example:
<table>
<tr>
<th>Title One</th>
</tr>
<tr>
<td>
<table>
<tr>
<td>Row one</td>
</tr>
<tr>
<td>Row Two</td>
</tr>
<tr>
<td>Row Three</td>
</tr>
<tr>
<td>Row Four</td>
</tr>
</table>
</td>
</tr>
</table>
From above, is there an XPath expression that would return 'Title One' 4 times, based on the number of tr//td tags in the subtable? For example:
Title One
Title One
Title One
Title One
XPath 2.0 or 3.0 can do that in a single expression:
for $r in 1 to count(/table/tr[2]/td/table/tr/td) return /table/tr[1]/th/string()
It can be done easily programmaticaly; here I use bash, but the logic can be used in any language of your choice :
count=$(xmllint --xpath 'count(//td[starts-with(text(), "Row")])' table.html)
for ((i=0; i<count; i++)) {
xmllint --xpath '//table/tr/th/text()' table.html
echo
}
OUTPUT :
Title One
Title One
Title One
Title One

Angularjs, two data in a single row in table

I have a problem to implement table with thin width.
myData = { name:"Foo", age:11, sex:"M", weight:77, height:77, hobby:'gaming'}
I wanna table like belows.
<table>
<tr>
<td>name</td><td>Foo</td><td>age</td><td>11</td>
</tr>
<tr>
<td>sex</td><td>M</td><td>weight</td><td>77</td>
</tr>
<tr>
<td>height</td><td>77</td><td>hobby</td><td>gaming</td>
</tr>
</table>
Is it possible to show data like this using ngRepeat and its built-in variable?
The question John posted would solve your problem but I think it would be less of a hack to use ng-repeat-start and ng-repeat-end e.g.:
<table>
<tr ng-repeat-start="item in myData">
<td>name</td><td>{{item.name}}</td><td>age</td><td>{{item.age}}</td>
</tr>
<tr>
<td>sex</td><td>{{item.sex}}</td><td>weight</td><td>{{item.weight}}</td>
</tr>
<tr ng-repeat-end>
<td>height</td><td>{{item.height}}</td><td>hobby</td><td>{{item.hobby}}</td>
</tr>
</table>
If you have yr myData like this :
myData = [{ name:"Foo", age:11, sex:"M", weight:77, height:77, hobby:'gaming'},{ name:"Foo", age:11, sex:"M", weight:77, height:77, hobby:'gaming'},{ name:"Foo", age:11, sex:"M", weight:77, height:77, hobby:'gaming'}]
Then Your table will be like this :
<table>
<tr ng-repeat="row in myData">
<td>{{row.name}}</td>
<td>{{row.age}}</td>
<td>{{row.sex}}</td>
<td>{{row.weight}}</td>
<td>{{row.height}}</td>
<td>{{row.hobby}}</td>
</tr>
</table>

Selenium IDE: Why "Element X = Y not found" message, despite successful execution of test step?

Why does Selenium IDE give me the following error message, even when it successfully clicks on the UI button? I've tried all the available click, clickAndWait, Pause (pictured here), options I know of.
Exact Log:
[info] Executing: |click | class=button save | |
[error] Element class=button save not found
HTML:
</tr>
<tr>
<td>clickAndWait</td>
<td>id=login</td>
<td></td>
</tr>
<tr>
<td>click</td>
<td>link=Add</td>
<td></td>
</tr>
<tr>
<td>waitForElementPresent</td>
<td>class=icon-capability</td>
<td></td>
</tr>
<tr>
<td>click</td>
<td>link=Capability</td>
<td></td>
</tr>
<tr>
<td>waitForElementPresent</td>
<td>class=btn btn-primary</td>
<td></td>
</tr>
<tr>
<td>type</td>
<td>name=name</td>
<td>secondly</td>
</tr>
<tr>
<td>click</td>
<td>name=create</td>
<td></td>
</tr>
<tr>
<td>pause</td>
<td></td>
<td></td>
</tr>
<tr>
<td>click</td>
<td>class=button save</td>
<td></td>
</tr>
</tbody></table>
</body>
</html>
my guess is the selector. You are looking for *[class='button save']
If the element you are selecting is:
// doesn't match
<button id="something" class="save button"></button>
// matches
<button id="something_else" class="button save"></button>
My guess is that something dynamically is happening. Try matching on something more unique than a class. If it has an ID attribute, use that. If it doesn't have that and it has a name attribute, use that.
If it doesn't have anything to match on BUT the class, then try using CSS.
css=button.button.save

How do I awk a unix text file to a predefined html code?

I don't know HTML (HORRIBLY EMBARRASSED but didn't ever have the need to). I am pretty perspicacious when it comes to UNIX however I am horribly confused with this assignment I have. I know what I need to do but am having the hardest time ever getting started.
I have the following files in my hwk12 directory:
roster.html
roster.txt
sample.html
sample.txt
The following is the content of the roster.html file:
<html>
<body>
<table border=2>
<tr><th>Name</th><th>Username</th><th>Email</th></tr>
<tr>
<td>Nikhil Banerjee</td>
<td>nbanerje</td>
<td>zetapsi796#hotmail.com</td>
</tr>
<tr>
<td>Jeff Nazarian</td>
<td>jnazaria</td>
<td>jeff.nazarian#asu.edu</td>
</tr>
<tr>
<td>Anna Melzer</td>
<td>amelzer</td>
<td>anna.melzer#asu.edu</td>
</tr>
<tr>
<td>Jose Garcia</td>
<td>jgarcia</td>
<td>garcia-j#msn.com</td>
</tr>
<tr>
<td>Jillian Testa</td>
<td>jtesta</td>
<td>jillian.testa#asu.edu</td>
</tr>
<tr>
<td>Clayton Lengelzigich</td>
<td>clengelz</td>
<td><a href="mailto:clayton.lengel-zigich#asu.edu">clayton.lengel-
zigich#asu.edu</a></td>
</tr>
<tr>
<td>Ashley Bennett</td>
<td>abennett</td>
<td>ashley.bennett#asu.edu</td>
</tr>
<tr>
<td>Ann Frost</td>
<td>afrost</td>
<td>ann.frost#asu.edu</td>
</tr>
<tr>
<td>Timothy Whipple</td>
<td>twhipple</td>
<td>tweed#asu.edu</td>
</tr>
<tr>
<td>Wei Shen</td>
<td>wshen</td>
<td>shenwei58#hotmail.com</td>
</tr>
<tr>
<td>Cari Mahon</td>
<td>cmahon</td>
<td>cari.mahon#asu.edu</td>
</tr>
<tr>
<td>Alberto Salas</td>
<td>asalas</td>
<td>alberto2504#msn.com</td>
</tr>
<tr>
<td>Dorothy Haskett</td>
<td>dhaskett</td>
<td>dorothy.haskett#asu.edu</td>
</tr>
<tr>
<td>Criss Bradbury</td>
<td>cbradbur</td>
<td>crissbradbury#hotmaiil.com</td>
</tr>
<tr>
<td>Steve Ellermann</td>
<td>sellerma</td>
<td>cis494#ellermann.com</td>
</tr>
<tr>
<td>Zewdie Bekele</td>
<td>zbekele</td>
<td>zewdiea#aol.com</td>
</tr>
<tr>
<td>Frederic Diziere</td>
<td>fdiziere</td>
<td>fsd#asu.edu</td>
</tr>
<tr>
<td>Matt Bowes</td>
<td>mbowes</td>
<td>matt.bowes#asu.edu</td>
</tr>
<tr>
<td>Jasen Meece</td>
<td>jmeece</td>
<td>jasen.meece#sun.com</td>
</tr>
<tr>
<td>Aaron Carpenter</td>
<td>acarpent</td>
<td>aaron.carpenter#asu.edu</td>
</tr>
<tr>
<td>Binqin Xi</td>
<td>bxi</td>
<td>binqin.xi#asu.edu</td>
</tr>
<tr>
<td>Yinting Chan</td>
<td>ychan</td>
<td>yin.chen#asu.edu</td>
</tr>
<tr>
<td>Michael Evans</td>
<td>mevans</td>
<td>michael.evans#asu.edu</td>
</tr>
<tr>
<td>Herman Beringer</td>
<td>hberinge</td>
<td>jber#cox.net</td>
</tr>
<tr>
<td>Andrew Jolley</td>
<td>ajolley</td>
<td>andrew#andrewjolley.com</td>
</tr>
<tr>
<td>Michael Raby</td>
<td>mraby</td>
<td>mike1071#yahoo.com</td>
</tr>
<tr>
<td>Hajar Alaoui</td>
<td>halaoui</td>
<td>hajar6#hotmail.com</td>
</tr>
<tr>
<td>Anne Lemar</td>
<td>alemar</td>
<td>anne.lemar#asu.edu</td>
</tr>
<tr>
<td>Russell Crotts</td>
<td>rcrotts</td>
<td>Russell.Crotts#asu.edu</td>
</tr>
<tr>
<td>Dan Mazzola</td>
<td>dmazzola</td>
<td>dan.mazzola#sun.com</td>
</tr>
<tr>
<td>Bill Boyton</td>
<td>bboyton</td>
<td>boytonb#earthlink.net</td>
</tr>
</table>
</body>
</html>
The following is the content of the roster.txt file:
Whipple Timothy tweed#asu.edu Shen Wei shenwei58#hotmail.com
Mahon Cari cari.mahon#asu.edu Salas Alberto alberto2504#msn.com
Haskett Dorothy dorothy.haskett#asu.edu Bradbury Criss
crissbradbury#hotmaiil.com Ellermann Steve
cis494#ellermann.com Bekele Zewdie zewdiea#aol.com Diziere Frederic
fsd#asu.edu Bowes Matt matt.bowes#asu.edu Meece Jasen
jasen.meece#sun.com Carpenter Aaron aaron.carpenter#asu.edu
Xi Binqin binqin.xi#asu.edu Chan Yinting yin.chen#asu.edu
Evans Michael michael.evans#asu.edu Beringer Herman
jber#cox.net Jolley Andrew andrew#andrewjolley.com Raby Michael
mike1071#yahoo.com Alaoui Hajar hajar6#hotmail.com Lemar Anne
anne.lemar#asu.edu Crotts Russell Russell.Crotts#asu.edu Mazzola Dan
dan.mazzola#sun.com Boyton Bill boytonb#earthlink.net
The following is the content of the sample.html file:
<html>
<body>
<table border=2>
<tr><th>Name</th><th>Username</th><th>Email</th></tr>
<tr>
<td>Michael Raby</td>
<td>mraby</td>
<td>mike1071#yahoo.com</td>
</tr>
<tr>
<td>Hajar Alaoui</td>
<td>halaoui</td>
<td>hajar6#hotmail.com</td>
</tr>
<tr>
<td>Anne Lemar</td>
<td>alemar</td>
<td>anne.lemar#asu.edu</td>
</tr>
<tr>
<td>Russell Crotts</td>
<td>rcrotts</td>
<td>Russell.Crotts#asu.edu</td>
</tr>
<tr>
<td>Dan Mazzola</td>
<td>dmazzola</td>
<td>dan.mazzola#sun.com</td>
</tr>
<tr>
<td>Bill Boyton</td>
<td>bboyton</td>
<td>boytonb#earthlink.net</td>
</tr>
</table>
</body>
</html>
The following is the content of the sample.txt file:
Raby Michael mike1071#yahoo.com
Alaoui Hajar hajar6#hotmail.com
Lemar Anne anne.lemar#asu.edu
Crotts Russell Russell.Crotts#asu.edu
Mazzola Dan dan.mazzola#sun.com
Boyton Bill boytonb#earthlink.net
I'm not asking for someone to do this for me because I LOVE UNIX and I want to learn it myself. Everytime I look at this HTML code I am confusing the #$$#& out of myself. I need help getting started.
The homework prompt is the following:
You are to write a nawk(1) script called ~/hwk12/mk_html.awk that converts a text file (sample.txt and roster.txt) to an html page that a web browser can read. I have given you the output in the file sample.html which is reproduced below (notice how each level of indentation is two spaces deep):
Again, I don't want someone to do this for me. Im just confused as to how data in the text file will append to the HTML table without the actual HTML code. Can someone please help me get started?
Looks like you'll need to define the necessary HTML tags within your script. The meat of the html file will be these lines:
<tr>
<td>$first $last</td>
<td>$username</td>
<td>$email</td>
</tr>
These tags define a table row. You can parse the variables from the text files with awk and use them to fill in the html. The other html markup can be copy-pasted as static text into the output html file.
Edit: You can do this to grab the first and last name and print to the html file.
last = $1
first = $2
print " <tr>"
print " <td>" first " " last "</td>"
print " </tr>"
You just need to expand that to get the email and username.