I'm reading a local HTML document with Nokogiri like so:
f = File.open(local_xml)
#doc = Nokogiri::XML(f)
f.close
#doc contains a Nokogiri XML object that I can parse using at_css.
I want to modify it using Nokogiri's XML::Node, and I'm absolutely stuck. How do I take this Nokogiri XML document and work with it using node methods?
For example:
#doc.at_css('rates tr').add_next_sibling(element)
returns:
undefined method `add_next_sibling' for nil:NilClass (NoMethodError)
despite the fact that #doc.class is Nokogiri::XML::Document.
For completeness, here is the markup I'm trying to edit.
<html>
<head>
<title>Exchange Rates</title>
<link rel="stylesheet" href="style.css">
</head>
<body>
<table class="rates">
<tr>
<td class="up"><div></div></td>
<td class="date">Saturday, Jan 12</td>
<td class="rate up">3.83</td>
</tr>
<tr>
<td class="up"><div></div></td>
<td class="date">Friday, Jan 11</td>
<td class="rate up">3.70</td>
</tr>
<tr>
<td class="down"><div></div></td>
<td class="date">Thursday, Jan 10</td>
<td class="rate down">3.68</td>
</tr>
<tr>
<td class="down"><div></div></td>
<td class="date">Wedensday, Jan 9</td>
<td class="rate down">3.70</td>
</tr>
<tr>
<td class="up"><div></div></td>
<td class="date">Tuesday, Jan 8</td>
<td class="rate up">3.66</td>
</tr>
</table>
</body>
</html>
This is an example how to do what you are trying to do. Starting with f containing a shortened version of the HTML you want to parse:
require 'nokogiri'
f = '
<html>
<head>
<title>Exchange Rates</title>
<link rel="stylesheet" href="style.css">
</head>
<body>
<table class="rates">
<tr>
<td class="up"><div></div></td>
<td class="date">Saturday, Jan 12</td>
<td class="rate up">3.83</td>
</tr>
</table>
</body>
</html>
'
doc = Nokogiri::HTML(f)
doc.at('.rates tr').add_next_sibling('<p>foobar</p>')
puts doc.to_html
Your code is incorrectly trying to find the class="rates" parameter for <table>. In CSS we'd use .rates. An alternate way to do it using CSS is table[class="rates"].
Your example didn't define the node you were trying to add to the HTML, so I appended <p>foobar</p>. Nokogiri will let you build a node from scratch and append it, or use markup and add that, or you could find a node from one place in the HTML, remove it, and then insert it somewhere else.
That code outputs:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Exchange Rates</title>
<link rel="stylesheet" href="style.css">
</head>
<body>
<table class="rates">
<tr>
<td class="up"><div></div></td>
<td class="date">Saturday, Jan 12</td>
<td class="rate up">3.83</td>
</tr>
<p>foobar</p>
</table>
</body>
</html>
It's not necessary to use at_css or at_xpath instead of at. Nokogiri senses what type of accessor you're using and handles it. The same applies using xpath or css instead of search. Also, at is equivalent to search('some accessor').first, so it finds the first occurrence of the matching node.
Try to load as HTML instead of XML Nokogiri::HTML(f)
Not getting in much detail on how Nokogiri works, lets say that XML does not have css right? So the method at_css doesn't make sense (maybe it does I dunno). So it should work loading as Html.
Update
Just noticed one thing. You want to do at_css('.rates tr') insteand of at_css('rates tr') because that's how you select a class in css. Maybe it works with XML now.
Related
I have two cases where the HTML script displaying differently with and without using <thead></thead> tag.
SCENARIO 1 : (With <thead></thead> tag)
<!DOCTYPE html>
<meta charset = "UTF-8"/>
<head>
<title>HTML PAGE</title>
</head>
<link rel = "stylesheet" href = "TEST_CSS.CSS"> <!--Linking our .CSS sheet-->
<body>
<table border="1">
<caption>EXAMPLE TABLE</caption>
<thead>
<tr>
<th colspan = "2">NAME</th>
<th>AGE</th>
<th>SEX</th>
</tr>
</thead>
<tr>
<td>REBEL RIDER</td>
<td>RR</td>
<td>5</td>
<td>MALE</td>
</tr>
<tr>
<td>IGR</td>
<td>4:20R</td>
<td>11</td>
<td>MALE</td>
</tr>
<tr>
<td colspan = "4">END OF THE TABLE</td>
<tr/>
</table>
</body>
SCENARIO 2: (Without <thead></thead>)
<!DOCTYPE html>
<meta charset = "UTF-8"/>
<head>
<title>HTML PAGE</title>
</head>
<link rel = "stylesheet" href = "TEST_CSS.CSS"> <!--Linking our .CSS sheet-->
<body>
<table border="1">
<caption>EXAMPLE TABLE</caption>
<tr>
<th colspan = "2">NAME</th>
<th>AGE</th>
<th>SEX</th>
</tr>
<tr>
<td>REBEL RIDER</td>
<td>RR</td>
<td>5</td>
<td>MALE</td>
</tr>
<tr>
<td>IGR</td>
<td>4:20R</td>
<td>11</td>
<td>MALE</td>
</tr>
<tr>
<td colspan = "4">END OF THE TABLE</td>
<tr/>
</table>
</body>
I am using the common CSS for above 2 scenarios to style the every 2nd row as below.
tr:nth-child(2n)
{
background-color: #ccc;
}
I am getting a different output for the above 2 scenarios.
OUTPUT(Scenario 1)
OUTPUT(Scenario 2)
My question is what role does <thead></thead> play in displaying the output differently. Thanks in advance.
when you have a <thead> you have a single odd row inside it, and other 3 siblings rows as children of the table - actually of the tbody - (odd, even, odd)
Without thead all rows are siblings so you have odd, even, odd, even.
In other words: a row is odd or even relatively to its immediate parent (or its siblings).
From MDN:
The :nth-child() CSS pseudo-class matches elements based on their position in a group of siblings.
tr:nth-child(2n) indicates that you want the 2nd, 4th, ... TRs with a common parent to have a gray background.
In the first (with thead) table, you have split the TRs into two distinct sibling groups, the first with one child, and the second with three. The thead only has one TR child, so its TR isn't gray. The other 3 TRs have the table as their parent, and so the 2nd one is gray.
In the second table (without thead), the table has four children, and the 2nd and 4th TRs have gray backgrounds.
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Achievements</title>
</head>
<body>
<h2>수료증 및 수상 내역</h2>
<hr>
<ul>
<h3><li>수료증</li></h3>
<table border="1">
<tr>
<th>과목명</th>
<th>교수명</th>
<th>대학명</th>
</tr>
<tr>
<td>Programming for Everybody(Getting Started with Python)</td>
<td>Charles Severance</td>
<td>University of Michigan</td>
</tr>
</table>
<h3><li>수상내역</li></h3>
<table border="1">
<tr>
<th>대회명</th>
<th>수상일</th>
</tr>
<tr>
<td>The 5th MIRROR Essay Contest</td>
<td>11/23/15</td>
</tr>
</table>
</ul>
<hr>
메인으로
</body>
</html>
I'm using Eclipse Jee Mars btw, and I get warnings saying "Multiple annotations found at this line: Invalid location of tag (h3), Invalid location of tag (h3)" and "Invalid location of (table)". When I open the file through Internet explorer it works fine. What's the problem and how should I fix it?
I'm new to thymeleaf and am trying to make a simple table using an array and an each loop.
My code looks like this:
<!DOCTYPE HTML>
<html xmlns:th="http://www.thymeleaf.org">
<head>
<title>Smoke Tests</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
</head>
<body>
<table border="1" style="width:300px">
<tr>
<td>Test Name</td>
</tr>
<tr th:each="smokeTest : ${smokeTests}">
<td>
th:text="${smokeTest.name}">A Smoke Test'
</td>
</tr>
</table>
</body>
</html>
Basically my problem is that I can't run the loop as <td>s within <tr>s. Is there any way that this code could work?
You must put th:text as an attribute of a tag, so
<tr th:each="smokeTest : ${smokeTests}">
<td th:text="${smokeTest.name}">A Smoke Test'</td>
</tr>
should run.
Simple solution which comes to mind first:
<th:block th:each="smokeTest : ${smokeTests}">
<tr>
<td th:text="${smokeTest.name}">A Smoke Test'</td>
</tr>
</th:block>
Details: http://www.thymeleaf.org/whatsnew21.html#bloc
Although, it's late answer.
It's work more specifically, like
<tr th:each="smokeTest : ${smokeTests}">
<td><p th:text="${smokeTest.name}"></p></td>
</tr>
I'm stumped and can't find the bug.
Angular is working, because it's reading my {{expressions}}, however, it's not replacing them with the content I'm expecting. It's simply removing them and blanks sit in their place.
I'm sure this issue is also tied in with, for some reason, my ng-repeat directive isn't working. (It's not repeating.)
Can someone help me out? I'm trying to draw a table. In this example, when it's done, it should have the respective "idea" posted multiple times across the same row, and each row should have a different "idea", as listed in the $scope.
It's creating one single row filled with blanks (rather than {{idea}} ).
index.html
<!doctype html>
<html ng-app="AppName">
<head>
<link rel="stylesheet" type="text/css" href="styles/table.css">
<script src="https://ajax.googleapis.com/ajax/libs/angularjs/1.2.15/angular.min.js"></script>
</head>
<body>
<div ng-controller="TableController">
<table>
<tr>
<td>
<table cellspacing="0">
<tr class="title_bar">
<td>Title</td>
<td>Rating</td>
<td>Votes</td>
<td>Comments</td>
<td>Post Date</td>
<td>Status</td>
</tr>
</table>
</td>
</tr>
<div ng-repeat="idea in ideas">
<tr style="color: white">
<td>{{idea}}</td>
<td>{{idea}}%</td>
<td>{{idea}}</td>
<td>{{idea}}</td>
<td>{{idea}}</td>
<td>{{idea}}</td>
</tr>
</div>
</table>
</div>
<script src="scripts/controllers/TableController.js"></script>
</body>
</html>
TableController.js
var app = angular.module('AppName', []);
app.controller('TableController', ['$scope',function($scope){
$scope.ideas = [
'wow',
'cool',
'so nice',
'amazing',
'please work'
];
}]);
I'm probably missing something obvious but I appreciate any help you could give me.
Edit: Whoops, guess I need to brush up on my HTML basics.
Well, your HTML layout looks pretty strange. You shouldn't put <div> between <tr>
Try something like this:
<table>
<tr class="title_bar">
<td>Title</td>
<td>Rating</td>
<td>Votes</td>
<td>Comments</td>
<td>Post Date</td>
<td>Status</td>
</tr>
<tr ng-repeat="idea in ideas">
<td>{{idea}}</td>
<td>{{idea}}%</td>
<td>{{idea}}</td>
<td>{{idea}}</td>
<td>{{idea}}</td>
<td>{{idea}}</td>
</tr>
</table>
Fiddle demo
What you have is invalid HTML, so it's not rendering how you expect it to. You can't put a <div> inside a <table> and have it contain elements; you can include a <div> inside a <td> element but that doesn't really help you.
If you want to use ng-repeat in a table use it in <tr> or <tbody>
<tr ng-repeat="idea in ideas">
<td>{{idea}}</td>
........
</tr>
Try adding the ng-repeat directive to the td itself.
<tr>
<td ng-repeat="idea in ideas">{{idea}}</td>
</tr>
That should iterate through your $scope.ideas array.
I have one xml file which has some html content like bold, paragraph and tables. I have written shell script to parse all html tags except tables. I'm using XML (R package) to parse the data.
<Root>
<Title> This is dummy xml file </Title>
<Content> This table summarises data in BMC format.
<div class="abctable">
<table border="1" cellspacing="0" cellpadding="0" width="100%" class="coder">
<tbody>
<tr>
<th width="50%">ABC</th>
<th width="50%">Weight status</th>
</tr>
<tr>
<td>are 18.5</td>
<td>arew</td>
</tr>
<tr>
<td>18.5 — 24.9</td>
<td>rweq</td>
</tr>
<tr>
<td>25.0 — 29.9</td>
<td>qewrte</td>
</tr>
<tr>
<td>30.0 and hwerqer</td>
<td>rwqe</td>
</tr>
<tr>
<td>40.0 rweq rweq</td>
<td>rqwe reqw</td>
</tr>
</tbody>
</table>
</div>
</Content>
<Section>blah blah blah</Section>
</Root>
How to parse the content of this table which in present in xml?
Well there is a function called readHTMLTable in the XML package, that seems to do just what you need ?
Here is a way to do it with the following xml file :
<Root>
<Title> This is dummy xml file </Title>
<Content>
This table summarises data in BMC format.
<div class="abctable">
<table border="1" cellspacing="0" cellpadding="0" width="100%" class="coder">
<tbody>
<tr>
<th width="50%">ABC</th><th width="50%">Weight status</th>
</tr>
<tr>
<td>are 18.5</td>
<td>arew</td>
</tr>
<tr>
<td>18.5 — 24.9</td>
<td>rweq</td>
</tr>
<tr>
<td>25.0 — 29.9</td>
<td>qewrte</td>
</tr>
<tr>
<td>30.0 and hwerqer</td>
<td>rwqe</td>
</tr>
<tr>
<td>40.0 rweq rweq</td>
<td>rqwe reqw</td>
</tr>
</tbody>
</table>
</Content>
</div>
<Section>blah blah blah</Section>
</Root>
If this is saved in a file called /tmp/data.xml then you can use the following code :
doc <- htmlParse("/tmp/data.xml")
tableNodes <- getNodeSet(doc, "//table")
tb <- readHTMLTable(tableNodes[[1]])
Which fives :
R> tb
V1 V2
1 ABC Weight status
2 are 18.5 arew
3 18.5 — 24.9 rweq
4 25.0 — 29.9 qewrte
5 30.0 and hwerqer rwqe
6 40.0 rweq rweq rqwe reqw
The best method for xml parsing would be to use xpath expressions
Xpath Tutorial
Xpath and R
How to use XPath and R stackoverflow