Identifying nodes on position relative to grandparent - html

I have the following HTML table:
tab2 <- '<table>
<thead>
<tr>
<th rowspan="2">a</th>
<th>b</th>
<th colspan="2" rowspan="2">c</th>
</tr>
<tr>
<td></td>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>2</td>
<td>3</td>
<td></td>
</tr>
</tbody>
</table>'
It has three rows, the first two are header information, the last one is the body. The goal is to extract the header information using only the header row position relative to the table node (1+2), i.e. without having to pay attention to whether the header nodes have a thead parent or not.
I tried
//tr[position() < 3]
doesn't work because position() works relative to parent node thead and tbody.
I am using R with the XML package (which uses XPath 1.0). This is what I get when I use above XPath
xpathSApply(tab2, "//tr[position() < 3]")
[[1]]
<tr><th rowspan="2">a</th>
<th>b</th>
<th colspan="2" rowspan="2">c</th>
</tr>
[[2]]
<tr><td/>
</tr>
[[3]]
<tr><td>1</td>
<td>2</td>
<td>3</td>
<td/>
</tr>
I get all three rows. Which makes sense according to how I understand position(). It works relative to its parent.
Context
I am writing a function that allows users to parse HTML tables with the R programming language and assemble an R data structure from it. The function allows users to pass a numeric value for which rows provide header information and which body information. For the above table, users should be able to say row 1 and row 2 (in the entire table) provide header information. I need to process this input so it works on HTML tables unconditional on whether this table makes use of thead and tbody elements, or not. The problem with
//tr[position() < 3]
is that it also returns the body row (third row). Hope this makes it clear(er).

Use the following XPath expression:
/table//tr[count(preceding::tr) < 2]
It does not care whether a certain tr is inside thead or not. It just considers tr element that are preceded by zero or one other tr element. The result is the following:
<tr>
<th rowspan="2">a</th>
<th>b</th>
<th colspan="2" rowspan="2">c</th>
</tr>
-----------------------
<tr>
<td/>
</tr>
Caveat: This simple approach only works if there is only one table in the HTML document. But as long as you are working with exactly this HTML snippet , it suffices.

This expression will work for a document with any number of tables.
//table/descendant::tr[position() < 3]
By using the descendant forward axis, the [position() < 3] subscript will select the first and second node in the set of tr descendants of the table (rather than finding their position relative to their parent node, as with //tr in your question).
http://jsfiddle.net/uutavwvk/1/

Related

Extract a td value using XPath

I need to extract values that are in a HTML cell table using XPath and I'd like not to use "positional" XPath string.
My code sample is something like this one
<body>
<table>
<tr>...</tr>
<tr>...</tr>
<tr>...</tr>
<tr>
<td>
<table id="GridView5">
<tr>...</tr>
<tr>...</tr>
<tr>
<td>First Title</td>
<td>1</td>
<td>2</td>
<td>3</td>
<tr>
</table>
</td>
</tr>
</table>
</body>
I'm trying to use something like this XPath expression
//*[#id=”GridView5”]/*[td=”First Title”]/td[3]
to extract the value "2" from the above code
Suggestions? Examples?
This XPath,
//table[#id="GridView5"]/tr[td="First Title"]/td[3]
will select the third td child of the tr that has a td child with a string value of First Title within a table with an id attribute value of GridView5.
It uses a single positional selector for the requested td element because your markup affords no other way to differentiate the td that contains 2, unless you allow an assumption that 2 comes after 1 or before 3 (treating either a label). If the preceding or following td elements can serve as labels, then you could use the preceding or following sibling axis instead of the [3].
//table[#id="GridView5"]/tr[td="First Title"]/td[.='1']/following-sibling::td[1]
but even here you'll need [1] to select only the immediately following sibling.

How to select table entry via XPath

I have the following table, and I wanted an expression to get the Percentage of the Category "OC". Is it possible to extract via XPath?
<tbody>
<tr>
<th class="textL">Category</th>
<th class="textR">No. of Items</th>
<th class="textR">Percentage</th>
</tr>
<tr class="data_row">
<td>OC</td>
<td class="textR">100</td>
<td class="textR">4.70</td>
</tr>
<tr class="data_row">
<td>FP</td>
<td class="textR">200</td>
<td class="textR">38.82</td>
</tr>
<tr class="data_row">
<td>FI</td>
<td class="textR">300</td>
<td class="textR">20.39</td>
</tr>
</tbody>
Selecting table entry based on value of another entry
To select the Percentage for the given "OC" Category:
//td[.='OC']/following-sibling::td[count(../..//th[.='Percentage']/preceding-sibling::th)]/text()
The above XPath will return
"4.70"
as requested.
Note that it will continue to work in the face of many changes, including row and column rearrangements as long as the targeted column continues to be named "Percentage" and remains after the Category column in the first column. One could even further generalize the expression by taking the difference of the positions of the two columns rather than assuming that Category is the first column.
Explanation: From the td that contains "OC", go over the number of siblings equal to the position of the "Percentage" column header, and there select the text in the correct sibling td.
Another XPath, also dependent on the order of the table's columns
//td[text()='OC']/following-sibling::td[2]
(explanation: take the second td sibling among the siblings of a td that contains text 'OC')
There are multiple XPaths for that. This one will work:
/tbody[1]/tr[2]/td[3]/text()
But it is based on the current layout of the XML
This works:
tbody/tr/td[. eq "OC"]/../td[#class eq "textR"][2]/text()
It assumes that the OC td element will be there, and that the value you want is the 2nd element with a "textR" attribute.

Semantic of nested tables and table header

I have a table where elements can have child elements with the very same attributes, like:
ITEM ATTRIBUTE 1 ATTRIBUTE 2
item value value
sub value value
sub value value
item value value
From this I've created a markup like this:
<table>
<thead>
<tr>
<th>ITEM</th>
<th>ATTRIBUTE 1</th>
<th>ATTRIBUTE 2</th>
</tr>
</thead>
<tbody>
<tr>
<td>item</td>
<td>value</td>
<td>value</td>
</tr>
<tr>
<td colspan=3>
<table>
<tbody>
<tr>
<td>sub</td>
<td>value</td>
<td>value</td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td>item</td>
<td>value</td>
<td>value</td>
</tr>
</tbody>
</table>
My questions are now:
Is this the best semantic solution?
Is another approach better suited? If so, which is the recommended way?
Is the table header in charge for both tables or do I have to create a new one (maybe with visibility: hidden for the nested table?
Is this the best semantic solution?
Not really. While the act of nesting an element A within another element B can be used to indicate that A is a child of B, that isn't what you're doing here: you're nesting the table within a completely different row, so there's no implication of a parent-child relationship between A and B.
By creating a cell that spans all the columns in the table and then building another table inside that with the same number of columns, you're also effectively saying "these are some other columns, that don't relate to the ones in the outer table".
You can see the implied (lack of) relationship between the columns by adding a border to the cells in your example above:
Obviously you can fix that with CSS, but the unstyled rendering of a piece of HTML is often a good guide to its semantics.
Is another approach better suited? If so, which is the recommended way?
There's no standard way to represent hierarchical relationships between rows of a table in HTML. Cribbing from an answer I gave to a similar question, though, you can do it with extra classes, ids and data- attributes:
<table>
<thead>
<tr>
<th>ITEM</th>
<th>ATTRIBUTE 1</th>
<th>ATTRIBUTE 2</th>
</tr>
</thead>
<tbody>
<tr id=100>
<td>item</td>
<td>value</td>
<td>value</td>
</tr>
<tr id=110 data-parent=100 class=level-1>
<td>sub</td>
<td>value</td>
<td>value</td>
</tr>
<tr id=200>
<td>item</td>
<td>value</td>
<td>value</td>
</tr>
</tbody>
</table>
The parent-child relationship won't be visible in an unstyled rendering (there's no other way you could make it so without adding extra content, as far as I can see), but there are enough hooks to add the CSS required:
.level-1 > td:first-child {
padding-left: 1em;
}
... which results in this:
With a little javascript, you could also use the id and data-parent attributes to set things up so that e.g. hovering over a row causes its parent to be highlighted.
Is the table header in charge for both tables, or do I have to create a new one?
In your proposed solution, creating a single cell that spans all columns and then building another table inside it means that there's no implied relationship between the header cells and those of your "child" row. Obviously my suggested solution above doesn't have that problem.
This is W3C's recommendation:
At the current time, those who want to ensure consistent support across Assistive
Technologies for tables where the headers are not in the first row/column may want
to use the technique for complex tables H43: Using id and headers attributes to
associate data cells with header cells in data tables. For simple tables that have
headers in the first column or row we recommend the use of the th and td elements.
you can lock at this post: Best way to construct a semantic html table
hope that will help you to get your answer
Talking about semantics requires us to have more time than to find an answer for your question.
But for a whole point, this link should help you. That page contains all the information you may be interested in. Interestingly unlike normal 'declarative' spec w3c writes, it has 'suggestive' writing about the question in this context. You may wish to read right from the start.
I think putting the children in a separate table is the wrong way to go. Nested tables are not like nested lists; they don't carry that same semantic hierarchy. It seems everything should be within the same table if it all lists the same information.
For example, if your table had the headers
REGION POPULATION AREA
then you could have item1 = Earth, item2 = France, item3 = Paris... and it wouldn't really matter if France were a child of Earth or if Paris were a child of France; you'd still be better off keeping it all in one table and not trying to do a parent/child relationship other than in CSS styling.
If your table is really not comprehensible without someone knowing that parent/child relationship, could you give an example of the table data so I can better understand how to structure it?

How to make HTML table cell to flow below other cells in the same row

I have a table in which I need to add another cell per column, and since that column will have a lot of elements on it, the cell content must flow below the other cells in the same row, filling all the available width on the table. This way the columns won't be stretched making it impossible to view its content.
A visual example might help: http://www.asciiflow.com/#Draw9009157520507047228
Edit
After reading all the replies I realize that perhaps I didn't provide enough information. My apologies.
The table I want to modify is the one in http://staging.locamotion.org/projects/pootle/ . Note that this table uses the sorttable JavaScript library for sorting the table.
I have to modify that table to display tagging information for each of the entries. Since every entry can have several tags which can span a lot of space, the new column (necessary to keep working the sorting library) must flow below the other columns to allow showing the tagging stuff which can span to one or more lines due to width constraints (have in mind that this program is used on third world countries will old equipment and low resolution screens).
Someone asked what I've tried. I tried adding an additional row per each entry, which only one cell with colspan attribute, but that way the sorting library doesn't work:
<table class="sortable">
<thead>
<tr>
<th>Col 1</th>
<th>Col 2</th>
<th>Col 3</th>
<th>Col 4</th>
</tr>
</thead>
<tbody>
<tr class="even">
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
</tr>
<tr class="even">
<td colspan="4">First entry tagging stuff</td>
</tr>
<tr class="odd">
<td>5</td>
<td>6</td>
<td>7</td>
<td>8</td>
</tr>
<tr class="odd">
<td colspan="4">Second entry tagging stuff</td>
</tr>
<!-- More and more entries -->
</tbody>
</table>
If you have any other idea on how this UI can be achieved while keeping column ordering I would appreciate it.
I want to apologize again, and also thank the people who spent some of their time trying to answer and help me the best they could.
Tables cells are ALWAYS rectangles and cannot "flow" as described in your image.
You would need to nest a table in a DIV and use CSS floats to accomplish what you're looking for.
I don't think that's possible in HTML.
You could create a Div with a left-floated table in it that contains the two cells, and then have the third cell's content as the content of the div (after the table).
<div>
<table><tr><td>First</td><td>Second</td></tr></table>
Third with a lot of content that might actually span under the table but it's not part of the table
</div>
Like so: http://jsfiddle.net/9QS44/
You can merge the cells from 2 rows using rowspan and similarly for the columns using colspan. You can also combine both the rowspan and colspan. Not sure if you will be able to achieve the effect you're trying to get, but this is the only way I know that will get you close. This should also help you out. http://www.htmlcodetutorial.com/tables/index_famsupp_30.html
I agree with previous posters that HTML table cells do not exactly work the way you are wanting. Here's another way to skin the cat:
<style type="text/css" media="screen">
table {
border-collapse:collapse;
border:1px solid #FF0000;
}
table td{
border:1px solid #FF0000;
}
table#child {
width:200px; float:left;
}
table#parent {
width:300px;
}
</style>
<table id="parent">
<tr>
<td>
<table id="child">
<tr>
<td>1</td>
<td>2</td>
</tr>
</table>Text that<br />wraps<br /> and wraps.
</td>
</tr>
</table>
Might want to reduce the amount of bottom margin on the child table as well so that the text wraps a little bit closer.
If it were me, I'd explore using divs for this particular need and forget about tables altogether.

Hierarchical html table, putting last td on next line

I'm creating a simple hierarchical table with html and CSS and I'm getting into trouble with formatting the last td element with class .child to be on next line.
I want to have the nested table inside table > tr > td.child becase each table can be sorted and javascript sorters don't implement any grouping of rows (my problem of having nested table could be easily solved by moving the .child > table element into next table > tr however this would break the nice nesting structure)
Is there a way to put td.child on next row with css?
html sample:
<table>
<tr>
<td>I have</td>
<td>1</td>
<td>pie</td>
<td class="child">
<table>
<tr>
<td>I have</td>
<td>1</td>
<td>pie</td>
</tr>
</table>
</td>
</tr>
</table>
You could do something like this . You'd need to be careful cross browser though (only checked on Chrome)