XPath - getting data including HTML tags - html

I have Wordpress with Web Scraper tool (PHP in background) that uses XPath to retreive data from other websites.
I'm facing a problem where I get all needed data, but these data are stripped from HTML tags.
XPath formula I'm using:
//table/tbody/tr[td//text()[contains(., 'FFF')]]
Data I'm using:
<table id="myTable">
<thead>
<tr>
<th>#</th>
<th>First</th>
<th>Second</th>
<th>G</th>
<th>Z</th>
<th>C</th>
</tr>
</thead>
<tbody>
<tr>
<td>1.</td>
<td>D</td>
<td>FFF</td>
<td class="txt-c">6</td>
<td class="txt-c">0</td>
<td class="txt-c">0</td>
</tr>
<tr>
<td>2.</td>
<td>C</td>
<td>YYY</td>
<td class="txt-c">4</td>
<td class="txt-c">1</td>
<td class="txt-c">0</td>
</tr>
<tr>
<td>3.</td>
<td>B</td>
<td>ZZZ</td>
<td class="txt-c">4</td>
<td class="txt-c">0</td>
<td class="txt-c">0</td>
</tr>
<tr>
<td>4.</td>
<td>A</td>
<td>FFF</td>
<td class="txt-c">3</td>
<td class="txt-c">0</td>
<td class="txt-c">0</td>
</tr>
</tbody>
</table>
Result I'm getting:
1. D FFF 6 0 0 4. A FFF 3 0 0
Result I need:
<tr>
<td>1.</td>
<td>D</td>
<td>FFF</td>
<td class="txt-c">6</td>
<td class="txt-c">0</td>
<td class="txt-c">0</td>
</tr>
<tr>
<td>4.</td>
<td>A</td>
<td>FFF</td>
<td class="txt-c">3</td>
<td class="txt-c">0</td>
<td class="txt-c">0</td>
</tr>
Tool I'm using: https://wordpress.org/plugins/wp-web-scraper/
Exact shortcode I'm using in wordpress (url changed):
[wpws url='https://myweb.comm' query='%2F%2Ftable%2Ftbody%2Ftr%5Btd%2F%2Ftext()%5Bcontains(.%2C%20%27FFF%27)%5D%5D' output='html' query_type='xpath' querydecode='1']
All I need is same filtered HTML-tagged table.
Thank you for answers.

Thank you for your thoughts.
I have finally managed to get it working. The plugin itself is working fine. Only problem was missing table pair tag in Wordpress post where shortcode is used.
Solution:
<table>
[wpws url='https://yoururl.com' query='your query' output='html' query_type='xpath']
</table>

Related

How can I merge cell 28 and 29

When I use **collspan=2 **on the cell 28 the table doesn't have a good view and my expectation is merge cell 28 and 29 without breaking.
<table width="100%" border="2">
<tr>
<td colspan="2" rowspan="2">1</td>
<td colspan="2">3</td>
<td colspan="2">5</td>
</tr>
<tr>
<td colspan="2" rowspan="2">9</td>
<td colspan="2" rowspan="2">11</td>
</tr>
<tr>
<td>13</td>
<td rowspan="3">14</td>
</tr>
<tr>
<td rowspan="3">19</td>
<td rowspan="3">21</td>
<td colspan="2">22</td>
<td>24</td>
</tr>
<tr>
<td rowspan="2">28</td>
<td rowspan="2">29</td>
<td rowspan="2">30</td>
</tr>
<tr>
<td>32</td>
</tr>
</table>
Simple typo. you tried collspan, but it's colspan.
the full table row should look like this:
<tr>
<td rowspan="2" colspan="2">28 + 29</td>
<td rowspan="2">30</td>
</tr>
A word of advice from a very old coder:
learning this level of table trickery was important in 1999,
bevor CSS and flexbox and grid came along.
Now it's irrelevant.
Please check the image #epascarello #bjelli

How to align <td>s from 1 table against another?

I'm new to HTML/CSS, and I'm having a hard time aligning the Opening days, hours, closing days of the Chicken shop against the Open, Hours, and Close from the table. I want the days and time to align directly below each category. Such as Open (Sun/Mon..), Hours (9-3pm), Close (Tues/Fri). Below are my codes, any advise would be greatly appreciated!!! Thank you!!!
<table id="shops">
<tr>
<th>Shops</th>
<th>Location</th>
<th>Store Hours</th>
<th>Products</th>
</tr> <!-- Nested table for store hours and product types-->
<tr>
<td colspan="2"></td>
<td>
<table id="hours_table">
<tr>
<th>OPEN</th>
<th>HOURS</th>
<th>CLOSE</th>
</tr>
</table>
</td>
<td>
<table id="products_table">
<tr>
<th>Animals</th>
<th>Cost</th>
<th>Items</th>
<th>Cost</th>
</tr>
</table>
</td>
</tr>
<tr>
<td id="chicken_shop">Cuckoo House Chicken Shop</td>
<td>West Natura</td>
<td>
<table id="chicken_hours">
<tr>
<td>SUN/MON/WED/THURS/SAT</td>
<td>9AM - 3PM</td>
<td>TUES/FRI</td>
</tr>
</table>
</td>
</table>
Hi here is the solution:
<table id="shops" border='1'>
<tr>
<th>Shops</th>
<th>Location</th>
<th>Store Hours</th>
<th colspan="4">Products</th>
</tr> <!-- Nested table for store hours and product types-->
<tr>
<td id="chicken_shop">Cuckoo House Chicken Shop</td>
<td>West Natura</td>
<td>
<table width="333" id="hours_table" border='1'>
<tr>
<td>OPEN</td>
<td>HOURS</td>
<td>CLOSE</td>
</tr>
<tr>
<td>SUN/MON/WED/THURS/SAT</td>
<td>9AM - 3PM</td>
<td>TUES/FRI</td>
</tr>
</table>
</td>
<th>Animals</th>
<th>Cost</th>
<th>Items</th>
<th>Cost</th>
</tr>
</table>
Instead of using <th> you have to use <td> even if it is part of the table head.
<table>
<thead>
<tr>
<td>Shops</td>
<td>SOmethng</td>
<td>Something#2</td>
</tr>
</thead>
<tbody>
<tr>
<td>Something in the body of the table</td>
<td>something</td>
<tdSomething</td>
</tr>
</tbody>
</table>
I suggest using w3schools.com for additional info.Also you can add borders in case you want some borders around it.

HTML: can not align table cells

I generate automatic emails about TFS build. I need to make a table, therefore I create HTML Table to assign it to property Body of instance of object System.Net.Mail.MailMessage:
msg.Body = message + table;
My C# code works fine, however I can not to align my HTML table. This is part of generated HTML:
<table border="1" style="border: 1px solid; ">
<tr>
<td rowspan="5">
<b>Requirement #1172660: </b>
<br/>Malicious apps weren't recognized on desktop (webroot did not respond) <br/>
</td>
</tr>
<tr>
<td valign="top">
<b>Changeset #444273:</b>
<br/>By John Smith: increase webroot external service timeout 11/05/2016 10:38:59<br/>
</td>
</tr>
<tr>
<td valign="top">
<b>Changeset #455754:</b>
<br/>By John Smith: Added retry mechanism to external service call 12/07/2016 18:19:23<br/>
</td>
</tr>
<tr>
<td valign="top">
<b>Changeset #455969:</b>
<br/>By John Smith: Increased webroot timeout to 30 sec 13/07/2016 15:10:42<br/>
</td>
</tr>
<tr>
<td valign="top">
<b>Changeset #458813:</b>
<br/>By John Smith: Fixed bug in soapfull request retry 28/07/2016 12:16:16<br/>
</td>
</tr>
<tr>
<td rowspan="5">
<b>Requirement #1172660: </b>
<br/>Malicious apps weren't recognized on desktop (webroot did not respond) <br/>
</td>
</tr>
<tr>
<td valign="top">
<b>Changeset #444273:</b>
<br/>By John Smith: increase webroot external service timeout 11/05/2016 10:38:59<br/>
</td>
</tr>
<tr>
<td valign="top">
<b>Changeset #455754:</b>
<br/>By John Smith: Added retry mechanism to external service call 12/07/2016 18:19:23<br/>
</td>
</tr>
<tr>
<td valign="top">
<b>Changeset #455969:</b>
<br/>By John Smith: Increased webroot timeout to 30 sec 13/07/2016 15:10:42<br/>
</td>
</tr>
<tr>
<td valign="top">
<b>Changeset #458813:</b>
<br/>By John Smith: Fixed bug in soapfull request retry 28/07/2016 12:16:16<br/>
</td>
</tr>
<tr>
<td rowspan="2">
<b>Requirement #1180032: </b>
<br/>Orange FR - Change text before Factory Reset / Flash <br/>
</td>
</tr>
<tr>
<td valign="top">
<b>Changeset #455265:</b>
<br/>By John Smith: 11/07/2016 10:33:46<br/></td>
</tr>
</table>
When I past this code to some HTML compilator http://www.w3schools.com/tags/tryit.asp?filename=tryhtml_table_span
I get sutisfactory result:
Although, as you can see, right cell a little bit aligned to bottom.
Nervertheles, my headache starts when I receive my email. The generated table looks different:
As you can see, generated table in email has considerable empty space in the top of the right cells.
I've never dealt with HTML in my life and I don't have a clue how to remove this gap, align it to top or to keep the same look and feel of table like in HTML (the look and feel on first screenshot).
I would be glad to any help or advice.
I found the solution: I need to change my table structure:
Now I use this:
<table>
<tr>
<td rowspan="1">Bug 1</td >
<td>Changeset 1</td >
</tr>
<tr>
<td rowspan="2">Bug 1</td >
<td>Changeset 1</td >
</tr>
<tr>
<td >Changeset 2</td >
</tr>
<tr>
<td rowspan="3">Bug 2</td >
<td >Changeset 1</td >
</tr>
<tr>
<td >Changeset 2</td >
</tr>
<tr>
<td >Changeset 3</td >
</tr>
<tr>
<td rowspan="4">Bug 2</td >
<td >Changeset 1</td >
</tr>
<tr>
<td >Changeset 2</td >
</tr>
<tr>
<td >Changeset 3</td >
</tr>
<tr>
<td >Changeset 3</td >
</tr>
Instead of this:
<table>
<tr>
<td rowspan="3">Bug 1</td>
</tr>
<tr>
<td>Changeset 1</td>
</tr>
<tr>
<td>Changeset 2</td>
</tr>
<tr>
<th rowspan="5">Bug 2</th>
</tr>
<tr>
<th>Changeset 1</th>
</tr>
<tr>
<th>Changeset 2</th>
</tr>
<tr>
<th>Changeset 3</th>
</tr>
<tr>
<th>Changeset 4</th>
</tr>
</table>

Trying to layout a table in html

I'm trying to layout a table containing four columns: column 1 cell is six rows deep; column 2 cell is six rows deep; column 3 contains a cell four rows deep, and 2 cells one row deep; column 4 contains a cell three rows deep and a cell one row deep, with the last two cells in the column empty and unspecified.
ABCD
ABCD
ABCD
ABCE
ABFx
ABGx
I tried to follow what I think is the rule for doing this, namely: the first <tr> contains <td>s for everything in the first row; the second <tr> contains the <td>(s) to fill in columns for the first non-specified column(s) [in this case the cell called "E"], and the next two <tr>s contain a <td> each for "F" and "G".
The following code is my attempt:
<table border='1'>
<tr>
<td rowspan='6'>A<br/>A<br/>A<br/>A<br/>A<br>A</td>
<td rowspan='6'>B<br/>B<br/>B<br/>B<br/>B<br>B</td>
<td rowspan='4'>C<br/>C<br/>C<br/>C</td>
<td rowspan='3'>D<br/>D<br/>D</td>
</tr>
<tr>
<td>E</td>
</tr>
<tr>
<td>F</td>
</tr>
<tr>
<td>G</td>
</tr>
</table>
This gives me:
ABCDx
ABCDx
ABCDE
ABCDF
ABCGx
If I "guide it" with an unwanted column:
1ABCD
2ABCD
3ABCD
4ABCE
5ABF
6ABG
using:
<table border='1'>
<tr>
<th>1</th>
<td rowspan='6'>A<br/>A<br/>A<br/>A<br/>A<br>A</td>
<td rowspan='6'>B<br/>B<br/>B<br/>B<br/>B<br>B</td>
<td rowspan='4'>C<br/>C<br/>C<br/>C</td>
<td rowspan='3'>D<br/>D<br/>D</td>
</tr>
<tr>
<th>2</th>
</tr>
<tr>
<th>3</th>
</tr>
<tr>
<th>4</th>
<td>E</td>
</tr>
<tr>
<th>5</th>
<td>F</td>
</tr>
<tr>
<th>6</th>
<td>G</td>
</tr>
</table>
it comes out as expected. So what am I doing wrong?
Add two empty rows before row containing E
<table border='1'>
<tr>
<td rowspan='6'>A<br/>A<br/>A<br/>A<br/>A<br>A</td>
<td rowspan='6'>B<br/>B<br/>B<br/>B<br/>B<br>B</td>
<td rowspan='4'>C<br/>C<br/>C<br/>C</td>
<td rowspan='3'>D<br/>D<br/>D</td>
</tr>
<tr></tr>
<tr></tr>
<tr>
<td>E</td>
</tr>
<tr>
<td>F</td>
</tr>
<tr>
<td>G</td>
</tr>
</table>
First of all, to activate 6 rowspan, you need 6 not empty rows, like that http://codepen.io/Toomean/pen/dMeaqd
<table border='1'>
<tr>
<td rowspan='6'>A<br/>A<br/>A<br/>A<br/>A<br>A</td>
<td rowspan='6'>B<br/>B<br/>B<br/>B<br/>B<br>B</td>
<td rowspan='4'>C<br/>C<br/>C<br/>C</td>
<td rowspan='3'>D<br/>D<br/>D</td>
</tr>
<tr></tr>
<tr></tr>
<tr>
<td>E</td>
</tr>
<tr>
<td>F</td>
</tr>
<tr>
<td>G</td>
</tr>
</table>

html table format rows and columns

I am having problems with formatting a table. For some reason, the colspans and rowspans are not working and the cells are just dropped into the first row and column available. I have made column groups specifying the width of the columns. I have the code here:
<table class = “programs” border=“1”
summary=“Lists the morning programs aired by KPAF from 5:00 a.m. to 12:00p.m.(central time).>
<caption> All Times Central </caption>
<colgroup>
<col class = “timeColumn” />
<col class = “wDayColumns” span =“5”/>
<col class = “wEndColumns” span=“2”/>
</colgroup>
<thead>
<th>Time</th>
<th>Monday</th>
<th>Tuesday</th>
<th>Wednesday</th>
<th>Thursday</th>
<th>Friday</th>
<th>Saturday</th>
<th>Sunday</th>
</thead>
<tbody>
<tr>
<th>5:00</th>
<td colspan =“5” rowspan=“4”>Dawn Air</td>
<td colspan =“1”>Dawn Air Weekends</td>
<td colspan =“1”>Sunday Magazine</td>
</tr>
<tr>
<th>5:30</th>
</tr>
<tr>
<th>6:00</th>
<td col = “1” rowspan = “2”>Weekend Reflections</td>
</tr>
<tr>
<th>6:30</th>
</tr>
<tr>
<th>7:00</th>
<td colspan=“5”> Local News</td>
<td colspan=“1” rowspan=“2”>Weekend Wrap</td>
<td colspan=“1” rowspan=“2”>Radio U</td>
</tr>
<tr>
<th>7:30</th>
<td colspan=“5”>World News Feed</td>
</tr>
<tr>
<th>8:00</th>
<td colspan=“5” rowspan=“4”>Classical Roots</td>
<td colspan=“1” rowspan=“3”>What can you say?</td>
<td colspan=“1” rowspan=“4”>University on the air</td>
</tr>
<tr>
<th>8:30</th>
</tr>
<tr>
<th>9:00</th>
</tr>
<tr>
<th>9:30</th>
<td colspan=“1” rowspan=“4”>Animal Talk</td>
</tr>
<tr>
<th>10:00</th>
<td colspan=“5” rowspan=“4”>Symphony City</td>
<td colspan=“1” rowspan=“1”>Word Play</td>
</tr>
<tr>
<th>10:30</th>
<td colspan=“1” rowspan=“1”>Brain Stew</td>
</tr>
<tr>
<th>11:00</th>
<td colspan=“1” rowspan=“3”>Opera Live from the East Coast</td>
<td colspan=“1” rowspan=“1”>The Inner Mind</td>
</tr>
<tr>
<th>11:30</th>
<td colspan=“1” rowspan=“1”> Grammar Rules!!</td>
</tr>
<tr>
<th>12:00</th>
<td colspan=“5” rowspan=“1”>Book Club</td>
<td colspan=“1” rowspan=“1”>Weekend Wrap</td>
</tr>
</tbody>
</table>
it's because you are using bad quotes “,”. You have to use normal ones " (ASCII code: 034)