how to find text and extract whole section with xpath - html

i am trying to parse some text from bibliographic database which contains not standard tables. specifications of articles may or may not exist, bu if exist they have same tags for their specifications. For example; all articles have title but only some of them have keywords section. but when they have that section it shown with standard tags like that:
<tr>
<td align="right" valign="top" nowrap="nowrap">Database Name: </td>
<td>Social Science Database</td>
</tr>
<tr>
<td align="right" valign="top" nowrap="nowrap">Journal: </td>
<td>Social Science and Education, 2011,8(4):29-42</td>
</tr>
<tr>
<td align="right" valign="top" nowrap="nowrap">Author: </td>
<td>James H.; Chaomei C.</td>
<td align="right" valign="top" nowrap="nowrap">Type: </td>
<td>Journal</td>
</tr>
<tr>
<td align="right" valign="top" nowrap="nowrap">Article Type: </td>
<td>Research Article</td>
</tr>
<tr>
<td align="right" valign="top" nowrap="nowrap">Retrieve Type: </td>
<td>Bibliographic</td>
</tr>
<tr><td align="right" valign="top" nowrap="nowrap">Language: </td>
<td>En</td>
</tr>
<tr>
<td align="right" valign="top" nowrap="nowrap">Abstract Language: </td>
<td>En</td>
</tr>
Here is my question. I am trying to parse text with Knime using Xpath but i couldn't achieve anything i want. I want to find <tr>'s that contains specific text and take second <td>'s of that section. For example:
for "Database Name:" Xpath must get "Social Science Database".
I tried this code:
.//dns:tr//text()[contains(., 'Database Name:')]
But result contains just first , i need second one.I tried to that code, but it brings nothing.
.//dns:tr//text()[contains(., 'Database Name:')]/dns:td[*]

You can try this:
.//dns:tr//text()[contains(., 'Database Name:')]/../../dns:td[2]
.. takes you to the parent. You need to traverse 2 levels up and get the 2nd td.

Related

Indenting Text in HTML Email Template

I am attempting to place HTML in an email template of an older vendor solution that doesn't support modern HTML5 techniques. In the code sample (JSFiddle url below) if I resize the template and make it smaller the text falls to the next line without an indent.
Is there a way to make the text indent without a hard line break and indenting?
<div>
<table style="background:#8B0000 ;color:#FFF;width:100%;font-size: 11pt;font-family: Arial;">
<tr>
<td width="10"></td>
<td height="30">Test Email</td>
<td align="right"></td>
<td width="30"></td>
</tr>
</table>
<table style="background:#D9D9D9;color:#17375E;width:100%;font-size: 11pt;font-family: Arial;">
<tr>
<td width="10"></td>
<td height="30"></td>
</tr>
</table>
<table style="background:#D9D9D9;width:100%;font-family: Arial;">
<tr style="background:#FFF">
<td style="background:#D9D9D9"></td>
<td height="30">
<p style="font-size: 10.5pt;font-weight: 700;color:#555">  Test.</p>
<div style="font-size: 10pt;color:#555">
<p>  There are pending items that require your review. Please see below are the details. The request must be approved or denied within 72 business hours or it will escalate to your manager. If you have questions regarding this email, call <b>1-555-555-5555</b>.</p>
</div>
</td>
<td style="background:#D9D9D9"></td>
</tr>
</table>
<table style="background:#D9D9D9;color:#17375E;width:100%;font-size: 11pt;font-family: Arial;">
<tr>
<td></td>
<td height="30"> </td>
</tr>
</table>
</div>
Working code sample: https://jsfiddle.net/wa1z4nvr/4/
Remove the ensp entity and give your paragraph a margin you like.
<p style="text-indent: 0; margin: 1em;">There are pending items that require your review. Please see below are the details. The request must be approved or denied within 72 business hours or it will escalate to your manager. If you have questions regarding this email, call <b>1-555-555-5555</b>.</p>
It looks like right now you're using en-spaces as your indent  
This would create a behavior where the first line appears to be indented and the rest does not.
If you want subsequent lines of text to be lined up with your first line, then you're probably not looking for an "indent".
You can remove the   from both your paragraph and your "Test." text, and adding a padding-left:1em to those elements, or to the table row containing them.
The table containing your test text might look like this:
<table style="background:#D9D9D9;width:100%;font-family: Arial;">
<tr style="background:#FFF">
<td style="background:#D9D9D9"></td>
<td height="30" style="padding-left: 1em">
<p style="font-size: 10.5pt;font-weight: 700;color:#555">Test.</p>
<div style="font-size: 10pt;color:#555">
<p>There are pending items that require your review. Please see below are the details. The request must be approved or denied within 72 business hours or it will escalate to your manager. If you have questions regarding this email, call <b>1-555-555-5555</b>.</p>
</div>
</td>
<td style="background:#D9D9D9"></td>
</tr>
</table>

HTML table design issue

I´m generating a pdf to print the bill, but in Argentina we have an easy but tricky design which I cannot figure out how to do it! I need to make it with html tables.
It is very simple, but I can´t do it and it´s driving me crazy!
I appreciate all the help.
HERE is my code:
<table border= "1" width="50%">
<tr>
<td > Merge 1 </td>
<td colspan="2" align="center">A</td>
<td> Merge 2 </td>
</tr>
<tr>
<td colspan="2">Merge 1</td>
<td colspan="2">Merge 2</td>
</tr>
</table>
Have you considered using PDF Forms? Then you can design the pdf as you want and just fill in the data you need at runtime

How to remove blank after td

I want to display an IBAN number in a html mail. To improve readability I want to add a blank after every 4th sign. To also provide a good usability I want to do this with padding, so the user can copy & paste the IBAN into his online banking system without any blanks. To achieve this I have to use a table, because Microsoft Outlook ignores padding on other elements:
<table cellpadding="0" cellspacing="0" border="0">
<tr>
<td style="padding-right:5px;">
IBAN:
</td>
<td style="padding-right:5px;">
XX01
</td>
<td style="padding-right:5px;">
2345
</td>
<td style="padding-right:5px;">
6789
</td>
<td style="padding-right:5px;">
0123
</td>
<td>
4567
</td>
</tr>
</table>
(See this jsfiddle)
My problem: When I copy & paste the IBAN a blank is added after each td element. Is there a way to remove these blanks with html or css?

HTML validation error: table row has no cells beginning in it

<table border="1">
<tr>
<td rowspan="4">A</td>
<td colspan="5">B</td>
</tr>
<tr>
<td colspan="3">E</td>
<td rowspan="2">F</td>
<td rowspan="4">C</td>
</tr>
<tr>
<td rowspan="2">G</td>
<td>
<table border="1">
<tr>
<td>1</td><td>2</td>
</tr>
<tr>
<td>3</td><td>4</td>
</tr>
</table>
</td>
<td>
<table border="1">
<tr>
<td>1</td><td>2</td>
</tr>
<tr>
<td>3</td><td>4</td>
</tr>
</table>
</td>
</tr>
<tr>
<td colspan="3">H</td>
</tr>
<tr>
<td colspan="5">D</td>
</tr>
</table>
W3C Validator complains, that: "Table column 6 established by element td has no cells beginning in it." even though cell 'C' should begin on 6th column. It displays correctly, so could it be a bug in the validator?
This appears to be a bug in the validator. It somehow fails to analyze the table properly.
Proving this might require a detailed analysis based on the HTML5 table model, but if you just add <col><col><col><col><col><col> right after the <table ...> tag, the markup passes validation – perhaps because it tells the browser that there are six columns and this helps the validator to recognize the status of the C cell properly. I accidentally noticed this when I added the col elements in order to set background colors on columns to visualize the situation better.
Consider posting a bug report and reporting back here if there will be progress there.

What's wrong with my table?

Here is the code for the table:
<table align="center" width="303" height="740" border="1" cellpadding="10">
<tr>
<th width="130" height="41" scope="col">URL1 - Normal</th>
<th width="121" scope="col">URL2 - Hover</th>
</tr>
<tr>
<td height="94"><img src="http://i1018.photobucket.com/albums/af309/5416339/ad-green.png"/></td>
<td><img src="http://i1018.photobucket.com/albums/af309/5416339/ad-green-h.png" alt=""/></td>
</tr>
<tr>
<td height="124"><img src="http://i1018.photobucket.com/albums/af309/5416339/ad-blue.png" alt=""/></td>
<td><img src="http://i1018.photobucket.com/albums/af309/5416339/ad-blue-h.png" alt=""/></td>
</tr>
<tr>
<td height="147"><img src="http://i1018.photobucket.com/albums/af309/5416339/ad-grey-h.png" alt=""/></td>
<td><img src="http://i1018.photobucket.com/albums/af309/5416339/ad-grey.png" alt=""/></td>
</tr>
<tr>
<td height="137"><img src="http://i1018.photobucket.com/albums/af309/5416339/ad-pink.png" alt=""/></td>
<td><img src="http://i1018.photobucket.com/albums/af309/5416339/ad-pink-h.png" alt=""/></td>
</tr>
<tr>
<td height="132"><img src="http://i1018.photobucket.com/albums/af309/5416339/ad-red.png" alt=""/></td>
<td><img src="http://i1018.photobucket.com/albums/af309/5416339/ad-red-h.png" alt=""/></td>
</tr>
<tr>
<td height="132"><img src="http://i1018.photobucket.com/albums/af309/5416339/ad-black.png" alt=""/></td>
<td><img src="http://i1018.photobucket.com/albums/af309/5416339/ad-black-h.png" alt=""/></td>
</tr>
</table>
When I insert the table, it leaves a gap in-between the table and the text. If I remove the table, then everything is fine. What's going wrong here?
Blogspot inserts line breaks for you... and they push the table down. (I haven't found a workaround yet.)
If you view the source, you can see them:
<table align="center" width="303" height="740" border="1" cellpadding="10"><br />
<tr><br />
<th width="130" height="41" scope="col">URL1 - Normal</th><br />
<th width="121" scope="col">URL2 - Hover</th><br />
</tr><br />
<tr><br />
<td height="94"><img src="http://i1018.photobucket.com/albums/af309/5416339/ad-green.png"/></td><br />
...
Because the BRs are invalid when directly inside a TABLE, TR, or after a TH or TD, the browser pushes those elements out of and above the table when rendering the DOM.
If you take a look at the source of the page, you'll notice a TON of <br/> tags interspersed with your table (but not contained in cell elements). They are rendered above the table.
It looks like your HTML is being parsed by something, and your line-breaks are being replaced with BR tags.
Quick solution: remove all linebreaks and just have the table code on one line :)
It has nothing to do with the table. It's the fact that there are 31 <br> (line break) tags before the table (which are what are creating the huge gap.
It sounds like BlogSpot (or whatever blog service you are using) is adding extra <br> tags based on how you're formatting the rest of your content. Edit the source of the page if possible and manually remove them...otherwise it becomes a support issue with whatever blog platform you're on.
This has nothing to do with anything in your table markup. Viewing the HTML source of that page shows about 30 <br> tags ahead of the table. They are obviously responsible for the extra space.
Why you get 30 <br> tags when inserting a table must have something to do with how blogspot.com is formatting your content. Your best bet is to try editing the HTML by hand to remove the <br> tags. If you can't do that, or if the <br> tags don't show up when editing the HTML, it's a question for customer service at Blogspot.