I am newbie.
I have a problem how to convert html table with fpdf.
This is my html code
<html>
<body>
<table border=1 align=center>
<tr>
<tr>
<td> </td>
<td>PO1</td>
<td>PO2</td>
</tr>
<tr>
<td>LO1</td>
<td> "THIS WILL TAKE THE VALUE FROM DATABASE " </td>
<td> "THIS WILL TAKE THE VALUE FROM DATABASE " </td>
</tr>
<tr>
<td>LO2</td>
<td> "THIS WILL TAKE THE VALUE FROM DATABASE " </td>
<td> "THIS WILL TAKE THE VALUE FROM DATABASE " </td>
</tr>
</table>
</body>
</html>
Can anyone tell me how to do it?
please take a look the example at http://www.fpdf.org/en/script/script41.php
Related
I`m using Scrapy Python to try to grep data from the site.
How I can grep this structure with Xpath?
<div class="foo">
<h3>Need this text_1</h3>
<table class="thesamename">
<tbody>
<tr>
<td class="tmp_year">
45767
</td>
<td class="tmp_outcome">
<b>Win_1</b><br>
<span class="tmp_category">TEST_1</span>
</td>
</tr>
<tr>
<td class="tmp_year">
1232004
</td>
<td class="tmp_outcome">
<b>Win_2</b><br>
<span class="tmp_category">TEST_2</span>
</td>
</tr>
<tr>
<td class="tmp_year">
122004
</td>
<td class="tmp_outcome">
<b>Win_3</b><br>
<span class="tmp_category">TEST_3</span>
</td>
</tr>
</tbody>
<h3>Need this text_2</h3>
<table class="thesamename">
<tbody>
<td class="tmp_year">
234
</td>
<td class="tmp_outcome">
<b>Win_E</b><br>
<span class="tmp_category">TEST_E</span>
</td>
</tr>
<tr>
<td class="tmp_year">
3476
</td>
<td class="tmp_outcome">
<b>Win_C</b><br>
<span class="tmp_category">TEST_C</span>
</td>
</tr>
</tbody>
<h3>Need this text_3</h3>
<table class="thesamename">
<tbody>
<tr>
<td class="tmp_year">
85567
</td>
<td class="tmp_outcome">
<b>Win_T</b><br>
<span class="tmp_category">TEST_T</span>
</td>
</tr>
<tr>
<td class="tmp_year">
435656
</td>
<td class="tmp_outcome">
<b>Win_A</b><br>
<span class="tmp_category">TEST_A</span>
</td>
</tr>
<tr>
<td class="tmp_year">
980
</td>
<td class="tmp_outcome">
<b>Win_Z</b><br>
<span class="tmp_category">TEST_Z</span>
</td>
</tr>
</tbody>
I would like to have output with this structure:
"Section": {
Need this text_1 :
[45767 : Win_1 : TEST_1]
[1232004 : Win_2 : TEST_2]
[122004: Win_3 : TEST_3]
,
Need this text_2:
[234 : Win_E : TEST_E]
[3476 : Win_C : TEST_C]
,
Need this text_3:
[85567 : Win_T : TEST_T]
[435656 : Win_A : TEST_A]
[980: Win_Z : TEST_Z]
}
How can I create the proper xpath select to take this structure?
I can take separately all "h3" , all "a" then all tags with class but how I can match?
GREP YOU SAY?! LOL Well, You would be entirely wron to name it so but for the sake ofkeeping the jargon cleanfor understanding your just parsing/extracting.... So new to scrapy? or web dev sideof things? No matter... Theres no way I couldexpect to teach you in one answer here how to xpth/regex like a pro... only wayis for you to keep at but I throw in my input.
First of all, xpath is amazingly usefull wen it comes to websites that are necessarily build to stadard, which doesnt make them bad per say but in the html snipet you gave... its structured all right soo.. Id recommend css extract .. THESE ARE THE VALUES...
year = response.css('td.tmp_year a::text').extract()
outcome = response.css('td.tmp_outcome b::text').extract()
category= response.css('span.tmp_category::text').extract()
PRO-TIP: For what ever case you deem it neccesary, you can save a web page asan HTML file and use scrapy shell by referencing the direct file path to it... So I save you html snippet to a file on my desktop then ran...
scrapy shell file:///home/scriptso/Desktop/letsGREPlol.html
ANYWAYS... as far as xpath... since you asked lol... cake. lets compare the xpath with the cssand tell me you can see... it? lol
response.css('td.tmp_outcome b::text').extract()
so is a td tag....and the class name is tmp_outcome, thn the next node is a bold tag... of which where the text is thusly declaring it as text with the ::text
response.xpath('//td[#class="tmp_outcome"]/b/text()').extract()
So xpath is basically saying we star with a patter inthe entire site of the td tag... and class= tmp_outcome, then the bold, then in xpath to declare type /text() is for text.... /#href is for.. yeah you guessedit
I have a text file something like this.
<tbody>
<tr>
<td>
String1
</td>
<td>
String2
</td>
<td>
String3
</td>
...
...
<td>
StringN
</td>
</tr>
</tbody>
This is the output that I want.
<tbody>
<tr>
String1;String2;String3;... ...;StringN
</tr>
</tbody>
Here is my BUGGY code.
sed '{
:a
N
$!ba
s|<td.*>\(.*\)</td>|\1|
}'
I wanted to remove all <td> and </td> tags and get all the strings delimitered by some string (I can filter those strings later using that as the delimiter charater). I used the solution given in this URL. Output does not come as I expected.
This is the actual Code
<tbody>
<tr>
<td>
120.52.72.58:80
</td>
<td>
HTTP
</td>
<td>
<span class="text-danger">Transparent</span>
</td>
<td>
<abbr title="2016-12-15 00:07:46">12h ago</abbr>
</td>
<td class="small">
<span class="text-muted">—</span>
</td>
<td>
<img src="/flags/png/cn.png" alt="China (CN)" title="China (CN)" onerror="this.style.display='none'"> <abbr title="China">CN</abbr>
</td>
<td class="small">
Beijing
</td>
<td class="small">
Beijing
</td>
<td class="small">
China Unicom IP network
</td>
<td class="small">
<span class="text-muted">—</span>
</td>
</tr>
</tbody>
Output does not come as I expected.
Your sed code does not work because the <td.*>\(.*\)</td> matches the part of the pattern space from the first <td up to the last </td> due to the greediness of the * quantifier. Unfortunately, sed doesn't support a more modern regex flavor with ungreedy quantifiers; thus, some other tool would be more appropriate.
I wanted to remove all <td> and </td> tags and get all the strings delimitered by some string …
If those tags are always (as in your examples) on a separate line, we can do with a simple sed command:
sed '/<\/*td.*>/d'
All the strings are thereafter delimited by some string which is \n followed by spaces.
I have similiar structure to this:
<table class="superclass">
<tr>
<td>
</td>
<td>
</td>
</tr>
<tr>
<td>
</td>
<td>
</td>
</tr>
</table>
<table cellspacing="0">
<tr>
<td>
</td>
<td>
</td>
</tr>
<tr>
<td>
</td>
<td>
</td>
</tr>
</table>
This is how I get the first table with class:
HtmlNode firstTable = document.DocumentNode.SelectSingleNode("//table[#class=\"superclass\"]");
Then I read the data. However I don't know how to get straight to the another table and read that data too. Any ideas?
I'd rather avoid counting which table it is and then using index to that table.
There is XPath following-sibling axis which allows you to get element following current context element at the same level :
HtmlNode firstTable = document.DocumentNode.SelectSingleNode("//table[#class=\"superclass\"]");
HtmlNode nextTable = firstTable.SelectSingleNode("following-sibling::table");
If you want to access multiple nodes, you can consider SelectNodes(xpath) method over SelectSingleNode(xpath) method.
I'll provide a sample code here for reference, it may not work towards your need.
var tables = htmlDocument.DocumentNode.SelectNodes("//table");
foreach (HtmlNode table in tables)
{
if (table.GetAttributeValue("class", "").Contains("superclass"))
{
//this is the table of class="superclass"
}
else
{
//this is the other table.
}
}
i need to change quite some html entries in a mysql database. my problem is that some tags need to be replaced while the surrounded code needs to stay the same. in detail: all td-tags in tr-tags with the class "kopf" need to be changed to th-tags (and the addording closing for the tags)
it would not be a problem without the closing tags..
update `tt_content` set `bodytext` = replace(`bodytext`,'<tr class="kopf"><td colspan="2">','<tr><th colspan="2">');
this would work
from what i found the %-sign is used, but how exactly?:
update `tt_content` set `bodytext` = replace(`bodytext`,'<tr class="kopf"><td colspan="2">%</td></tr>','<tr><th colspan="2">%</th></tr>');
i guess this would replace all the code within the old td tags by a %-sign?? how can i achive the needed replacement?
edit: just to clarify things here is a possible entry in the db:
<table class="techDat" > <tbody> <tr class="kopf"> <td colspan="2"> <p><strong>Technical data:</strong></p> </td> </tr> <tr> <td> <p>Operating time depending on battery chargeBetriebszeit je Akkuladung</p> </td> <td> <p>Approx. 4 h</p> </td> </tr> <tr> <td> <p>Maximum volume</p> </td> <td> <p>Approx. 120 dB(A)</p> </td> </tr> <tr> <td> <p>Weight</p> </td> <td> <p>Approx. 59 g</p> </td> </tr> </tbody> </table>
after the mysql replacement it should look like
<table class="techDat" > <tbody> <tr> <th colspan="2"> <p><strong>Technical data:</strong></p> </th> </tr> <tr> <td> <p>Operating time depending on battery chargeBetriebszeit je Akkuladung</p> </td> <td> <p>Approx. 4 h</p> </td> </tr> <tr> <td> <p>Maximum volume</p> </td> <td> <p>Approx. 120 dB(A)</p> </td> </tr> <tr> <td> <p>Weight</p> </td> <td> <p>Approx. 59 g</p> </td> </tr> </tbody> </table>
Try two replaces
update `tt_content` set `bodytext` =
replace(replace(`bodytext`,
'<tr class="kopf"><td colspan="2">','<tr><th colspan="2">'),
'</td></tr>','</th></tr>')
Try updating your records with two queries :
1) for without % sign:
updatett_contentsetbodytext= replace(bodytext,'<tr class="kopf"><td colspan="2">','<tr><th colspan="2">');
2) for % sign
updatett_contentsetbodytext= replace(bodytext,'<tr class="kopf"><td colspan="2">%</td></tr>','<tr><th colspan="2">%</th></tr>')
where instr(bodytext,'%') > 0 ;
I don't know HTML (HORRIBLY EMBARRASSED but didn't ever have the need to). I am pretty perspicacious when it comes to UNIX however I am horribly confused with this assignment I have. I know what I need to do but am having the hardest time ever getting started.
I have the following files in my hwk12 directory:
roster.html
roster.txt
sample.html
sample.txt
The following is the content of the roster.html file:
<html>
<body>
<table border=2>
<tr><th>Name</th><th>Username</th><th>Email</th></tr>
<tr>
<td>Nikhil Banerjee</td>
<td>nbanerje</td>
<td>zetapsi796#hotmail.com</td>
</tr>
<tr>
<td>Jeff Nazarian</td>
<td>jnazaria</td>
<td>jeff.nazarian#asu.edu</td>
</tr>
<tr>
<td>Anna Melzer</td>
<td>amelzer</td>
<td>anna.melzer#asu.edu</td>
</tr>
<tr>
<td>Jose Garcia</td>
<td>jgarcia</td>
<td>garcia-j#msn.com</td>
</tr>
<tr>
<td>Jillian Testa</td>
<td>jtesta</td>
<td>jillian.testa#asu.edu</td>
</tr>
<tr>
<td>Clayton Lengelzigich</td>
<td>clengelz</td>
<td><a href="mailto:clayton.lengel-zigich#asu.edu">clayton.lengel-
zigich#asu.edu</a></td>
</tr>
<tr>
<td>Ashley Bennett</td>
<td>abennett</td>
<td>ashley.bennett#asu.edu</td>
</tr>
<tr>
<td>Ann Frost</td>
<td>afrost</td>
<td>ann.frost#asu.edu</td>
</tr>
<tr>
<td>Timothy Whipple</td>
<td>twhipple</td>
<td>tweed#asu.edu</td>
</tr>
<tr>
<td>Wei Shen</td>
<td>wshen</td>
<td>shenwei58#hotmail.com</td>
</tr>
<tr>
<td>Cari Mahon</td>
<td>cmahon</td>
<td>cari.mahon#asu.edu</td>
</tr>
<tr>
<td>Alberto Salas</td>
<td>asalas</td>
<td>alberto2504#msn.com</td>
</tr>
<tr>
<td>Dorothy Haskett</td>
<td>dhaskett</td>
<td>dorothy.haskett#asu.edu</td>
</tr>
<tr>
<td>Criss Bradbury</td>
<td>cbradbur</td>
<td>crissbradbury#hotmaiil.com</td>
</tr>
<tr>
<td>Steve Ellermann</td>
<td>sellerma</td>
<td>cis494#ellermann.com</td>
</tr>
<tr>
<td>Zewdie Bekele</td>
<td>zbekele</td>
<td>zewdiea#aol.com</td>
</tr>
<tr>
<td>Frederic Diziere</td>
<td>fdiziere</td>
<td>fsd#asu.edu</td>
</tr>
<tr>
<td>Matt Bowes</td>
<td>mbowes</td>
<td>matt.bowes#asu.edu</td>
</tr>
<tr>
<td>Jasen Meece</td>
<td>jmeece</td>
<td>jasen.meece#sun.com</td>
</tr>
<tr>
<td>Aaron Carpenter</td>
<td>acarpent</td>
<td>aaron.carpenter#asu.edu</td>
</tr>
<tr>
<td>Binqin Xi</td>
<td>bxi</td>
<td>binqin.xi#asu.edu</td>
</tr>
<tr>
<td>Yinting Chan</td>
<td>ychan</td>
<td>yin.chen#asu.edu</td>
</tr>
<tr>
<td>Michael Evans</td>
<td>mevans</td>
<td>michael.evans#asu.edu</td>
</tr>
<tr>
<td>Herman Beringer</td>
<td>hberinge</td>
<td>jber#cox.net</td>
</tr>
<tr>
<td>Andrew Jolley</td>
<td>ajolley</td>
<td>andrew#andrewjolley.com</td>
</tr>
<tr>
<td>Michael Raby</td>
<td>mraby</td>
<td>mike1071#yahoo.com</td>
</tr>
<tr>
<td>Hajar Alaoui</td>
<td>halaoui</td>
<td>hajar6#hotmail.com</td>
</tr>
<tr>
<td>Anne Lemar</td>
<td>alemar</td>
<td>anne.lemar#asu.edu</td>
</tr>
<tr>
<td>Russell Crotts</td>
<td>rcrotts</td>
<td>Russell.Crotts#asu.edu</td>
</tr>
<tr>
<td>Dan Mazzola</td>
<td>dmazzola</td>
<td>dan.mazzola#sun.com</td>
</tr>
<tr>
<td>Bill Boyton</td>
<td>bboyton</td>
<td>boytonb#earthlink.net</td>
</tr>
</table>
</body>
</html>
The following is the content of the roster.txt file:
Whipple Timothy tweed#asu.edu Shen Wei shenwei58#hotmail.com
Mahon Cari cari.mahon#asu.edu Salas Alberto alberto2504#msn.com
Haskett Dorothy dorothy.haskett#asu.edu Bradbury Criss
crissbradbury#hotmaiil.com Ellermann Steve
cis494#ellermann.com Bekele Zewdie zewdiea#aol.com Diziere Frederic
fsd#asu.edu Bowes Matt matt.bowes#asu.edu Meece Jasen
jasen.meece#sun.com Carpenter Aaron aaron.carpenter#asu.edu
Xi Binqin binqin.xi#asu.edu Chan Yinting yin.chen#asu.edu
Evans Michael michael.evans#asu.edu Beringer Herman
jber#cox.net Jolley Andrew andrew#andrewjolley.com Raby Michael
mike1071#yahoo.com Alaoui Hajar hajar6#hotmail.com Lemar Anne
anne.lemar#asu.edu Crotts Russell Russell.Crotts#asu.edu Mazzola Dan
dan.mazzola#sun.com Boyton Bill boytonb#earthlink.net
The following is the content of the sample.html file:
<html>
<body>
<table border=2>
<tr><th>Name</th><th>Username</th><th>Email</th></tr>
<tr>
<td>Michael Raby</td>
<td>mraby</td>
<td>mike1071#yahoo.com</td>
</tr>
<tr>
<td>Hajar Alaoui</td>
<td>halaoui</td>
<td>hajar6#hotmail.com</td>
</tr>
<tr>
<td>Anne Lemar</td>
<td>alemar</td>
<td>anne.lemar#asu.edu</td>
</tr>
<tr>
<td>Russell Crotts</td>
<td>rcrotts</td>
<td>Russell.Crotts#asu.edu</td>
</tr>
<tr>
<td>Dan Mazzola</td>
<td>dmazzola</td>
<td>dan.mazzola#sun.com</td>
</tr>
<tr>
<td>Bill Boyton</td>
<td>bboyton</td>
<td>boytonb#earthlink.net</td>
</tr>
</table>
</body>
</html>
The following is the content of the sample.txt file:
Raby Michael mike1071#yahoo.com
Alaoui Hajar hajar6#hotmail.com
Lemar Anne anne.lemar#asu.edu
Crotts Russell Russell.Crotts#asu.edu
Mazzola Dan dan.mazzola#sun.com
Boyton Bill boytonb#earthlink.net
I'm not asking for someone to do this for me because I LOVE UNIX and I want to learn it myself. Everytime I look at this HTML code I am confusing the #$$#& out of myself. I need help getting started.
The homework prompt is the following:
You are to write a nawk(1) script called ~/hwk12/mk_html.awk that converts a text file (sample.txt and roster.txt) to an html page that a web browser can read. I have given you the output in the file sample.html which is reproduced below (notice how each level of indentation is two spaces deep):
Again, I don't want someone to do this for me. Im just confused as to how data in the text file will append to the HTML table without the actual HTML code. Can someone please help me get started?
Looks like you'll need to define the necessary HTML tags within your script. The meat of the html file will be these lines:
<tr>
<td>$first $last</td>
<td>$username</td>
<td>$email</td>
</tr>
These tags define a table row. You can parse the variables from the text files with awk and use them to fill in the html. The other html markup can be copy-pasted as static text into the output html file.
Edit: You can do this to grab the first and last name and print to the html file.
last = $1
first = $2
print " <tr>"
print " <td>" first " " last "</td>"
print " </tr>"
You just need to expand that to get the email and username.