I have a very complex dynamic table that I need to output to pdf in laravel 5.6. The project I inherited had Dompdf installed and is already rendering all other content. Therefore, I use it as well for compatibility.
My issue is I have a table to render consisting of 13 columns and undefined number of rows, where intermittently a column may span 13 columns for a heading or a row may span several rows at any given time or a colspan within the rowspan that spans 11 columns from the 3rd row. No html is hardcoded except the <table>, <thead>, <th> and <tbody> tags. The html within the tbody tag is dynamically generated depending on the array data.
Everything looks great in the browser and when I view() the pdf blade as well as ctrl + p it creates a nice pdf, although for some reason rowspan cells spanning to the next page does not carry over markup and content. As soon as I try to stream() the pdf the table becomes warped and looks like a toppled building built by Picasso.
Here is links to pdf's, the one I ctrl + p lost its colour due to me removing names.
File to view pdf printed with ctrl + p
Pdf streamed with Dompdf
Image of viewing pdf in browser
Image of pdf when streaming via Dompdf:
Html sample rendered in browser:
<tr style="background-color: #5b8969;">
<td rowspan="2" style="background-color: #F8C293; color: black;">Spray 4</td>
<td>Pollinate</td>
<td>7-10 days later</td>
<td>BENOMYL WP 25KG </td>
<td>benomyl 500g/kg</td>
<td> </td>
<td>1000</td>
<td>2.00</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Full bloom</td>
<td>Black Spot</td>
<td>WETCIT DUO 20L </td>
<td>borax 10g/orange oil 50g/l</td>
<td> </td>
<td>1000</td>
<td>25.00</td>
<td>100.0000</td>
<td>120.0000L</td>
<td>2500.0000</td>
<td></td>
<td></td>
</tr>
<tr>
<td colspan="13" style="background-color: #9fb5d3;" class="h3 font-weight-bold">ANOTHER ONE</td>
</tr>
<tr>
<td rowspan="7" style="background-color: #F8C293; color: black;">Spray 7</td>
<td>20 cm</td>
<td>African Armyworm</td>
<td>CERATO 250 EC 5L </td>
<td>pyraclostrobin 250g/l</td>
<td> </td>
<td>1000</td>
<td>2.00</td>
<td>10.0000</td>
<td></td>
<td>20.0000</td>
<td></td>
<td></td>
</tr>
Can someone please help and give me a clue on how to output such a complex table with Dompdf? As I would really want to keep on using only one PDF rendering library in this project.
Otherwise I am open to suggestions to use another pdf library that can handle rowspan that span pages and this complex layout?
Update
Based on a comment by Don't panic (he suggested validating html and fill empty td tags with ), that he subsequently deleted.
I re-wrote the HTML as a template into my pdf.blade.php view. Now, I only output the values in a loop in my view. Firstly, it becomes easier to maintain and to leave off the validation he suggested. I also filled every empty <td> tag with a hardcoded ' '. This is to more easily see why certain rows end where they should and others not. The result is sadly still the same, a warped table. But it does seem to be a rowspan issue not colspan. The 'rowspan' rows stack after another. So maybe missing a td tr.
Solved rowspan stacking issue
Two weeks of testing and only problem was it was not outputting certain rows' opening tags, which lead to rows not knowing when to begin. Now only problem left is rowspan across pages.
Update on update
So I have really tried everything I can to get DomPdf to do what it is suppose to do, which is rendering pdf's. I have read a bit more and found that this library has a long standing issue of not being able to render rowspan accross pages. Therefore, on to the next rendering library wkhtlmpdf or I could logically divide rowspans to stop at end of page and start again on new page. Will have to check my watch on this one.
I've just been given a take home exam for a job as an email developer. My task is to try my best to create an HTML email out of a .png file. I've been using tables, and I've come to a section where I have to insert an image alongside text, and I'm crashing and burning. The header text is too far displaced from the paragraph, and the image doesn't sit well; does anyone have any ideas on how to resolve the issue? My code is as follows:
div #costume-section {
width:645px;
height: 225px;
padding-left: 05px;
background: #ff821d;
color: white;
}
<div id="costume-section">
<table>
<th id="cos-font">Costume Contest</th>
<tr>
<td>Duhh, of course - Wear it all day if you wanna. Perhaps you will be the winner of the contest? (must be present at party to be voted on) We'll hold a kid contest too!</td>
<td><img src="http://mandrill-emails.s3.amazonaws.com/melt-holidays/20151020/costumes.png" /></td>
</tr>
</table>
</div>
👋 Rachelledev
Think about this like you are working in a spreadsheet. You have set a th and a tr table row.
Since you have your header text outside of the row containg the 2 td's the header text sits on top of the tr containing the text and the image.
Rearranging the table markup you don't really need the th in this scenario unless its a requirement.
<div id="costume-section">
<table>
<tr>
<td>
<h1>Costume Contest</h1>
Duhh, of course - Wear it all day if you wanna. Perhaps you will be the winner of the contest? (must be present at party to be voted on) We'll hold a kid contest too!</td>
<td><img src="http://mandrill-emails.s3.amazonaws.com/melt-holidays/20151020/costumes.png" /></td>
</tr>
</table>
</div>
I'm working on a project where I've made my phpmyadmin database spit out a set of 6 images on my webpage. I've put it into a table and this is where the trouble begins - even though it sounds easy!
I need the images to be in three's, in a horizontal line.
I will have 6 images most of the time so 3 per row with good spacing/padding etc.
I've tried a lot of things and played around with the CSS but couldn't get it to work.
Here are (respectively) the actual page and how it looks, the CSS for it and the actual code/script of the table:
Actual Page
CSS for the table: table.Evidence td {
padding:0px,10px,0px,0px;
}
Script for the table:
It looks very easy but I couldn't make it work.
Any help would be much appreciated!
I'm new so please bear with me until I get used to this.
The first thing is that if you define all 4 paddings in one command you have to seperate them with spaces.
table.Evidence td { padding:0px 10px 0px 0px; }
It also seems that you don't use the table tags right.
With an tr you are adding tan new row and with an td you are adding a new cell.
A table of 2x2 cells would look like:
<table border="1">
<tr>
<td>
1
</td>
<td>
2
</td>
</tr>
<tr>
<td>
3
</td>
<td>
4
</td>
</tr>
</table>
Your <tr></tr> tags should be after every third image, so the if in your while should be:
if($i % 3 == 0){
echo "</tr></tr>";
}
else{
echo "<td><img something...></td>";
}
Also you must have one <tr> opening tag directly after the <table> tag, and one </tr> closing tag directly before the </table> tag.
I am attempting to scrape items from a page containing various HTML elements and a series of nested tables.
I have some code working that is successfully scraping from table X where class="ClassA" and outputting table elements into a series of items, such as company address, phone number, website address, etc.
I would like to add some extra items into this list that i am outputting, however the other items to be scraped aren't located within the same table, and some aren't even located in a table at all, eg < H1 > tag in another part of the page.
How is it possible to add some other items into my output, using xpath filter and have them appear in the same array / output structure ? I noticed if I scrape extra table items from another table (even when the table has the exact same CLASS Name and ID) the CSV output for those other items are outputted on different lines in the CSV, not keeping the CSV structure intact :(
Im sure there must be a way for items to remain unified in a csv output, even if they are scraped from slightly different areas on a page ? Hopefully its just a simple fix...
----- HTML EXAMPLE PAGE BEING SCRAPED -----
<html>
<head></head>
<body>
< // huge amount of other HTML and tables NOT to be scraped >
<h2>HEADING TO BE SCRAPED - Company Name</h2>
<p>Company Description</p>
< table cellspacing="0" class="contenttable company-details">
<tr>
<th>Item Code</th>
<td>IT123</td>
</tr>
<th>Listing Date</th>
<td>12 September, 2011</td>
</tr>
<tr>
<th>Internet Address</th>
<td class="altrow">http://www.website.com/</td>
</tr>
<tr>
<th>Office Address</th>
<td>123 Example Street</td>
</tr>
<tr>
<th>Office Telephone</th>
<td>(01) 1234 5678</td>
</tr>
</table>
<table cellspacing="0" class="contenttable" id="staff">
<tr><th>Management Names</th></tr>
<tr>
<td>
Mr John Citizen (CEO)<br/>Mrs Mary Doe (Director)<br/>Dr J. Watson (Manager)<br/>
</td>
</tr>
</table>
<table cellspacing="0" class="contenttable company-details">
<tr>
<th>Contact Person</th>
<td>
Mr John Citizen<br/>
</td>
</tr>
<tr>
<th class=principal>Company Mission</th>
<td>ACME Corp is a retail sales company.</td>
</tr>
</table>
</body>
</html>
---- SCRAPY CODE EXAMPLE ----
from scrapy.spider import Spider
from scrapy.selector import Selector
from my.items import AsxItem
class MySpider(Spider):
name = "my"
allowed_domains = ["website.com"]
start_urls = ["http://www.website.com/ABC" ]
def parse(self, response):
sel = Selector(response)
sites = sel.xpath('//table[#class="contenttable company-details"]')
items = []
for site in sites:
item = MyItem()
item['Company_name'] = site.xpath('.//h1//text()').extract()
item['Item_Code'] = site.xpath('.//th[text()="Item Code"]/following-sibling::td//text()').extract()
item['Listing_Date'] = site.xpath('.//th[text()="Listing Date"]/following-sibling::td//text()').extract()
item['Website_URL'] = site.xpath('.//th[text()="Internet Address"]/following-sibling::td//text()').extract()
item['Office_Address'] = site.xpath('.//th[text()="Office Address"]/following-sibling::td//text()').extract()
item['Office_Phone'] = site.xpath('.//th[text()="Office Telephone"]/following-sibling::td//text()').extract()
item['Company_Mission'] = site.xpath('//th[text()="Company Mission"]/following-sibling::td//text()').extract()
yield item
Outputting to CSV
scrapy crawl my -o items.csv -t csv
With the example code above, the [company mission] item appears on a different line in the CSV to the other items (guessing because its in a different table) even though it has the same CLASS name and ID, and additionally im unsure how to scrape the < H1 > field since it falls outside the table structure for my current XPATH sites filter ?
I could expand the sites XPATH filter to include more content, but won't that be less effecient and defeat the point of filtering all together ?
Here's an example of the debug log, where you can see the Company Mission is being processed twice for some reason, and the first loop is empty, which must be why it is outputting onto a new line in the CSV, but why ??
{'Item_Code': [u'ABC'],
'Listing_Date': [u'1 January, 2000'],
'Office_Address': [u'Level 1, Some Street, SYDNEY, NSW, AUSTRALIA, 2000'],
'Office_Fax': [u'(02) 1234 5678'],
'Office_Phone': [u'(02) 1234 5678'],
'Company_Mission': [],
'Website_URL': [u'http://www.company.com']}
2014-02-06 16:32:13+1000 [my] DEBUG: Scraped from <200 http://www.website.com/Code=ABC>
{'Item_Code': [],
'Listing_Date': [],
'Office_Address': [],
'Office_Fax': [],
'Office_Phone': [],
'Company_Mission': [u'The comapany is involved in retail, food and beverage, wholesale services.'],
'Website_URL': []}
The other thing I am completely baffled about is why the items are spat out in the CSV in a completely different order to the items on the HTML page and the order I have defined in the spiders config file. Does scrapy run completely asynchronously returning items in whatever order it pleases ?
I understand you want to scrape 1 item for this page but //table[#class="contenttable company-details"] matches 2 tables elements in your HTML content, so the for site in sites: will run twice, creating 2 items.
And for each table, XPath expressions will be applied within the current table if they are relative -- .//th[text()="Item Code"]. Absolute XPath expressions, such as //th[text()="Company Mission"], will look for elements from the root element of your HTML document.
Your sample output shows the "Company_Mission" only once while you say it appears twice. And because you're using an absolute XPath expression for it, it should have indeed appeared twice. Not sure if the ouput matches your current spider code in the question.
So, first iteration of the loop,
<table cellspacing="0" class="contenttable company-details">
<tr>
<th>Item Code</th>
<td>IT123</td>
</tr>
<th>Listing Date</th>
<td>12 September, 2011</td>
</tr>
<tr>
<th>Internet Address</th>
<td class="altrow">http://www.website.com/</td>
</tr>
<tr>
<th>Office Address</th>
<td>123 Example Street</td>
</tr>
<tr>
<th>Office Telephone</th>
<td>(01) 1234 5678</td>
</tr>
</table>
in which you can scrape:
Item Code
Listing Date
Internet Address --> Website URL
Office Address
Office Telephone
and because you're using an absolute XPath expression, //th[text()="Company Mission"]/following-sibling::td//text() will look anywhere in the document, not only in this first <table cellspacing="0" class="contenttable company-details">
These extracted field go into an item of their own.
Then comes the 2nd table matching your XPath for sites:
<table cellspacing="0" class="contenttable company-details">
<tr>
<th>Contact Person</th>
<td>
Mr John Citizen<br/>
</td>
</tr>
<tr>
<th class=principal>Company Mission</th>
<td>ACME Corp is a retail sales company.</td>
</tr>
</table>
for which a new MyItem() is instantiated, and here, no XPath expression match except the absolute XPath for "Company Mission", so at the end of the loop iteration, you've got an item with only "Company Mission".
If you're sure you only expect 1 and only 1 item from this page, you can use longer XPaths like //table[#class="contenttable company-details"]//th[text()="Item Code"]/following-sibling::td//text() for each field you want, so that it will match the 1st or 2nd table,
and use only 1 MyItem() instance.
Also, you can try CSS selectors that would be shorter to read and write and easier to maintain:
"Company_name" <-- sel.css('h2::text')
"Item_Code" <-- sel.css('table.company-details th:contains("Item Code") + td::text')
"Listing_Date" <-- sel.css('table.company-details th:contains("Listing Date") + td::text')
etc.
Note that :contains() is available in Scrapy via cssselect underneath, but it's not standard (was remove from the CSS specs, but is handy) and ::text pseudo-element selector is also non-standard but a Scrapy extension, and is also handy.
guessing because its in a different table - wrong guess, there is no correlation between tables and items, in fact, it does not matter where is the data from, as long as you set it of the item fields.
meaning you can take Company_name and Company_Mission from wherever you want.
having said that, check what is returned from //th[text()="Company Mission"] and how many times it appears on the page, while other items xpath are relative (start with a .) this one is absolute (start with //), it may scrape a list of items and not just one
I need a button spanning the entire length of a table that will perform the function of making a call to get the next data-set to save the user from having to manually selecting it.
Unfortunately my css is not that great. Any help? Thanks!
There is some confusion as to what I want, let me try to draw it.
| table |b|
| |u|
| |t|
|_______|n|
I can't give you the mark up because it's at work. Plus it's heavily templated so it wouldn't help much.
Ok, so that problem is a little trickier.
<div style="position: relative;">
<table style="">
<tr>
<td>asdasdasdssssssss sssssssssssss sssssssssss sssssssss sssssssssssssss sssssssss</td>
<td>betasssssssssssssssssssssssssssss ssssssssssssssssssssssssssss ssssssssssssssss sssssssssss sssssss sssssssssssssssss s s ss s s s</td>
</tr>
<tr>
<td>alpha</td>
<td>beta</td>
</tr>
</table>
<div style="position: absolute; top: 0pt; right: 0pt; bottom: 0pt;">
<button style="height: 100%;">c</button>
</div>
</div>
The problem with this solution is you need to give your content that sits on the right a width then add a margin to the table so that it does not hide behind the button.
There is a lot of information for you to take in over at this related question which points to some valuable resources.
<table>
<tr>
<td>alpha</td>
<td>beta</td>
</tr>
<tr>
<td colspan="2"><button style="width:100%">c</button></td>
</tr>
</table>
Some example of what you are trying to lay out and what you have tried would have been useful.
The important thing is the colspan which turns many columns into 1.
I wouldn't recommend using <table>s, as they scale differently than most elements (i.e. if you have an element inside a table that gets too big (like a block/inline-block/or long word) the table scales, possibly throwing off your layout. That, and having to mark each row adds more cruft to your markup.
Anyways, if you want to span multiple rows to be able to right align an <input type="submit"/> or whatever you're trying to do, you can simply say:
<table>
<tr>
<!-- lots of other rows and stuff -->
<td colspan="4" align="right"><!-- Adjust [colspan] to # of rows in yours -->
<input type="submit" value="I'm on the right now!"/>
</td>
</tr>
</table>
The align="right" does what you'd expect - aligns inline/inline-block elements to the right (generally text and images). The colspan="4" (again, replace with your own number of columns) simply tells the table that you're going to be merging 4 columns of the table starting with you (going to the right).
So, before:
-------------------------
| 1 | 2 | 3 | 4 |
-------------------------
| <------ after ------> |
-------------------------
Hope that helps.