While there are many questions like that, none of them describes my problem:
I have this list:
<ul>
<li>Burger</li>
<li>Fries</li>
<li>Coke</li>
</ul>
The list gets it's data from a database, that also includes the prices.
Now I need a list that also can show me the price in another column, like:
1. Burger | 6.99$
2. Fries | 2.99$
3. Coke | 1.99$
But all questions I find are about multiple columns if the list is too long.
Is there a way to reach my goal?
Lists aren't designed like that, I guess you could implement some kind of hacky way to make a multi-column list, or you can use a table:
<table>
<tr>
<th>Item</th>
<th>Price</th>
</tr>
<tr>
<td>Burger</td>
<td>$6.99</td>
</tr>
<tr>
<td>Fries</td>
<td>$2.99</td>
</tr>
<tr>
<td>Coke</td>
<td>$1.99</td>
</tr>
</table>
I have the following HTML:
<ol>
<li>foo</li>
<li>bar</li>
<li>testing</li>
<li>hello world</li>
</ol>
This, of course, works and gives the following output:
1. foo
2. bar
3. testing
4. hello world
Now my problem is that I sometimes need to have different text instead of the numbers. The text should be completely customizable, so CSS :before solutions don't work for me.
Consider this example, a listing of TV series episodes:
12. The Duel
13. Prince Family Paper
14./15. Stress relief
16. Lecture Circuit: Part 1
So essentially I'm trying to create a "fake" <ol> so that all the text after the numbers is correctly aligned. Is there any way I can achieve this in HTML/CSS?
I think that you will have to solve it with a table without borders. Align the first tds to the right and the second ones to the left. Is what I think that would most closely resemble what you want.
Something like:
css:
table {
border: none;
}
table tr td:first-child {
text-align: right;
}
html:
<table>
<tr>
<td>12.</td>
<td>The Duel</td>
</tr>
<tr>
<td>13.
<td>Prince Family Paper
</tr>
<tr>
<td>14./15.</td>
<td>Stress relief</td>
</tr>
<tr>
<td>16.</td>
<td>Lecture Circuit: Part 1</td>
</tr>
</table>
I have a table with the following structure:
html {
width: 400px;
}
<table>
<tr>
<td>Description:</td>
<td style="white-space: pre-wrap;">
Some text which could be quite long with some 'simple' lists created by the user like this:
- Point 1
- Point 2
- Point 3
- Another point which is a bit longer than the previous one
- Point 4
</td>
</tr>
</table>
When I would view this in the browsers, on a screen with a small width, one of the bullet points will wrap to the new line and start at the beginning of this new line.
Ideally I want this bullet point to have the same indentation as the line before so both lines of text of the same bullet point will have the same indentation.
In case you're wondering, the reason why there is no ordered or unordered list element is because the user is limited to using a simple textarea to enter their content.
Could anyone tell me if this is possible, and if so, how?
Thanks in advance.
Shouldn't the code be like this:
<table>
<tr>
<td>Description:</td>
<td style="white-space: pre-wrap;">
Some text which could be quite long with some 'simple' lists created by the user like this:
<ul>
<li>Point 1</li>
<li>Point 2</li>
<li>Point 3</li>
<li>Another point which is a bit longer than the previous one</li>
<li>Point 4</li>
</ul>
</td>
</tr>
</table>
You need to use proper ul and li tags instead of -.
EDIT
If you are willing to populate the value from some text-area field, then you might implement WYSIWYG text editors like TinyMCE, CKeditor .
I am attempting to scrape items from a page containing various HTML elements and a series of nested tables.
I have some code working that is successfully scraping from table X where class="ClassA" and outputting table elements into a series of items, such as company address, phone number, website address, etc.
I would like to add some extra items into this list that i am outputting, however the other items to be scraped aren't located within the same table, and some aren't even located in a table at all, eg < H1 > tag in another part of the page.
How is it possible to add some other items into my output, using xpath filter and have them appear in the same array / output structure ? I noticed if I scrape extra table items from another table (even when the table has the exact same CLASS Name and ID) the CSV output for those other items are outputted on different lines in the CSV, not keeping the CSV structure intact :(
Im sure there must be a way for items to remain unified in a csv output, even if they are scraped from slightly different areas on a page ? Hopefully its just a simple fix...
----- HTML EXAMPLE PAGE BEING SCRAPED -----
<html>
<head></head>
<body>
< // huge amount of other HTML and tables NOT to be scraped >
<h2>HEADING TO BE SCRAPED - Company Name</h2>
<p>Company Description</p>
< table cellspacing="0" class="contenttable company-details">
<tr>
<th>Item Code</th>
<td>IT123</td>
</tr>
<th>Listing Date</th>
<td>12 September, 2011</td>
</tr>
<tr>
<th>Internet Address</th>
<td class="altrow">http://www.website.com/</td>
</tr>
<tr>
<th>Office Address</th>
<td>123 Example Street</td>
</tr>
<tr>
<th>Office Telephone</th>
<td>(01) 1234 5678</td>
</tr>
</table>
<table cellspacing="0" class="contenttable" id="staff">
<tr><th>Management Names</th></tr>
<tr>
<td>
Mr John Citizen (CEO)<br/>Mrs Mary Doe (Director)<br/>Dr J. Watson (Manager)<br/>
</td>
</tr>
</table>
<table cellspacing="0" class="contenttable company-details">
<tr>
<th>Contact Person</th>
<td>
Mr John Citizen<br/>
</td>
</tr>
<tr>
<th class=principal>Company Mission</th>
<td>ACME Corp is a retail sales company.</td>
</tr>
</table>
</body>
</html>
---- SCRAPY CODE EXAMPLE ----
from scrapy.spider import Spider
from scrapy.selector import Selector
from my.items import AsxItem
class MySpider(Spider):
name = "my"
allowed_domains = ["website.com"]
start_urls = ["http://www.website.com/ABC" ]
def parse(self, response):
sel = Selector(response)
sites = sel.xpath('//table[#class="contenttable company-details"]')
items = []
for site in sites:
item = MyItem()
item['Company_name'] = site.xpath('.//h1//text()').extract()
item['Item_Code'] = site.xpath('.//th[text()="Item Code"]/following-sibling::td//text()').extract()
item['Listing_Date'] = site.xpath('.//th[text()="Listing Date"]/following-sibling::td//text()').extract()
item['Website_URL'] = site.xpath('.//th[text()="Internet Address"]/following-sibling::td//text()').extract()
item['Office_Address'] = site.xpath('.//th[text()="Office Address"]/following-sibling::td//text()').extract()
item['Office_Phone'] = site.xpath('.//th[text()="Office Telephone"]/following-sibling::td//text()').extract()
item['Company_Mission'] = site.xpath('//th[text()="Company Mission"]/following-sibling::td//text()').extract()
yield item
Outputting to CSV
scrapy crawl my -o items.csv -t csv
With the example code above, the [company mission] item appears on a different line in the CSV to the other items (guessing because its in a different table) even though it has the same CLASS name and ID, and additionally im unsure how to scrape the < H1 > field since it falls outside the table structure for my current XPATH sites filter ?
I could expand the sites XPATH filter to include more content, but won't that be less effecient and defeat the point of filtering all together ?
Here's an example of the debug log, where you can see the Company Mission is being processed twice for some reason, and the first loop is empty, which must be why it is outputting onto a new line in the CSV, but why ??
{'Item_Code': [u'ABC'],
'Listing_Date': [u'1 January, 2000'],
'Office_Address': [u'Level 1, Some Street, SYDNEY, NSW, AUSTRALIA, 2000'],
'Office_Fax': [u'(02) 1234 5678'],
'Office_Phone': [u'(02) 1234 5678'],
'Company_Mission': [],
'Website_URL': [u'http://www.company.com']}
2014-02-06 16:32:13+1000 [my] DEBUG: Scraped from <200 http://www.website.com/Code=ABC>
{'Item_Code': [],
'Listing_Date': [],
'Office_Address': [],
'Office_Fax': [],
'Office_Phone': [],
'Company_Mission': [u'The comapany is involved in retail, food and beverage, wholesale services.'],
'Website_URL': []}
The other thing I am completely baffled about is why the items are spat out in the CSV in a completely different order to the items on the HTML page and the order I have defined in the spiders config file. Does scrapy run completely asynchronously returning items in whatever order it pleases ?
I understand you want to scrape 1 item for this page but //table[#class="contenttable company-details"] matches 2 tables elements in your HTML content, so the for site in sites: will run twice, creating 2 items.
And for each table, XPath expressions will be applied within the current table if they are relative -- .//th[text()="Item Code"]. Absolute XPath expressions, such as //th[text()="Company Mission"], will look for elements from the root element of your HTML document.
Your sample output shows the "Company_Mission" only once while you say it appears twice. And because you're using an absolute XPath expression for it, it should have indeed appeared twice. Not sure if the ouput matches your current spider code in the question.
So, first iteration of the loop,
<table cellspacing="0" class="contenttable company-details">
<tr>
<th>Item Code</th>
<td>IT123</td>
</tr>
<th>Listing Date</th>
<td>12 September, 2011</td>
</tr>
<tr>
<th>Internet Address</th>
<td class="altrow">http://www.website.com/</td>
</tr>
<tr>
<th>Office Address</th>
<td>123 Example Street</td>
</tr>
<tr>
<th>Office Telephone</th>
<td>(01) 1234 5678</td>
</tr>
</table>
in which you can scrape:
Item Code
Listing Date
Internet Address --> Website URL
Office Address
Office Telephone
and because you're using an absolute XPath expression, //th[text()="Company Mission"]/following-sibling::td//text() will look anywhere in the document, not only in this first <table cellspacing="0" class="contenttable company-details">
These extracted field go into an item of their own.
Then comes the 2nd table matching your XPath for sites:
<table cellspacing="0" class="contenttable company-details">
<tr>
<th>Contact Person</th>
<td>
Mr John Citizen<br/>
</td>
</tr>
<tr>
<th class=principal>Company Mission</th>
<td>ACME Corp is a retail sales company.</td>
</tr>
</table>
for which a new MyItem() is instantiated, and here, no XPath expression match except the absolute XPath for "Company Mission", so at the end of the loop iteration, you've got an item with only "Company Mission".
If you're sure you only expect 1 and only 1 item from this page, you can use longer XPaths like //table[#class="contenttable company-details"]//th[text()="Item Code"]/following-sibling::td//text() for each field you want, so that it will match the 1st or 2nd table,
and use only 1 MyItem() instance.
Also, you can try CSS selectors that would be shorter to read and write and easier to maintain:
"Company_name" <-- sel.css('h2::text')
"Item_Code" <-- sel.css('table.company-details th:contains("Item Code") + td::text')
"Listing_Date" <-- sel.css('table.company-details th:contains("Listing Date") + td::text')
etc.
Note that :contains() is available in Scrapy via cssselect underneath, but it's not standard (was remove from the CSS specs, but is handy) and ::text pseudo-element selector is also non-standard but a Scrapy extension, and is also handy.
guessing because its in a different table - wrong guess, there is no correlation between tables and items, in fact, it does not matter where is the data from, as long as you set it of the item fields.
meaning you can take Company_name and Company_Mission from wherever you want.
having said that, check what is returned from //th[text()="Company Mission"] and how many times it appears on the page, while other items xpath are relative (start with a .) this one is absolute (start with //), it may scrape a list of items and not just one
I've got a Wiki which uses Textile to markup text. I'm trying to put a list within a table cell, and I can't seem to figure out how. I'm trying to replicate the following HTML in Textile:
<table>
<tr>
<td>Cell One</td>
<td>
Cell Two
<ol>
<li>Lorem</li>
<li>Ipsum</li>
</ol>
</td>
</tr>
</table>
Tried a lot of things but couldn't get anything to work. Is it possible todo this without using <li> HTML tags in Textile?
I have the same problem currently in RubyMine. I have a solution that works on Textile, perhaps it helps you:
|Cell One|Cell Two ==<ol><li>Lorem</li>
<li>Ipsum</li></ol>== test|
It doesn't work in Redmine, though :(