date =soup.find_all("td", {"id": "utime"})
print date
[<td class="mstat-date" colspan="3" id="utime">23.11.2015 17:00</td>]
this is what i want
[23.11.2015 17:00]
print(soup.date.string)
AttributeError: 'NoneType' object has no attribute 'string'
Little help please, thank you.
The method find_all will always return a list. In a list, you must index, before you can access methods on the elements contained within. So when you have
date =soup.find_all("td", {"id": "utime"})
You can access the text from the first tag, by typing:
date[0].text
If there are more items in the list, you can use a list comprehension, like this:
[ _.text for _ in date ]
That will give you a list of dates, if the HTML you're scraping had more than one such date-like tags.
Your beautifulsoup instance will not have associated with it a date attribute/property, unless in some very specific conditions. Use your first function to access all the dates, so don't try this approach (bad: soup.date).
Related
I comment, and looked here and I can not find the solution, my problem is the following:
in my html template in angular, I need to pass a series of data to the metadata property of a button, I can't get the correct way to successfully concatenate the variable that contains the value.
this should be the html element:
<mati-button clientId="clientId" flowId="flowId" color="green"metadata='{"user_id":"1234778","email":"som#som.com"}'/>
I tried several ways but I can't insert the respective values....
example:
<mati-button metadata='{"userID": "{{user.id}}" }'></mati-button>
unsuccessfully...
Assuming mati-button is an Angular component with metadata as Input(), you are probably looking for
<mati-button
[clientId]="clientId"
[flowId]="flowId"
[color]="green"
[metadata]="{ userId: '1234778', email: 'som#som.com'}"
></mati-button>
See the guide on property binding to learn more:
To bind to an element's property, enclose it in square brackets, [], which identifies the property as a target property. [...] The brackets, [], cause Angular to evaluate the right-hand side of the assignment as a dynamic expression. Without the brackets, Angular treats the right-hand side as a string literal and sets the property to that static value.
By "dynamic expression" they mean JS-expressions, i.e., a public variable available through the component's TypeScript, a boolean expression, an array, or, like in your case, a JS-object that you can construct inline.
You can try doing this
<mati-button metadata="{'userID': user.id }"></mati-button>
metadata='{" userID ": {{user.id}}}'
in the end I got it. Apparently I don't know why, but the third-party script hides that parameter and it couldn't be debugged in the console, but it does receive them without any problem! Thanks everyone for your help!
I'm trying to scrape text from a website, but specifically only the text that's linked to with one of two specific links, and then additionally scrape another text string that follows shortly after it.
The second text string is easy to scrape because it includes a unique class I can target, so I've already gotten that working, but I haven't been able to successfully scrape the first text (with the one of two specific links).
I found this SO question ( Find specific link w/ beautifulsoup ) and tried to implement variations of that, but wasn't able to get it to work.
Here's a snippet of the HTML code I'm trying to scrape. This patter recurs repeatedly over the course of each page I'm scraping:
<em>[女孩]</em> 寻找2003年出生2004年失踪贵州省黔西南布依族苗族自治州贞丰县珉谷镇锅底冲 黄冬冬289179
The two parts I'm trying to scrape and then store together in a list are the two Chinese-language text strings.
The first of these, 女孩, which means female, is the one I haven't been able to scrape successfully.
This is always preceded by one of these two links:
forum.php?mod=forumdisplay&fid=191&filter=typeid&typeid=19 (Female)
forum.php?mod=forumdisplay&fid=191&filter=typeid&typeid=15 (Male)
I've tested a whole bunch of different things, including things like:
gender_containers = soup.find_all('a', href = 'forum.php?mod=forumdisplay&fid=191&filter=typeid&typeid=19')
print(gender_containers.get_text())
But for everything I've tried, I keep getting errors like:
ResultSet object has no attribute 'get_text'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
I think that I'm not successfully finding those links to grab the text, but my rudimentary Python skills thus far have failed me in figuring out how to make it happen.
What I want to have happen ultimately is to scrape each page such that the two strings in this code (女孩 and 寻找2003年出生2004年失踪贵州省...)
<em>[女孩]</em> 寻找2003年出生2004年失踪贵州省黔西南布依族苗族自治州贞丰县珉谷镇锅底冲 黄冬冬289179
...are scraped as two separate variables so that I can store them as two items in a list and then iterate down to the next instance of this code, scrape those two text snippets and store them as another list, etc. I'm building a list of list in which I want each row/nested list to contain two strings: the gender (女孩 or 男孩)and then the longer string, which has a lot more variation.
(But currently I have working code that scrapes and stores that, I just haven't been able to get the gender part to work.)
Sounds like you could use attribute = value css selector with $ ends with operator
If there can only be one occurrence per page
soup.select_one("[href$='typeid=19'], [href$='typeid=15']").text
This is assuming those typeid=19 or typeid=15 only occur at the end of the strings of interest. The "," between the two in the selector is to allow for matching on either.
You could additionally handle possibility of not being present as follows:
from bs4 import BeautifulSoup
html ='''<em>[女孩]</em> 寻找2003年出生2004年失踪贵州省黔西南布依族苗族自治州贞丰县珉谷镇锅底冲 黄冬冬289179'''
soup=BeautifulSoup(html,'html.parser')
gender = soup.select_one("[href$='typeid=19'], [href$='typeid=15']").text if soup.select_one("[href$='typeid=19'], [href$='typeid=15']") is not None else 'Not found'
print(gender)
Multiple values:
genders = [item.text for item in soup.select_one("[href$='typeid=19'], [href$='typeid=15']")]
Try the following code.
from bs4 import BeautifulSoup
data='''<em>[女孩]</em> 寻找2003年出生2004年失踪贵州省黔西南布依族苗族自治州贞丰县珉谷镇锅底冲 黄冬冬289179'''
soup=BeautifulSoup(data,'html.parser')
print(soup.select_one('em').text)
OutPut:
[女孩]
I'm trying to restaurant names from Tripadvisor with Python 3 & lxml. The text i'm trying to retrieve is in the following element and is named 'Al Fresco's in this case.
<a target="_blank" href="/Restaurant_Review-g293925-d8327527-Reviews-
Al_Fresco_s-Ho_Chi_Minh_City.html" class="property_title"
onclick="ta.restaurant_list_tracking.clickDetailTitle('/Restaurant_Review-
g293925-d8327527-Reviews-Al_Fresco_s-
Ho_Chi_Minh_City.html','tags_category_tag_restaurants','8327527','1','0');">
Al Fresco's
</a>
The Xpath reference to this element:
//*[#id="eatery_8327527"]/div[2]/div[1]/div[1]/a
I use the following simple code to retrieve the text in this element:
from lxml import html
import requests
page = requests.get('https://www.tripadvisor.nl/Restaurants-g293925-
Ho_Chi_Minh_City.html')
tree = html.fromstring(page.content)
#This will create a list of Names:
Name = tree.xpath('//*[#id="eatery_8327527"]/div[2]/div[1]/div[1]/a/text()')
print ('Name: ', Name)
This returns me an empty array: Name: []
How do I get the text I want?
Without having a look at the actual page your Xpath is probably too strict. Try something like this:
//a[contains(#href,"Restaurant_Review")]/text()
If that yields too many results try adding the parent in front.
Hope that helps.
UPDATE:
After having a look at the actual page, this i probably what you are looking for:
//a[contains(#class,"property_title")]/text()
I'm trying to get a working JSON output (an array with x amount of objects) from a freemaker ftl file. If there is only 1 object in th array of "loggedInUsers" then the code below works. If there are more than 1, then the JSON breaks. I know a comma should separate the separate between each, but the problem comes when I add one after the closing brace. Any help would be greatly appreciated.
[
<#list loggedInUsers as user>
{
"user": "${user}"
}
</#list>
]
If I understand well, you want to add a comma except after the last item. In that case use the #sep directive, like }<#sep>,</#sep>. (See also: http://freemarker.org/docs/ref_directive_list.html)
I'm using the Last.fm API to return data in JSON format and this works fine. I'm using the user.getTopArtist() API call.
As the page loads, a DIV object is created for each artist containing relevant details from the JSON data. When a user performs an action with the DIV I basically want to swap the image url to show a bigger image size!
How can I find/reference a JSON object by matching it's stored value?
For example, if I need to match the artist name 'Kate Bush' and then retrieve the "extralarge" image url. How would I do this?
The data structure looks like this:
{"topartists":{
"artist":[{
"name":"Kate Bush",
"playcount":"20",
"mbid":"4b585938-f271-45e2-b19a-91c634b5e396",
"url":"http:\/\/www.last.fm\/music\/Kate+Bush",
"image":[
{"#text":"http:\/\/userserve-ak.last.fm\/serve\/34\/224740.jpg","size":"small"},
{"#text":"http:\/\/userserve-ak.last.fm\/serve\/64\/224740.jpg","size":"medium"},
{"#text":"http:\/\/userserve-ak.last.fm\/serve\/126\/224740.jpg","size":"large"},
{"#text":"http:\/\/userserve-ak.last.fm\/serve\/252\/224740.jpg","size":"extralarge"},
{"#text":"http:\/\/userserve-ak.last.fm\/serve\/500\/224740\/Kate+Bush.jpg","size":"mega"}
]
}
}
What are you using to parse JSON?
Here's a jQuery example http://api.jquery.com/jQuery.parseJSON/
If it is just for this particular task then that $.each(data.topartists.artist[0].images) approach would work.
As of more generic solution... Take a look on http://goessner.net/articles/JsonPath/ - that is XPath variant for JSON