ROBOTFRAMEWORK - Looping through all images on a page - pulling the link

ROBOTFRAMEWORK - Looping through all images on a page - pulling the link - html

I am working on a test that checks that all images on a page are visible. I'm running into an issue where its only pulling the link from the first img on the page and logs it the length of the loop. Im currently getting a count of all the images, and in that count I loop through and pull the img source. There are no special classes, or ids. The only thing I have to go off of is . I'm guessing I will somehow need to parse the entire HTML since robotframework only looks at what is viewable on the screen?
My end goal is to pull all img sources on a page and confirm each one returns a 200 status code.
Here is what I have now:
#{all_image_sources} Create List
${all_images} Get Element Count //body//img
FOR ${image} IN RANGE ${all_images}
${img_src} Get Element Attribute tag:img src
log ${img_src}
Append To List ${all_image_sources} ${img_src}
END
Log List ${all_image_sources}'''

You might consider using Get WebElements, this will give you each image locator in a list. You can then loop through the list to get each src attribute.
example:
#{all_image_sources} Create List
${all_images} Get WebElements //body//img
FOR ${image} IN #{all_images}
${img_src} Get Element Attribute ${image} src
Append To List ${all_image_sources} ${img_src}
END
Log List ${all_image_sources}
Get WebElements

Related

How to scrape text based on a specific link with BeautifulSoup?

I'm trying to scrape text from a website, but specifically only the text that's linked to with one of two specific links, and then additionally scrape another text string that follows shortly after it.
The second text string is easy to scrape because it includes a unique class I can target, so I've already gotten that working, but I haven't been able to successfully scrape the first text (with the one of two specific links).
I found this SO question ( Find specific link w/ beautifulsoup ) and tried to implement variations of that, but wasn't able to get it to work.
Here's a snippet of the HTML code I'm trying to scrape. This patter recurs repeatedly over the course of each page I'm scraping:
<em>[女孩]</em> 寻找2003年出生2004年失踪贵州省黔西南布依族苗族自治州贞丰县珉谷镇锅底冲 黄冬冬289179
The two parts I'm trying to scrape and then store together in a list are the two Chinese-language text strings.
The first of these, 女孩, which means female, is the one I haven't been able to scrape successfully.
This is always preceded by one of these two links:
forum.php?mod=forumdisplay&fid=191&filter=typeid&typeid=19 (Female)
forum.php?mod=forumdisplay&fid=191&filter=typeid&typeid=15 (Male)
I've tested a whole bunch of different things, including things like:
gender_containers = soup.find_all('a', href = 'forum.php?mod=forumdisplay&fid=191&filter=typeid&typeid=19')
print(gender_containers.get_text())
But for everything I've tried, I keep getting errors like:
ResultSet object has no attribute 'get_text'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
I think that I'm not successfully finding those links to grab the text, but my rudimentary Python skills thus far have failed me in figuring out how to make it happen.
What I want to have happen ultimately is to scrape each page such that the two strings in this code (女孩 and 寻找2003年出生2004年失踪贵州省...)
<em>[女孩]</em> 寻找2003年出生2004年失踪贵州省黔西南布依族苗族自治州贞丰县珉谷镇锅底冲 黄冬冬289179
...are scraped as two separate variables so that I can store them as two items in a list and then iterate down to the next instance of this code, scrape those two text snippets and store them as another list, etc. I'm building a list of list in which I want each row/nested list to contain two strings: the gender (女孩 or 男孩）and then the longer string, which has a lot more variation.
(But currently I have working code that scrapes and stores that, I just haven't been able to get the gender part to work.)

Sounds like you could use attribute = value css selector with $ ends with operator
If there can only be one occurrence per page
soup.select_one("[href$='typeid=19'], [href$='typeid=15']").text
This is assuming those typeid=19 or typeid=15 only occur at the end of the strings of interest. The "," between the two in the selector is to allow for matching on either.
You could additionally handle possibility of not being present as follows:
from bs4 import BeautifulSoup
html ='''<em>[女孩]</em> 寻找2003年出生2004年失踪贵州省黔西南布依族苗族自治州贞丰县珉谷镇锅底冲 黄冬冬289179'''
soup=BeautifulSoup(html,'html.parser')
gender = soup.select_one("[href$='typeid=19'], [href$='typeid=15']").text if soup.select_one("[href$='typeid=19'], [href$='typeid=15']") is not None else 'Not found'
print(gender)
Multiple values:
genders = [item.text for item in soup.select_one("[href$='typeid=19'], [href$='typeid=15']")]

Try the following code.
from bs4 import BeautifulSoup
data='''<em>[女孩]</em> 寻找2003年出生2004年失踪贵州省黔西南布依族苗族自治州贞丰县珉谷镇锅底冲 黄冬冬289179'''
soup=BeautifulSoup(data,'html.parser')
print(soup.select_one('em').text)
OutPut:
[女孩]

How to include template's value inside link or table in MediaWiki?

I search the documentation but I didn't know exactly how to call that.
I have a template Index2Name that return a name based on an index.
I'm trying to use that name in a link:
[[Articles/{{Index2Name|0001}}|{{Index2Name|0001}}]]
or
Image:Big-0001.png|link=Articles/{{Index2Name|0001}}|''{{Index2Name|0001}}''
In the last example, the name is printed but the link doesn't work. (In gallery element)
It doesn't work. The value from the template is printed but it is not converted to a link.
How can I make this works? And does this have a name? (For future reference)
EDIT: Index2Name is a simple switch returning a few words depending of the id. Since I'm using subpages I only want the name to appear (Example: MyArticle) but the link is Articles/MyArticle

Could you clarify exactly what you want to happen please. (Where you want to link and how you want it to look).
But for example if you use:
[[Image:Big-0001.png|''{{Index2Name|0001}}'']]
It will link to the page Image:Big-0001.png with the link text being the output of:
''{{Index2Name|0001}}''
Or if you use:
[[Image:Big-001.jpg|link=Articles/{{Index2Name|0001}}]]
The image, when clicked, will redirect you to the output of:
{{Index2Name|0001}}

xpath scraping data from the second page

I am trying to scrape data from this webpage: http://webfund6.financialexpress.net/clients/zurichcp/PortfolioPriceTable.aspx?SchemeID=33, and I specifically need data for fund number 26.
Have no problem getting data from the first page with this address (funds number 1-25), but for the hell of me can't scrape anything from the second page. Can someone help?
Thanks!
Here is the code I use: in Google Sheets:
=IMPORTXML("http://webfund6.financialexpress.net/clients/zurichcp/PortfolioPriceTable.aspx?SchemeID=33","/html/body/form[#id='MainForm']/table/tr/td/div[#id='main']/div[#id='tabResult']/div[#id='Prices']/table/thead/tr[26]/td[#class='Center'][1]")

You can do 2 things - one is to append the PgIndex=2 onto the end of your URL, and then you can also significantly simplify your xpath to this:
//*[#id='Prices']//tr[2]/td[2]
This specifically grabs the second row on the table (tr which means table-row), in order to bypass the header row, then grabs the second field which is the table-data cell.
=IMPORTXML("http://webfund6.financialexpress.net/clients/zurichcp/PortfolioPriceTable.aspx?SchemeID=33&PgIndex=2","//*[#id='Prices']//tr[2]/td[2]")

To get the second page, add &PgIndex=2 to your url. Then adjust the /table/thead/tr[26] to /table/thead/tr[2]. The result is:
=IMPORTXML("http://webfund6.financialexpress.net/clients/zurichcp/PortfolioPriceTable.aspx?SchemeID=33&PgIndex=2","/html/body/form[#id='MainForm']/table/tr/td/div[#id='main']/div[#id='tabResult']/div[#id='Prices']/table/thead/tr[2]/td[#class='Center'][1]")

how to make next link for pagination

I am new to jsp in my jsp i have
href="${mainUrl}/userdata?page=1" target="_blank">Next On First Page
href="${mainUrl}/userdata?page=1" target="_blank">Next On 2nd Page
href="${mainUrl}/userdata?page=1" target="_blank">Next On 3rd Page
href="${mainUrl}/userdata?page=1" target="_blank">Next On 4rth Page
However , i want this link on every page
Requirement :
Next
So i applied for-each loop on tag
<c:forEach items="${i}" var="i">
Next
</c:forEach>
${i} : Its an attribute from my controller with int i=0;
However , compiler is saying it cant interate on "i"
How can i fix that? I am doing Pagination

Assuming userdata is your controller. Make the controller set the next page's id, not the current one's, and name it something better than i, like nextPageID.
So in the controller, take the current id received in the request parameter page, if any, and add one to it, and set it in the request attribute nextPageID. I guess if no page id was received, the next id would 2. Then in your JSP:
Next
Because if you are adding one in the JSP, how do you know this isn't the last page? Keep the logic part in the controller. The controller will presumably know which number is the last page.
BTW, you obviously cannot iterate on a simple int; Iteration means traversing over an array or list.

GxtRtl Html and HtmlContainer flips input html content

I use Gxt-2.2.5-Rtl (http://code.google.com/p/gxt-rtl/) and try to show html content through HtmlContainer's setUrl() method. But unfortunately the result is flipped version of my expected output. For example suppose our input html contains a table which starts columns from right to left as id, name, description. So what we get is a table that their column starts from expected order BUT FROM LEFT TO RIGHT!
I used Gxt's Html and Gwt's HTML and HtmlPanel classes, but this problem doesn't solve.
In addition I should say when I use TabItem or ContentPanel's setUrl() method this problem disappears. But I prefer to don't use that method and because:
1- Just last loaded iFrame is visible at a time. This means that navigating through other preloaded tab items displays a blank page.
2- Poor control over loaded page through GWT, like catching click events and etc.
Expected output:
http://www.freeimagehosting.net/yow6l
Wrong output:
http://www.freeimagehosting.net/8opdt
I changed the titles to English for better communicating! :)
Thanks!

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

ROBOTFRAMEWORK - Looping through all images on a page - pulling the link - html

Related

How to scrape text based on a specific link with BeautifulSoup?

How to include template's value inside link or table in MediaWiki?

xpath scraping data from the second page

how to make next link for pagination

GxtRtl Html and HtmlContainer flips input html content

Categories

Resources