Getting contents of specific div element? - html

is it possible to get the contents of a tag from a web browser control like this: <div class="desc">contents</div> and then strip all HTML characters from it?
say WebBrowser1 has a website loaded into it. I want to extract the source code from it and find this:
<div class="desc"><b>these are the contents I want</b></div>
and extract it like this: these are the contents I want

Dim divs = WebBrowser1.Document.Body.GetElementsByTagName("div")
For Each d As HtmlElement In divs
If d.GetAttribute("className") = "desc" Then
Return d.InnerText
End If
Next

Related

Python 2.7 Copy and paste hyperlinked text

I am using Python 2.7, Webdriver and Chrome. Manually, I can mouse swipe across text containing a hyperlink on a web page and copy it to the clipboard. How do I do this automatically? I have no issue finding the element containing the hyperlink. I am not trying to find the hyperlink. I am trying to paste it into a web page text box which does not process https://www.python.org/ ">Link within an "a" tag but processes it correctly when pasted from elsewhere i.e. "Link" with embedded href.
Even after OP clarifications, it's still hard to understand the exact issue, so I'll try to cover all possible options :)
Suppose we have an anchor element, like Link
We can find this element in such ways
element = driver.find_element_by_xpath('//a[text()="Link"]')
element = driver.find_element_by_xpath('//a[#href=" python.org "]')
depending on what information we currently know about the element and what exactly we want to scrap.
Also, we can use index of anchor element element = driver.find_elements_by_tag_name('a')[0]
1) To get value of href attribute:
value = element.get_attribute('href')
Output: https://python.org
2) To get value of text node:
value = element.text
Output: "Link"
3) To get complete HTML code of element:
value = element.get_attribute('outerHTML')
Output: Link

Retrieve an image from a website using Ruby and Nokogiri

I am trying to get an image from this website using Ruby.
https://steamcommunity.com/market/listings/730/M4A1-S%20%7C%20Cyrex%20(Minimal%20Wear)
So far, I have successful code to get the name of the item listed on the website:
html = Nokogiri::HTML.parse(open('https://steamcommunity.com/market/listings/730/'+url2))
title = html.css('title').text
titles = title.sub(/^Steam Community Market :: Listings for / , '')
Which results in "M4A1-S | Cyrex (Minimal Wear)"
(The "url2" comes from an input box on the html page that I made)
The image on the Steam Website has a class of "market_listing_largeimage".
Is there a way to also use Nokogiri to get the image src so that I can then input it into Html?
The image does not have that class; the div that the image is wrapped in does. That said,
html.at_css('.market_listing_largeimage img')['src']

display:none does not show other div

I have a code that is formatted like this:
<div id = "test" class = "invisible">
<!--I want to hide this!-->
%%GLOBAL_ProductDescription%%
</div>
<script type = "text/javascript">
//Takes the info within the div above and manipulates some information
var desc = $('#test').html();
//Put edits to new_desc
$(document).ready(function() {
document.getElementById("info").innerHTML = new_desc;
});
</script>
<div class = "stuff" id = "product">
<a id = "info"><!--receive info from script here--></a>
</div>
The code works properly in terms of the last div displaying the information and formatting that I want to have. The problem now is: the page is displaying the original information plus the edited one in the bottom. Whenever I try to hide the first div, everything else goes away!
I would manipulate the data by just assigning the contents of the global variable into my Javascript variable but that it sort of out of the picture right now. Can anybody tell me what I am doing wrong and why hiding this one div completely gets rid of all the other information in the page?
Note: When I type some gibberish at the beginning of the code, it shows even though there's a display:none. If I put it anywhere below that line, it does not show either.
The content changes per product. There may have been some divs in there that weren't closed properly and that's why it's pushing the latter part of the code somewhere inside %%GLOBAL_ProductDesc%%. I did not know it could behave like that so I overlooked that part in my check.
I can't really go ahead and bulk edit about 4000 products such that the HTML in there is correct so I inserted 4 s before the start of the script and everything looks good. I know this may not be the most robust answer to the question but it works for now. Thanks for all the help!

GetElementById/GetElementsByTagName not finding element

Bear in mind that I know only a bit about HTML:
There is a site I'm trying to interact with using a WebBrowser. The site has a textarea element as follows:
<textarea name="ctl00$ContentPlaceHolderMain$txtCallDesc" rows="2" cols="20" id="ctl00_ContentPlaceHolderMain_txtCallDesc" tabindex="205" style="width: 100%; height: 80px; font-size: 8pt"></textarea>
From what I've read, the generated ID of the textarea signifies that it's placed inside another form of some sort, and I'm not sure if this is where I'm running into my problem.
Once the page has loaded, I have something like the following in a button:
Dim theCol As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("textarea")
For Each curElement As HtmlElement In theCol
ListBox1.Items.Add(curElement.TagName)
Next
Nothing populates in the list. I've also tried using the ID of the text box gathered by the 'inspect element' feature of Chrome:
Dim value As HtmlElement = WebBrowser1.Document.GetElementById("ctl00_ContentPlaceHolderMain_txtCallDesc")
MsgBox(value.GetAttribute("value"))
No matter what I do, I can't seem to get the program to recognize that there ARE textarea elements in the document. The source for the page is far to long to spam everyone with here, but is there anything I'm missing that I should be looking out for? Perhaps needing to get another element first, then search that for elements within it?
Edit:
The element I'm trying to get seems to be within an iFrame, but it looks like it's from the same domain so the same origin policy shouldn't come into play, should it?
<iframe id="mainFrame" width="100%" height="100%" frameborder="0" class="mainFrame" name="Main" src="/Calls/OpenCalls.aspx">
Using the code shown in Get Iframe HTML:
For i = 0 To WebBrowser1.Document.Window.Frames.Count - 1
Dim frameDoc = WebBrowser1.Document.Window.Frames(i)
Dim theCol = frameDoc.Document.GetElementsByTagName("textarea")
For Each curElement As HtmlElement In theCol
ListBox1.Items.Add(String.Format("TagName: {0} Id:{1}", curElement.TagName, curElement.Id))
Next
Next
The essential part being the use of WebBrowser1.Document.Window.Frames.
You can't reference elements inside an iframe directly since they are inside another document. So first get a reference to the document element inside the iframe and then you can query it the same way.
Dim frameDoc = WebBrowser1.Document.GetElementById("mainFrame").DomElement.contentWindow.Docume‌​nt
And the rest you already know...
Dim theCol = frameDoc.GetElementsByTagName("textarea")
For Each curElement In theCol
ListBox1.Items.Add(curElement.TagName)
Next

Change Element's node name?

Is it possible to change the element's node name in GWT? I mean something like this:
HTML h = new HTML();
h.getElement().setNodeName("mydiv")
while there is no setNodeName() method for Element.
I'd like to acquire <mydiv>some contents</mydiv> instead of default tag <div>some contents</div>
Thanks for any hints.
You can't change the element node name of the HTML widget. However, you can create your own tag with Document.get().createElement("mydiv"), and use that to create a new Widget by extending Composite. However, I'm not sure why you want to do this, because adding new tags to the DOM and thereby extending HTML doesn't sound as something you should want. Setting the content in this tag isn't possible via methods like innerText because they are only available for valid tags.
change the tag name while keeping content and attributes
function changeTagName(elm,new_tag_name){
var newElm = document.createElement(new_tag_name)
var atr = elm.attributes;
for(var i=0;i<atr.length;i++){ // copy all atributtes
newElm.setAttribute(atr[i].name,atr[i].value)
}
document.body.insertBefore(newElm,elm)
newElm.innerHTML=elm.innerHTML; //copy the content
elm.parentNode.removeChild(elm) // remove original
}
for example:
<span id='sp1' class='cl1 cl2'> some t e x t with (\n) gaps .... and etc</span>
changeTagName(document.getElementById('sp1'),'pre');