VBA Excel IE automation: locate element by custom tag - html

I need to pick out an element by a custom html tag - ie, where the custom tag would be "somecustomtag" in the following div element
<div class="panel-one" somecustomtag="blue">
I just can't remember the sytax. I know it's something like:
Set myElements = IE.Document.getElementsbyTagName("div")
For Each ob in myElements
If ob.subTag("somecustomtag") = "blue" then ' ????????
someStringVariable = ob.innerText
exit for
End If
Next ob
I've used this a dozen times before but can't find it any where. What is the proper syntax for .subTag?

In your case somecustomtag is an attribute. You will get the value of somecustomtag with the following code snippet
ob.getAttribute("somecustomtag")

Related

Html <pre> not formatting/rendering text correctly [duplicate]

I'm using Prototype's PeriodicalUpdater to update a div with the results of an ajax call. As I understand it, the div is updated by setting its innerHTML.
The div is wrapped in a <pre> tag. In Firefox, the <pre> formatting works as expected, but in IE, the text all ends up on one line.
Here's some sample code found here which illustrates the problem. In Firefox, abc is on different line than def; in IE it's on the same line.
<html>
<head>
<title>IE preformatted text sucks</title>
</head>
<body>
<pre id="test">
a b c
d e f
</pre>
<script type="text/javascript"><!--
var textContent = document.getElementById("test").innerText;
textContent = textContent.replace("a", "<span style=\"color:red;\">a</span>");
document.getElementById("test").style.whiteSpace = "pre";
document.getElementById("test").innerHTML = textContent;
--></script>
</body>
</html>
Anyone know of a way to get around this problem?
Setting innerHTML fires up an HTML parser, which ignores excess whitespace including hard returns. If you change your method to include the <pre> tag in the string, it works fine because the HTML parser retains the hard returns.
You can see this in action by doing a View Generated Source after you run your sample page:
<PRE id="test" style="WHITE-SPACE: pre"><SPAN style="COLOR: red">a</SPAN> b c d e f </PRE>
You can see here that the hard return is no longer part of the content of the <pre> tag.
Generally, you'll get more consistent results by using DOM methods to construct dynamic content, especially when you care about subtle things like normalization of whitespace. However, if you're set on using innerHTML for this, there is an IE workaround, which is to use the outerHTML attribute, and include the enclosing tags.
if(test.outerHTML)
test.outerHTML = '<pre id="test">'+textContent+'</pre>';
else
test.innerHTML = textContent;
This workaround and more discussion can be found here: Inserting a newline into a pre tag (IE, Javascript)
or you could
if (el.innerText) {
el.innerText = val;
} else {
el.innerHTML = val;
}
Don't know if this has been suggested before, but the solution I found for preserving white space, newlines, etc when doing an innerHTML into a 'pre' tag is to insert another 'pre' tag into the text:
<pre id="pretag"></pre>
TextToInsert = "lots of text with spaces and newlines";
document.getElementById("pretag").innerHTML = "<pre>" + TextToInsert + "</pre>";
Seems I.E. does parse the text before doing the innerHTML. The above causes the parser to leave the text inside the additional 'pre' tag unparsed. Makes sense since that's what the parser is supposed to do. also works with FF.
It could also be rewritten 'the Python way', i.e.:
el.innerText && el.innerText = val || el.innerHTML = val;

Get text in a div that doesn't have a name or id

I would like to get the text inside this code:
<div class="js-text-container"></div>
when there is an ID, i use getelementbyId, no problem, but in this case no ID and even nothing inside the 2 >< (although something is displayed)
I found an interesting solution here and tried to adapt it to my case:
Dim divs = WebBrowser1.Document.Body.GetElementsByTagName("div")
For Each d As HtmlElement In divs
If d.GetAttribute("class") = "js-text-container" Then
TextBox1.Text = d.InnerText
End If
Next
But nothing appears in my textbox. Do someone have an idea? I think its because InnerText refers to nothing in this case...
I hope I was clear enough.
Thanks a lot
Instead of d.GetAttribute("class") = "js-text-container"
use
d.GetAttribute("className") = "js-text-container"
I tested it locally, I believe you can use it in VB.Net
foreach (HtmlElement el in webBrowser1.Document.GetElementsByTagName("div"))
if (el.GetAttribute("className") == "js-text-container")
{
textBox1.Text = el.InnerText;
}
Hope it helps!

VBS: Can I target a field by its tabIndex, and its div position?

Novice here, using VBS to help with data entry to a web input form. Would appreciate any advice. I regularly use lines like this to set the value of a field based on its name:
IE.Document.All.Item("field1").Value = "test"
However I have a set of very awkward fields whose names change with each record. Their physical positions stay the same (visually); their tabIndexes stay the same (1,2,3,4), so I wondered if it's possible to do something like this:
IE.Document.All.getElementByTabIndex(1).Value = "test"
...But I'm not sure it is? Furthermore, even if that did work, tabIndex1 is used for another field on the same webpage. The fields that I am interested in, however, are all located on a div. The ID of the div is "form_div". So I'm trying to target a field located on div "form_div" whose tabIndex is 1... do you think it is possible?
Big thanks in advance.
So you have a DIV element with tabIndex set to 1 and you don't know it Name or ID, right? Then do something like this:
Set oDivs = IE.Document.getElementsByTagName("div")
Set myDiv = Nothing
For Each od In oDivs
If od.tabIndex = "1" Then
Set myDiv = od
Exit For
End If
Next
If Not myDiv Is Nothing Then
'do what needs here...
MsgBox myDiv.Name
End If
P.S. Well, I see 2 drawbacks in your design.
The tabIndex should be unique.
Searching for element by name is not so perfect in IE. If your
element has only Name and not ID then getElementsByName will
fail. Better use ID, it's even simplify coding:
Set myDiv = IE.Document.All.form_div
To find it by Name w'd be:
Set oDivs = IE.Document.getElementsByTagName("div")
Set myDiv = Nothing
For Each od In oDivs
If od.Name = "form_div" Then
Set myDiv = od
Exit For
End If
Next
And once you have the element...
If Not myDiv Is Nothing Then
Set nodes = myDiv.childNodes
For i = 0 To nodes.Length-1 Step 2
If nodes(i).tabIndex = "1" Then
'do what need here...
nodes(i).Value = nodes(i).tabIndex
Exit For
End If
Next
End If

Issue with ruby parsing

Im just having a slight problem parising a website with nokogiri in ruby.
Here is what the site looks like
<div id="post_message_111112" class="postcontent">
Hee is text 1
here is another
</div>
<div id="post_message_111111" class="postcontent">
Here is text 2
</div>
Here is my code to parse it
doc = Nokogiri::HTML(open(myNewLink))
myPost = doc.xpath("//div[#class='postcontent']/text()").to_a()
ii=0
while ii!=myPost.length
puts "#{ii} #{myPost[ii].to_s().strip}"
ii+=1
end
My problem is when it displays it, because of the new line after Hee is text 1, the to_a puts it weird like so
myPost[0] = hee is text 1
myPost[1] = here is another
myPost[2] = here is text 2
I want each div to be its own message. like
myPost[0] = hee is text 1 here is another
myPost[1] = here is text 2
How would i solve this thanks
UPDATED
I tried
myPost = doc.xpath("//div[#class='postcontent']/text()").to_a()
myPost.each_with_index do |post, index|
puts "#{index} #{post.to_s().gsub(/\n/, ' ').strip}"
end
I put post.to_s().gsub because it was complaining about gsub not being a method for post. But i still have the same issue. I know im doing it wrong just wrecking my head
UPDATE 2
Forgot to say that the new line is <br /> and even with
doc.search('br').each do |n|
n.replace('')
end
or
doc.search('br').remove
The issue is still there
If you look at the myPost array, you will see that each div is in fact its own message. The first just happens to include a newline-character \n. To replace it with a space, use #gsub(/\n/, ' '). So your loop looks like this:
myPost.each_with_index do |post, index|
puts "#{index} #{post.to_s.gsub(/\n/, ' ').strip}"
end
Edit:
According to my limited understanding of it, xpath can only find nodes. The child nodes are <br />, so either you have multiple texts between them or you have the div tag included in your search. There sure is a way to join the texts between the <br /> nodes, but I don't know it.
Until you find it, here something that works:
replace your xpath match with "//div[#class='postcontent']"
adjust your loop to delete the div tags:
myPost.each_with_index do |post, index|
post = post.to_s
post.gsub!(/\n/, ' ')
post.gsub!(/^<div[^>]*>/, '') # delete opening div tag
post.gsub!(%r|</\s*div[^>]*>|, '') # delete closing div tag
puts "#{index} #{post.strip}"
end
Here, let me clean that up for you:
doc.search('div.postcontent').each_with_index do |div, i|
puts "#{i} #{div.text.gsub(/\s+/, ' ').strip}"
end
# 0 Hee is text 1 here is another
# 1 Here is text 2

How to get value of a hidden element? (Watir)

just wondering, how can I get the value of a hidden element using watir? This is the element:
<input type="hidden" value="randomstringhere" id="elementid" name="elementname" />
And this is my code atm:
require "rubygems"
require "watir-webdriver"
$browser = Watir::Browser.new :ff
$browser.goto("http://www.site.com")
$grabelement = $browser.hiddens(:id, "elementid")
$blah = $grabelement.attribute_value("value")
puts $blah
This gets stuck at the last line, where it returns
code.rb:6:in `<main>': undefined method `attribute_value' for #<Watir::HiddenCollection:0x8818adc> (NoMethodError)
Sorry for the basic question, I've had a search and couldn't find anything.
Thanks in advance!
Problem
Your code is quite close. The problem is the line:
$grabelement = $browser.hiddens(:id, "elementid")
This line says to get a collection (ie all) of hidden elements that have id "elementid". As the error message says, the collection does not have the attribute_value method. Only elements (ie the objects in the collection) have the method.
Solution (assuming single hidden with matching id)
Assuming that there is only one, you should just get the first match using the hidden instead of hiddens (ie drop the s):
$grabelement = $browser.hidden(:id, "elementid")
$blah = $grabelement.value
puts $blah
#=> "randomstringhere"
Note that for the value attribute, you can just do .value instead of .attribute_value('value').
Solution (if there are multiple hiddens with matching id)
If there actually are multiple, then you can iterate over the collection or just get the first, etc:
#Iterate over each hidden that matches
browser.hiddens(:id, "elementid").each{ |hidden| puts hidden.value }
#Get just the first hidden in the collection
browser.hiddens(:id, "elementid").first.value