Retrieve <TD> text using WATIR

Retrieve <TD> text using WATIR - html

I am using WATIR for automated testing, and I need to copy in a variable the value of a rate. In the example below (from webpage source code), I need that variable myrate has value 2.595. I know how to retrieve value from <input> or <span> (see below), but not directly from a <td>. Any help? Thanks
<TABLE>
<TR>
<TD></TD>
<TD>Rate</TD>
<TD>2.595</TD>
</TR>
</TABLE>
For a <span> I use this code:
raRetrieved = browser.span(:name => 'myForm.raNumber').text

try this, find the row you want using a regular expression to match a row that contains the word 'Rate', then get the text of the third cell in the row.
myrate = browser.tr(:text, /Rate/).td(:index => 2).text
#or you can use the more user-friendly aliases for those tags
myrate = browser.row(:text, /Rate/).cell(:index => 2).text
If the word 'Rate' might appear elsewhere in other textin that table, but is always just the only entry in the second cell of the row you want, then find the cell with that exact text, use the parent method to locate the row that holds that cell , and then get the text from the third cell.
myrate = browser.cell(:text, 'Rate').parent.cell(:index => 2).text
use of .cell & .row vs .td & .tr is up to you, some people prefer the tags, others like the more descriptive names. Use whatever you feel makes the code the most readable for you or others who will work with it.
Note: code above presumes use of Watir-Webdriver, or Watir 2.x which both use zero based indexing. For older versions of Watir, change the index values to 3
And for the record I totally agree with comments of others about the lack of testability of the code sample you posted. it's horrid. Asking for something to locate the proper elements, such as ID values or Names is not out of line in terms of making the page easier to test.

Try this:
browser.td(how, what).text
The problem here is that table, tr and td tags do not have any attributes. You can try something like this (not tested):
browser.table[0][2].text

If this helps to anyone who is having the same issue, it is working for me like this:
browser.td(:text => "Rate").parent.cell(:index, 2).text
Thank you all

Related

Web scraping without id VBA

I'm trying to scrape a web , some elements were easy to get . But I have a problem with those who have no id like this .
<TABLE class=DisplayMain1 cellSpacing=1 cellPadding=0><TBODY>
<TR class=TitleLabelBig1>
<TD class=Title1 colSpan=100><SPAN style="FONT-FAMILY: arial narrow; FONT-WEIGHT: normal">Tool & </SPAN><BR>PE311934-1-1 </TD></TR></TBODY></TABLE>
i want this ---►PE311934-1-1
i Try with "document.getElementsByClassName" but the vba gave me a error :/..
some tip?

Use Regular Expressions and the XMLHttpRequest object in VBA
I made a AddIn some time ago that does just that:
http://www.analystcave.com/excel-tools/excel-scrape-html-add/
If you just want the source code then here (GetElementByRegex function):
http://www.analystcave.com/excel-scrape-html-element-id/
Now the actual regex will be quite simple:
</SPAN><BR>(.*?)</TD></TR></TBODY></TABLE>
If it captures too much items simply expand the regex.

You don't specify the error and there is not enough HTML to know how many elements there are on the page.
You may have forgotten to use an index with document.getElementsByClassName("Title1"), as it returns a collection
For example, the first item would be: document.getElementsByClassName("Title1")(0)
In the same way, you could use a CSS querySelector such as .Title1
Which says the same thing i.e. select the elements with ClassName "Title1".
For the first instance simply use:
document.querySelector(".Title1")
For a nodeList of all matching
document.querySelectorAll(".Title1")
and then iterate over its length.
You would access the .innerText property of the element, generally, to retrieve the required string.
For the snippet shown, assuming the item is the first .Title1 on the page the CSS selector retrieves the following from your HTML
The resultant string can then be processed for what you want. This method, and regex, are fragile at best considering how easily an updated source page can break these methods.
In your above example, you can use the class name, .Title1, and then use Replace() to remove the Tool & .

Can I use knitr to apply CSS styles to individual table cells?

Is it possible to apply a class attribute to individual table cells using knitr? I have successfully applied a class attribute to the section heading that contains a knitr::kable generated table and used that to format the entire table. However, I would like to be able to conditionally format individual cells which would require being able to apply a class to specific <td> elements.
My current workaround is to programmatically wrap the cell contents in a pair of <span> tags and pass that on to knitr::kable. This approach only allows me to format the text inside the cell versus the entire cell (e.g. setting the cell background color). Here's an example of what I'm currently using:
## Read in the report, process the data, send to kable
rpt <- generate.report()
mutate(rpt, Col2 = ifelse(abs(Col2) > Threshold,
paste('<span class="warning">',
sprintf("%.2f", Col2), '</span>'),
sprintf("%.2f", Col2))) %>%
knitr::kable(format="markdown", align = c("l", rep("r", 4)),
col.names = gsub("\\.", "<br>", colnames(.)))
Which results in the following example HTML output:
<td align="right"><span class="warning"> -1.74 </span></td>
I would like to be able to have knitr::kable generate something like this:
<td align="right" class="warning"> -1.74 </td>
That way I could apply css styles to the <td> tag vice the <span> tag.

package ReporteRs may help. Have a look here FlexTable.
You can then get the corresponding HTML code with function as.html and reuse it within your knitr code.

Ok, this may not be the answer but it may point you in the right direction. I had a similar problem formatting individual cells in knitr to prepare a pdf. In the end, I use xtable and wrote a function that relied on a logical matrix to decide whether or not a cell in the output table would be formatted.
I couldn't quite get it to work smoothly by myself so I had to post it on here and with the help of ivyleavedtoadflax I was able to develop a reasonably easy to use function to apply formatting to certain cells in an xtable in knitr.
Here's the link to my post
As I say, it's not the exact solution to your problem but it may point you in the right direction.

Selenium automation- finding best xpath

I am looking to avoid using xpaths that are 'xpath position'. Reason being, the xpath can change and fail an automation test if a new object is on the page and shifts the expected xpath position.
But on some web pages, this is the only xpath I can find. For example, I am looking to click a tab called 'FooBar'.
If I use the Selenium IDE FireFox plugin, I get:
//td[12]/a/font
If I use the FirePath Firefox plugin, I get:
html/body/form/table[2]/tbody/tr/td[12]/font
If a new tab called "Hello, World" is added to the web page (before FooBar tab) then FooBar tab will change and have an xpath position of
//td[13]/a/font
What would you suggest to do?
TY!

Instead of using absolute xpath you could use relateive xpath which is short and more reliable.
Say
<td id="FooBar" name="FooBar">FooBar</td>
By.id("FooBar");
By.name("FooBar");
By.xpath("//td[text()='FooBar']") //exact match
By.xpath("//td[#id='FooBar']") //with any attribute value
By.xpath("//td[contains(text(),'oBar')]") //partial match with contains function
By.xpath("//td[starts-with(text(),'FooB')]") //partial match with startswith function
This blog post may be useful for you.

Relative xpath is good idea. relative css is even better(faster)
If possible suggest/request id for element.
Check also chrome -> check element -> copy css/xpath

Using //td is not a good idea because it will return all your td nodes. Any predicate such as //td[25] will be a very fragile selection because any td added to any previous table will change its result. Using plugins to generate XPath is great to find quickly what you want, but its always best to use it just as a starting point, and then analyze the structure of the file to write a locator that will be harder to break when changes occur.
The best locators are anchored to invariant values or attributes. Plugins usually won't suggest id or attribute anchors. They usually use absolute positional expressions. If can rewrite your locator path in terms of invariant structures in the file, you can then select the elements or text that you want relative to it.
For example, suppose you have
<body> ...
... lots of code....
<h1>header that has a special word</h1>
... other tags and text but not `h1` ...
<table id="some-id">
...
<td>some-invariant-text</td>
<td>other text</td>
<td>the field that you want</td>
...
The table has an ID. That's the best anchor. Now you can select the table as
//table[#id='some-id']
But many times you don't have the id, or even some other invariant attribute. You can still try to discover a pattern. For example: suppose that the last <h1> before the table you want contains a word you can match, you could still find the table using:
//table[preceding::h1[1][contains(.,'word')]]
Once you have the table, you can use relative axes to find the other nodes. Let's assume you want an td but there are no attributes on any tbody, tr, etc. You can still look for some invariant text. Tables usually have headers, or some fixed text which you can match. In the example above, if you find a td that is 2 fields before the one that you want, you could use:
//table[preceding::h1[1][contains(.,'word')]]/td[preceding-sibling::td[2][.='some-invariant-text']]
This is a simple example. If you apply some of these suggestions to the file you are working on, you can improve your XPath expression and make your selection code more robust.

Excel VBA: get content from online HTML table

can anybody pleas show me part of VBA code, which will get text "hello" from this example online HTML table? first node will be found by his ID (id="something").
...
<table id="something">
<tr>
<td><TABLE><TR><TD></TD></TR><TR><TD></TD></TR></TABLE></td><td></td>
</tr>
<tr>
<td></td><td></td><td>hello</td>
</tr>
...
i think it will be something like child->sibling->child->sibling->sibling->child, but I don't know the exact way.
EDIT
updated code tags are CAPITALS. so if I use getElemenetsById("something").getElemenetsByTagName('tr') it get only two tr tags to collection, or four (with tags which are deeper children)?

If you did search for an answer, you might want to broaden your scope next time. There are plenty of questions and answers that deal with DOM stuff and VBA.
Use getElementById on HTMLElement instead of HTMLDocument
While the question (and answers) aren't exactly what you want, it will show you how to create something you can work with.
You'll need to use a mixture of getElementById() and getElemenetsByTagName() to retrieve your desired "hello"
eg: Document.getElementById("something").getElementsByTagName("tr")(1).getElementsByTagName("td")(2).innerText
Get the element "something"
Inside "something" get all "tr" tags (specifically the one at index 1)
Inside the returned tr tag get all "td" tags (specifically the one at index 2)
Get the innerText of the previous result
These objects use a 0 based array so the first item is item(0).
Update
document.getElementById() will return an (singular) IHTMLElement (which will include all of its children) or nothing/null if it does not exist.
document.getElementsByTagName() will return a collection of IHTMLElement (again, each element will include all of its children). (or an empty collection if none exist)
document.getElementsByTagName("tr") this will return all tr elements inside the "document" element.
document.getElementsByTagName("tr")(0) will return the first (singular) IHTMLElement from the collection. (note the index at the end?)
There is no (that i could find) "sibling" feature of the InternetExplorer object in VBA, so you'd have to do it manually using the child index.
Using the DOM Functions is the clean way to do it. Its much clearer than just looking at a chain "Element.Children(0).children(1).children(2)" as you've no idea what the index means without manually looking it up.

I looked all over for the answer to this question, too. I finally found the solution by talking to a coworker which was actually through recording a macro.
I know, you all think you are above this, but it is actually the best way. See the full post here: http://automatic-office.com/?p=344
In short, you want to record the macro and go to data --> from web and navigate to your website and select the table you want.
I have used the above solutions "get element by id" type stuff in the past, and it is great for a few elements, but if you want a whole table, and you aren't super experienced, just record a macro.
don't tell your friends and then reformat it to look like your own work so no one knows you used the macro tool ;)

How can I hide/remove/disable "forums views" in vbulletin?

anyone have an idea how to do this.
i need to get rid of forum views either by hide, delete, disable or any other way.

I assume you mean THREAD views in the text below:
Do a template search for $thread[views], and there should be a template called threadbit. If you want to quickly and easily obscure the views just delete $thread[views] and replace with or asterisks, or whatever you'd like.
If you want to remove the whole <td> it becomes more complicated. First you remove that <td>, and then in FORUMDISPLAY template you have to remove the <td> that contains $vbphrase[views] (do a search for it if you can't find it).
But I believe there may be some issue with removing that entire column, and any of the hardcoded colspan attributes among the templates. If so then you would have to reduce the colspan number by one. I'm not sure about the colspan part, it's been a long time since I edited the FORUMDISPLAY and threadbit templates.
Also, you will need to remove the Views from another location in the threadbit template:
title="<phrase 1="$thread[replycount]" 2="$thread[views]"
This shows up when you hover on top of the Last Post column. Just delete $thread[views] and it will show up blank.

i need 50 points to reply, sorry for keep using answer.
i was thinking of going 1 step futher and swapping the word hidden for a picture?
I used the word hidden just as a test to see if it would work which it does

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Retrieve <TD> text using WATIR - html

Try this: browser.td(how, what).text The problem here is that table, tr and td tags do not have any attributes. You can try something like this (not tested): browser.table[0][2].text

If this helps to anyone who is having the same issue, it is working for me like this: browser.td(:text => "Rate").parent.cell(:index, 2).text Thank you all

Related

Web scraping without id VBA

Can I use knitr to apply CSS styles to individual table cells?

Selenium automation- finding best xpath

Excel VBA: get content from online HTML table

How can I hide/remove/disable "forums views" in vbulletin?

Categories

Resources