Get certain row from html table with JMeter - html

I have a table in HTML. In one row, I have numerous data like name, email, URL, date, etc:
<td>
15
</td>
<td>2017-07-18 11:00</td>
<td>Teszt SK</td>
<td>Test Trevor<small></small></td>
<td>2017-07-18 12:00</td>
<td><span class="label label-primary">Already in</span></td>
I wish to get the last part of the URL where the name equals my ${name} variable.
The URL looks like: http<nolink>://mywerbsite.com/event/123
So what I wish to get is the 123 and put it into my ${eventid} variable.
The problem is the fact, that there are usually more than one events for the users. So I want JMeter to stop at the first found. How do I do that?

The relevant XPath Expression would be something like:
substring-after(//td[text()='Test Trevor']/parent::*/td/a/#href,'events/')
Demo:
References:
XSLT, XPath, and XQuery Functions
XPath substring-after function reference
Using the XPath Extractor in JMeter

Add regular expression as post processor of request
regular expression: mywebsite.com/events/(\d+)" (.*)Test Trevor
Template $1$
Match No. 1
It will return the first row (checked)

Well, after a few tries, I've came up with a quite complicated answer but it sure works. Some minor tweaks are still needed, but it works. Here are some screenshots.

Related

How do I get rid of the tags in XPath

I have a bunch of html files with tons of data in it and I want to extract the important parts of it.
The files are all very similar; I've to search for a <tr> which contains a certain keyword. The third column of this table row always contains the name of the "block" I'm searching for (it's a few table rows).
//body/table/tbody/tr[td = "Deployed to"]/td[3]/div//span[text()]
with this XPath query I get the names (maybe one, maybe more)
The problem is, how do I get rid of the tags around the data?
Right now my output is something like this:
<span class="log_entry_text">Name1</span><span class="log_entry_text">Name2</span><span class="log_entry_text">Name3</span>
I want to have something like that: Name1 Name2 Name3
So I can use it for extracting these blocks more easily.
With string() i can only extract the first element (result would be: Name1)
Thanks for helping me!
Just wrap your xpath with data() element like data(//body/table/tbody/tr[td = "Deployed to"]/td[3]/div//span[text()]) for retrieve text.
Your XPath expression asks to retrieve span elements and that's what it has returned. If you're seeing tags with angle brackets in the output, that's because of the way the XPath result is being processed and rendered by the receiving application.
If you're in XPath 2.0+ or XQuery 1.0+ you can combine the several span elements into a single string using
string-join(//path/span, ' ')

Web scraping without id VBA

I'm trying to scrape a web , some elements were easy to get . But I have a problem with those who have no id like this .
<TABLE class=DisplayMain1 cellSpacing=1 cellPadding=0><TBODY>
<TR class=TitleLabelBig1>
<TD class=Title1 colSpan=100><SPAN style="FONT-FAMILY: arial narrow; FONT-WEIGHT: normal">Tool & </SPAN><BR>PE311934-1-1 </TD></TR></TBODY></TABLE>
i want this ---►PE311934-1-1
i Try with "document.getElementsByClassName" but the vba gave me a error :/..
some tip?
Use Regular Expressions and the XMLHttpRequest object in VBA
I made a AddIn some time ago that does just that:
http://www.analystcave.com/excel-tools/excel-scrape-html-add/
If you just want the source code then here (GetElementByRegex function):
http://www.analystcave.com/excel-scrape-html-element-id/
Now the actual regex will be quite simple:
</SPAN><BR>(.*?)</TD></TR></TBODY></TABLE>
If it captures too much items simply expand the regex.
You don't specify the error and there is not enough HTML to know how many elements there are on the page.
You may have forgotten to use an index with document.getElementsByClassName("Title1"), as it returns a collection
For example, the first item would be: document.getElementsByClassName("Title1")(0)
In the same way, you could use a CSS querySelector such as .Title1
Which says the same thing i.e. select the elements with ClassName "Title1".
For the first instance simply use:
document.querySelector(".Title1")
For a nodeList of all matching
document.querySelectorAll(".Title1")
and then iterate over its length.
You would access the .innerText property of the element, generally, to retrieve the required string.
For the snippet shown, assuming the item is the first .Title1 on the page the CSS selector retrieves the following from your HTML
The resultant string can then be processed for what you want. This method, and regex, are fragile at best considering how easily an updated source page can break these methods.
In your above example, you can use the class name, .Title1, and then use Replace() to remove the Tool & .

Having trouble selecting some specific xpath... (html table, scrapy, xpath)

I'm trying to scrape data (using scrapy) from tables that can be found here:
http://www.bettingtools.co.uk/tipster-table/tipsters
My spider functions when I parse response within the following xpath:
//*[#id="imagetable"]/tbody/tr
Every table on the page shares that id, so I'm basically grabbing all the table data.
However, I only want the table data for the current month (tables in the right column).
When I try and be more specific with my xpath, I get an invalid xpath error even though it seems to be correct. I've tried:
- //*[#id="content"]/[contains(#class, "column2")]/[contains(#class, "table3")]/[#id="imagetable"]/tbody/tr
- //*[#id="content"]/div[contains(#class, "column2")]/div[contains(#class, "table3")]/[#id="imagetable"]/tbody/tr
- //*[#id="content"]/div[2]/div[1]/[#id="imagetable"]/tbody/tr
Also, when I try to select the xpath of a specific table on the page with chrome I just get //*[#id="imagetable"].
Am I missing something obvious here? Why are the 3 above xpath examples I've tried not valid?
Thanks
What makes those 3 invalid xpath is the part with this pattern :
/[predicate expression here]
above xpath missed to select a node on which the predicate would be applied. It should rather looks like this :
/*[predicate expression here]
Here are some examples of valid ones :
1. /table[#id="imagetable"]
2. /div[contains(#class, "column2")]
3. /*[contains(#class, "table3")]
For this specific task, you can try the following xpath which selects rows from table inside <div class="column2"> :
//div[#class='column2']//table[#id="imagetable"]/tbody/tr
Check my anwser Selenium automation- finding best xpath. In short check it by browser, browser can give U unique locator, then check it.

Powershell modifying HTML from ConvertTo-HTML

I have a script that generates an array of objects that I want to email out in HTML format. That part works fine. I am trying to modify the HTML string to make certain rows a different font color.
Part of the html string looks like this (2 rows only):
<tr>
<td>ABL - Branch5206 Daily OD Report</td>
<td>'\\CTB052\Shared_Files\FIS-BIC Reporting\Report Output Files\ABL\Operations\Daily\ABL - Branch5206 Daily OD Report.pdf'</td>
<td>13124</td>
<td>4/23/2013 8:05:34 AM</td>
<td>29134</td>
<td>0</td>
<td>Delivered</td>
</tr>
<tr>
<td>ABL - Branch5206 Daily OD Report</td>
<td>'\\CTB052\Shared_Files\FIS-BIC Reporting\Report Output Files\ABL\Operations\Daily\ABL - Branch5206 Daily OD Report.xls'</td>
<td>15716</td>
<td>4/23/2013 8:05:34 AM</td>
<td>29134</td>
<td>0</td>
<td>Delivered</td>
</tr>
I tried regex to add a font color to the beginning and end of the rows where the row ends with "Delivered":
$email = [regex]::Replace($email, "<tr><td>(.*?)Delivered</td></tr>", '<tr><font color = green><td>$1Delivered</td></font></tr>')
This didn't work (I am not sure if you can set font color for a whole row like that).
Any ideas on how to do this easily/efficiently? I have to do it on several different statuses (like Delivered)
Disclaimer: HTML cannot be parsed by regular expression parser. A regular expression will NOT provide a general solution to this problem. If your HTML structure is well known and you don't have any other <tr></tr> elements, though, the following might work. On that note, though, is there some reason you can't modify the HTML generation to do this then instead of waiting until the HTML is already generated?
Try this command:
PS > $email = $email -replace '(?s)<tr>(.*?)<td>Delivered</td>(.*?)</tr>','<tr style="color: #FF0000">$1<td>Delivered</td>$2</tr>'
The first string is the pattern. The (?s) tells the parser to allow . to accept newlines; this is called "single line" mode. Then it grabs a <tr> element that contains the string <td>Delivered</td>. The two capture groups grab everything else in the <tr> element around the <td>Delivered</td> string. Take note of the question marks following the *s. * by itself is greedy and matches as much text as possible; *? matches as little text as possible. If we just used * here, it would treat your entire string as one match and only replace the first <tr>.
The second string is the replacement. It plops the <tr> element and its contents back in place with an added style attribute, and all without back ref.
One other minor note is the quoting. I tend toward single quotes anyway, but in this case, you're likely to have double quotes in the replacement string. So single quotes are probably the way to go.
As for how you could do this for different statuses, regular expressions really aren't designed for conditional content like that; it's like trying to use a screwdriver as a drill. You can hard code several replaces or loop over status/color pairs and build your pattern and replace strings from them. A full blown HTML parser would be more efficient if you can find one for .NET; you might try to get away with an XML parser if you can guarantee it's valid XML. Or, going back to my question at the beginning, you could modify the HTML generation. If your e-mails are few in number, though, this may not be a bottleneck worth addressing. Development time spent is also costly. See if it's fast enough and try a different route if not.
Credit where it's due: I took the HTML style attribute from #FrankieTheKneeMan.

Retrieve <TD> text using WATIR

I am using WATIR for automated testing, and I need to copy in a variable the value of a rate. In the example below (from webpage source code), I need that variable myrate has value 2.595. I know how to retrieve value from <input> or <span> (see below), but not directly from a <td>. Any help? Thanks
<TABLE>
<TR>
<TD></TD>
<TD>Rate</TD>
<TD>2.595</TD>
</TR>
</TABLE>
For a <span> I use this code:
raRetrieved = browser.span(:name => 'myForm.raNumber').text
try this, find the row you want using a regular expression to match a row that contains the word 'Rate', then get the text of the third cell in the row.
myrate = browser.tr(:text, /Rate/).td(:index => 2).text
#or you can use the more user-friendly aliases for those tags
myrate = browser.row(:text, /Rate/).cell(:index => 2).text
If the word 'Rate' might appear elsewhere in other textin that table, but is always just the only entry in the second cell of the row you want, then find the cell with that exact text, use the parent method to locate the row that holds that cell , and then get the text from the third cell.
myrate = browser.cell(:text, 'Rate').parent.cell(:index => 2).text
use of .cell & .row vs .td & .tr is up to you, some people prefer the tags, others like the more descriptive names. Use whatever you feel makes the code the most readable for you or others who will work with it.
Note: code above presumes use of Watir-Webdriver, or Watir 2.x which both use zero based indexing. For older versions of Watir, change the index values to 3
And for the record I totally agree with comments of others about the lack of testability of the code sample you posted. it's horrid. Asking for something to locate the proper elements, such as ID values or Names is not out of line in terms of making the page easier to test.
Try this:
browser.td(how, what).text
The problem here is that table, tr and td tags do not have any attributes. You can try something like this (not tested):
browser.table[0][2].text
If this helps to anyone who is having the same issue, it is working for me like this:
browser.td(:text => "Rate").parent.cell(:index, 2).text
Thank you all