Xpath not getting content - html

I've tried looking through a bunch of answers already related to this, but I'm very unfamiliar with xpath and I'm a bit stuck.
I'm trying to just grab some information from a website, but I keep getting "imported content is empty" when i try to use importxml in excel.
Here's an example of the page I'm trying to read from (it's a college football simulator for running games. This call is Alabama vs Oklahoma using the 2019 teams):
http://www.ncaagamesim.com/FB_GameSimulator.asp?HomeTeam=Alabama&HomeYear=2019&AwayTeam=Oklahoma&AwayYear=2019&hs=1&hSchedule=0
I'm trying to grab the two teams' scores from the above link.
The first team's score's xpath is supposedly /html/body/div[3]/div/div/div[2]/div/div[1]/center/div[3]/div[1]/table/tbody/tr[1]/td[2]
but I keep getting an empty response.
I'm trying to use importxml in google sheets to get the data.
This returns quite a bit, but it doesn't appear to have the info I need. =importxml("http://www.ncaagamesim.com/FB_GameSimulator.asp?HomeTeam=Alabama&HomeYear=2019&AwayTeam=Oklahoma&AwayYear=2019&hs=1&hSchedule=0", "//div[contains(#class,gs_score)]")
If I quote the gs_score, it doesn't return anything.
Would appreciate any help with this. Thanks!
Edit: The xpath fails with /html/body/div[3]. If I change this to div[2], it returns some of the page data, but not the part I'm looking for.
According to an article I found -
Unfortunately, ImportXML doesn’t load JavaScript, so you won’t be able
to use this function if the content of the document is generated by
JavaScript (jQuery, etc.)
Not sure if this is relevant...
Edit 2:
I noticed the values I need are in an html table, so I tried using this
=IMPORTHTML("http://www.ncaagamesim.com/FB_GameSimulator.asp?HomeTeam=Alabama&HomeYear=2019&AwayTeam=Oklahoma&AwayYear=2019&hs=1&hSchedule=0", "table",1)
I'm still getting no content, no matter what table number I put in that formula.
If I copy the selector in the inspector, we get:
body > div.container > div > div > div.container > div > div.col-lg-9 > center > div:nth-child(3) > div.col-sm-6.col-xs-12.gs_score.gs_borderright.rightalign > table > tbody > tr:nth-child(1) > td:nth-child(2)
Which seems to be the same as the xpath.

Part of the answer: 'gs_score' needs to be in quotes - it's a string literal, not an element name. As an element name, it selects nothing, and everything contains nothing, so the predicate is always true.

Related

Extract values from HTML when parent div contains a specific word (multi-nested divs)

I copy the HTML of a "multi-select" list from a page which looks like that:
and then paste the HTML version (after beautifying it online) in a notepad++ page.
I know want to use Regex in order to extract the lines that are enabled in that list. In other words, I want to see what options I had selected from that dropdown. There are many lines and it is impossible to scroll and find them all. So, the best way in my mind is to use that HTML and search for the divs that contain "enabled". Then, the inner divs should have the values that I am looking for.
The HTML is shown below:
<div class="ui-multiselect-option-row" data-value="1221221111">
<div class="ui-multiselect-checkbox-wrapper">
<div class="ui-multiselect-checkbox"></div>
</div>
<div class="ui-multiselect-option-row-text">(BASE) OneOneOne (4222512512)</div>
</div>
<div class="ui-multiselect-option-row ui-multiselect-option-row-selected" data-value="343333434334">
<div class="ui-multiselect-checkbox-wrapper">
<div class="ui-multiselect-checkbox"></div>
<div class="ui-multiselect-checkbox-selected">✔</div>
</div>
<div class="ui-multiselect-option-row-text">(BASE) TwoTwoTwo (5684641230)</div>
</div>
The outcome should return the following value only (based on the above):
(BASE) TwoTwoTwo (5684641230)
So far, I have tried using the following regex in notepad++:
<div class="ui-multiselect-option-row ui-multiselect-option-row-selected"(.*?)(?=<div class="ui-multiselect-option-row")
but it is impossible to mark all the lines at the same time and remove the unmarked ones. Notepad++ only marks the first line of the entire selection. So, I am thinking whether there is a better way - a more complex regex that can parse the value directly. So, in lines:
a) I either want to make the above work with another regex line in notepad++ (I am open to visualstudio if that makes it faster)
b) Or an easier way using the console in Chrome to parse the selected values. I would still like to see the regex solution but for Chrome console I have an
Update 1:
I used this line $('div.ui-multiselect-option-row-selected > div:nth-child(2)')
and all I need know, as I am not that familiar with the Chrome console export, is to get the innerHTML from the following lines:
Update 2:
for (var b in $('div.ui-multiselect-option-row-selected > div:nth-child(2)')){
console.log($('div.ui-multiselect-option-row-selected > div:nth-child(2)')[b].innerHTML);
which works and I now only have to export the outcome
}
Open up Chrome's Console tab and execute this:
$x('//div[contains(#class, "ui-multiselect-option-row-selected")]/div[#class="ui-multiselect-option-row-text"]/text()')
Here is how it should look using your limited HTML sample but duplicated.
If you have multiple multi-selects and no unique identifier then count which one you need to target (notice the [1]):
$x('//div[contains(#class, "ui-multiselect-option-row-selected")][1]/div[#class="ui-multiselect-option-row-text"]/text()')
All you have to do is use css selectors followed by a .map to get all the elements' innerHTML in a list
[...$('div.ui-multiselect-option-row-selected > div:nth-child(2)')].map(n => n.innerHTML)
The css selector is div.ui-multiselect-option-row-selected > div:nth-child(2) - which, as I've already mentioned in my comment, selects the 2nd immediate child of all divs with the ui-multiselect-option-row-selected class.
Then we just use some javascript to turn the result into a list and do a map to extract all the innerHTML. As you asked.
If the list is sufficiently big, you might consider storing the result of [...$('div.ui-multiselect-option-row-selected > div:nth-child(2)')].map(n => n.innerHTML) in a variable using
const e = [...$('div.ui-multiselect-option-row-selected > div:nth-child(2)')].map(n => n.innerHTML);
and then doing
copy(e);
This will copy the list into your clipboard, wherever you use ctrl + v now - you'll end up pasting the list.

Html selector using Regex

So there is a page that I want to perform some action on with puppeteer. The problem is that there is a text area in which I want to type in something however the id of it is :
id="pin-draft-title-13a10e18-5a1e-49b9-893c-c5e028dc63e1"
As you might have guess for some reason only pin-draft-title remains the same but the whole number part changes for every refresh so puppeteer can't find it. I tried deleting the id and copying the selector itself the whoe #_Root div>div etc but that seems to be changing after sometime as well. So the main question is is there any way i can just select it using the pin-draft-title part and no matter what numbers follow it still selects it ?
You can use [id^=pin-draft-title-]
In case of javascript, if you want to select all the elements whos ID starts with a specified string or pattern, in your case "pin-draft-title", then consider using the following syntax.
document.querySelectorAll('[id^="pin-draft-title"]');

Is there a way to access the first element in a column on a website using VBA?

Here is a screenshot of a column in a website page.
It is located in that way in the website page :
As you can see, all the rows have a 'Completed' button you can pres and followed by a number of lines. These rows refer to exports. So the columnis not static and is constantly changing.
However, everytime i run the macro i want to access the first row of the column.
Here is a sample code of he HTML code of the first 'Completed' button in the screenshot above:
I have many that have the same class name. Look at the highlighted rows as an example in the picture below:
I really have no idea how to write a VBA code to always access the first 'Completed' bytton in this column.
PS: In the HTML code, in the tag "a", the onclick="....." is constantly changing. So i cannot use this as an argument to access the desired field and click on the desired button.
Please if anyone could help me figure out how to do this, i would really be happy.
Thank you :)
If you want to click the 'Completed' button in the first column, you can use the code below:
Set doc = objIE.Document
doc.getElementsByTagName("tr")(0).getElementsByTagName("td")(0).getElementsByTagName("a")(0).Click
The code get the first <tr> then get the first <td> then get <a> in it.
<tr> tags are rows, <td> tags are cells inside those rows. You did not provide enough code to show the entire table, but generally speaking to access the first row of a table, you would need to refer to the collection object and use the index number you want.
.getElementsByTagName("tr")(0)
This will refer to the first row of a table. Same with getting the first column in the first row of your table:
.getElementsByTagName("tr")(0).getElementsByTagName("td")(0)
Once you tracked down the particular cell, now you are wanting to click the link. You can use the same method as above.
.getElementsByTagName("tr")(0).getElementsByTagName("td")(0).getElementsByTagName("a")(0).Click
And a final note, the first row of a table could be a header, so you may actually want the 2nd row (1) instead.
Thanks for updating with more HTML code. I am going to slightly switch gears and use querySelector() to grab the main table.
doc.querySelector("#divPage > table.advancedSearch_table > tbody"). _
getElementsByTagName("tr")(3).getElementsByTagName("td")(3).Children(0).Click
See if this works for you.

xpath scraping data from the second page

I am trying to scrape data from this webpage: http://webfund6.financialexpress.net/clients/zurichcp/PortfolioPriceTable.aspx?SchemeID=33, and I specifically need data for fund number 26.
Have no problem getting data from the first page with this address (funds number 1-25), but for the hell of me can't scrape anything from the second page. Can someone help?
Thanks!
Here is the code I use: in Google Sheets:
=IMPORTXML("http://webfund6.financialexpress.net/clients/zurichcp/PortfolioPriceTable.aspx?SchemeID=33","/html/body/form[#id='MainForm']/table/tr/td/div[#id='main']/div[#id='tabResult']/div[#id='Prices']/table/thead/tr[26]/td[#class='Center'][1]")
You can do 2 things - one is to append the PgIndex=2 onto the end of your URL, and then you can also significantly simplify your xpath to this:
//*[#id='Prices']//tr[2]/td[2]
This specifically grabs the second row on the table (tr which means table-row), in order to bypass the header row, then grabs the second field which is the table-data cell.
=IMPORTXML("http://webfund6.financialexpress.net/clients/zurichcp/PortfolioPriceTable.aspx?SchemeID=33&PgIndex=2","//*[#id='Prices']//tr[2]/td[2]")
To get the second page, add &PgIndex=2 to your url. Then adjust the /table/thead/tr[26] to /table/thead/tr[2]. The result is:
=IMPORTXML("http://webfund6.financialexpress.net/clients/zurichcp/PortfolioPriceTable.aspx?SchemeID=33&PgIndex=2","/html/body/form[#id='MainForm']/table/tr/td/div[#id='main']/div[#id='tabResult']/div[#id='Prices']/table/thead/tr[2]/td[#class='Center'][1]")

How to find the index of HTML child tag in Selenium WebDriver?

I am trying to find a way to return the index of a HTML child tag based on its xpath.
For instance, on the right rail of a page, I have three elements:
//*[#id="ctl00_ctl50_g_3B684B74_3A19_4750_AA2A_FB3D56462880"]/div[1]/h4
//*[#id="ctl00_ctl50_g_3B684B74_3A19_4750_AA2A_FB3D56462880"]/div[2]/h4
//*[#id="ctl00_ctl50_g_3B684B74_3A19_4750_AA2A_FB3D56462880"]/div[3]/h4
Assume that I've found the first element, and I want to return the number inside the tag div, which is 1. How can I do it?
I referred to this previous post (How to count HTML child tag in Selenium WebDriver using Java) but still cannot figure it out.
You can get the number using regex:
var regExp = /div\[([^)]+)\]/;
var matches = regExp.exec("//[#id=\"ctl00_ctl50_g_3B684B74_3A19_4750_AA2A_FB3D56462880\"]/div[2]/h4");
console.log(matches[1]); \\ returns 2
You can select preceeding sibling in xpath to get all the reports before your current one like this:
//h4[contains(text(),'hello1')]/preceding-sibling::h4
Now you only have to count how many you found plus the current and you have your index.
Another option would be to select all the reports at once and loop over them checking for their content. They always come in the same order they are in the dom.
for java it could look like this:
List<WebElement> reports = driver.findElements(By.xpath("//*[#id='ctl00_ctl50_g_3B684B74_3A19_4750_AA2A_FB3D56462880']/div/h4")
for(WebElement element : reports){
if(element.getText().contains("report1"){
return reports.indexOf(element) + 1;
}
}
Otherwise you will have to parse the xpath by yourself to extract the value (see LG3527118's answer for this).