How to retrieve specific data from html using XPath?

How to retrieve specific data from html using XPath? - html

Hey guys I am having a hard time trying to get the stock price from a site using XPath.
the html is this:
<span class=" price">
<meta content="14.400" itemprop="price">
14.400
<span itemprop="priceCurrency"> BRL</span>
</span>
The path I used to retrieve the 14.400 value (all of them getting me null), were:
#"//span[#class=' price']";
#"/span[#class=' price']";
#"span[#class=' price']";
#"//meta[#itemprop='price'"];
#"/html/body/div[2]/div/div/div/div[2]/span/meta";
#"//html/body/div[2]/div/div/div/div[2]/span/meta";
After trying a lot more the closest I could get to what I need was using this xPath:
#"//span[#class=' price']/meta";
to get this log:
2014-02-07 13:50:39.616 manejoderisco[2838:60b] {
nodeAttributeArray = (
{
attributeName = itemprop;
nodeContent = price;
},
{
attributeName = content;
nodeContent = "14.280";
}
);
nodeName = meta;
}
But still returning me null value...

I finally managed to create the correct xPath which is this one:
#"//span/meta/#content

The HTML you are trying to parse isn't well formed, since there is no closing tag for meta.
However, if you are indeed able to catch the meta tag, you may want to select the content:
//span[#class=' price']/meta/#content
Or, if you need the first text field,
//span[#class=' price']//text()[1]
might work as well.
Don't forget that when you do //span/meta you are selecting the meta node, so <meta content="14.400" itemprop="price">14.400 (ending wherever, depending on what is using your xpath, since the HTML is malformed). If you want the content, you need to select either #content attribute or the text field with text().

Related

SwiftSoup - Extracting specific div tags/elements

I'm not the most knowledgeable when dealing with scraping/getting data from a website, so apologies in advance. I have loaded in the HTML file locally, into my project so that I can have a reference and breakdown of the elements:
<div class="price">99</div>
<div class="size">M</div>
I want to select both these div classes, name and price and extract the value(s) which are 99 and M accordingly, how can I do this? I looked at SwiftSoups
let elements = try doc.select("[name=transaction_id]") // query
let transaction_id = try elements.get(0) // select first element
let value = try transaction_id.val() // get value
But that gave me an error. I can see you can select <P> tags, which are paragraphs, but how do I select the specific div class?
Once again, apologies if this is a beginner question.
Thank you.
Edit - The data I wish to parse:
var pstats = {att1:85,att2:92,att3:91,att4:95,att5:38,att6:65,acceleration:91,agility:91,balance:95,jumping:68,reactions:94,sprintspeed:80,stamina:72,strength:69,aggression:44,positioning:93,tactaware:40,vision:95,ballcontrol:96,crossing:85,curve:93,dribbling:96,finishing:95,fkacc:94,headingacc:70,longpass:91,longshot:94,marking:32,penalties:75,shortpass:91,shotpower:86,slidetackle:24,standingtackle:35,volleys:88,composure:96};
Edit 2 - New data I want to parse:
<div style="display: none;" id="player_stats_json">{"test":0,"ppace":85,"pshooting":92,"ppassing":91,"pdribbling":95,"pdefending":38,"pphysical":65,"acceleration":91,"sprintspeed":80,"agility":91,"balance":95,"reactions":94,"ballcontrol":96,"dribbling":96,"positioning":93,"finishing":95,"shotpower":86,"longshotsaccuracy":94,"volleys":88,"penalties":75,"interceptions":40,"headingaccuracy":70,"marking":32,"standingtackle":35,"slidingtackle":24,"vision":95,"crossing":85,"freekickaccuracy":94,"shortpassing":91,"longpassing":91,"curve":93,"jumping":68,"stamina":72,"strength":69,"aggression":44,"composure":96}</div>

If these tags have unique classes you can use getElementsByClass(_:) function and then get the first item, like this:
let price = try doc.getElementsByClass("price").first()?.text()
let size = try doc.getElementsByClass("size").first()?.text()

React: How to provide procedurally generated <li> elements distinct HTML id values?

I'm rendering a map of items retrieved from a database and filtered via the value state of an input field and attempting to then set the state of the input field as the value stored in some list item on click. I figured that using document.getElementById().innerHTML would allow me to retrieve the content stored within the appropriate tag and then set it to state which does work, the issue I'm facing is that it will only retrieve the innerHTML of the first item rendered in the map.
I've tried solutions ranging from applying UUID to making the mapped content available to the window and transfering the state of the individual objects but each disparate solution only moves the value of the first item to state - any ideas?
Rendered Content:
window.filteredItems = this.state.items.filter(
(item) => {
return item.companyNameObj.toLowerCase().indexOf(this.state.search.toLowerCase()) !== -1;
}
);
<div className="fixed-width">
<div className="search-container">
<form>
<input type="text" name="search" className="search-bar" placeholder="Search: " onChange={this.handleChange} value={this.state.search} />
</form>
<ul className="search-results">
{window.filteredItems.map((item) => {
return (
<div className="distinct-result-container">
<li key={item.id}>
<div className="image-container">
<img src={item.imageObj} alt={item.companyNameObj + " logo."}/>
</div>
<div className="company-container">
<span onClick={this.stateTransfer}><h3 id={"ID"}>{item.companyNameObj}</h3></span>
<p>Owned by: {item.ownerNameObj}</p>
</div>
</li>
</div>
)
})}
</ul>
</div>
<Footer />
</div>
);
stateTransfer()
stateTransfer(id) {
var search = this.state.search;
var uniqueID = document.getElementById("ID").innerHTML;
this.setState({
search: uniqueID
});
}
The current content of stateTransfer() doesn't represent any significant attempts at approaching a solution to this issue, it's just the minimum required implementation to move the innerHTML content to the input fields value.
EDIT: I've further clarified on the task at hand and a potential solution in the comments below (which follow this), I'm just hoping someone is able to help me with the actual implementation.
#DILEEPTHOMAS The list is comprised of data pulled from a Firebase Realtime Database and is rendered via mapping the filteredList and a search query; that functoionality works fine - what I need is to be able to click the element of any distinct li and have the innerHTML (the text stored in that li's item.companyNameObj) be moved to the value of the input field (so users can navigate the search content with re-typing).
#JoshuaLink I can't necessarily configure the items of the list any
further as it's just data pulled from an external database - I believe
the appropriate solution is to somehow provide a unique HTML ID value
to each newly rendered li and have that selected ID moved to
stateTransfer() where it can be set as the input fields value, I'm
just struggling with the actual implementation of this.
EDIT 2: I've managed to figure out a solution to both parts of the problem as described above - I'll post it as an answer below.

I managed to solve both parts of my problem:
The key issue, which was moving the text stored in each distinct li to the input value, which was apparently easily solved by making my stateTransfer() function accept an event and passing the .innerText value of the h3 through the event (I assumed I would have to use .innerHTML, which would require me to provide each distinct li with a unique generated ID) as follows:
stateTransfer(e) {
var search = this.state.search;
var innerText = e.target.innerText
this.setState({
search: innerText
})
}
The secondary issue, (which I incorrectly assumed was integral to implementing a solution to my question), assigning unique HTML id values to my procedurally generated li's was solved by implementing a for-loop in a componentDidUpdate() function which iterates through the current total length of the list and and assigns an id with the loop iterator concatenated to the end of the string as follows:
componentDidUpdate() {
var i;
var searchCompanyNames = document.querySelectorAll('.comapnyNames');
for(i = 0; i < searchCompanyNames.length; i++) {
searchCompanyNames[i].id = 'companyName-' + i;
}
}
Whilst I didn't need to assign unique ID's to the li's in the correct implementation, it's a useful trick worth noting nonetheless.

How to insert HTML inside template literal strings?

In React, I want to be able to use style words within a string which is defined in a variable using template literals.
For that I am making use of a to just style that word.
I am getting HTMLIntrinsic usage error.
Note- Solutions given in SO to questions related to this does not solve the issue I have. Pls check the code.
How to circumvent this problem
Tried using dangerouslyinsertHTML, but not a recommended solution.
//Actual code
const temperature = "22";
const list = {
item: `The temperature is ${temperature}`
}
//To style it-
const temperature = "22";
const list = {
item: `The temperature is <span style={{color:'red'}}>${temperature}</span>`
}
//And the above list.item is inserted inside JSX like -
return (
<div>{list.item}</div>
)
The temperature(22) needs to be styled.

Instead of the template string, you can use JSX elements for generating HTML as usual, placed next to your text elements. Example:
item: (
<>
The temperature is
<span style={{color:'red'}}>
{temperature}
</span>
</>
)
I'm using a Fragment to wrap the text and elements together, but you can use something else like a div if you wish to style the wrapper too.

You can't use React in template-literal like that because the React component is an object. Using it with template-literal will result in this[object Object] . So I recommend use other way, for example the solution by #richardo

You can make this as simple as this, IF you are OK to not have object like you defined
return(
<div>The temperature is <span style={{color: 'red'}}>{temperature}</span></div>
)

How to use the text between HTML tags to access an element - Selenium WebDriver

I have following HTML code.
<span class="ng-binding" ng-bind="::result.display">All Sector ETFs</span>
<span class="ng-binding" ng-bind="::result.display">China Macro Assets</span>
<span class="ng-binding" ng-bind="::result.display">Consumer Discretionary (XLY)</span>
<span class="ng-binding" ng-bind="::result.display">Consumer Staples (XLP)</span>
As it can be seen that tags are all the same for every line except the text between the tags.
How can I access each of the above line separately based on the text between tags.

use the below as xpath
//span[text()='All Sector ETFs']

You can use x-path function text() for that.
For example
//span[text()="All Sector ETFs"]
to find first span

You can use following xPath to find desired element based on text
String text = 'Your text';
//text may be ==>All Sector ETFs, China Macro Assets, Consumer Discretionary (XLY), Consumer Staples (XLP)
String xPath = "//*[contains(text(),'"+text+"')]";
By this you can find each elements..
Hope it will help you..:)

Hi please do it like below
Way One
public static void main(String[] args) {
WebDriver driver = new FirefoxDriver();
List<WebElement> mySpanTags = driver.findElements(By.xpath("ur xpath"));
System.out.println("Count the number of total tags : " + mySpanTags.size());
// print the value of the tags one by one
// or do whatever you want to do with a specific tag
for(int i=0;i<mySpanTags.size();i++){
System.out.println("Value in the tag is : " + mySpanTags.get(i).getText());
// either perform next operation inside this for loop
if(mySpanTags.get(i).getText().equals("Consumer Staples (XLP)")){
// perform your operation here
mySpanTags.get(i).click(); // clicks on the span tag
}
}
// or perform next operations on span tag here outside the for loop
// in this case use index for a specific tag (e.g below)
mySpanTags.get(3).click(); // clicks on the 4 th span tag
}
Way Two
find the tag directly //span[text()='Consumer Staples (XLP)']

How to excluded XPATH Nested SPAN Class

From the following Html code, I want to select only the first span class Text..
<span class="item_amount order_minibasket_amount order_full_minibasket">10
<span class="article">Article
<i class="icon"> >
</i>
</span>
</span>
This is my current XPATH :
//span[contains(#class,'order_minibasket_amount')]
when I use this in my Selenium Test, I got the whole SPAN TEXT.. Like :
10 Article >
I just want to get the "10" article amount..
AMOUNT(new PageElement(By.xpath("//span[contains(#class,'order_minibasket_amount')]/text()[1]"), "not such Element...."))
public String getAmount() {
return amount = PageObjectUtil.findAndInitElementInside(webElement, PageElements.AMOUNT.pe, amount, String.class);
}
Many thanks in advance,
Cheers,
koko

What you want cannot be done directly, you will have to resort to String manipulation. Something like:
String completeString = driver.findElement(By.className("item_amount")).getText()
String endString = driver.findElement(By.className("article")).getText()
String beginString = completeString.replace(endString, "")

You could add /text() to the end:
//span[contains(#class,'order_minibasket_amount')]/text()
Instead of selecting the span element node, this XPath will select the set of text nodes that are direct children of the span element. This should be a set of two nodes, one containing "10", a newline and four spaces (the text between the opening tag of the target span and the opening tag of the nested span) and the other containing just a newline (between the closing tags of the two spans. If you only want the first text node child (10, nl, spaces) then use
//span[contains(#class,'order_minibasket_amount')]/text()[1]

Now I am using work around solution...but i am not happy with it. :-(
public String getAmount() {
String tempAmount = PageObjectUtil.waitFindAndInitElement(PageElements.AMOUNT.pe).getText();
String output = tempAmount.replaceAll("[a-zA-Z->]", "");
return amount = output.trim();
}
cheers,
KoKo

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How to retrieve specific data from html using XPath? - html

I finally managed to create the correct xPath which is this one: #"//span/meta/#content

Related

SwiftSoup - Extracting specific div tags/elements

React: How to provide procedurally generated <li> elements distinct HTML id values?

How to insert HTML inside template literal strings?

How to use the text between HTML tags to access an element - Selenium WebDriver

How to excluded XPATH Nested SPAN Class

Categories

Resources