Cheerio: Add Link To Text - html

Using Cheerio, I'm trying to search an HTML for certain text and add a link to it only if it is not linked already.
For example:
<div>
<p> this is an example link </p>
</div>
I want to transform to:
<div>
<p> this is an example link </p>
</div>

Related

Is there a way to use the same ID multiple times using CSS?

<p id = "formatOne">My text here</p>
<!--insert bunch of other stuff in between-->
<!--Reuse formatOne so I don't have to copy-paste the entire CSS formatting again for each use
(similar to functions in most programming languages, write once, call whenever-->
<p id = "formatOne">Different text here</p> <!--like this, but this is obviously wrong since ID must be unique-->
Is there a way to make CSS ID's that are callable like functions?
id should be unique per HTML element. If you want to apply same style to different HTML elements, you can create class and apply the same class to multiple HTML elements.
.formatOne {
color:blue;
}
<p class="formatOne">My text here</p>
<p class="formatOne">Different text here</p>
This is not possible with ids. If you want same multiple uses in css then only class
allow you to do so.
<body>
<div class="asd">
<h1> First </h1>
<p> First line Statement </p>
</div>
<div class="asd">
<h1> Second </h1>
<p> Second line Statement </p>
</div>
<style>
.asd {color : red;
font-size: 20px}
</style>
</body>

Getting text within <a > tag inside <p> tag

Hi i have been trying to get all the text part within the div - p tags up to the hr tag so somebody gave this xpath
//div[#class="entry"]/*[not(preceding-sibling::hr | self::hr)]/text()
which works fine but this ignores the text part within the <.a> tag in the p tag
any ideas to grab that text as well?
<div class="entry">
<p> some text</p>
<p> some text2</p>
<p> some text3</p>
<p> some text4
<a href='somelink'> this text here i want to get through xpath</a>
some text5
</p>
<hr>(up to this hr tag)
<p> some text5</p>
<hr>
<p> some text6</p>
</div>
One way might be //div[#class="entry"]/*[not(preceding-sibling::hr | self::hr)]//text() though I might prefer to simply select the elements //div[#class="entry"]/*[not(preceding-sibling::hr | self::hr)] and use the string value.
You can simply pull data based on xpath.
//div[#class="entry"]/p[0]
//div[#class="entry"]/p[1]
//div[#class="entry"]/p[2]
//div[#class="entry"]/p[3]
//div[#class="entry"]/p[4]
//div[#class="entry"]/p[5]

XPath for parent's sibling descendants

I have the following HTML I need to scrape, but the only reliable handle is a stable description of a text field. From there, I need to go to its parent, find that parents next sibling and then get the descendents (unfortunately the data-automation-id selector repeats in every such iteration of this snippet on the site). I put together the below XPath but my RPA tool is unable to find it in the document.
XPath
div[contains(text(),'STABLE TEXT HANDLE')]/following-sibling::div/div/div/span[data-automation-id="SOMETHING"]
HTML:
<ul>
<li>
<div>
<label>STABLE TEXT HANDLE</label>
</div>
<div>
<div>
<div>
<span></span>
<span data-automation-id="something">
<div>
<div>
<div>
DYNAMIC TEXT I WANT TO SCRAPE
</div>
</div>
</div>
</span>
<span data-automation-id="somethingelse">
<div>
<div>
<div>
DYNAMIC TEXT I WANT TO SCRAPE
</div>
</div>
</div>
</span>
</div>
</div>
</div>
</li>
</ul>
EDIT:
After futher testing, it seems the issue starts with the contains(text(),'STABLE TEXT HANDLE'), which fails to find that particular node (be it the label, or its parent div).
Please try this:
//label[contains(text(),'STABLE TEXT HANDLE')]/../..//span[#data-automation-id="something"]

html hyperlink - adds automatic break

My button "TEST" is not in the same linke as my text. So the hyperlink does not come directly after the word "hyperlink" in my text but it adds a automatic break. And I dont understand why.
<div style="margin-top: 30px;">
<div class="col d-flex flex-column">
<h3>TITLE!</h4>
<p class="mb-3">
Just some text
<br>
Some more text
</p>
<br>
<h3>Second Title</h4>
<p class="mb-3">
Some text that will contain a hyperlink
<br>
HYPERLINK
Test
some more following text
</p>
<br>**strong text**
</div>
</div>
Your question should contain much more information (i.e. relevant code), but basically the a tag which contains the "TEST" tag needs to be an inline element to allow subsquent text to be on the same line (which it isn't, judging from the behaviour you describe).
Apply a class or ID to it and create a CSS rule for that class or ID wich contains display: inline-block.

Use AND and NOT in xPath

I have the following issue at hand where I need to get part of the text without including a tag. just to be more clear, I have the following code:
<div class="field-item even">
<p> text text text</p>
<p> text text text <a class="people-articles">text text</a> text text</p>
<p> text text text</p>
</div>
So I'm trying to get the text inside the p tag but not the a class="people-articles". and here what I've done so far but its not working
//div[#class="field-item even"] and [not a(#class='people-articles')]
Can someone tell me what am i doing wrong? and how to obtain p without a ?
this should be straight forward.
//div[#class="field-item even"]/p[not(a[#class="people-articles"])]/text()