What would be the xpath file I can create for the following href: 00730073? - html

I'm new to using xpath statements. Using the following href:
00730073
How would I write the xpath?
I'm using Xpath Helper and when I use the following xpath:
//a[contains(#href,'8000')]
Its returning the incorrect results. I was expecting '00730073', but I'm getting '00730075'

Using ID as your locator is usually your safest bet, since it is always going to be a unique match.
//a[#id="8000j0000000XlzAAE"]
But you could also access it by the text (hyperlink):
//a[text()="00730073"];
or finally, you could access it by the href itself:
//a[#href="/8000j0000000XlzAAE"]
The text and href are more likely to change over time, and thus break your script, so that's another reason to use ID whenever available. That can change too, but it rarely does.

Please try :
Using text of link
"//a[contains(.,'00730073')]";
I hope it will help you.

Related

Trying to extract data from Google results pages for a specific domain

so I'm trying to extract the URL, Title and decription from the SERPs for just 1 domain. This means I have to use the URL in some sort of "contains" function in order to get the corresponding title and description, right?
Since Google has the URL and the Title within the same element, I could get these easily via xpath.
My issue right now is the description, which is outside the initial where the URL is. So far I have tried Xpath as well as regex and couldn't find a way to make it work.
Here is some code that didn't work:
Xpath:
//div[#class="jtfYYd"]/a[starts-with(#href,'https://www.example.com')]//*[#class="NJo7tc Z26q7c uUuwM"]/div
Regex:
A: ["']href="https://www.example.com["']<div class="NJo7tc Z26q7c uUuwM"["']>(.*?)
B: (?=["']https://www.example.com["'])(?=["']NJo7tc Z26q7c uUuwM["'])(.*?)
I can only use Xpath1.0 since the tool (Screaming Frog) doesn't support Xpath 2.0. I hope someone has a solution.

AEM Rich Text Source Editor Anchor Tag Stripping href formed like Sightly tag

In my AEM project, we have client-side dynamic variable functionality which checks for any strings that are formed inside of a ${ } wrapper. The dynamic variable values are coming from our cookies. Replacing this with a more friendly format that does not conflict with Sightly is not an option at the moment, so please don't tell me to do that :)
When creating an anchor tag in the source editor of the Text core component, I am setting the href as the following: href="/content/en/opt-in.html?hash=${/profile/hash}". The anti-Samy configuration is blocking the href attribute from being rendered on this element, but I have tried to add the following to the overlayed file /apps/cq/xssprotection/config.xml:
<regexp name="expressionURLWithSpecialCharacters" value="(\$\{(\w|\/|:)+\})"/>
<regexp-list>
<regexp name="onsiteURL"/>
<regexp name="offsiteURL"/>
<regexp name="expressionURL"/>
<regexp name="expressionURLWithSpecialCharacters"/>
</regexp-list>
^ inside of the <attribute name="href"> block of common-attributes. Is there something else I need to do in order to make this not be filtered out so that it can be correctly parsed by the global variable replacement? Thanks!
There are two issues here:
The RTE will encode your URL and turn hash=${/profile/hash} into hash=$%7B/profile/hash%7D when storing into JCR
Even if you pass 1, the expression you are trying to use will only match EXACTLY the URL of ${/profile/hash}. You would need to expand the expression to include everything else (scheme, domain/host, path, query etc.). Think onsiteURL and offsiteURL but allowing your expression as well in query parameters. Have a look at https://github.com/apache/sling-org-apache-sling-xss/blob/master/src/main/java/org/apache/sling/xss/impl/XSSFilterImpl.java#L115 to get a starting point.
Have you tried adding disableXSSFiltering="{Boolean}true”?
Vlad, your second point was helpful in that I hadn't considered that one of the regular expressions in the XSS Protection configuration href attribute block needed to match the ${/profile/hash} in addition to the rest of the URL preceding and following it. Although to your first point, the RTE actually did save the special characters as-is into the JCR and did not encode them, probably since I was using the source editor mode and not the inline text editor.
What I ended up doing was creating a new regular expression as follows:
<regexp name="onsiteURLWithVariableExpression"
value="(?!\s*javascript(?::|&colon;))(?:(?://(?:(?:(?:(?:\p{L}\p{M}*)|[\p{N}-._~])|(?:%\p{XDigit}\p{XDigit})|(?:[!$&&apos;()*+,;=]))*#)?(?:\[(?:(?:(?:\p{XDigit}{1,4}:){6}(?:(?:\p{XDigit}{1,4}:\p{XDigit}{1,4})|(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])))|(?:::(?:\p{XDigit}{1,4}:){5}(?:(?:\p{XDigit}{1,4}:\p{XDigit}{1,4})|(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])))|(?:(?:\p{XDigit}{1,4}){0,1}::(?:\p{XDigit}{1,4}:){4}(?:(?:\p{XDigit}{1,4}:\p{XDigit}{1,4})|(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])))|(?:(?:(?:\p{XDigit}{1,4}:){0,1}\p{XDigit}{1,4})?::(?:\p{XDigit}{1,4}:){3}(?:(?:\p{XDigit}{1,4}:\p{XDigit}{1,4})|(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])))|(?:(?:(?:\p{XDigit}{1,4}:){0,2}\p{XDigit}{1,4})?::(?:\p{XDigit}{1,4}:){2}(?:(?:\p{XDigit}{1,4}:\p{XDigit}{1,4})|(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])))|(?:(?:(?:\p{XDigit}{1,4}:){0,3}\p{XDigit}{1,4})?::(?:\p{XDigit}{1,4}:){1}(?:(?:\p{XDigit}{1,4}:\p{XDigit}{1,4})|(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])))|(?:(?:(?:\p{XDigit}{1,4}:){0,4}\p{XDigit}{1,4})?::(?:(?:\p{XDigit}{1,4}:\p{XDigit}{1,4})|(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])))|(?:(?:(?:\p{XDigit}{1,4}:){0,5}\p{XDigit}{1,4})?::(?:\p{XDigit}{1,4}))|(?:(?:(?:\p{XDigit}{1,4}:){0,6}\p{XDigit}{1,4})?::))]|(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])|(?:(?:(?:\p{L}\p{M}*)|[\p{N}-._~])*|(?:%\p{XDigit}\p{XDigit})*|(?:[!$&&apos;()*+,;=])*))(?::\p{Digit}+)?(?:/|(/(?:(?:\p{L}\p{M}*)|[\p{N}-._~]|%\p{XDigit}\p{XDigit}|[!$&&apos;()*+,;=]|:|#)+/?)*))|(?:/(?:(?:(?:\p{L}\p{M}*)|[\p{N}-._~]|%\p{XDigit}\p{XDigit}|[!$&&apos;()*+,;=]|:|#)+(?:/|(/(?:(?:\p{L}\p{M}*)|[\p{N}-._~]|%\p{XDigit}\p{XDigit}|[!$&&apos;()*+,;=]|:|#)+/?)*))?)|(?:(?:(?:\p{L}\p{M}*)|[\p{N}-._~]|%\p{XDigit}\p{XDigit}|[!$&&apos;()*+,;=]|:|#)+(?:/|(/(?:(?:\p{L}\p{M}*)|[\p{N}-._~]|%\p{XDigit}\p{XDigit}|[!$&&apos;()*+,;=]|:|#)+)*)))?(?:\?(?:(?:\p{L}\p{M}*)|(\$\{(\w|\/|:)+\})|[\p{N}-._~]|%\p{XDigit}\p{XDigit}|[!$&&apos;()*+,;=]|:|#|/|\?)*)?(?:#(?:(?:\p{L}\p{M}*)|[\p{N}-._~]|%\p{XDigit}\p{XDigit}|[!$&&apos;()*+,;=]|:|#|/|\?)*)?"/>
which is just the onsiteURL with my original expressionURLWithSpecialCharacters: (\$\{(\w|\/|:)+\}) value added as a group in the query string parameter section. This enabled AEM to accept this as an href value in my anchor tag.
I appreciate everyone's help!

Replace double slash with single slash in import.io XPath selector

I am using import.io to scrape some pages. I came across a page that uses internal hrefs like this: http://domain.com//Event - notice the double slash after the domain name. From my research, this is done for SEO purposes but I need to get the url without those double slashes, so it returns http://domain.com/Event.
I am trying to use XPath (which I'm very new to) and I can get the link fine with: //a[contains(#class, 'event-info-btn')]//#href.
My next step was to try fn:repace() with this: fn:replace(//a[contains(#class, 'event-info-btn')]//#href, 'http://domain.com//', 'http://domain.com/'). This isn't working - nothing is returned.
I'm not sure if my implementation is bad, or if import.io just doesn't support this.
I'll also note the reason why I'm trying to do this: import.io is failing on all of the urls. If I manually remove the slash and try again, it works fine.
Note that import.io claims to support XPath 2.0.
Problem
You probably mean /#href rather than //#href, but that's not the real problem.
Your XPath is returning a sequence of href attributes where replace() is expecting a string.
Solution
For this HTML,
<div>
<a class="event-info-btn" href="http://domain.com//1">one</a>
<a class="event-info-btn" href="http://domain.com//2">one</a>
<a class="event-info-btn" href="http://domain.com//3">one</a>
</div>
this XPath,
for $href in //a[contains(#class, 'event-info-btn')]/#href
return replace($href, 'http://domain.com//', 'http://domain.com/')
will return
http://domain.com/1
http://domain.com/2
http://domain.com/3
as requested.
Update
This doesn't work in import.io and I'm having trouble finding a
fiddle-like site to test it.
You can see this working here.
Import.io, it seems, only allows you to input one line of xpath.
You might try putting the XPath on a single line, then:
for $href in //a[contains(#class, 'event-info-btn')]/#href return replace($href, 'http://domain.com//', 'http://domain.com/')
If that doesn't work, then import.io's claim that they support XPath 2.0 is not correct.

How to retrieve text from DIV tag using xpath extractor for jmeter

I am writing some load tests using Jmeter. Now I want to retrieve DetailIds, modeOrKey, and previouskey="d9d3f801-12fa-439f-924a-3ca3d9b4182c" from the tag. I am using XPATH extractor to extract it. Can you please guide how to retrieve that data from the following tag.
<div id="divBasket" class="basket" previousaction="reload" previousdata="DetailIds=5528e3e6-be52-4fe5-97be-ac2ba8f60956,426e0bfb-cd08-4420-8af4-364e352a7b79&modeOrKey=dd18682c-40bc-4f5e-9fc7-09d8ce77566f" previouskey="d9d3f801-12fa-439f-924a-3ca3d9b4182c">
Please guide.
I think the CSS/JQuery_Extractor is easier.
CSS\JQuery Expression: div#divBasket
Then you can get the attribute one by one. you still need to split DetailIds and modeOrKey by yourself.

Automatically make links to URLs

Basically at the moment I have:
http://link.com - plain text
I would like:
http://link.com
Is it possible to automatically add 'a href'?
If so how should I go about doing it?
Yes, you could parse your text on load with javascript. By using the appropriate regular expression, just replace each link with the anchored link version.
text.replace(**link regular expression**, "$1");
Note: The syntax is probably not right, but you get the idea.