find contact us pages using google importxml xpath - html

I am trying to pull links to contact pages from a list of urls in column B. I have tried the following but I get an error:
=IMPORTXML(B10,"//a[contains('contact')]/#href")
I want to be able to get href value for every a element that has anchor text containing the word "contact".
Any help would be appreciated.

Without seeing the URL you import, I can only comment on the XPath expression. Your expression is not valid, contains() always takes two arguments. Use
=IMPORTXML(B10,"//a[contains(.,'contact')]/#href")
If that does not give the expected result, you have to tell us the URL of the document you are importing.

Related

Trying to extract data from Google results pages for a specific domain

so I'm trying to extract the URL, Title and decription from the SERPs for just 1 domain. This means I have to use the URL in some sort of "contains" function in order to get the corresponding title and description, right?
Since Google has the URL and the Title within the same element, I could get these easily via xpath.
My issue right now is the description, which is outside the initial where the URL is. So far I have tried Xpath as well as regex and couldn't find a way to make it work.
Here is some code that didn't work:
Xpath:
//div[#class="jtfYYd"]/a[starts-with(#href,'https://www.example.com')]//*[#class="NJo7tc Z26q7c uUuwM"]/div
Regex:
A: ["']href="https://www.example.com["']<div class="NJo7tc Z26q7c uUuwM"["']>(.*?)
B: (?=["']https://www.example.com["'])(?=["']NJo7tc Z26q7c uUuwM["'])(.*?)
I can only use Xpath1.0 since the tool (Screaming Frog) doesn't support Xpath 2.0. I hope someone has a solution.

SPFX Content Query [Handlebars] Unable to get a valid URL from list Hyperlink column

I am using the Content Query webpart for sharepoint online and try to query a hyperlink column from a list.
I am query my "URL" column for the bing address.
<div class="url">{{URL.htmlValue}}</div>
But this returns
https://sharepoint-tenant/sites/%3Ca%20href=%22http://bing.com%22%3EBING%3C/a%3E
How can I just get the direct URL?
Not sure I understand your problem correctly, but if you already have URL encoded, you can use triple quotes to avoid double-encoding with handlebars:
<div class="url">{{{URL.htmlValue}}}</div>
How do you get the value for the URL? As far as I know, the value of hyperlink column should have a Description property and a Url property. You could just use the url property to get the direct URL.

Google App Script. How can i get hyperlink name?

Have troubles with extracting url from text in google document. I found method getLinkUrl(offset). It works fine, but if i for example add link not directly to text but via menu item Insert -> Link i can specify some name for link, for example name, that will be displayed in text will be StackOverflow and the link is https://stackoverflow.com, so when i use method getLinkUrl() i will get hyperlink but not the name. I need to get position of this link on text, so i need to get this word StackOverflow but i didn't find corresponding method for this.
Is someone know how i can get this word or its start and end indexes?
I know about attributeIndices property which contains indexes of starting of special formatting of text and it can be used in this case, but if link is last word in paragraph, there is no any more attribute index in this array, so i can not determine length of this word.
Would be appreciate for any advice)

Can't get XPath code right in Google Spreadsheet

I'd start by saying that I have very little knowledge (sadly) of HTML since I do not work in IT. Anyways, I am trying to get the Google Spreadsheet to display data from this website:
http://www.oddsportal.com/soccer/england/premier-league/results/#/page/2/
I would like to display data from the central table, where results and odds are displayed. I go to Inspect and get the XPath code for this table is:
//*[#id="tournamentTable"]
But, upon typing it into the importxml function, will give me the error
=importxml("http://www.oddsportal.com/soccer/england/premier-league/results/#/page/2/"; "//*[#id="tournamentTable"]")
After reading here and there on this site, I tried editing the formula and substituting " with ' inside the query, but it gives a blank response.
Could anybody help me?
Am I redirecting the wrong XPath or did I type anything wrong inside the formula?

extracting value from a <ul> with specific text text using HTMLAgilitypack

I'm trying to extract a link from http://www.raws.dri.edu/cgi-bin/rawLIST.pl?idIAN1+id
this site contains an unsorted list and I want to get the link for Daily Summary.
So far I've tried using an xpath string of "//ul/li/a" using the .SelectNodes() method. Doing so returns only the first item in the list which is what I want but ultimately in the future I may want to get the link to a different page so being able to specify which link to retrieve is what I need.
If you use //ul/li/a, you should get all the <a> links, not one.
If you want to extract the links that contain some text (e.g. Time Series Graph), you can do:
//ul/li/a[contains(text(), 'Time Series Graph')]
Similar, if you're looking for some specific text in the href attribute:
//ul/li/a[contains(#href, 'Time Series Graph')]
By the way, I see you have asked many questions pointing to the same website, etc. My suggestion is: Learn a little bit of XPath, the basics, and read a tutorial about how HtmlAgilityPack works (pretty simple once you understand the basics of XPath), and then start working on that scraper.