I think the xpath is not standard but I do not know how to fetch data from this dam site
IMPORTXML("http://www.tsetmc.com/loader.aspx?ParTree=151311&i=36773155987365094#","/html/body/div[4]/form/div[3]/div[9]/span/div2/div2
")
As on this page you need to click on a tab to open what you want, it would be impossible to go through it.
So I had to open Google's Dev Tools, under Network and selected Fetch/XHR:
After that, I copied the text شركت سرمايه گذاري البرز-سهامي عام-, press CONTROL + F to search all url's and searched the text, it was found in a single instance:
http://www.tsetmc.com/Loader.aspx?Partree=15131T&c=IRO3ETLZ0007
Equipped with this url, when opening it, it is fixedly only that tab:
Well, then the hardest part is gone, now just create the desired path:
=IMPORTXML(
"http://www.tsetmc.com/Loader.aspx?Partree=15131T&c=IRO3ETLZ0007",
"//tr[#class='sh']"
)
Which resulted in what you want:
Related
Is there any possible solution to extract the file from any website when there is no specific file uploaded using download.file() in R.
I have this url
https://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=y&type=8&season=2016&month=0&season1=2016&ind=0
there is a link to export csv file to my working directory, but when i right click on the export data hyperlink on the webpage and select the link address
it turns to be the following script
javascript:__doPostBack('LeaderBoard1$cmdCSV','')
instead of the url which give me access to the csv file.
Is there any solution to tackle this problem.
You can use RSelenium for jobs like this. The script below works for me exactly as is, and it should for you as well with minor edits noted in the text. The solution uses two packages: RSelenium to automate Chrome, and here to select your active directory.
library(RSelenium)
library(here)
Here's the URL you provided:
url <- paste0(
"https://www.fangraphs.com/leaders.aspx",
"?pos=all",
"&stats=bat",
"&lg=all",
"&qual=y",
"&type=8",
"&season=2016",
"&month=0",
"&season1=2016",
"&ind=0"
)
Here's the ID of the download button. You can find it by right-clicking the button in Chrome and hitting "Inspect."
button_id <- "LeaderBoard1_cmdCSV"
We're going to automate Chrome to download the file, and it's going to go to your default download location. At the end of the script we'll want to move it to your current directory. So first let's set the name of the file (per fangraphs.com) and your download location (which you should edit as needed):
filename <- "FanGraphs Leaderboard.csv"
download_location <- file.path(Sys.getenv("USERPROFILE"), "Downloads")
Now you'll want to start a browser session. I use Chrome, and specifying this particular Chrome version (using the chromever argument) works for me. YMMV; check the best way to start a browser session for you.
An rsDriver object has two parts: a server and a browser client. Most of the magic happens in the browser client.
driver <- rsDriver(
browser = "chrome",
chromever = "74.0.3729.6"
)
server <- driver$server
browser <- driver$client
Using the browser client, navigate to the page and click that button.
Quick note before you do: RSelenium may start looking for the button and trying to click it before there's anything to click. So I added a few lines to watch for the button to show up, and then click it once it's there.
buttons <- list()
browser$navigate(url)
while (length(buttons) == 0) {
buttons <- browser$findElements(button_id, using = "id")
}
buttons[[1]]$clickElement()
Then wait for the file to show up in your downloads folder, and move it to the current project directory:
while (!file.exists(file.path(download_location, filename))) {
Sys.sleep(0.1)
}
file.rename(file.path(download_location, filename), here(filename))
Lastly, always clean up your server and browser client, or RSelenium gets quirky with you.
browser$close()
server$stop()
And you're on your merry way!
Note that you won't always have an element ID to use, and that's OK. IDs are great because they uniquely identify an element and using them requires almost no knowledge of website language. But if you don't have an ID to use, above where I specify using = "id", you have a lot of other options:
using = "xpath"
using = "css selector"
using = "name"
using = "tag name"
using = "class name"
using = "link text"
using = "partial link text"
Those give you a ton of alternatives and really allow you to identify anything on the page. findElements will always return a list. If there's nothing to find, that list will be of length zero. If it finds multiple elements, you'll get all of them.
XPath and CSS selectors in particular are super versatile. And you can find them without really knowing what you're doing. Let's walk through an example with the "Sign In" button on that page, which in fact does not have an ID.
Start in Chrome by pretty Control+Shift+J to get the Developer Console. In the upper left corner of the panel that shows up is a little icon for selecting elements:
Click that, and then click on the element you want:
That'll pull it up (highlight it) over in the "Elements" panel. Right-click the highlighted line and click "Copy selector." You can also click "Copy XPath," if you want to use XPath.
And that gives you your code!
buttons <- browser$findElements(
"#linkAccount > div > div.label-account",
using = "css selector"
)
buttons[[1]]$clickElement()
Boom.
I have an application by mean-stack that hosts a website and an Excel add-in. html5 is enabled, and it has
<script src="https://appsforoffice.microsoft.com/lib/1/hosted/office.js"></script>
<script src="https://cdn.rawgit.com/devote/HTML5-History-API/master/history.js"></script>
In the Excel add-in, I have a button that opens a page in the website by Dialog API:
$scope.openDialog = function () {
Office.context.ui.displayDialogAsync("https://localhost:3000/preview/tmp/6wr-4PqdBrYQwjp3AAAD", {}, function () {})
}
When I click on this button in Excel Online in Chrome, it opens a dialog box with the following url (note that # and several %2F are systematically appended), but it does not prevent from well displaying the page.
https://localhost:3000/preview/tmp/6wr-4PqdBrYQwjp3AAAD?_host_Info=excel|web|16.00|en-us|b6f37f78-e519-7d03-0069-b9c4317a362c|isDialog#%2Fpreview%2Ftmp%2F6wr-4PqdBrYQwjp3AAAD%3F_host_Info=excel%7Cweb%7C16.00%7Cen-us%7Cb6f37f78-e519-7d03-0069-b9c4317a362c%7CisDialog
However, when I click on this button in Excel Online in Firefox, the url changes quickly to the following url, which turns out to display the homepage of the website:
https://localhost:3000/home#%2Fpreview%2Ftmp%2F6wr-4PqdBrYQwjp3AAAD%3F_host_Info=excel%7Cweb%7C16.00%7Cen-gb%7C919fff78-e51f-dc20-0c3c-871b7d0ec25d%7CisDialog
So my questions are:
1) Why does Office.context.ui.displayDialogAsync add systematically # and %2F to the url of my website? Is it possible to prevent this from happening?
2) Is it possible to make a url (regardless of # and %2F) that Firefox accept?
Edit 1: It seems that if I don't use html5-history-api, it will work. So does not know how to disable html5-history-api for that displayDialogAsync? (The reason why I have to use html5-history-api after office.js is here.)
It is not displayDialogAsync that is doing this. It is a frequent issue with Angular routing and Angular location strategy. Search for "angular app is adding hash to my urls" and you will find a lot of information about it and solutions.
In another question a user used this url(1) which contains a data table and somehow converted the code into this url(2) in order to scrape using json and beautiful soup.
My question is how do you get the second url which is scrape friendly given the first url?
The user which somehow got the 2nd url was asked how he got it and it has been a while and he never responded..Here is a link to the original thread.
You can do that by using Google Chrome Developer Tools (and with other browsers as well).
Open Google Chrome browser and navigate to the first URL
Open Developer Tools (⌘ + option + i)
Head to "Network" tab
Click on "Preserve log" and on "XHR" (since it's an XMLHttpRequest)
Reload the page and you'll see the XMLHttpRequest to the second URL
Note: In this case I guessed that it was loaded by an XHR but I'd recommend to click "All" instead of "XHR" next time. You'll see more results and you'll need to filter and/or take some more time to find the call/request on the matter, but it'll be more accurate.
I need "fid" of google place if I have "cid"
CID: 13044706322416579131
FID: 0x395e84f3f650771d:0xb5081b4952eeb23b
As I found 2nd portion (0xb5081b4952eeb23b) of fid is hex of cid (13044706322416579131).
But I can't find how to get 1st portion (0x395e84f3f650771d)
First, you would start by locating the Google Plus Local page.
You then click the ‘Edit business details’ link on that G+ Local Page. That URL can then be transposed into a Google Maps (Google Places) URL such as in this example:
visit a Google+ Local page, such as: https://plus.google.com/109929267125901378263/
click ‘Edit business details’ on the G+ Local page
copy the first part of the URL on the page that appears after clicking on ‘Edit business details’.
paste the URL into a text file and you end up with a URL similar to this (but longer): https://maps.google.com/maps/place?cid=4851990417008427082&mode=edit […]
remove the text: /place … and remove anything after the CID # (meaning: remove ‘&mode’ and anything after it)
You end up with a Google Maps URL such as this, which you can bookmark for reference sake: https://maps.google.com/maps?cid=4851990417008427082. You would then finish the process by doing this:
copy the part that says cid=################### and paste it after this partial URL: www.google.com/mapmaker?gw=90&
the result is a Google Map Maker URL such as: www.google.com/mapmaker?gw=90&cid=4851990417008427082 (I call that the ‘gw 90’ GMM POI URL.)
However, it is often useful to have another version of the URL in the format of: www.google.com/mapmaker?gw=39&fid=. (I call that the ‘GW 39’ or the ‘FID’ version of the GMM POI) such as this: www.google.com/mapmaker?gw=39&fid=0x8804fe4c222357a7:0x692b02e73b5dc33b
Referred from here.
I am able to generate public URLs for iCloud files. e.g. https://www.icloud.com/documents/dl/?p=3&t=BAKsXkcDP-p8sdTS8NgBLWRQxE281oe4hogA
Accessing such a URL from a browser, I see a landing page, and shorty afterwards the file downloads automatically. Fine.
However, I want to be able to download this file from my iOS app (with NSURLConnection). How can I do this? Maybe...
a) process the html headers to somehow determine the direct URL?
b) intercept the redirect/refresh that triggers the download on a browser?
c) somehow imitate a browser in order to trigger a download?
Thanks
PS. please give me the idiot's answer- I'm clueless about html etc.
Here is the html response I'm getting for the indirect URL above:
var SC_benchmarkPreloadEvents={headStart:new Date().getTime()}; -->iCloud - Loading ...window.SC=window.SC||{MODULE_INFO:{},LAZY_INSTANTIATION:{}};SC.buildMode="production";
SC.buildNumber="1FCS22.32292";SC.buildLocale="en-us";String.preferredLanguage="en-us";window.SC=window.SC||{MODULE_INFO:{},LAZY_INSTANTIATION:{}};SC._detectBrowser=function(userAgent,language){var version,webkitVersion,browser={};
userAgent=(userAgent||navigator.userAgent).toLowerCase();language=language||navigator.language||navigator.browserLanguage;
version=browser.version=(userAgent.match(/.*(?:rv|chrome|webkit|opera|ie)/: ([ );]|$)/)||[])[1];
webkitVersion=(userAgent.match(/webkit/(.+?) /)||[])[1];browser.windows=browser.isWindows=!!/windows/.test(userAgent);
browser.mac=browser.isMac=!!/macintosh/.test(userAgent)||(/mac os x/.test(userAgent)&&!/like mac os x/.test(userAgent));
browser.lion=browser.isLion=!!(/mac os x 10_7/.test(userAgent)&&!/like mac os x 10_7/.test(userAgent));
browser.iPhone=browser.isiPhone=!!/iphone/.test(userAgent);browser.iPod=browser.isiPod=!!/ipod/.test(userAgent);
browser.iPad=browser.isiPad=!!/ipad/.test(userAgent);browser.iOS=browser.isiOS=browser.iPhone||browser.iPod||browser.iPad;
browser.android=browser.isAndroid=!!/android/.test(userAgent);browser.opera=/opera/.test(userAgent)?version:0;
browser.isOpera=!!browser.opera;browser.msie=/msie/.test(userAgent)&&!browser.opera?version:0;
browser.isIE=!!browser.msie;browser.isIE8OrLower=!!(browser.msie&&parseInt(browser.msie,10)<=8);
browser.mozilla=/mozilla/.test(userAgent)&&!/(compatible|webkit|msie)/.test(userAgent)?version:0;
browser.isMozilla=!!browser.mozilla;browser.webkit=/webkit/.test(userAgent)?webkitVersion:0;
browser.isWebkit=!!browser.webkit;browser.chrome=/chrome/.test(userAgent)?version:0;
browser.isChrome=!!browser.chrome;browser.mobileSafari=/apple.*mobile/.test(userAgent)&&browser.iOS?webkitVersion:0;
browser.isMobileSafari=!!browser.mobileSafari;browser.iPadSafari=browser.iPad&&browser.isMobileSafari?webkitVersion:0;
browser.isiPadSafari=!!browser.iPadSafari;browser.iPhoneSafari=browser.iPhone&&browser.isMobileSafari?webkitVersion:0;
browser.isiPhoneSafari=!!browser.iphoneSafari;browser.iPodSafari=browser.iPod&&browser.isMobileSafari?webkitVersion:0;
browser.isiPodSafari=!!browser.iPodSafari;browser.isiOSHomeScreen=browser.isMobileSafari&&!/apple.*mobile.*safari/.test(userAgent);
browser.safari=browser.webkit&&!browser.chrome&&!browser.iOS&&!browser.android?webkitVersion:0;
browser.isSafari=!!browser.safari;browser.language=language.split("-",1)[0];browser.current=browser.msie?"msie":browser.mozilla?"mozilla":browser.chrome?"chrome":browser.safari?"safari":browser.opera?"opera":browser.mobileSafari?"mobile-safari":browser.android?"android":"unknown";
return browser};SC.browser=SC._detectBrowser();if(typeof SC_benchmarkPreloadEvents!=="undefined"){SC.benchmarkPreloadEvents=SC_benchmarkPreloadEvents;
SC_benchmarkPreloadEvents=undefined}else{SC.benchmarkPreloadEvents={headStart:new Date().getTime()}
}SC.setupBodyClassNames=function(){var el=document.body;if(!el){return}var browser,platform,shadows,borderRad,classNames,style;
browser=SC.browser.current;platform=SC.browser.windows?"windows":SC.browser.mac?"mac":"other-platform";
style=document.documentElement.style;shadows=(style.MozBoxShadow!==undefined)||(style.webkitBoxShadow!==undefined)||(style.oBoxShadow!==undefined)||(style.boxShadow!==undefined);
borderRad=(style.MozBorderRadius!==undefined)||(style.webkitBorderRadius!==undefined)||(style.oBorderRadius!==undefined)||(style.borderRadius!==undefined);
classNames=el.className?el.className.split(" "):[];if(shadows){classNames.push("box-shadow")
}if(borderRad){classNames.push("border-rad")}classNames.push(browser);if(browser==="chrome"){classNames.push("safari")
}classNames.push(platform);var ieVersion=parseInt(SC.browser.msie,10);if(ieVersion){if(ieVersion===7){classNames.push("ie7")
}else{if(ieVersion===8){classNames.push("ie8")}else{if(ieVersion===9){classNames.push("ie9")
}}}}if(SC.browser.mobileSafari){classNames.push("mobile-safari")}if("createTouch" in document){classNames.push("touch")
}el.className=classNames.join(" ")};(function(){var styles=[];if(window.devicePixelRatio==2||window.location.search.indexOf("2x")>-1){styles=["/applications/documents/download/en-us/1FCS22.32292/stylesheet#2x-packed.css"];
SC.APP_IMAGE_ASSETS=["/applications/documents/sproutcore/desktop/en-us/1FCS22.32292/stylesheet-no-repeat#2x.png","/applications/documents/coreweb/views/en-us/1FCS22.32292/stylesheet-no-repeat#2x.png","/applications/documents/sproutcore/ace/en-us/1FCS22.32292/stylesheet-no-repeat#2x.png","/applications/documents/sproutcore/ace/en-us/1FCS22.32292/stylesheet-repeat-x#2x.png","/applications/documents/sproutcore/ace/en-us/1FCS22.32292/stylesheet-repeat-y#2x.png","/applications/documents/download/en-us/1FCS22.32292/stylesheet-no-repeat#2x.png","/applications/documents/download/en-us/1FCS22.32292/stylesheet-repeat-x#2x.png"]
}else{styles=["/applications/documents/download/en-us/1FCS22.32292/stylesheet-packed.css"];
SC.APP_IMAGE_ASSETS=["/applications/documents/sproutcore/desktop/en-us/1FCS22.32292/stylesheet-no-repeat.png","/applications/documents/coreweb/views/en-us/1FCS22.32292/stylesheet-no-repeat.png","/applications/documents/sproutcore/ace/en-us/1FCS22.32292/stylesheet-no-repeat.png","/applications/documents/sproutcore/ace/en-us/1FCS22.32292/stylesheet-repeat-x.png","/applications/documents/sproutcore/ace/en-us/1FCS22.32292/stylesheet-repeat-y.png","/applications/documents/download/en-us/1FCS22.32292/stylesheet-no-repeat.png","/applications/documents/download/en-us/1FCS22.32292/stylesheet-repeat-x.png"]
}var head=document.getElementsByTagName("head")[0],len=styles.length,idx,css;for(idx=0;
idxSC.benchmarkPreloadEvents.headEnd=new Date().getTime();SC.benchmarkPreloadEvents.bodyStart=new Date().getTime();if(SC.setupBodyClassNames){SC.setupBodyClassNames()};SC.benchmarkPreloadEvents.bodyEnd=new Date().getTime();
As of July 2012, the following seems to work. But there's no guarantee that apple won't change their scheme for generating these, and it's possible that they would regard this as a private API and reject your app. So use at your own risk.
The URL has two important parameters, p and t. The first seems to identify a server, while the second identifies the actual file. The direct download link is made by plugging these values into this URL:
https://p[p]-ubiquityws.icloud.com/ws/file/[t]
Looking at your example:
https://www.icloud.com/documents/dl/?p=3&t=BAKsXkcDP-p8sdTS8NgBLWRQxE281oe4hogA
p is 3, and t is BAKsXkcDP-p8sdTS8NgBLWRQxE281oe4hogA. So your direct download link would be
https://p3-ubiquityws.icloud.com/ws/file/BAKsXkcDP-p8sdTS8NgBLWRQxE281oe4hogA
Whenever I've published a link to iCloud, p has been 01; so it's possible that you might need to zero-pad your value in which case your URL would be
https://p03-ubiquityws.icloud.com/ws/file/BAKsXkcDP-p8sdTS8NgBLWRQxE281oe4hogA
It would be great to know whether that's necessary.
In iCloud Drive / iOS8 the links are different, but you can still get a direct link to the files.
Original link:
https://www.icloud.com/attachment?u=https%3A%2F%2Fms-eu-ams-103-prod.digitalhub.com%2FB%2FATmkKK8ju8SRwQqDoEFKJzbRsxiuAXQ3PBcJBXw1Qot9jz68TkqjiiNu%2F%24%7Bf%7D%3Fo%3DAtenENR8OcvlNq6JMa331mr-8gCreXxwcfgQ26B5gFKo%26v%3D1%26x%3D3%26a%3DBclucinSeKmFAy2GJg%26e%3D1413787013%26k%3D%24%7Buk%7D%26r%3D567CC38A-FD1B-4DE6-B11B-4166A5669E1B-1%26z%3Dhttps%253A%252F%252Fp03-content.icloud.com%253A443%26s%3DlO5SolOouS9qhYz1oIxKDoGtMpo%26hs%3DovfPXj3b9XXz9lWKChBmyNq_cug&uk=OXDCcLTETbvUcOKdJ-vTdQ&f=Testdatei.vrphoto&sz=1212622
URL decoded to be more readable:
https://www.icloud.com/attachment?u=https://ms-eu-ams-103-prod.digitalhub.com/B/ATmkKK8ju8SRwQqDoEFKJzbRsxiuAXQ3PBcJBXw1Qot9jz68TkqjiiNu/${f}?o=AtenENR8OcvlNq6JMa331mr-8gCreXxwcfgQ26B5gFKo&v=1&x=3&a=BclucinSeKmFAy2GJg&e=1413787013&k=${uk}&r=567CC38A-FD1B-4DE6-B11B-4166A5669E1B-1&z=https%3A%2F%2Fp03-content.icloud.com%3A443&s=lO5SolOouS9qhYz1oIxKDoGtMpo&hs=ovfPXj3b9XXz9lWKChBmyNq_cug&uk=OXDCcLTETbvUcOKdJ-vTdQ&f=Testdatei.vrphoto&sz=1212622
Save the text between '?u=' and '&uk=' as a NSMutableString
Save the information after 'uk=' and 'f=' as NSStrings
In the first string replace the text '${f}' with the 'f=' string and replace the text '${uk}' whith the 'uk=' string
If you need the files size for any reason, it's the number after 'sz=', but this is not needed for the final link
Voila, here is your direct link to the file:
https://ms-eu-ams-103-prod.digitalhub.com/B/ATmkKK8ju8SRwQqDoEFKJzbRsxiuAXQ3PBcJBXw1Qot9jz68TkqjiiNu/Testdatei.vrphoto?o=AtenENR8OcvlNq6JMa331mr-8gCreXxwcfgQ26B5gFKo&v=1&x=3&a=BclucinSeKmFAy2GJg&e=1413787013&k=OXDCcLTETbvUcOKdJ-vTdQ&r=567CC38A-FD1B-4DE6-B11B-4166A5669E1B-1&z=https%3A%2F%2Fp03-content.icloud.com%3A443&s=lO5SolOouS9qhYz1oIxKDoGtMpo&hs=ovfPXj3b9XXz9lWKChBmyNq_cug
It looks like the heavy lifting is done by the file referenced there:
https://www.icloud.com/applications/documents/download/en-us/1FCS22.32292/javascript-packed.js
I'd start there looking for the file name etc.