Scape data after login rvest - html

Please help me.
I'm trying to scrape the split table but actually I can't do and I don't understand why.
This is the url:
https://www.strava.com/activities/1983801964
This is the credential to login:
email=trytest#tiscali.it
password=12345678
This is my code:
pgsession<-html_session("https://www.strava.com/login")
pgform<-html_form(pgsession)[[1]]
filled_form<-set_values(pgform, email="trytest#tiscali.it", password="12345678")
submit_form(pgsession, filled_form)
page<-jump_to(pgsession, "https://www.strava.com/activities/1983801964")
page%>%html_nodes(xpath='//*[#id="contents"]')
And I get {xml_nodeset (0)}
I tried everything, also
page%>%html_nodes("body")%>%html_text()
But I can't get this information, please help me!!
Thanks in advance

I cannot find the split data in the HTML. Therefore, it may not be possible to scrape the splits from the HTML like this.
Alternatively, you can download the raw activity data. Link: https://support.strava.com/hc/en-us/articles/216918437-Exporting-your-Data-and-Bulk-Export
Edit: you may also be able to use this method to download Strava data: https://scottpdawson.com/export-strava-workout-data/
Edit 2: The splits are contained in a DIV called "splits-container". But, the source HTML is likely modified by javascript after the page is loaded. This means you will probably not be able to scrape the data without running the javascript first. Hope this helps.

Related

IMPORTHTML() doesn't work in this webpage

I want to import data from a table from the following page:
https://basketballmonster.com/playerrankings.aspx
When I do so, with all players filter selected, only the top players are imported to my googlesheet. Can someone help me achieve this? Appreciation in advance.
I attached the googlesheet below for your review:
https://docs.google.com/spreadsheets/d/1uvhNp6gBnnEvs8CBb4K7onccew_doFp96wmFEsYyLBk/edit?usp=sharing
Google Sheets can't know what your browser has, so it doesn't know whether you selected which filter. You have to get the same html for Sheets that you have displayed in your browser, which means adding the filter in your query.
Since it looks like the controls aren't passed as parameters in a GET request, sadly it's not as simple as appending
?PlayerFilterControl=TopPlayers to the url.
You have to POST it as a payload with a post request like
{ 'PlayerFilterControl': 'AllPlayers'}.
Sadly Google Sheets IMPORTHTML() doesn't support post request yet, so you'll have to get into apps scripts, request and xml parsing.
I suggest you check out these:
https://developers.google.com/apps-script/reference/url-fetch/url-fetch-app
https://developers.google.com/apps-script/reference/xml-service

Using R-selenium to scrape data from an aspx webpage

I am pretty new to r and selenium so hopefully i can express myself clearly about my question.
I want to scrape some data off a website (.aspx) and i need to type some chemical code to be able to pull out some information in the next page (using R-selenium to input and click element). So far i have been able to build a short code that will get me through the first step, i.e. pull out the correct page i wanted. But i had so much trouble in finding a good way to scrape the data (the chemical information in the table) off this website. Mainly because the website will not assign a new html address instead of give me the same aspx address for any chemical i search. I plan to overcome this and then build a loop so i can scrape more information automatically. Anyone has any good thoughts that how i should get the data off after click-element? I need the chemical information table in the second page.
Thanks heaps in advance!
Here i put my code that i wrote so far: the next step i need is to scrape the table out the next page!
library("RSelenium")
checkForServer()
startServer()
mybrowser <- remoteDriver()
mybrowser$open()
mybrowser$navigate("http://limitvalue.ifa.dguv.de/")
mybrowser$findElement(using = 'css selector', "#Tbox_cas")
wxbox <- mybrowser$findElement(using = 'css selector', "#Tbox_cas")
wxbox$sendKeysToElement(list("64-19-7"))
wxbutton <- mybrowser$findElement(using = 'css selector', "#Butsearch")
wxbutton$clickElement()
First of all, your tool choice is wrong.
Secondly, in your case
POST to the "permanent" url
302 redirect to a new url, which is http://limitvalue.ifa.dguv.de/WebForm_ueliste2.aspx in your case
GET the new url
Thirdly, what's the ultimate output you are after?
It really depends on how much data you are up to. Otherwise do a manual task.

Scrape CSS to bulk-check responsiveness

I have a list of web domains and would like to check if they are built to be mobile-responsive. A fairly sure way to check this manually is to see if there are "#media" queries in the style.css.
I've used XPATH (IMPORTXML) previously to bulk-check for strings on webpages, but I don't see an obvious way of importing the css files in bulk and search for a string within them. Is there a way to do this? Ideally, I'd like to accomplish it in Google Sheets or with Google Apps Script.
Thank you!
You can use Google's Mobile-Friendly Test if you want to use a GUI.
If you want to use a REST API, try this (replace url parameter for what you want to test):
https://www.googleapis.com/pagespeedonline/v3beta1/mobileReady?url=http://facebook.com
This will return a JSON object. It will return lots of useful info, but if you are just looking for mobile friendliness, look for the true or false result here:
"ruleGroups": {
"USABILITY": {
"pass": true
}
Hope that helps!

Include html pages with Google Closure

I'm working with Google Closure. I'm trying to include some html files in another one. Just like A.html import B.html and C.html, but actually, I don't get how to do that.
Can anyone could give some orientation please?
Thx in advance.
As far as I know you cant "include" html pages like that. The options you got is:
1: use ajax to fetch content
http://docs.closure-library.googlecode.com/git/closure_goog_net_xhrio.js.html
http://www.googleclosure.com/google-closure-ajax/
2: Google closure templates
https://developers.google.com/closure/templates/?csw=1
3: Use a serverside language like php to include your file.
http://www.php.net/manual/en/function.include-once.php
I really don't understand.
1) Have you HTML in JS and u don't know how to join it?
try goog.dom.appendChild(parent, child)
2) You don't know how to get it into JS?
You have to send it from server, or If I were in your shoes... use soy templates

What is VHTML? How it works? Where can I find information about it?

The code below continues many lines until it ends with a expected /veotherwise /vechoose. I started working on a development firm a little ago where they use this html version called vhtml. I have search the web but it brings different definitions for vhtml. I have seen some posts in Joomla about vhtml but they don't look like the code below. I was expecting to get a pointer on how to understand the language.
It looks very similar to normal html with even very similar commands, or maybe smalltalk. But I just can decipher it. Any help will be appreciated. Please post comments if you want more information.
<vechoose>
<vewhen criteria='isPortalEdit'>
widget: practices-landing-page
</vewhen>
<veotherwise>
<veinclude src='private/webportal/webtemplate-content.vhtml'>
<vesection name='content-body'>
<% // Determine portlet visibility %>
<vecalc expression='isEmpty = false' output='none' />
<vechoose>
<vewhen criteria='isEmpty'>
<veif criteria='portlet.ifEmptyDo == "Hide"'>
<script>getTag( 'portlet_<%=portlet.order%>' ).style.display = "none";</script>
</veif>
<veif criteria='portlet.ifEmptyDo == "Show Message"'>
<%#portlet.ifEmptyMessage%>
</veif>
</vewhen>
...
Managed to find this: http://vitrage.sibweb.ru/english/ Looks like it could be an Apache Module called VITRAGE. Not much available in English however so am really unsure if it's a match.
On reading the code sample you posted, it looks like a XML styled procedural language. Are you sure it's available elsewhere or perhaps something that was developed internally?
I think this is an internal language between to bring server side aspect to a display on the browser. I have been unable to find documentation on this language, and I don't think Vitrage explains it. The server uses coyote as web browser, tomcat as a servlet handler and java as the backend.
Any new information please post.