URLdecode unsuccessfully creating link - html

I have a website URL that I am trying to turn into a hyperlink using R that displays the first 25 characters for a shiny page. The code that use to do that is below.
val <- "https://www.google.com/"
sprintf(paste0('', substr(val, 1, 25) ,''))
If val is set equal to the login page for National Instruments where on my chrome browser the information is autofilling for a saved username and password based on one of my past logins that I saved:
val <- "https://lumen.ni.com/nicif/us/LMS_LOGIN/content.xhtml?du=http%3A%2F%2Fsine.ni.com%3A80%2Fmyni%2Fself-paced-training%2Fapp%2Fmain.xhtml%3Fsessionid%3D3-E63B1535-F81F-46C9-A867-E3176E756971%26requestedurl%3Dlearncenter%252Easp%253Fid%253D178409%2526page%253D1"
the sprintf function throws the error Error in sprintf(paste0("<a href=\"", URLdecode(val), "\" target=\"_blank\">", : too few arguments
The issue appears to be towards the end. If the link is truncated before the %252Easp, the sprintf function works as intended. This is the first I have worked with html, so as far as my initial research goes, the R function URLdecode should take care of special characters to ensure that this doesn't happen. If someone could explain to me why this is throwing the error it is and how to fix it I would greatly appreciate it.

Related

Seeing "internal error: Huge input lookup [1]" error when trying to create a large powerpoint file using OFFICER package

I don't have a great way to give a reproduceable example, but here's my best description. I'm running a loop that generates 60 different powerpoint slides, each in officer and creates a list, which results in a "pptx document with 60 slides" in my R environment. However, when I try to print this list, I see the following error:
Error in read_xml.raw(charToRaw(enc2utf8(x)), "UTF-8", ..., as_html = as_html, :
internal error: Huge input lookup [1]
I tried running the list with only 10 powerpoint slides, and the print works, creating a slide deck of 10 slides. But I guess 60 is beyond the level that is considered "huge." Is there a way to override this? I saw some other posts about how you can add a Huge override, but I'm not exactly sure where I would do that.
set options = c("HUGE") for read_xml()

R - Twitter Extraction - Error in .subset2(x, i, exact=exact)

I am making an R-script to get all of the mentions (#username) of a specific set of users.
My first issue isn't a big deal. I try to work at home, as well as work. At work, the code works fine. At home, I get Error 32 - Could not authenticate you from Oauth. This is using the exact same code, key, secret, token. I have tried resetting my secret key/token, same thing. Not a problem, since I can do remote login, but its frustrating.
The REAL issue here...
I construct a URL (ex: final_url = "https://api.twitter.com/1.1/search/tweets.json?q=#JimFKenney&until=2015-10-25&result_type=recent&count=100")
Then I search twitter for my query of #usernameDesired to get all the comments where they were mentioned.
mentions = GET(final_url, sig)
This works fine, but then I want my data in a usable format so I do...
library(rjson)
#install.packages("jsonlite", repos="http://cran.rstudio.com/")
library(jsonlite)
#install.packages("bit64", repos="http://cran.rstudio.com/")
json = content(mentions)
I then get the following error -
$statuses
Error in .subset2(x, i, exact = exact) : subscript out of bounds
I don't have even the first idea of what can be causing this.
Any help is gratly appreciated.
EDIT 1: For Clarity, I get the error when trying to see what is in json. If I do "json = content(mentions)" that line of code executes fine. I then type "json" to see what is in the variable, and I get the above error that starts with $statuses.

R: getting data from website, method POST, dropdown menu options change

I'm trying to use R to extract data from a website where I have to select information from 5 dropdown menus and then click on an export or consult button (http://200.20.53.7/dadosaguaweb/default.aspx). I found this excellent thread: Getting data in R as dataframe from web source, but it didn't answer my question because of some differences:
1) The website's form's method is Post, not Get;
I tried using the RHTMLForms package together with RCurl, in a way that would work for Post or Get. Namely:
baseURL <- "http://200.20.53.7/dadosaguaweb/default.aspx"
forms<-getHTMLFormDescription(baseURL)
form1<-forms$form1
dadosAgua<-createFunction(form1)
dadosDef<-dadosAgua(75,"PS0421",1979,2015,6309)
2) The website is one of those where the list of options for the second dropdown menu changes according to what you selected for the first one and so on. Therefore, when I set the first input parameter to "75", it does not accept the second one as "PS0421" because that option is not available when the first parameter is at its default value.
So, I tried a step-by-step approach, changing one parameter at a time, like this:
baseURL <- "http://200.20.53.7/dadosaguaweb/default.aspx"
forms1 <- getHTMLFormDescription(baseURL)
form1 <- forms1$form1
dadosAgua1 <- createFunction(form1)
dadosDef1 <- dadosAgua1(75)
forms2 <- getHTMLFormDescription(dadosDef1)
form2 <- forms2$form1
dadosAgua2 <- createFunction(form2)
dadosDef2 <- dadosAgua2(75,"PS0421")
And I get the error message:
Error in function (type, msg, asError = TRUE) : Empty reply from server
Now I'm completely stuck.
I think what you're trying to do is navigation scripting, i.e. getting code to interact with a webpage. It may be complicated to do that programatically, because in order for the fields in the form to change in response to what you click, you have to actually be on a web-browser.
An alternative might be for you to use a tool that can do that for you, like CasperJS, which uses a headless browser, so the page fields can change based on behaviour you script. I don't know how comfortable you are with Javascript, and I don't know of any R packages that can do what casperjs does, so I can't recommend anything else.
Edit:
Take a look at RSelenium

httr: retrieving data with POST()

Disclaimer: while I have managed to grab data from another source using httr's POST function, let it be known that I am a complete n00b with regards to httr and HTML forms in general.
I would like to bring some data directly into R from a website using httr. My first attempt involved passing a named list to the body arg (as is shown in this vignette). However, I noticed square brackets in the form input names (at least I think they're the form input arguments). So instead, I tried passing in the body as a string as I think it should appear in the request body:
url <- 'http://research.stlouisfed.org/fred2/series/TOTALSA/downloaddata'
query <- paste('form[native_frequency]=Monthly', 'form[units]=lin',
'form[frequency]=Monthly', 'form[obs_start_date]="1976-01-01"',
'form[obs_end_date]="2014-11-01"', 'form[file_format]=txt'
sep = '&')
response <- POST(url, body = query)
In any case, the above code just returns the webpage source code and I cannot figure out how to properly submit the form so that it returns the same data as manually clicking the form's 'Download Data' button.
In Developer Tools/Network on Chrome, it states in the Response Header under Content-Disposition that there is a text file attachment containing the data when I manually click the 'Download Data' button on the form. It doesn't appear to be in any of the headers associated with the response object in the code above. Why isn't this file getting returned by the POST request--where's the file with the data going?
Feels like I'm missing something obvious. Anyone care to help me connect the dots?
Generally if you're going to use httr, you let it build and encode the data for you, you just pass in the information via a list of form values. Try
url<-"http://research.stlouisfed.org/fred2/series/TOTALSA/downloaddata"
query <- list('form[native_frequency]'="Monthly",
'form[units]'="lin",
'form[frequency]'="Monthly",
'form[obs_start_date]'="1996-01-01",
'form[obs_end_date]'="2014-11-01",
'form[file_format]'="txt")
response <- POST(url, body = query)
content(response, "text")
and the return looks something like
[1] "Title: Total Vehicle Sales\r\nSeries ID: TOTALSA\r\nSource:
US. Bureau of Economic Analysis\r\nRelease: Supplemental Estimates, Motor
Vehicles\r\nSeasonal Adjustment: Seasonally Adjusted Annual Rate\r\nFrequency: Monthly\r\nUnits:
Millions of Units\r\nDate Range: 1996-01-01 to 2014-11-
01\r\nLast Updated: 2014-12-05 7:16 AM CST\r\nNotes: \r\n\r\nDATE
VALUE\r\n1996-01-01 14.8\r\n1996-02-01 15.6\r\n1996-03-01 16.0\r\n1996-04-01 15.5\r\n1996-05-01
16.0\r\n1996-06-01 15.3\r\n1996-07-01 15.1\r\n1996-08-01 15.5\r\n1996-09-01 15.5\r\n1996-10-01 15.3\r

Find a specific string in html source

My goal is to find a predefined string in an HTML source of a specific site that I have extracted using c++, but I'm getting some errors. Here is my source code so far:
So after I connect to the internet and the site and all I have this...
addr = InternetOpenUrl...
dmbp = char dmbp[5000]
dba = DWORD dba = 0
while (InternetReadFile(addr, dmbp, 80000, &dba) && dba)
{
string str2 = dmbp;
size_t sf1 = str2.find(string1);
if (sf1!=string::npos)
{printf("found");
// manipulate it...
}else{printf("not found");}
}
My problem is that it never actually confirms that it found the value that I need, it always says that the value is not found, but I even statically insert the page and look at myself and i can see the value that i need, it just doesnt show up. Does anyone with experience in html extraction with c++ know what I'm missing or how I can get this to work?
There is nothing wrong with the string search code as far as I can see, the problem is that we don't know exactly what you are searching for.
As pure HTML can be full of special characters (such as " or ", the string you might be looking for should deal with those characters. Also, strings can contain newlines and html tags (such as <b></b> within a single word), and they should be specified in the search string as string::find looks for an exact match (including any newline).
Also, I suggest debugging your code and see if the website's text/code is actually loaded into str2.
Looking at the information given that's currently the only issue I can think of why your code doesn't work.