Error in http_statuses - subscript out of bounds - html

Can someone explain me why session2 gives me following error:
library("rvest")
uastring = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"
session = html_session("https://www.linkedin.com/job/", user_agent(uastring))
session2 = html_session("https://www.linkedin.com/job/")
Error in http_statuses[[as.character(status)]] : subscript out of
bounds
I have these example from https://stat4701.github.io/edav/2015/04/02/rvest_tutorial/
How I can check which value of uastring I have to put to html_session (for different sites). I don't ask about this specific site (I put it here because it's comes from tutorial).

Related

Why can't this text be parsed through fastjson2?

import com.alibaba.fastjson2.JSONArray
JSONArray.parseArray(str).toString()
I use the toString method of fastjson2 to parse this JSON string, but I will encounter an error:
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
at com.alibaba.fastjson2.JSONWriterUTF16JDK8.writeString(JSONWriterUTF16JDK8.java:183)
at com.alibaba.fastjson2.writer.ObjectWriterImplMap.write(ObjectWriterImplMap.java:428)
at com.alibaba.fastjson2.writer.ObjectWriterImplMap.write(ObjectWriterImplMap.java:457)
at com.alibaba.fastjson2.writer.ObjectWriterImplList.write(ObjectWriterImplList.java:278)
at com.alibaba.fastjson2.JSONArray.toString(JSONArray.java:871)
Similar strings can work normally. I really can't find which special character caused them.
My str is:
[{"response_info":{"header":"Content-Length: 388\r\nContent-Type: application/octet-stream\r\nUser-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.84 Safari/537.36\r\nHost: 180.102.211.212\r\n","body":"\u0000\u0000\u0000\u0003seq\u0000\u0000\u0000\u000241\u0000\u0000\u0000\u0003ver\u0000\u0000\u0000\u00011\u0000\u0000\u0000\tweixinnum\u0000\u0000\u0000\n1429629729\u0000\u0000\u0000\u0007authkey\u0000\u0000\u0000D0B\u0002\u0001\u0001\u0004;09\u0002\u0001\u0002\u0002\u0001\u0001\u0002\u0004U6k!\u0002\u0003\u000fBA\u0002\u0004\u0015zXu\u0002\u0004\ufffd\ufffdf\ufffd\u0002\u0003\u000fU\ufffd\u0002\u0003\u0006\u0000\u0000\u0002\u0004U6k!\u0002\u0004d=\u001eS\u0002\u0004\ufffd\ufffd7\u0019\u0004\u0000\u0000\u0000\u0000\u0006rsaver\u0000\u0000\u0000\u00011\u0000\u0000\u0000\brsavalue\u0000\u0000\u0000\ufffd\ufffd\ufffd\ufffd\ufffd\u0006\u001d\ufffd_;\ufffdi\ufffdT.\ufffd\ufffd\"CK\ufffd/\u00169\u0018\u0015bI\ufffd\ufffd`<n\ufffd\ufffd\ufffdw\ufffd\ufffd\ufffd!\ufffd\u001a\u0003\ufffdHh\ufffdP%i$\ufffd$\ufffd\u0005\ufffd<\ufffd8\ufffd\ufffd\ufffd\ufffd\n\ufffd$\u0016A-O5\ufffd`\r\ufffd\ufffdc\ufffd\ufffd\u001b\ufffd\ufffd\r3\ufffd\ufffd`\ufffd)\ufffd\ufffdV\ufffdf \ufffd`\t\ufffd%\u0010\ufffd\ufffd\ufffdJ\ufffd\u001aCu\u0010\u000b\ufffd\u0001X\ufffd\ufffd\u01b7\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd.\u0000\u0000\u0000\u0007filemd5\u0000\u0000\u0000 0d65f9a4beb26b55874965490344abef\u0000\u0000\u0000\bfiletype\u0000\u0000\u0000\u00015\u0000\u0000\u0000\u0006touser\u0000\u0000\u0000\u00101688854880368629"}}]
fastjson version is 2.0.10

How to enable proper Description reason in N1QL Couchbase in case of query failure. Or any Exception Id(Icode)?

Following is the one failure log for same id here it is mark as status as success or error in failure case.
Here it is failure case in description it mention just "A N1QL EXPLAIN statement was executed" but didnt gave proper Exception Id and proper Description details-
{"clientContextId":"INTERNAL-b8d19563-94a1-442d-9a09-dde36743fb7d","description":"A
N1QL EXPLAIN statement was
executed","id":28673,"isAdHoc":true,"metrics":{"elapsedTime":"11.921ms","executionTime":"11.764ms","resultCount":1,"resultSize":649},"name":"EXPLAIN
statement","node":"127.0.0.1:8091","real_userid":{"domain":"builtin","user":"Administrator"},"remote":{"ip":"127.0.0.1","port":44695},"requestId":"958a7e12-d5a6-4d7b-bd40-ac9bb60cf4a3","statement":"explain
INSERT INTO `Guardium` (KEY, VALUE) \nVALUES ( "id::5554\n", { "Emp
Name": "Test4", "Emp Company" : "GS Lab", "Emp Country" :
"India"} )\nRETURNING
*;","status":"errors","timestamp":"2021-01-07T09:37:00.486Z","userAgent":"Mozilla/5.0
(Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/87.0.4280.88 Safari/537.36 (Couchbase Query Workbench
(6.6.1-9213-enterprise))"}
Please provide input on this, I want proper description about why this N1QL failure reason in audit.logs
Thank you..

Powerschool Login Form Data

I'm trying to login to PowerSchool to scrape my grades. Whenever I run the code it gives me the login pages HTML code instead of the secured pages HTML code.
Question 1: How do I get the value of the 3 fields that change labeled 'this changes' in the code above, and submit it to the current post?
Question 2: Am I required to add anything in the code for my password that gets hashed each post.
https://ps.lphs.net/public/home.html <--- Link to login page for HTML code.
Picture of form data on chrome
import requests
payload = {
'pstoken': 'this changes',
'contextData': 'this changes',
'dbpw': 'this changes',
'translator_username': '',
'translator_password': '',
'translator_ldappassword': '',
'serviceName':' PS Parent Portal',
'serviceTicket':'',
'pcasServerUrl':' /',
'credentialType':'User Id and Password Credential',
'account':'200276',
'pw':'my password',
'translatorpw':''
}
head = {'User-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3180.0 Safari/537.36'}
with requests.Session() as s:
p = s.post('https://ps.lphs.net/public/', data=payload, headers=head)
r = s.get('https://ps.lphs.net/guardian/home.html')
print(r.text)
EDIT 1 :
s.headers = {
'User-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3180.0 Safari/537.36'}
p = s.get('https://ps.lphs.net/guardian/home.html')
print(p.text)
r = s.post('https://ps.lphs.net/guardian/home.html', data=payload,
headers={'Content-Type': 'application/x-www-form-urlencoded',
'Referer': 'https://ps.lphs.net/public/home.html'})
print(r.text)
Give this a shot. It should fetch you the valid response:
import requests
payload = {
'pstoken': 'this changes',
'contextData': 'this changes',
'dbpw': 'this changes',
'translator_username': '',
'translator_password': '',
'translator_ldappassword': '',
'serviceName':' PS Parent Portal',
'serviceTicket':'',
'pcasServerUrl':' /',
'credentialType':'User Id and Password Credential',
'account':'200276',
'pw':'my password',
'translatorpw':''
}
with requests.Session() as s:
s.headers={'User-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3180.0 Safari/537.36'}
r = s.post('https://ps.lphs.net/guardian/home.html',data=payload,
headers={'Content-Type': 'application/x-www-form-urlencoded',
'Referer':'https://ps.lphs.net/public/home.html'})
print(r.text)
Btw, change the parameter in payload (if needed) to get logged in.

Get data from URL json in R

I'm using R and I would like to get JSON information from url and I have around 5000 user agent to sent to this API (http://www.useragentstring.com/pages/api.php)
I use this code to create the url and concatenate the user-agent:
url_1<-paste(" \"http://www.useragentstring.com/?uas=",uaelenchi[11,1],"&getJSON=all\"",sep = '');
json_data2<-fromJSON(readLines(cat(url_1)))
But I receive this error:
Error in readLines(cat(url_1)) : 'con' is not a connection
Any suggestions would be really appreciated! Thanks
I use rjson::fromJSON(file = paste(your_url)). If you make a reproducible example, I could check if it is working in your case.
library(httr)
library(jsonlite)
library(purrr)
uas <- c("Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:17.0) Gecko/20100101 Firefox/17.0",
"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:17.0) Gecko/20100101 Firefox/17.0",
"Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.6 Safari/537.11",
"Mozilla/5.0 (X11; OpenBSD amd64; rv:28.0) Gecko/20100101 Firefox/28.0",
"Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.6 Safari/537.11",
"Mozilla/5.0 (X11; OpenBSD amd64; rv:28.0) Gecko/20100101 Firefox/28.0",
"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:14.0) Gecko/20120405 Firefox/14.0a1",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1944.0 Safari/537.36",
"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:14.0) Gecko/20120405 Firefox/14.0a1",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1944.0 Safari/537.36")
parse_uas <- function(uas) {
res <- GET("http://www.useragentstring.com/", query=list(uas=uas, getJSON="all"))
stop_for_status(res)
content(res, as="text", encoding="UTF-8") %>%
fromJSON(res, flatten=TRUE) %>%
as.data.frame(stringsAsFactors=FALSE)
}
map_df(uas, parse_uas)
To save API calls you should add a caching layer to the parse_uas() function, which could be done pretty easily with the memoise package:
library(memoise)
.parse_uas <- function(uas) {
res <- GET("http://www.useragentstring.com/", query=list(uas=uas, getJSON="all"))
stop_for_status(res)
content(res, as="text", encoding="UTF-8") %>%
fromJSON(res, flatten=TRUE) %>%
as.data.frame(stringsAsFactors=FALSE)
}
parse_uas <- memoise(.parse_uas)
Also, if you're on Linux, you can also try this package (it doesn't compile well on macOS and not at all on Windows IIRC) which will do all the processing locally.

In R, getURL() results a page saying too many request. But that page is viewable in Broswer

I am trying to get the page from www.dotabuff.com.
library(RCurl)
url <- "http://www.dotabuff.com/heroes/abaddon/matchups"
webpage <- getURL(url,verbose = TRUE)
The result is a page from dotabuff complaining too many requests. I was expecting a html page with a table, like the one viewable in a web browser. I have tried http, https, getURLContent, etc.
I think this has something to do with the type of request getURL sent, or maybe something tricky about that website.
Add a header to the request...
library(RCurl)
url <- "http://www.dotabuff.com/heroes/abaddon/matchups"
options(RCurlOptions = list(verbose = TRUE, useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13"))
webpage <- getURL(url,verbose = TRUE)
* Trying 23.235.40.64...
* Connected to www.dotabuff.com (23.235.40.64) port 80 (#0)
> GET /heroes/abaddon/matchups HTTP/1.1
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13
Host: www.dotabuff.com
Accept: */*
< HTTP/1.1 200 OK
...