Read an html table via URL with R - html

I'm trying to add this data https://datahub.io/core/country-list/r/0.html to R as a df, but can't find a function to read a table from html file.
(the link to the git post about the data https://datahub.io/core/country-list#readme)
read_csv and read.table doesn't work.
I want to add this data to my df and use it in a project. Will appreciate your help!
read_csv and read.table doesn't work. I also tried XML package but got an error:
readHTMLTable("http://datahub.io/core/country-list/r/0.html", header = c(Country_Name, Code))
Error: failed to load external entity "http://datahub.io/core/country-list/r/0.html"
The same error I got using htmlParse
I want to add this data to my df and use it in a project. Will appreciate your help!

Related

"Unable to infer schema for JSON." error in PySpark?

I have a json file with about 1,200,000 records.
I want to read this file with pyspark as :
spark.read.option("multiline","true").json('file.json')
But it causes this error:
AnalysisException: Unable to infer schema for JSON. It must be specified manually.
When I create a json file with a smaller record count in the main file, this code can read the file.
I can read this json file with pandas, when I set the encoding to utf-8-sig:
pd.read_json("file.json", encoding = 'utf-8-sig')
How can I solve this problem?
Try this out:
spark.read.option("multiline","true").option("inferSchema", "true").json('file.json')
Since adding the encoding helps, maybe the following is what you need:
spark.read.json("file.json", multiLine=True, encoding="utf8")

Python - How to update a value in a json file?

I hate json files. They are unwieldy and hard to handle :( Please tell me why the following doesn't work:
with open('data.json', 'r+') as file_object:
data = json.load(file_object)[user_1]['balance']
am_nt = 5
data += int(am_nt['amount'])
print(data)
file_object[user_1]['balance'] = data
Through trial and error (and many print statements), I have discovered that it opens the file, goes to the correct place, and then actually adds the am_nt, but I can't make the original json file update. Please help me :( :( . I get:
2000
TypeError: '_io.TextIOWrapper' object is not subscriptable
json is fun to work with as it is similar to python data structures.
The error is: object is not subscriptable
This error is for this line:
file_object[user_1]['balance'] = data
file_object is not json/dictionary data that can be updated like above. Hence the error.
Try to read the json data:
data=json.load(file_object)
Then manipulate the data as python dictionary. And save the file.

reading .csv file + JSON with Matlab

So I have a .CSV file that contains dataset information, the data seems to be described in JSON. I want to read it with MatLab. One line example(7000 total) of the data:
imagename.jpg,"[[{""name"":""nose"",""position"":[2911.68,1537.92]},{""name"":""left eye"",""position"":[3101.76,544.32]},{""name"":""right eye"",""position"":[2488.32,544.32]},{""name"":""left ear"",""position"":null},{""name"":""right ear"",""position"":null},{""name"":""left shoulder"",""position"":null},{""name"":""right shoulder"",""position"":[190.08,1270.08]},{""name"":""left elbow"",""position"":null},{""name"":""right elbow"",""position"":[181.44,3231.36]},{""name"":""left wrist"",""position"":[2592,3093.12]},{""name"":""right wrist"",""position"":[2246.4,3965.76]},{""name"":""left hip"",""position"":[3006.72,3360.96]},{""name"":""right hip"",""position"":[155.52,3412.8]},{""name"":""left knee"",""position"":null},{""name"":""right knee"",""position"":null},{""name"":""left ankle"",""position"":[2350.08,4786.56]},{""name"":""right ankle"",""position"":[1460.16,5019.84]}]]","[[{""segment"":[[0,17.28],[933.12,5175.36],[0,5166.72],[0,2306.88]]}]]",https://imageurl.jpg,
If I use the Import functionlity/tool, I am able separate the data in four colums using the , as delimiter:
Image File Name,Key Points,Segmentation,Image URL,
imagename.jpg,
"[[{""name"":""nose"",""position"":[2911.68,1537.92]},{""name"":""left eye"",""position"":[3101.76,544.32]},{""name"":""right eye"",""position"":[2488.32,544.32]},{""name"":""left ear"",""position"":null},{""name"":""right ear"",""position"":null},{""name"":""left shoulder"",""position"":null},{""name"":""right shoulder"",""position"":[190.08,1270.08]},{""name"":""left elbow"",""position"":null},{""name"":""right elbow"",""position"":[181.44,3231.36]},{""name"":""left wrist"",""position"":[2592,3093.12]},{""name"":""right wrist"",""position"":[2246.4,3965.76]},{""name"":""left hip"",""position"":[3006.72,3360.96]},{""name"":""right hip"",""position"":[155.52,3412.8]},{""name"":""left knee"",""position"":null},{""name"":""right knee"",""position"":null},{""name"":""left ankle"",""position"":[2350.08,4786.56]},{""name"":""right ankle"",""position"":[1460.16,5019.84]}]]",
"[[{""segment"":[[0,17.28],[933.12,5175.36],[0,5166.72],[0,2306.88]]}]]",
https://imageurl.jpg,
But I have truble trying to use the tool to do further decomposition of the data. Of corse the ideal would be to separate the data in a code.
I hope someone can orientate me on how to or the tools I need to use. I have seen other questions, but they don't seem to fit my particular case.
Thank you very much!!
You can read a JSON file and store it in a MATLAB structure using the following command structure1 = matlab.internal.webservices.fromJSON(json_string)
You can create a JSON string from a MATLAB structure using the following command json_string= matlab.internal.webservices.toJSON(structure1)
JSONlab is what you want. It has a 'loadjson' function which inputs a char array of JSON data and returns a struct with all the data

Twitter User posts, retrieved with smappR and stored in JSON format, are not being read in to R

I am using the smappR packageto retrieve Twitter user posts, specifically the getTimeline() function.
However, the problem is that the retrieved data, which has been stored in JSON format is not subsequently being read in as a JSON file by R.
The image below denotes the command and the corresponding error -
Was wondering if there is any other way we can read the files back into R for further processing?
Any help will be appreciated.
Edit 1 : Funnily enough, the file does not appear to be read in, even when I attempted the same in Python (2.7)
The Python Code is as follows -
with open('C:/Users/ABC/Downloads/twitter/profile/bethguido3.JSON') as data_file:
data = json.load(data_file)
The error that appeared is -

Read a Text File into R

I apologize if this has been asked previously, but I haven't been able to find an example online or elsewhere.
I have very dirty data file in a text file (it may be JSON). I want to analyze the data in R, and since I am still new to the language, I want to read in the raw data and manipulate as needed from there.
How would I go about reading in JSON from a text file on my machine? Additionally, if it isn't JSON, how can I read in the raw data as is (not parsed into columns, etc.) so I can go ahead and figure out how to parse it as needed?
Thanks in advance!
Use the rjson package. In particular, look at the fromJSON function in the documentation.
If you want further pointers, then search for rjson at the R Bloggers website.
If you want to use the packages related to JSON in R, there are a number of other posts on SO answering this. I presume you searched on JSON [r] already on this site, plenty of info there.
If you just want to read in the text file line by line and process later on, then you can use either scan() or readLines(). They appear to do the same thing, but there's an important difference between them.
scan() lets you define what kind of objects you want to find, how many, and so on. Read the help file for more info. You can use scan to read in every word/number/sign as element of a vector using eg scan(filename,""). You can also use specific delimiters to separate the data. See also the examples in the help files.
To read line by line, you use readLines(filename) or scan(filename,"",sep="\n"). It gives you a vector with the lines of the file as elements. This again allows you to do custom processing of the text. Then again, if you really have to do this often, you might want to consider doing this in Perl.
Suppose your file is in JSON format, you may try the packages jsonlite ou RJSONIO or rjson. These three package allows you to use the function fromJSON.
To install a package you use the install.packages function. For example:
install.packages("jsonlite")
And, whenever the package is installed, you can load using the function library.
library(jsonlite)
Generally, the line-delimited JSON has one object per line. So, you need to read line by line and collecting the objects. For example:
con <- file('myBigJsonFile.json')
open(con)
objects <- list()
index <- 1
while (length(line <- readLines(con, n = 1, warn = FALSE)) > 0) {
objects[[index]] <- fromJSON(line)
index <- index + 1
}
close(con)
After that, you have all the data in the objects variable. With that variable you may extract the information you want.