R json,incomplete final line found - json

My problem: R json,incomplete final line found
My effort: I followed 'Incomplete final line' warning when trying to read a .csv file into R
I used this site to check my files validity. It is data from my facebook news feed collected using graph api.
My code:
library("rjson")
work<-"C:/ContainingFolder/"
json_data <- fromJSON(paste(readLines(paste0(work,"SunwayFB.txt")), collapse=""))
My error:
Warning message:
In readLines(paste0(work, "SunwayFB.txt")) :
incomplete final line found on 'C:/ContainingFolder/SunwayFB.txt'

It works without any errors if you read the file with fromJSON instead of readLines.
fp <- file.path(work, "SunwayFB.txt")
json_data <- fromJSON(file = fp)
By the way: For the readLines way, you have to add a new line at the end of the file.

You can ignore the warning message.
readLines(paste0(work,"SunwayFB.txt"))
add warn field.
readLines(paste0(work,"SunwayFB.txt"), warn=FALSE)

In most cases, Incomplete final line warnings can be averted by appending a new line to the file you are trying to open. Just go to the end of file -> press enter -> Save the file -> re-run whatever command you are using to load it in R and it shall show no warning.

Related

error finding and uploading a file in octave

I tried converting my .csv file to .dat format and tried to load the file into Octave. It throws an error:
unable to find file filename
I also tried to load the file in .csv format using the syntax
x = csvread(filename)
and it throws the error:
'filename' undefined near line 1 column 13.
I also tried loading the file by opening it on the editor and I tried loading it and now it shows me
warning: load: 'filepath' found by searching load path
error: load: unable to determine file format of 'Salary_Data.dat'.
How can I load my data?
>> load Salary_Data.dat
error: load: unable to find file Salary_Data.dat
>> Salary_Data
error: 'Salary_Data' undefined near line 1 column 1
>> Salary_Data
error: 'Salary_Data' undefined near line 1 column 1
>> Salary_Data
error: 'Salary_Data' undefined near line 1 column 1
>> x = csvread(Salary_Data)
error: 'Salary_Data' undefined near line 1 column 13
>> x = csvread(Salary_Data.csv)
error: 'Salary_Data' undefined near line 1 column 13
>> load Salary_Data.dat
warning: load: 'C:/Users/vaith/Desktop\Salary_Data.dat' found by searching load path
error: load: unable to determine file format of 'Salary_Data.dat'
>> load Salary_Data.csv
warning: load: 'C:/Users/vaith/Desktop\Salary_Data.csv' found by searching load path
error: load: unable to determine file format of 'Salary_Data.csv'
Salary_Data.csv
YearsExperience,Salary
1.1,39343.00
1.3,46205.00
1.5,37731.00
2.0,43525.00
2.2,39891.00
2.9,56642.00
3.0,60150.00
3.2,54445.00
3.2,64445.00
3.7,57189.00
3.9,63218.00
4.0,55794.00
4.0,56957.00
4.1,57081.00
4.5,61111.00
4.9,67938.00
5.1,66029.00
5.3,83088.00
5.9,81363.00
6.0,93940.00
6.8,91738.00
7.1,98273.00
7.9,101302.00
8.2,113812.00
8.7,109431.00
9.0,105582.00
9.5,116969.00
9.6,112635.00
10.3,122391.00
10.5,121872.00
Ok, you've stumbled through a whole pile of issues here.
It would help if you didn't give us error messages without the commands that produced them.
The first message means you were telling Octave to open something called filename and it couldn't find anything called filename. Did you define the variable filename? Your second command and the error message suggests you didn't.
Do you know what Octave's working directory is? Is it the same as where the file is located? From the response to your load commands, I'd guess not. The file is located at C:/Users/vaith/Desktop. Octave's working directory is probably somewhere else.
(Try the pwd command and see what it tells you. Use the file browser or the cd command to navigate to the same location as the file. help pwd and help cd commands would also provide useful information.)
The load command, used as a command (load file.txt) can take an input that is or isn't defined as a string. A function format (load('file.txt') or csvread('file.txt')) must be a string input, hence the quotes around file.txt. So all of your csvread input commands thought you were giving it variable names, not filenames.
Last, the fact that load couldn't read your data isn't overly surprising. Octave is trying to guess what kind of file it is and how to load it. I assume you tried help load to see what the different command options are? You can give it different options to help Octave figure it out. If it actually is a csv file though, and is all numbers not text, then csvread might still be your best option if you use it correctly. help csvread would be good information for you.
It looks from your data like you have a header line that is probably confusing the load command. For data that simply formatted, the csvread command can bring in the data. It will replace your header text with zeros.
So, first, navigate to the location of the file:
>> cd C:/Users/vaith/Desktop
then open the file:
>> mydata = csvread('Salary_Data.csv')
mydata =
0.00000 0.00000
1.10000 39343.00000
1.30000 46205.00000
1.50000 37731.00000
2.00000 43525.00000
...
If you plan to reuse the filename, you can assign it to a variable, then open the file:
>> myfile = 'Salary_Data.csv'
myfile = Salary_Data.csv
>> mydata = csvread(myfile)
mydata =
0.00000 0.00000
1.10000 39343.00000
1.30000 46205.00000
1.50000 37731.00000
2.00000 43525.00000
...
Notice how the filename is stored and used as a string with quotation marks, but the variable name is not. Also, csvread converted non-numeric header data to 'zeros'. The help for csvread and dlmread show you how to change it to something other than zero, or to skip a certain number of rows. If you want to preserve the text, you'll have to use some other input function.

Display html report in jupyter with R

The qa() function of the ShortRead bioconductor library generates quality statistics from fastq files. The report() function then prepares a report of the various measures in an html format. A few other questions on this site have recommended using the display_html() function of IRdisplay to show html in jupyter notebooks using R (irkernel). However it only throws errors for me when trying to display an html report generated by the report() function of ShortRead.
library("ShortRead")
sample_dir <- system.file(package="ShortRead", "extdata", "E-MTAB-1147") # A sample fastq file
qa_object <- qa(sample_dir, "*fastq.gz$")
qa_report <- report(qa_object, dest="test") # Makes a "test" directory containing 'image/', 'index.html' and 'QA.css'
library("IRdisplay")
display_html(file = "test/index.html")
Gives me:
Error in read(file, size): unused argument (size)
Traceback:
1. display_html(file = "test/index.html")
2. display_raw("text/html", FALSE, data, file, isolate_full_html(list(`text/html` = data)))
3. prepare_content(isbinary, data, file)
4. read_all(file, isbinary)
Is there another way to display this report in jupyter with R?
It looks like there's a bug in the code. The quick fix is to clone the github repo, and make the following edit to the ./IRdisplay/R/utils.r, and on line 38 change the line from:
read(file,size)
to
read(size)
save the file, switch to the parent directory, and create a new tarbal, e.g.
tar -zcf IRdisplay.tgz IRdisplay/
and then re-install your new version, e.g. after re-starting R, type:
install.packages( "IRdisplay.tgz", repo=NULL )

Scraping HTM into R data tables

I am trying to read a local htm file and used below code
rawHTML <- paste(readLines("HARVANSH CHAWLA.htm"),collapse = "\n")
but its showing below warning messages
Warning messages:
1: In readLines("HARVANSH CHAWLA.htm") :
line 670 appears to contain an embedded nul
2: In readLines("HARVANSH CHAWLA.htm") :
incomplete final line found on 'HARVANSH CHAWLA.htm'
I have attach the file for reference also.
https://1drv.ms/f/s!AgSrwi8NCr0BgxjmFpQ0vOdF6Wpg
Many thanks

how to read 7z json file in R

Cannot find the answer how to load 7z file in R. I can't use this:
s <- system("7z e -o <path> <archive>")
because of error 127. Maybe that's because I'm on Windows? However, 7z opens when I click in TotalCommander.
I'm trying something like this:
con <- gzfile(path, 'r')
ff <- readLines(con, encoding = "UTF-8")
h <- fromJSON(ff)
I have Error:
Error: parse error: trailing garbage
7z¼¯' ãSp‹ Ë:ô–¦ÐÐY#4U¶å¿ç’
(right here) ------^
The encoding is totally not there, when I load this file uncompressed it's ok without specifying the encoding. Moreover it's 2x longer. I have thousands of 7z files need to read them one by one in a loop, read, analyze and get out. Could anyone give me some hints how to do it effectively?
When uncompressed it easily works using:
library(jsonlite)
f <- read_json(path, simplifyVector = T)
EDIT
There are many json files in one 7z file. The above error is probably caused by parser which reads raw data of whole file. I don't know how to link these files or specify the connection attributes.

Python3 csv writer failing, exiting on error "TypeError: 'newline' is an invalid keyword argument for this function

What I'm trying to do:
I'm trying to change the formatting on a csv file from space delimited to comma delimited.
What I've done:
I can ingest the csv file just fine, and print the output row-by-row to the console. That code looks like this:
with open(txtpath, mode='r', newline='') as f:
fReader = csv.reader(f)
for rows in fReader:
print(rows)
This does exactly what it's supposed to, and spot checking the output confirms that the rows are being read correctly.
The Problem:
According to the official Python3 Documentation on csv.writer, "If csvfile is a file object, it should be opened with newline='' 1." My code looks like this:
with open(csvpath, 'w') as g:
gWriter = csv.writer(g, newline='')
gWriter.writerows(rows)
so all together, it looks like this:
with open(txtpath, mode='r', newline='') as f:
fReader = csv.reader(f)
for rows in fReader:
print(rows)
with open(csvpath, 'w') as g:
gWriter = csv.writer(g, newline='')
gWriter.writerows(rows)
However, when I run the code with both Pycharm (Anacondas 3.4 selected as project interpreter) and from the console with python3 mycode.py, both results tell me that newline "is an invalid keyword argument for this function" and references line 42, which is where my writer object is instantiated. I ran it through the debugger and it craps out as soon as I try to create the writer object. If I don't add the newline argument it asks for a dialect specification, so that doesn't work, either.
I'm sure there's something blindingly obvious that I'm missing, but I can't see it.
To avoid this error , open the file in 'wb' mode instead of 'w' mode. This will eliminate the need for newline.
Corrected code is as follows:
with open(csvpath, 'wb') as g:
gWriter = csv.writer(g)
gWriter.writerows(rows)
The wording is just a bit vague. The file should be opened with newline='', but newline is not a valid option for csv.writer() itself.
newline does not work in with open('output.csv', 'a',newline='') as fp. It will return back an error:
'newline' is an invalid keyword argument for this function
I used 'ab' method and it worked without blank lines between the lines:
with open('output.csv', 'ab',) as fp
I met the same problem and the solution may sound funny but it does work.
I actually changed the position of the newline argument to be in front of mode as below:
with open("database.csv", newline="", mode="a") as database2:
You can test and see the result.
The following will do
with open('filename_inCSV.csv',mode='r') as csvfile:
or
with open('filename_inCSV.csv','r') as csvfile: