readHTMLTable in R throws warning within for loop - html

Hi I have 5 html sources, in which I want to run readHTMLTable on each and store the result. I can do this individually using:
readHTMLTable(iso.content[1],which=6)
readHTMLTable(iso.content[2],which=6)
.
.
however when putting this into a for loop I get:
library(XML)
> iso.table<-NULL
> for (i in 1:nrow(gene.iso)) {
+ iso.table[i]<-readHTMLTable(iso.content[i],which=6)
+ }
Warning messages:
1: In iso.table[i] <- readHTMLTable(iso.content[i], which = 6) :
number of items to replace is not a multiple of replacement length
2: In iso.table[i] <- readHTMLTable(iso.content[i], which = 6) :
number of items to replace is not a multiple of replacement length
3: In iso.table[i] <- readHTMLTable(iso.content[i], which = 6) :
number of items to replace is not a multiple of replacement length
4: In iso.table[i] <- readHTMLTable(iso.content[i], which = 6) :
number of items to replace is not a multiple of replacement length
5: In iso.table[i] <- readHTMLTable(iso.content[i], which = 6) :
number of items to replace is not a multiple of replacement length
So I can do this individually, but not using a for loop. It is not my aim to replace the current data with the next iteration, so I am unsure why the warning presents this.
any ideas?

The error has nothing to do with readHTMLTable really; it's all about iso.table. I'm not sure what type of object you wanted that to be, but if you want to store a bunch of data.frames, you're going to need a list. And when you're assigning objects to a list, you want to place them with [[ ]] not [ ]. Try
iso.table <- list()
for (i in 1:nrow(gene.iso)) {
iso.table[[i]] <- readHTMLTable(iso.content[i],which=6)
}

Related

r fetch data from mysql db loop

I successfully fetch data from my mysql db using r:
library(RMySQL)
mydb = dbConnect(MySQL(), user='user', password='pass', dbname='fib', host='myhost')
rs = dbSendQuery(mydb, 'SELECT distinct(DATE(date)) as date, open,close FROM stocksng WHERE symbol = "FIB7F";')
data <- fetch(rs, n=-1)
dbHasCompleted(rs)
so now I've an object a list:
> print (typeof(data))
[1] "list"
each elements is a tuple(?) like date(charts),open(long),close(long)
ok well now my problem: I want to get a vector of percentuale difference betwen close (x) and next day open (x+1) until the end BUT I can't access properly to the item!
Example: ((open)/close*100)-100)
I try:
for (item in data){
print (item[2])
}
and all possible combination like:
for (item in data){
print (item[][2])
}
but cannot access to right element :! anyone could help?
You have a bigger problem than this in your MySQL query, because you did not specify an ORDER BY clause. Consider using the following query:
SELECT DISTINCT
DATE(date) AS date,
open,
close
FROM stocksng
WHERE
symbol = "FIB7F"
ORDER BY
date
Here we order the result set by date, so that it makes sense to speak of the current and next open or close. Now with a proper query in place if you wanted to get the percentile difference between the current close and the next day open you could try:
require(dplyr)
(lead(open, 1) / close*100) - 100
Or using base R:
(open[2:(length(open)+1)] / close*100) - 100
naif version:
for (row in 1:nrow(data)){
date <- unname (data[row,"date"])
open <- unname (data[row+1,"open"])
close <- unname (data[row,"close"])
var <- abs((close/open*100)-100)
print (var)
}

Error while trying to parse json into R

I have recently started using R and have a task regarding parsing json in R to get a non-json format. For this, i am using the "fromJSON()" function. I have tried to parse json as a text file. It runs successfully when i do it with just a single row entry. But when I try it with multiple row entries, i get the following error:
fromJSON("D:/Eclairs/Printing/test3.txt")
Error in feed_push_parser(readBin(con, raw(), n), reset = TRUE) :
lexical error: invalid char in json text.
[{'CategoryType':'dining','City':
(right here) ------^
> fromJSON("D:/Eclairs/Printing/test3.txt")
Error in feed_push_parser(readBin(con, raw(), n), reset = TRUE) :
parse error: trailing garbage
"mumbai","Location":"all"}] [{"JourneyType":"Return","Origi
(right here) ------^
> fromJSON("D:/Eclairs/Printing/test3.txt")
Error in feed_push_parser(readBin(con, raw(), n), reset = TRUE) :
parse error: after array element, I expect ',' or ']'
:"mumbai","Location":"all"} {"JourneyType":"Return","Origin
(right here) ------^
The above errors are due to three different formats in which i tried to parse the json text, but the result was the same, only the location suggested by changed.
Please help me to identify the cause of this error or if there is a more efficient way o performing the task.
The original file that i have is an excel sheet with multiple columns and one of those columns consists of json text. The way i tried right now is by extracting just the json column and converting it to a tab separated text and then parsing it as:
fromJSON("D:/Eclairs/Printing/test3.txt")
Please also suggest if this can be done more efficiently. I need to map all the columns in the excel to the non-json text as well.
Example:
[{"CategoryType":"dining","City":"mumbai","Location":"all"}]
[{"CategoryType":"reserve-a-table","City":"pune","Location":"Kothrud,West Pune"}]
[{"Destination":"Mumbai","CheckInDate":"14-Oct-2016","CheckOutDate":"15-Oct-2016","Rooms":"1","NoOfPax":"3","NoOfAdult":"3","NoOfChildren":"0"}]
Consider reading in the text line by line with readLines(), iteratively saving the JSON dataframes to a growing list:
library(jsonlite)
con <- file("C:/Path/To/Jsons.txt", open="r")
jsonlist <- list()
while (length(line <- readLines(con, n=1, warn = FALSE)) > 0) {
jsonlist <- append(jsonlist, list(fromJSON(line)))
}
close(con)
jsonlist
# [[1]]
# CategoryType City Location
# 1 dining mumbai all
# [[2]]
# CategoryType City Location
# 1 reserve-a-table pune Kothrud,West Pune
# [[3]]
# Destination CheckInDate CheckOutDate Rooms NoOfPax NoOfAdult NoOfChildren
# 1 Mumbai 14-Oct-2016 15-Oct-2016 1 3 3 0

Iteratively read a fixed number of lines into R

I have a josn file I'm working with that contains multiple json objects in a single file. R is unable to read the file as a whole. But since each object occurs at regular intervals, I would like to iteratively read a fixed number of lines into R.
There are a number of SO questions on reading single lines into R but I have been unable to extend these solutions to a fixed number of lines. For my problem I need to read 16 lines into R at a time (eg 1-16, 17-32 etc)
I have tried using a loop but can't seem to get the syntax right:
## File
file <- "results.json"
## Create connection
con <- file(description=file, open="r")
## Loop over a file connection
for(i in 1:1000) {
tmp <- scan(file=con, nlines=16, quiet=TRUE)
data[i] <- fromJSON(tmp)
}
The file contains over 1000 objects of this form:
{
"object": [
[
"a",
0
],
[
"b",
2
],
[
"c",
2
]
]
}
With #tomtom inspiration I was able to find a solution.
## File
file <- "results.json"
## Loop over a file
for(i in 1:1000) {
tmp <- paste(scan(file=file, what="character", sep="\n", nlines=16, skip=(i-1)*16, quiet=TRUE),collapse=" ")
assign(x = paste("data", i, sep = "_"), value = fromJSON(tmp))
}
I couldn't create a connection as each time I tried the connection would close before the file had been completely read. So I got rid of that step.
I had to include the what="character" variable as scan() seems to expect a number by default.
I included sep="\n", paste() and collapse=" " to create a single string rather than the vector of characters that scan() creates by default.
Finally I just changed the final assignment operator to have a bit more control over the names of the output.
This might help:
EDITED to make it use a list and Reduce into one file
## Loop over a file connection
data <- NULL
for(i in 1:1000) {
tmp <- scan(file=con, nlines=16, skip=(i-1)*16, quiet=TRUE)
data[[i]] <- fromJSON(tmp)
}
df <- Reduce(function(x, y) {paste(x, y, collapse = " ")})
You would have to make sure that you don't reach further than the end of the file though ;-)

How to check each value is greater or less than zero in csv file using python?

I want to check each value of one column and according to the values give them label (trends) on the next column. For example, if the value is greater than zero or equal or less than zero, according to this positive , negative and same labels are to be written in next column.
My input file is look like this :
Weightage /// column name
0.000555
0.002333
0
-0.22222
And I want my output file is look like:
Weightage Labels // column name
0.000555 positive
0.002333 positive
0 same
-0.22222 negative
Any one can help me??
The code is:
print (results)
for r in results:
if r >0:
print("test")
label = "positive"
print(label)
elif r == 0.0:
label = "equal"
print(label)
else:
print("nothing")
I have problem in 'r' for loop.
The error occur :
Traceback (most recent call last):
File "C:\Python34\col.py", line 23, in <module>
if r >0:
TypeError: unorderable types: tuple() > int()
At first glance, it looks like you are confusing rows and columns. I suggest using more explicit names. It helps to avoid confusion. Also, do not compare strings to numeric types like integers. It will give surprising results in Python 2. In Python 3, it is an error.
for row in results:
column = row[0] # The first column of this row.
value = float(column) # The csv module returns strings, so we should
# turn them into floats for numeric comparison.
if value > 0:
print "positive"
elif value < 0:
print "negative"
else:
print "zero"

Trying to append col.name on to a vector

I have several functions that I am trying to implement in R(studio). I will show the simplest one. I am trying to append names on to a vector for later use as a col.name.
# Initialize
headerA <- vector(mode="character",length=20)
headerA[1]="source";headerA[2]="matches"
# Function - add on new name
h <- function(df, compareA, compareB) {
new_header <- paste(compareA,"Vs",compareB,sep="_")
data.frame(df,new_header)
}
# Comparison 1:
compareA <-"AA"
compareB <-"BB"
headers <- (headerA, compareA, compareB)
But I am getting this error and it is very puzzling. I have googled it but the search is too vague/broad.
When run I get:
headers <- (headerA, compareA, compareB)
Error: unexpected ',' in "headers <- (headerA,"
The second error for the other function is similar...
It looks like you're missing a call to your function h and just have an open ( instead:
headers <- h(headerA, compareA, compareB)
Results in:
df new_header
1 source AA_Vs_BB
2 matches AA_Vs_BB
3 AA_Vs_BB
4 AA_Vs_BB
...