Find Jaccard distance of tweets and cluster in Kmeans - json

This is a follow up question to a problem I've been working on for a while. I have two questions. One regards an algorithm that works on two tweets, that I revised to measure 10 tweets. I'm wondering what my revision is measuring. I get result, but I want it to measure several tweet's jaccard distances, not just return one value. Since it's returning one value, I think it's just adding everything up. The other question is about my attempt to create a For Loop and assign clusters.
I'm trying to find the Jaccard distance between a dataset of tweets, then cluster those tweets with the Kmeans algorithm.
This is where I'm retrieving the data from:
http://www3.nd.edu/~dwang5/courses/spring15/assignments/A2/Tweets.json
What I have so far is this
install.packages("rjson")
library("rjson")
#download JSON File and put into a dataframe
download.file("http://www3.nd.edu/~dwang5/courses/spring15/assignments/A2/Tweets.json", tf<-tempfile());library(jsonlite);json_alldata <- fromJSON(sprintf("[%s]", paste(readLines(file(tf)),collapse=",")))
# get rid of geo column
tweet.features = json_alldata
tweet.features$geo <- NULL
# *Works. Compares two tweets and measures Jaccard Distance
tweetText <- list(tweet1 = tweet.features$text[1]:tweet.features$text[2])
jaccard_i <- function(tw1, tw2){
tw1 <- unlist(strsplit(tw1, " |\\."))
tw2 <- unlist(strsplit(tw2, " |\\."))
i <- length(intersect(tw1, tw2))
u <- length(union(tw1, tw2))
list(i=i, u=u, j=i/u)
}
jaccard_i(tweetText[[1]], tweetText[[2]])
All of that measures the jaccard distance of two specified tweets. Which is great.
But now I'm trying to modify to compare the distances between several tweets. This time 10 random tweets which I retrieved from the Sample command in R.
# Generates two sets of 5 random tweets
tweetText <- list(sample(tweet.features$text, replace = FALSE, size = 5), sample(tweet.features$text, replace = FALSE, size = 5))
jaccard_i <- function(tw1, tw2){
tw1 <- unlist(strsplit(tw1, " |\\."))
tw2 <- unlist(strsplit(tw2, " |\\."))
i <- length(intersect(tw1, tw2))
u <- length(union(tw1, tw2))
list(i=i, u=u, j=i/u)
}
jaccard_i(tweetText[[1]], tweetText[[2]])
This gives me results, but it can't be correct.
I'm trying to build an algorithm that can measure all the tweets, compare their jaccard distance, then cluster based on the Jaccard distance with Kmeans.
So for another attempt, I thought to make a For Loop.
I decided to make 10 cluster centers with the 10 random tweets
c <- sample(tweet.features$text, replace = FALSE, size = 10)
Now I did a For Loop hoping to measure the tweets which I figured I could assign to an array and cluster
#Algorithm attempt
for(i in tweet.features$text){
for (j in c){
i <- length(intersect(i, j))
u <- length(union(i, j))
j = i/u
}
#assign(my.array)
}
I don't believe that's doing anything useful, but it's an attempt to create a loop to measure the Jaccard distance.
I'm sorry that this is a loaded question. Any help would be appreciated as I'm a bit lost.

In your first function, you are doing unlist of your lists of words in your tweets, thus in tw1 and tw2 have the global lists of words and you cannot use them for your tweet-by-tweet Jaccard. You could accomplish it with a removing the unlist, then tw1 and tw2 are lists of lists of terms, and you can compare them using mapply. Something as follows.
jaccard_i <- function(tw1, tw2){
tw1 <- strsplit(tw1, " |\\.")
tw2 <- strsplit(tw2, " |\\.")
i <- mapply(function(tw1, tw2) {
length(intersect(tw1, tw2))
}, tw1=tw1, tw2=tw2)
u <- mapply(function(tw1, tw2) {
length(union(tw1, tw2))
}, tw1=tw1, tw2=tw2)
list(i=i, u=u, j=i/u)
}
Silly example:
> tw1 = c("we yes you no", "we are the people")
> tw2= c("we are the people", "we yes you no")
> tweetText = list(tw1, tw2)
> jaccard_i(tweetText[[1]], tweetText[[2]])
$i
[1] 1 1
$u
[1] 7 7
$j
[1] 0.1428571 0.1428571
As for the second part of your question, the double loop, an easy way to start addressing it would be like this,
tw = c("we yes you no", "we are the people")
lapply(tw, function(tweet1) {
lapply(tw, function(tweet2) {
jaccard_i(tweet1, tweet2)
})
})
With results that look like this,
[[1]]
[[1]][[1]]
[[1]][[1]]$i
[1] 4
[[1]][[1]]$u
[1] 4
[[1]][[1]]$j
[1] 1
[[1]][[2]]
[[1]][[2]]$i
[1] 1
[[1]][[2]]$u
[1] 7
[[1]][[2]]$j
[1] 0.1428571
[[2]]
[[2]][[1]]
[[2]][[1]]$i
[1] 1
[[2]][[1]]$u
[1] 7
[[2]][[1]]$j
[1] 0.1428571
[[2]][[2]]
[[2]][[2]]$i
[1] 4
[[2]][[2]]$u
[1] 4
[[2]][[2]]$j
[1] 1
where you should skip the values in the diagonals of course -just a starting point as I said.
Hope it helps.

Related

Loop Returning Indices Instead of Actual Results?

I have a list of url's in R. These are stored in a list object called "url".
I also have this loop here that I am using for webscraping. In this loop, if a specific url takes longer than 1 second to finish the webscraping, the loop automatically moves to the next url:
library(R.utils)
res <- vector("list", length = 10L)
for (i in 1:10) {
print(i)
tryCatch({
res[[i]] <- withTimeout({
url_i = url[i]
r_i <- data.frame(fromJSON(url_i))
res[[i]] <- r_i
print(i)
}, timeout = 1)
}, TimeoutException = function(ex) {
message("Timeout. Skipping.")
res[[i]] <- NULL
})
}
This loop runs, but the problem is that it seems like the loop is only returning the "indices" instead of the actual webscraping results. For example:
> res
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
Have I written this loop wrong? And what can I do to fix this?

rbind fromJSON page: duplicate rowname error

I was trying to rbind some json data scraped from api
library(jsonlite)
pop_dat <- data.frame()
for (i in 1:3) {
# Generate url for each page
url <- paste0('http://api.worldbank.org/v2/countries/all/indicators/SP.POP.TOTL?format=json&page=',i)
# Get json data from each page and transform it into dataframe
dat <- as.data.frame(fromJSON(url)[2],flatten = TRUE, row.names = NULL)
pop_dat <- rbind(pop_dat, dat)
}
However, it returns the following error:
Error in row.names<-.data.frame(*tmp*, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘1’, ‘10’, ‘11’, ‘12’, ‘13’, ‘14’, ‘15’, ‘16’, ‘17’, ‘18’, ‘19’, ‘2’, ‘20’, ‘21’, ‘22’, ‘23’, ‘24’, ‘25’, ‘26’, ‘27’, ‘28’, ‘29’, ‘3’, ‘30’, ‘31’, ‘32’, ‘33’, ‘34’, ‘35’, ‘36’, ‘37’, ‘38’, ‘39’, ‘4’, ‘40’, ‘41’, ‘42’, ‘43’, ‘44’, ‘45’, ‘46’, ‘47’, ‘48’, ‘49’, ‘5’, ‘50’, ‘6’, ‘7’, ‘8’, ‘9’
Changing the row.names to null doesn't work. I heard from someone it is due to the fact that some data are stored as lists here, which I don't quite understand.
I understand that there is an alternative package WDI to access this data and it works well, but I want to know how to resolve the duplicates row name problem here in general so that I can deal with similar situation where no alternative package is available.
I heard from someone it is due to the fact that some data are stored as lists...
This is correct. The solution is fairly simple, but I find it really easy to get tripped up by this. Right now you're using:
dat <- as.data.frame(fromJSON(url)[2],flatten = TRUE, row.names = NULL)
The problem comes from fromJSON(url)[2]. This should be fromJSON(url)[[2]] instead. According to the documentation, the key difference between [ and [[ is a single bracket can select multiple elements whereas [[ selects only one.
You can see how this works with some fake data.
foo <- list(
a = rnorm(100),
b = rnorm(100),
c = rnorm(100)
)
With [, you can select multiple values inside this list.
foo[c("a", "b")]
length(foo["a"]) # Result is 1 not 100 like you might expect.
With [[ the results are different.
foo[[c("a", "b")]] # Raises a subscript error.
foo[["a"]] #This works.
length(foo[["a"]]) # Result is 100.
So, your answer will depend on which subset operator you're using. For your problem, you'll want to use [[ to select a single data.frame inside of the list. Then, you should be able to use rbind correctly.
final <- data.frame()
for (i in 1:10) {
url <- paste0(
'http://api.worldbank.org/v2/countries/all/indicators/SP.POP.TOTL?format=json&page=',
i
)
res <- jsonlite::fromJSON(url, flatten = TRUE)[[2]]
final <- rbind(final, res)
}
Alternative solution with lapply:
urls <- sprintf(
'http://api.worldbank.org/v2/countries/all/indicators/SP.POP.TOTL?format=json&page=%s',
1:10
)
resl <- lapply(urls, jsonlite::fromJSON, flatten = TRUE)
resl <- lapply(resl, "[[", 2) # Use lapply to select the 2 element from each list element.
resl <- do.call(rbind, resl) # This takes all the elements of the list and uses those elements as the arguments for rbind.

Flatten deep nested json in R

I am trying to use R to convert a nested JSON file into a two dimensional dataframe.
My JSON file has a nested structure. But, the names and properties are the same across levels.
{"name":"A", "value":"1", "c":
[{"name":"a1", "value":"11", "c":
[{"name":"a11", "value":"111"},
{"name":"a12", "value":"112"}]
},
{"name":"a2", "value":"12"}]
}
The desired dataset would look like this. Although the exact column names can be different.
name value c__name c_value c_c_name c_c_value
A 1 a1 11 a11 111
A 1 a1 11 a12 112
A 1 a2 12
The code I have so far flattens the data, but it only seems to work for the first level (see the screenshot of the output).
library(jsonlite)
json_file <- ' {"name":"A", "value":"1", "c":
[{"name":"a1", "value":"11", "c":
[{"name":"a11", "value":"111"},
{"name":"a12", "value":"112"}]
},
{"name":"a2", "value":"12"}]
}'
data <- fromJSON(json_file, flatten = TRUE)
View(data)
I tried multiple packages, including jsonlite and RJSONIO, I spent the last 5 hours 5 hours debugging this and trying various online tutorial, but without success. Thanks for your help!
Firstly, that is some ugly JSON; if you have a way of avoiding it, do so. Consequently, what follows is also pretty ugly—to the degree that I normally wouldn't post it, but I am doing so now in the hope that some of the approaches may be of use. If it offends your eyes, let me know and I'll delete it.
library(jsonlite) # for fromJSON
library(reshape2) # for melt
library(dplyr) # for inner_join, select
jlist <- fromJSON(json_file)
jdf <- as.data.frame(jlist)
jdf$c.value <- as.numeric(jdf$c.value) # fix type
jdf$L1 <- as.integer(factor(jdf$c.name)) # for use as a key with an artifact of melt later *urg, sorry*
ccdf <- melt(jdf$c.c) # get nested list into usable form
names(ccdf)[1:2] <- c('c.c.name', 'c.c.value') # fix names so they won't cause problems with the join
df3 <- inner_join(jdf[, -5], ccdf) # join, take out nested column
df3$c.c.value <- as.numeric(df3$c.c.value) # fix type
df3 <- df3 %>% select(-L1, -c) # get rid of useless columns
which leaves you with
> df3
name value c.name c.value c.c.name c.c.value
1 A 1 a1 11 a11 111
2 A 1 a1 11 a12 112
3 A 1 a2 12 <NA> NA
with reasonably sensible types. The packages used are avoidable, if you like.
Is this scalable? Well, not really, without more of the same mess. If anybody else has a less nasty and more scalable approach for dealing with nasty JSON, please post it; I'd be as grateful as the OP.
I think I figured out a way to do this. It seems to work with larger trees. The idea is to unlist the JSON and use the names attribute of the unlisted elements. In this example, if a node has one parent, the name attribute will start with "c.", if it has a parent and a "grandparent", it will list it as "c.c."...etc. So, the code below uses this structure to find the level of nesting and placing the node in the appropriate columns. The rest of the code adds the attributes of the parent nodes and deletes extra rows generated. I know it is not elegant, but I thought it might be useful for others.
library(stringr)
library(jsonlite)
json_file <- ' {"name":"A", "value":"1", "c":
[{"name":"a1", "value":"11", "c":
[{"name":"a11", "value":"111"},
{"name":"a12", "value":"112"}]
},
{"name":"a2", "value":"12"}]
}'
nestedjson <- fromJSON(json_file, simplifyVector = F) #read the json
nAttrPerNode <- 2 #number of attributes per node
strChild <- "c." #determines level of nesting
unnestedjson <- unlist(nestedjson) #convert JSON to unlist
unnestednames <- attr(unnestedjson, "names") #get the names of the cells
depthTree <- (max(str_count(unnestednames, strChild)) + 1) * nAttrPerNode #maximum tree depth
htTree <- length(unnestednames) / nAttrPerNode #maximum tree height (number of branches)
X <- array("", c(htTree, depthTree))
for (nodeht in 1:htTree){ #iterate through the branches and place the nodes based on the count of strChild in the name attribute
nodeIndex <- nodeht * nAttrPerNode
nodedepth <- str_count(unnestednames[nodeIndex], strChild) + 1
X[nodeht, nodedepth * nAttrPerNode - 1] <- unnestedjson[nodeIndex - 1]
X[nodeht, nodedepth * nAttrPerNode] <- unnestedjson[nodeIndex]
}
for (nodeht in 2:htTree){ #repeat the parent node attributes for the children
nodedepth <- 0
repeat{
nodedepth <- nodedepth + 1
startcol <- nodedepth * nAttrPerNode - 1
endcol <- startcol + nAttrPerNode - 1
if (X[nodeht, startcol] == "" & nodedepth < depthTree/nAttrPerNode){
X[nodeht, startcol:endcol] <- X[nodeht-1, startcol:endcol]
} else {
break()
}
}
}
deleteRows <- NULL #Finally delete the rows that only have the parent attributes for nodes that have children
strBranches <- apply(X, 1, paste, collapse="")
for (nodeht in 1:(htTree-1)){
branch2sub <- substr(strBranches[nodeht+1], 1, nchar(strBranches[nodeht]))
if (strBranches[nodeht]==branch2sub){
deleteRows <- c(deleteRows, nodeht)
}
}
deleteRows
X <- X[-deleteRows,]

non-conformable arguments error from lmer when trying to extract information from the model matrix

I have some longitudinal data from which I'd like to get the predicted means at specified times. The model includes 2 terms, their interaction and a spline term for the time variable. When I try to obtain the predicted means, I get "Error in mm %*% fixef(m4) : non-conformable arguments"
I've used the sleep data set from lmer to illustrate my problem. First, I import the data and create a variable "age" for my interaction
sleep <- as.data.frame(sleepstudy) #get the sleep data
# create fake variable for age with 3 levels
set.seed(1234567)
sleep$age <- as.factor(sample(1:3,length(sleep),rep=TRUE))
Then I run my lmer model
library(lme4)
library(splines)
m4 <- lmer(Reaction ~ Days + ns(Days, df=4) + age + Days:age + (Days | Subject), sleep)
Finally, I create the data and matrix needed to obtain predicted means
#new data frame for predicted means
d <- c(0:9) # make a vector of days = 0 to 9 to obtain predictions for each day
newdat <- as.data.frame(cbind(Days=d, age=rep(c(1:3),length(d))))
newdat$Days <- as.numeric(as.character(newdat$Days))
newdat$age <- as.factor(newdat$age)
# create a matrix
mm<-model.matrix(~Days + ns(Days, df=4) + age + Days:age, newdat)
newdat$pred<-mm%*%fixef(m4)
It's at this point that I get the error:
Error in mm %*% fixef(m4) : non-conformable arguments
I can use predict to get the means
newdat$pred <- predict(m4, newdata=newdat, re.form=NA)
which works fine, but I want to be able to calculate a confidence interval, so I need a conformable matrix.
I read somewhere that the problem may be that lmer creates aliases (I can't find that post). This comment was made with regards to not being able to use effect() for a similar task. I couldn't quite understand how to overcome this problem. Moreover, I recall that post was a little old and hoped the alias problem may no longer be relevant.
If anyone has a suggestion for what I may be doing wrong, I'd appreciate the feedback. Thanks.
There are a couple of things here.
you need to drop columns to make your model matrix commensurate with the fixed effect vector that was actually fitted (i.e., commensurate with the model matrix that was actually used for fitting, after dropping collinear columns)
for additional confusion, you happened to only sample ages 2 and 3 (out of a possible {1,2,3})
I've cleaned up the code a little bit ...
library("lme4")
library("splines")
sleep <- sleepstudy #get the sleep data
set.seed(1234567)
## next line happens to sample only 2 and 3 ...
sleep$age <- as.factor(sample(1:3,length(sleep),rep=TRUE))
length(levels(sleep$age)) ## 2
Fit model:
m4 <- lmer(Reaction ~ Days + ns(Days, df=4) +
age + Days:age + (Days | Subject), sleep)
## message; fixed-effect model matrix is
## rank deficient so dropping 1 column / coefficient
Check fixed effects:
f1 <- fixef(m4)
length(f1) ## 7
f2 <- fixef(m4,add.dropped=TRUE)
length(f2) ## 8
We could use this extended version of the fixed effects (which has an NA value in it), but this would just mess us up by propagating NA values through the computation ...
Check model matrix:
X <- getME(m4,"X")
ncol(X) ## 7
(which.dropped <- attr(getME(m4,"X"),"col.dropped"))
## ns(Days, df = 4)4
## 6
New data frame for predicted means
d <- 0:9
## best to use data.frame() directly, avoid cbind()
## generate age based on *actual* levels in data
newdat <- data.frame(Days=d,
age=factor(rep(levels(sleep$age),length(d))))
Create a matrix:
mm <- model.matrix(formula(m4,fixed.only=TRUE)[-2], newdat)
mm <- mm[,-which.dropped] ## drop redundant columns
## newdat$pred <- mm%*%fixef(m4) ## works now
Added by sianagh: Code to obtain confidence intervals and plot the data:
predFun <- function(x) predict(x,newdata=newdat,re.form=NA)
newdat$pred <- predFun(m4)
bb <- bootMer(m4,
FUN=predFun,
nsim=200)
## nb. this produces an error message on its first run,
## but not on subsequent runs (using the development version of lme4)
bb_ci <- as.data.frame(t(apply(bb$t,2,quantile,c(0.025,0.975))))
names(bb_ci) <- c("lwr","upr")
newdat <- cbind(newdat,bb_ci)
Plot:
plot(Reaction~Days,sleep)
with(newdat,
matlines(Days,cbind(pred,lwr,upr),
col=c("red","green","green"),
lty=2,
lwd=c(3,2,2)))
The error is caused due to the drift component, if you put
allowdrift=FALSE
into your auto.arima prediction it will be fixed.

Substring in Data Frame R

I have data from GPS log like this : (this data in rows of data frame columns)
{"mAccuracy":20.0,"mAltitude":0.0,"mBearing":0.0,"mElapsedRealtimeNanos":21677339000000,"mExtras":{"networkLocationSource":"cached","networkLocationType":"wifi","noGPSLocation":{"mAccuracy":20.0,"mAltitude":0.0,"mBearing":0.0,"mElapsedRealtimeNanos":21677339000000,"mHasAccuracy":true,"mHasAltitude":false,"mHasBearing":false,"mHasSpeed":false,"mIsFromMockProvider":false,"mLatitude":35.1811956,"mLongitude":126.9104909,"mProvider":"network","mSpeed":0.0,"mTime":1402801381486},"travelState":"stationary"},"mHasAccuracy":true,"mHasAltitude":false,"mHasBearing":false,"mHasSpeed":false,"mIsFromMockProvider":false,"mLatitude":35.1811956,"mLongitude":126.9104909,"mProvider":"network","mSpeed":0.0,"mTime":1402801381486,"timestamp":1402801665.512}
The problem is I only need Latitude and longitude value, so I think i can use substring and sappy for applying to all data in dataframe.
But I am not sure this way is handsome because when i use substring ex: substr("abcdef", 2, 4) so I need to count who many chars from beginning until "mLatitude" , so anybody can give suggestion the fast way to processing it?
Thank you to #mnel for answering question, it's work , but i still have problem
From mnel answer I've created function like this :
fgps <- function(x) {
out <- fromJSON(x)
c(out$mExtras$noGPSLocation$mLatitude,
out$mExtras$noGPSLocation$mLongitude)
}
and then this is my data :
gpsdata <- head(dfallgps[,4],2)
[1] "{\"mAccuracy\":23.128,\"mAltitude\":0.0,\"mBearing\":0.0,\"mElapsedRealtimeNanos\":76437488000000,\"mExtras\":{\"networkLocationSource\":\"cached\",\"networkLocationType\":\"wifi\",\"noGPSLocation\":{\"mAccuracy\":23.128,\"mAltitude\":0.0,\"mBearing\":0.0,\"mElapsedRealtimeNanos\":76437488000000,\"mHasAccuracy\":true,\"mHasAltitude\":false,\"mHasBearing\":false,\"mHasSpeed\":false,\"mIsFromMockProvider\":false,\"mLatitude\":35.1779956,\"mLongitude\":126.9089661,\"mProvider\":\"network\",\"mSpeed\":0.0,\"mTime\":1402894224187},\"travelState\":\"stationary\"},\"mHasAccuracy\":true,\"mHasAltitude\":false,\"mHasBearing\":false,\"mHasSpeed\":false,\"mIsFromMockProvider\":false,\"mLatitude\":35.1779956,\"mLongitude\":126.9089661,\"mProvider\":\"network\",\"mSpeed\":0.0,\"mTime\":1402894224187,\"timestamp\":1402894517.425}"
[2] "{\"mAccuracy\":1625.0,\"mAltitude\":0.0,\"mBearing\":0.0,\"mElapsedRealtimeNanos\":77069916000000,\"mExtras\":{\"networkLocationSource\":\"cached\",\"networkLocationType\":\"cell\",\"noGPSLocation\":{\"mAccuracy\":1625.0,\"mAltitude\":0.0,\"mBearing\":0.0,\"mElapsedRealtimeNanos\":77069916000000,\"mHasAccuracy\":true,\"mHasAltitude\":false,\"mHasBearing\":false,\"mHasSpeed\":false,\"mIsFromMockProvider\":false,\"mLatitude\":35.1811881,\"mLongitude\":126.9084072,\"mProvider\":\"network\",\"mSpeed\":0.0,\"mTime\":1402894857416},\"travelState\":\"stationary\"},\"mHasAccuracy\":true,\"mHasAltitude\":false,\"mHasBearing\":false,\"mHasSpeed\":false,\"mIsFromMockProvider\":false,\"mLatitude\":35.1811881,\"mLongitude\":126.9084072,\"mProvider\":\"network\",\"mSpeed\":0.0,\"mTime\":1402894857416,\"timestamp\":1402894857.519}"
When run sapply why the data still shows in the result not just the results values.
sapply(gpsdata, function(gpsdata) fgps(gpsdata))
{"mAccuracy":23.128,"mAltitude":0.0,"mBearing":0.0,"mElapsedRealtimeNanos":76437488000000,"mExtras":{"networkLocationSource":"cached","networkLocationType":"wifi","noGPSLocation":{"mAccuracy":23.128,"mAltitude":0.0,"mBearing":0.0,"mElapsedRealtimeNanos":76437488000000,"mHasAccuracy":true,"mHasAltitude":false,"mHasBearing":false,"mHasSpeed":false,"mIsFromMockProvider":false,"mLatitude":35.1779956,"mLongitude":126.9089661,"mProvider":"network","mSpeed":0.0,"mTime":1402894224187},"travelState":"stationary"},"mHasAccuracy":true,"mHasAltitude":false,"mHasBearing":false,"mHasSpeed":false,"mIsFromMockProvider":false,"mLatitude":35.1779956,"mLongitude":126.9089661,"mProvider":"network","mSpeed":0.0,"mTime":1402894224187,"timestamp":1402894517.425}
[1,] 35.178
[2,] 126.909
{"mAccuracy":1625.0,"mAltitude":0.0,"mBearing":0.0,"mElapsedRealtimeNanos":77069916000000,"mExtras":{"networkLocationSource":"cached","networkLocationType":"cell","noGPSLocation":{"mAccuracy":1625.0,"mAltitude":0.0,"mBearing":0.0,"mElapsedRealtimeNanos":77069916000000,"mHasAccuracy":true,"mHasAltitude":false,"mHasBearing":false,"mHasSpeed":false,"mIsFromMockProvider":false,"mLatitude":35.1811881,"mLongitude":126.9084072,"mProvider":"network","mSpeed":0.0,"mTime":1402894857416},"travelState":"stationary"},"mHasAccuracy":true,"mHasAltitude":false,"mHasBearing":false,"mHasSpeed":false,"mIsFromMockProvider":false,"mLatitude":35.1811881,"mLongitude":126.9084072,"mProvider":"network","mSpeed":0.0,"mTime":1402894857416,"timestamp":1402894857.519}
[1,] 35.18119
[2,] 126.90841
I want the result looks like :
[1] 35.178 126.909
[2] 35.18119 126.90841
Thank you
It would appear that your data is in JSON format. Therefore, use a RJSONIO::fromJSON to read the file.
E.g.:
txt <- "{\"mAccuracy\":20.0,\"mAltitude\":0.0,\"mBearing\":0.0,\"mElapsedRealtimeNanos\":21677339000000,\"mExtras\":{\"networkLocationSource\":\"cached\",\"networkLocationType\":\"wifi\",\"noGPSLocation\":{\"mAccuracy\":20.0,\"mAltitude\":0.0,\"mBearing\":0.0,\"mElapsedRealtimeNanos\":21677339000000,\"mHasAccuracy\":true,\"mHasAltitude\":false,\"mHasBearing\":false,\"mHasSpeed\":false,\"mIsFromMockProvider\":false,\"mLatitude\":35.1811956,\"mLongitude\":126.9104909,\"mProvider\":\"network\",\"mSpeed\":0.0,\"mTime\":1402801381486},\"travelState\":\"stationary\"},\"mHasAccuracy\":true,\"mHasAltitude\":false,\"mHasBearing\":false,\"mHasSpeed\":false,\"mIsFromMockProvider\":false,\"mLatitude\":35.1811956,\"mLongitude\":126.9104909,\"mProvider\":\"network\",\"mSpeed\":0.0,\"mTime\":1402801381486,\"timestamp\":1402801665.512}"
Then process:
library(RJSONIO)
out <- fromJSON(txt)
out$$mLongitude
#[1] 126.9105
out$mLatitude
#[1] 35.1812
# to process multiple values
tt <- rep(txt,2)
myData <- lapply(tt, fromJSON)
latlong <- do.call(rbind,lapply(myData, `[` ,c('mLatitude','mLongitude')))
# or using rbind list
library(data.table)
latlong <- rbindlist(lapply(myData, `[` ,c('mLatitude','mLongitude')))