ggplot2 is not printing all the information I need in R - json

I am trying to replicate the following script: San Francisco Crime Classification
here is my code:
library(dplyr)
library(ggmap)
library(ggplot2)
library(readr)
library(rjson)
library(RCurl)
library(RJSONIO)
library(jsonlite)
train=jsonlite::fromJSON("/home/felipe/Templates/Archivo de prueba/databritanica.json")
counts <- summarise(group_by(train, Crime_type), Counts=length(Crime_type))
#counts <- counts[order(-counts$Crime_type),]
# This removes the "Other Offenses" category
top12 <- train[train$Crime_type %in% counts$Crime_type[c(1,3:13)],]
map<-get_map(location=c(lon = -2.747770, lat = 53.389499) ,zoom=12,source="osm")
p <- ggmap(map) +
geom_point(data=top12, aes(x=Longitude, y=Latitude, color=factor(Crime_type)), alpha=0.05) +
guides(colour = guide_legend(override.aes = list(alpha=1.0, size=6.0),
title="Type of Crime")) +
scale_colour_brewer(type="qual",palette="Paired") +
ggtitle("Top Crimes in Britain") +
theme_light(base_size=20) +
theme(axis.line=element_blank(),
axis.text.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank())
ggsave("united kingdom_top_crimes_map.png", p, width=14, height=10, units="in")
I am reading the data from a JSON file and try to print points over the map according to the data. Each point is a type of crime that is have been committed, the location of each point depends of two parameters: longitude and latitude.
What is the problem? the points are not being printing. The script generate a new map without the points that is suppose to show.
This is the original map:
And this is the result:
Any ideas??
This a example of the data contain in the JSON file is:
[
{"Month":"2014-05","Longitude":-2.747770,"Latitude":53.389499,"Location":"On or near Cronton Road","LSOA_name":"Halton 001B","Crime_type":"Other theft"},
{"Month":"2014-05","Longitude":-2.799099,"Latitude":53.354676,"Location":"On or near Old Higher Road","LSOA_name":"Halton 008B","Crime_type":"Anti-social behaviour"},
{"Month":"2014-05","Longitude":-2.804451,"Latitude":53.352456,"Location":"On or near Higher Road","LSOA_name":"Halton 008B","Crime_type":"Anti-social behaviour"}
]

Short answer:
Your alpha = 0.05 is making the points practically invisible when plotted on the colorful map background, as mentioned by #aosmith.
Longer answer:
I suggest the following changes to your geom_point:
Increase the alpha to something more reasonable
Increase the size of the points
Optionally, change the shape to one with a background and fill for better visibility
This will require you to change the fill parameter in aes, as well as the scale_color_brewer to scale_fill_brewer
Example:
# Load required packages
library(dplyr)
library(ggplot2)
library(ggmap)
library(jsonlite)
# Example data provided in question, with one manually entered entry with
# Crime_type = "Other Offenses"
'[
{"Month":"2014-05","Longitude":-2.747770,"Latitude":53.389499,"Location":"On or near Cronton Road","LSOA_name":"Halton 001B","Crime_type":"Other theft"},
{"Month":"2014-05","Longitude":-2.799099,"Latitude":53.354676,"Location":"On or near Old Higher Road","LSOA_name":"Halton 008B","Crime_type":"Anti-social behaviour"},
{"Month":"2014-05","Longitude":-2.804451,"Latitude":53.352456,"Location":"On or near Higher Road","LSOA_name":"Halton 008B","Crime_type":"Anti-social behaviour"},
{"Month":"2014-05","Longitude":-2.81,"Latitude":53.36,"Location":"On or near Higher Road","LSOA_name":"Halton 008B","Crime_type":"Other Offenses"}
]' -> example_json
train <- fromJSON(example_json)
# Process the data, the dplyr way
counts <- train %>%
group_by(Crime_type) %>%
summarise(Counts = length(Crime_type))
# This removes the "Other Offenses" category
top12 <- train %>%
filter(Crime_type != "Other Offenses")
# Get the map
map <- get_map(location=c(lon = -2.747770, lat = 53.389499), zoom=12, source="osm")
# Plotting code
p <- ggmap(map) +
# Changes made to geom_point.
# I increased the alpha and size, and I used a shape that has
# a black border and a fill determined by Crime_type.
geom_point(data=top12, aes(x=Longitude, y=Latitude, fill=factor(Crime_type)),
shape = 21, alpha = 0.75, size = 3.5, color = "black") +
guides(fill = guide_legend(override.aes = list(alpha=1.0, size=6.0),
title="Type of Crime")) +
# Changed scale_color_brewer to scale_fill_brewer
scale_fill_brewer(type="qual", palette="Paired") +
ggtitle("Top Crimes in Britain") +
theme_light(base_size=20) +
theme(axis.line=element_blank(),
axis.text.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank())

Related

Extracting data from NetCDF

I have downloaded the sea surface temperature for January from here https://oceancolor.gsfc.nasa.gov/l3/
and imported it into R.
I know how to crop using extent(ymax, ymin, xmax,xmin) but I can't figure out how to do it just for one station (53.9S, 174,1W) or the nearest one to that coordinate. Is there a way I can crop the data just for one station?
val <- extract(174.1,53.9)
Error in .local(x, y, ...) : extents do not overlap
SST_Jan <- brick("~https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/A20021822018212.L3m_MC_SST_sst_9km.nc", stopIfNotEqualSpaced = FALSE, varname = "sst")
print(SST_Jan)
val<-extract(174.1, 53.9)
SST_Jan_station <- extract(SST_Jan, val)
I would like to be able to plot the changes in SST at that particular location over the 12 months
Thank you,
The extract function doesn't work with a numeric vector.
You can put the coordinates in a matrix -
pnt = matrix(c(174.1, 53.9), ncol = 2)
pnt
## [,1] [,2]
## [1,] 174.1 53.9
And then extract will work -
extract(SST_Jan, pnt)
## layer
## [1,] 8.24

How to extract an adjacency matrix of a giant component of a graph using R?

I would like to extract an adjacency matrix of a giant component of a graph using R.
For example, I can create Erdos-Renyi g(n,p)
n = 100
p = 1.5/n
g = erdos.renyi.game(n, p)
coords = layout.fruchterman.reingold(g)
plot(g, layout=coords, vertex.size = 3, vertex.label=NA)
# Get the components of an undirected graph
cl = clusters(g)
# How many components?
cl$no
# How big are these (the first row is size, the second is the number of components of that size)?
table(cl$csize)
cl$membership
# Get the giant component
nodes = which(cl$membership == which.max(cl$csize))
# Color in red the nodes in the giant component and in sky blue the rest
V(g)$color = "SkyBlue2"
V(g)[nodes]$color = "red"
plot(g, layout=coords, vertex.size = 3, vertex.label=NA)
here, I only want to extract the adjacency matrix of those red nodes.
enter image description here
It's easy to get the giant component as a new graph like below and then get the adjacency matrix.
g <- erdos.renyi.game(100, .015, directed = TRUE)
# if you have directed graph, decide if you want
# strongly or weakly connected components
co <- components(g, mode = 'STRONG')
gi <- induced.subgraph(g, which(co$membership == which.max(co$csize)))
# if you want here you can decide if you want values only
# in the upper or lower triangle or both
ad <- get.adjacency(gi)
But you might want to keep the vertex IDs of the original graph. In this case just subset the adjacency matrix:
g <- erdos.renyi.game(100, .015)
co <- components(g)
gi_vids <- which(co$membership == which.max(co$csize))
gi_ad <- get.adjacency(g)[gi_vids, gi_vids]
# you can even add the names of the nodes
# as row and column names.
# generating dummy node names:
V(g)$name <- sapply(
seq(vcount(g)),
function(i){
paste(letters[ceiling(runif(5) * 26)], collapse = '')
}
)
rownames(gi_ad) <- V(g)$name[gi_vids]
colnames(gi_ad) <- V(g)$name[gi_vids]

Post local image into a popup using leaflet and R

I've been trying like crazy to add local images (as in image files in my computer) into my leaflet map using R. I have plotted around 500 coordinates analyzing some images and I wish to show that specific image when clicking (popup).
leaflet(pics) %>%
addTiles() %>%
addCircleMarkers(
fillOpacity = 0.8, radius = 5,
lng = ~GPSLongitude, lat =~GPSLatitude,
color = ~pal(Married),
popup = ~SourceFile, # WISH TO ADD EMBEDDED LOCAL IMAGE IN HERE
label = mapply(function(x, y) {
HTML(sprintf("<em>%s</em></br> %s", htmlEscape(x), htmlEscape(y)))},
pics$Address, pics$DateTimeOriginal, SIMPLIFY = F),
labelOptions = lapply(1:nrow(pics), function(x) {
labelOptions(direction='auto')
}))
I am attaching 2 screenshots: one hovering the mouse and the other one clicking on a specific place. Ideally, I'd wish to show the image and the image file name when I click on each one. Is that possible?
I can also show you an RPub with the example: http://rpubs.com/laresbernardo/photomap
Hope you can help me. Thanks!
_________________________ UPDATE _________________________
All code used for this example. Basically I scan for all images with geotags, bring the address to add on a label and then plot all coordinates. When I click on a coordinate I wish to see that picture.
wd <- "/Users/bernardo/Dropbox (Personal)/Documentos/R/R Mapping/GPS Photos"
# ------------------------------------------- get the pics with geotags
library(exifr)
library(dplyr)
library(lubridate)
library(beepr)
library(maps)
time <- Sys.time(); print(time)
setwd("/Users/bernardo/Dropbox (Personal)/Imágenes")
files <- list.files(pattern = "*.jpg|*.JPG|*.png|*.PNG", recursive=T)
exif <- read_exif(files, tags = c("SourceFile", "DateTimeOriginal", "GPSLongitude", "GPSLatitude"))
pics <- exif %>% filter(!is.na(GPSLongitude)) %>%
mutate(DateTimeOriginal = ymd_hms(DateTimeOriginal))
pics$Owner <- ifelse(grepl("iPhone Maru", pics$SourceFile), "Maru", "Ber")
pics$Married <- ifelse(as.Date(pics$DateTimeOriginal) >= '2016-04-30', TRUE, FALSE)
pics$Country <- maps::map.where(database="world", pics$GPSLongitude, pics$GPSLatitude)
#lares::freqs(pics %>% filter(!is.na(Country)), Country)
# Save pics with geotags
setwd(wd)
write.csv(pics, "with_geotags.csv", row.names = F)
print(Sys.time() - time)
beepr::beep()
# ------------------------------------------- get the addresses from files
# GET ALL ADDRESSES
library(ggmap)
options(warn=-1)
setwd(wd)
pics <- read.csv("with_geotags.csv")
addresses <- read.csv("with_address.csv")
pics_to_search <- pics %>% filter(!SourceFile %in% addresses$SourceFile)
print(paste0("Without address: ",round(100 * nrow(pics_to_search)/nrow(pics), 2),"% | ", nrow(pics_to_search)))
out <- data.frame()
for (i in 1:nrow(pics_to_search)) {
Address <- revgeocode(cbind(pics_to_search$GPSLongitude, pics_to_search$GPSLatitude)[i,], output="address")[1]
if (!is.na(Address)) {
out <- rbind(out, cbind(SourceFile=as.character(pics_to_search$SourceFile[i]), Address))
print(paste(i, Address, sep=" - "))
}
}
# Save pics with geotags
pics_with_address <- rbind(out, addresses)
write.csv(pics_with_address, "with_address.csv", row.names = F)
# ------------------------------------------- Map all coordinates with leaflet
setwd(wd)
library(leaflet)
library(htmltools)
library(mapview)
pics <- read.csv("with_geotags.csv")
address <- read.csv("with_address.csv")
pal <- colorFactor(c("green4", "navy"), domain = c(FALSE, TRUE))
pics <- left_join(pics, address, by=c("SourceFile"))
pics$Content <- paste("Dirección:","<em>", pics$Address,"</em>", "<br/> Fecha:", as.Date(pics$DateTimeOriginal))
leaflet(pics) %>%
addTiles() %>%
addCircleMarkers(
fillOpacity = 0.8, radius = 5,
lng = ~GPSLongitude, lat =~GPSLatitude,
color = ~pal(Married),
popup = popupImage(as.character(pics$SourceFile), src = "local"),
label = mapply(function(x, y) {
HTML(sprintf("<em>%s</em></br> %s", htmlEscape(x), htmlEscape(y)))},
pics$Address, pics$DateTimeOriginal, SIMPLIFY = F),
labelOptions = lapply(1:nrow(pics), function(x) {
labelOptions(direction='auto')
}))
But...
I even installed the latest version with devtools::install_github("r-spatial/mapview#develop")
With no reproducible example it is hard, but takes this for instance:
library(leaflet)
library(mapview)
# make-up dataset
data_df <- data.frame(lat = as.numeric(c("35.68705", "35.88705")), long = as.numeric(c("51.38", "53.35")))
# Loaded random pictures on my laptop
images <- c("/PathToImage1/download.jpeg",
"/PathToImage2/download1.jpeg")
leaflet(data_df) %>%
addTiles() %>%
addCircleMarkers(
fillOpacity = 0.8, radius = 5,
lng = ~long, lat =~lat,
popup = popupImage(images)
)
Click on each point to see a different image. Make sure to load your images in the same order as your data frame.
Finally after lots of hours wasted in this problem, I managed to fix the issue. Thanks to #MLavoie and #TimSalabim3 (via Twitter) for the support.
This was it: if you are running macOS, you should have installed a driver called gdal. I literally just installed it, ran the original script and it worked. Don't know what that gdal does but it really did the job!

R: Selecting certain from a JSON file

I've imported a JSON file into R from ( http://eric.clst.org/wupl/Stuff/gz_2010_us_040_00_20m.json ) and I'm trying to select only counties in Kansas.
Right now I have all the data into one variable and I'm trying to make subdata of this that is just counties of Kansas. I'm not sure how to go about this.
What you have there is geoJson, which can be read directly by library(sf), to give you an sf object, which is also data.frame. Then you can use the usual data.frame subsetting operations
library(sf)
sf <- sf::read_sf("http://eric.clst.org/wupl/Stuff/gz_2010_us_040_00_20m.json")
sf[sf$NAME == "Kansas", ]
# Simple feature collection with 1 feature and 5 fields
# geometry type: MULTIPOLYGON
# dimension: XY
# bbox: xmin: -102.0517 ymin: 36.99308 xmax: -94.58993 ymax: 40.00316
# epsg (SRID): 4326
# proj4string: +proj=longlat +datum=WGS84 +no_defs
# GEO_ID STATE NAME LSAD CENSUSAREA geometry
# 30 0400000US20 20 Kansas 81758.72 MULTIPOLYGON(((-99.541116 3...
And seeing as you want the individual counties, you need to use the counties data set
sf_counties <- sf::read_sf("http://eric.clst.org/wupl/Stuff/gz_2010_us_050_00_500k.json")
sf_counties[sf_counties$STATE == 20, ]
To stay with a JSON workflow, can try jqr
library(jqr)
url <- 'http://eric.clst.org/wupl/Stuff/gz_2010_us_040_00_20m.json'
download.file(url, (f <- tempfile(fileext = ".json")))
res <- paste0(readLines(f), collapse = " ")
out <- jq(res, '.features[] | select(.properties.NAME == "Kansas")')
can map easily like
library(leaflet)
leaflet() %>%
addTiles() %>%
addGeoJSON(out) %>%
setView(-98, 38, 6)
library(rjson)
lst=fromJSON(file = 'http://eric.clst.org/wupl/Stuff/gz_2010_us_040_00_20m.json')
index = which(sapply(lapply(lst$features,"[[",'properties'),'[[','NAME')=='Kansas')
subdata = lst$features[[index]]

How to plot a learning curve in R?

I want to plot a learning curve in my application.
A sample curve image is shown below.
Learning curve is a plot between the following Variance,
X-Axis: Number of samples (Training set size).
Y-axis: Error(RSS/J(theta)/cost function )
It helps in observing whether our model is having the high bias or high variance problem.
Is there any package in R which can help in getting this plot?
You can make such a plot using the excellent Caret package. The section on Customizing the tuning process will be very helpful.
Also, you can check out the well written blog posts on R-Bloggers by Joseph Rickert. They are titled "Why Big Data? Learning Curves" and "Learning from Learning Curves".
UPDATE
I just did a post on this question Plot learning curves with caret package and R. I think my answer will be more useful to you. For convenience sake, I have reproduced the same answer here on plotting a learning curve with R. However, I used the popular caret package to train my model and get the RMSE error for the training and test set.
# set seed for reproducibility
set.seed(7)
# randomize mtcars
mtcars <- mtcars[sample(nrow(mtcars)),]
# split iris data into training and test sets
mtcarsIndex <- createDataPartition(mtcars$mpg, p = .625, list = F)
mtcarsTrain <- mtcars[mtcarsIndex,]
mtcarsTest <- mtcars[-mtcarsIndex,]
# create empty data frame
learnCurve <- data.frame(m = integer(21),
trainRMSE = integer(21),
cvRMSE = integer(21))
# test data response feature
testY <- mtcarsTest$mpg
# Run algorithms using 10-fold cross validation with 3 repeats
trainControl <- trainControl(method="repeatedcv", number=10, repeats=3)
metric <- "RMSE"
# loop over training examples
for (i in 3:21) {
learnCurve$m[i] <- i
# train learning algorithm with size i
fit.lm <- train(mpg~., data=mtcarsTrain[1:i,], method="lm", metric=metric,
preProc=c("center", "scale"), trControl=trainControl)
learnCurve$trainRMSE[i] <- fit.lm$results$RMSE
# use trained parameters to predict on test data
prediction <- predict(fit.lm, newdata = mtcarsTest[,-1])
rmse <- postResample(prediction, testY)
learnCurve$cvRMSE[i] <- rmse[1]
}
pdf("LinearRegressionLearningCurve.pdf", width = 7, height = 7, pointsize=12)
# plot learning curves of training set size vs. error measure
# for training set and test set
plot(log(learnCurve$trainRMSE),type = "o",col = "red", xlab = "Training set size",
ylab = "Error (RMSE)", main = "Linear Model Learning Curve")
lines(log(learnCurve$cvRMSE), type = "o", col = "blue")
legend('topright', c("Train error", "Test error"), lty = c(1,1), lwd = c(2.5, 2.5),
col = c("red", "blue"))
dev.off()
The output plot is as shown below: