NetLogo - using BehaviorSpace get all turtles locations as the result of each repetition

NetLogo - using BehaviorSpace get all turtles locations as the result of each repetition - csv

I am using BehaviorSpace to run the model hundreds of times with different parameters. But I need to know the locations of all turtles as a result instead of only the number of turtles. How can I achieve it with BehaviorSpace?
Currently, I output the results in a csv file by this code:
to-report get-locations
report (list xcor ycor)
end
to generate-output
file-open "model_r_1.0_locations.csv"
file-print csv:to-row get-locations
file-close
end
but all results are popped into same csv file, so I can't tell the condition of each running.

Seth's suggestion of incorporating behaviorspace-run-number in the filename of your csv output is one alternative. It would allow you to associate that file with the summary data in your main BehaviorSpace output file.
Another option is to include list reporters as "measures" in your behavior space experiment definition. For example, in your case:
map [ t -> [ xcor ] of t ] sort turtles
map [ t -> [ ycor ] of t ] sort turtles
You can then parse the resulting list "manually" in your favourite data analysis language. I've used the following function for this before, in Julia:
parselist(strlist, T = Float64) = parse.(T, split(strlist[2:end-1]))
I'm sure you can easily write some equivalent code in Python or R or whatever language you're using.
In the example above, I've outputted separate lists for the xcor and the ycor of turtles. You could also output a single "list of lists", but the parsing would be trickier.
Edit: How to do this using the csv extension and R
Coincidentally, I had to do something similar today for a different project, and I realized that a combination of the csv extension and R can make this very easy.
The general idea is the following:
In NetLogo, use csv:to-string to encode list data into a string and then write that string directly in the BehaviorSpace output.
In R, use purrr::map and readr::read_csv, followed by tidyr::unnest, to unpack everything in a neat "one observation per row" dataframe.
In other words: we like CSV, so we put CSV in our CSV so we can parse while we parse.
Here is a full-fledged example. Let's say we have the following NetLogo model:
extensions [ csv ]
to setup
clear-all
create-turtles 2 [ move-to one-of patches ]
reset-ticks
end
to go
ask turtles [ forward 1 ]
tick
end
to-report positions
let coords [ (list who xcor ycor) ] of turtles
report csv:to-string fput ["who" "x" "y"] coords
end
We then define the following tiny BehaviorSpace experiment, with only two repetitions and a time limit of two, using our positions reporter as an output:
The R code to process this is pleasantly straightforward:
library(tidyverse)
df <- read_csv("experiment-table.csv", skip = 6) %>%
mutate(positions = map(positions, read_csv)) %>%
unnest()
Which results in the following dataframe, all neat and tidy:
> df
# A tibble: 12 x 5
`[run number]` `[step]` who x y
<int> <int> <int> <dbl> <dbl>
1 1 0 0 16 10
2 1 0 1 10 -2
3 1 1 1 9.03 -2.24
4 1 1 0 -16.0 10.1
5 1 2 1 8.06 -2.48
6 1 2 0 -15.0 10.3
7 2 0 1 -14 1
8 2 0 0 13 15
9 2 1 0 14.0 15.1
10 2 1 1 -13.7 0.0489
11 2 2 0 15.0 15.1
12 2 2 1 -13.4 -0.902
The same thing in Julia:
using CSV, DataFrames
df = CSV.read("experiment-table.csv", header = 7)
cols = filter(col -> col != :positions, names(df))
df = by(df -> CSV.read(IOBuffer(df[:positions][1])), df, cols)

Related

rjson::fromJSON returns only the first item

I have a sqlite database file with several columns. One of the columns has a JSON dictionary (with two keys) embedded in it. I want to extract the JSON column to a data frame in R that shows each key in a separate column.
I tried rjson::fromJSON, but it reads only the first item. Is there a trick that I'm missing?
Here's an example that mimics my problem:
> eg <- as.vector(c("{\"3x\": 20, \"6y\": 23}", "{\"3x\": 60, \"6y\": 50}"))
> fromJSON(eg)
$3x
[1] 20
$6y
[1] 23
The desired output is something like:
# a data frame for both variables
3x 6y
1 20 23
2 60 50
or,
# a data frame for each variable
3x
1 20
2 60
6y
1 23
2 50

What you are looking for is actually a combination of lapply and some application of rbind or related.
I'll extend your data a little, just to have more than 2 elements.
eg <- c("{\"3x\": 20, \"6y\": 23}",
"{\"3x\": 60, \"6y\": 50}",
"{\"3x\": 99, \"6y\": 72}")
library(jsonlite)
Using base R, we can do
do.call(rbind.data.frame, lapply(eg, fromJSON))
# X3x X6y
# 1 20 23
# 2 60 50
# 3 99 72
You might be tempted to do something like Reduce(rbind, lapply(eg, fromJSON)), but the notable difference is that in the Reduce model, rbind is called "N-1" times, where "N" is the number of elements in eg; this results in a LOT of copying of data, and though it might work alright with small "N", it scales horribly. With the do.call option, rbind is called exactly once.
Notice that the column labels have been R-ized, since data.frame column names should not start with numbers. (It is possible, but generally discouraged.)
If you're confident that all substrings will have exactly the same elements, then you may be good here. If there's a chance that there will be a difference at some point, perhaps
eg <- c(eg, "{\"3x\": 99}")
then you'll notice that the base R solution no longer works by default.
do.call(rbind.data.frame, lapply(eg, fromJSON))
# Error in (function (..., deparse.level = 1, make.row.names = TRUE, stringsAsFactors = default.stringsAsFactors()) :
# numbers of columns of arguments do not match
There may be techniques to try to normalize the elements such that you can be assured of matches. However, if you're not averse to a tidyverse package:
library(dplyr)
eg2 <- bind_rows(lapply(eg, fromJSON))
eg2
# # A tibble: 4 × 2
# `3x` `6y`
# <int> <int>
# 1 20 23
# 2 60 50
# 3 99 72
# 4 99 NA
though you cannot call it as directly with the dollar-method, you can still use [[ or backticks.
eg2$3x
# Error: unexpected numeric constant in "eg2$3"
eg2[["3x"]]
# [1] 20 60 99 99
eg2$`3x`
# [1] 20 60 99 99

Using to-reports while creating agents from a csv file

My question is a bit long. I really appreciate it if you could please read it all and I'd be very grateful for any piece of advice.
I have data related to 2 consumers as turtles who have rated laptops' features. The laptops have 2 kinds of features : size of the screen and battery life. Each has some levels. For example battery life has 5 hours, 12 hours, 24 hours, 30 hours. Data is stored in a csv file.
12 13.5 14 15 5 12 24 30
1 1 2 1 3 2 2 4 5
2 4 3 1 2 1 1 2 3
I want to sum the rates of 2 levels of feature. For example for consumer 1 and 2, what is:
sum rate of screen size of 13.5 + rate of battery life 24
Since 13.5 and 24 may change later and I want to know for example the sum of rates of size of 14 and battery life of 5, I defined two functions using "to-reports" . Also since for example the value "12" is in the header row twice and represents both a size and a battery life, I made 2 subsets of the csv file one for screen size , the other for battery.
12 13.5 14 15
1 1 2 1 3
2 4 3 1 2
5 12 24 30
1 2 2 4 5
2 1 1 2 3
First in the main program, the csv file is read and a turtle is assigned to each row, expecting to have two consumers.
to setup
ca
reset-ticks
file-close-all
file-open "turtle_details.csv"
let headerrow csv:from-row file-read-line
set Sc 13.5 ; at the beginning I want the rate of this screen size
set Bat 24
while [ not file-at-end? ] [
let data csv:from-row file-read-line
create-consumers 1 [
set shape "person"
set k k + 1
print obtain-Sc (Sc) + obtain-Bat (Bat)
]
]
file-close-all
end
I assumed, first, row one is read and a consumer is generated. Now it goes to to-report to find obtain-Screen(13.5) which is 2, but I thought next time that obrain_Screen is called, again the csv opens and the cursor would still be at the beginning, but I want it to read the second row. And to extend it, I might need it to go further and further. To solve this, I defined a counter k that for example the first time this condition is checked : idx = 0 < k =1 so the first row is read. Then idx = idx + 1 , so nothing else is done.
to-report obtain-Screen[Sc]
file-close-all ; close all open files
file-open "turtle_detailsSc.csv"
let headings csv:from-row file-read-line
ifelse is-number? position Sc headings
[
while [idx < k ]
[ set fdata csv:from-row file-read-line
set idx idx + 1
]
report item position Sc headings fdata
]
[report 0.000000009]
Something similar for Bat. But it does not work and has errors. Any ideas how to improve to-reports?
Thanks
Edit
Considering the data set to be like this :
size12 size13.5 size14 size15 Battery5 Battery12 Battery24 Battery30
1 1 *2* 1 3 2 2 *4* 5
2 4 3 3 2 1 1 2 3
I can now access the data set and for each consumer find their evaluation of a laptop they have purchased. For example, consumer 1 has a laptop with size of 13.5 and battery life 24.
Consumer 1 evaluation of size 13.5 = 2
Consumer 1 evaluation of battery 24 = 4
Overall evaluation of laptop = 2 + 4 = 6
I have defined a procedure "Find Eval" that when I need to know the evaluation of different consumers, it enables me to access the dataset and find the values.
To explain the data in the table more, a consumer has a laptop so can evaluate it pretty well, but for other features like how he evaluates a laptop with the screen size of 15, he might have some exposure or just by hearing from others he has filled in the table.
I want to keep these 2 consumers and monitor their attitudes about laptop features during 20 years. In year 2, consumer 1 possibly updated his system, so now his battery life is 30. This time what I need to do is to access the dataset and calculate
Consumer 1 evaluation of size 13.5 = 2
plus
Consumer 1 evaluation of battery 30 = 5
Overall evaluation of laptop = 2 + 5 = 7
This time, I need to go to find the value for battery 30.
I think when I repeat my code for 20 years, create-consumer will be repeated each time that I want to work with the dataset, so instead of keeping 2 consumers, each year, some new consumers will be created and the previous ones will be totally replaced.
The question is how I can create consumers once, but can access any data point in the data set for many times?
Thanks a lot,

In response to your comment- got it, thought that might be the case. I will present an alternative approach that may or may not suit you, but it's probably how I would tackle this. The main difference is that I would store the lists for screen and battery ratings in a turtle variable, so you can easily access them after the fact without having to track counters. Given these variable declarations:
extensions [ csv ]
globals [ screen-headings battery-headings]
turtles-own [
turtle-screen-list
turtle-battery-list
turtle-screen-eval
turtle-bat-eval
turtle-sum-eval
turtle-row-number
]
I would split headings of the csv file to the screen-headings and battery-headings globals.
Then, I would iterate through the next lines of the csv, as you did below, and split the lines in a similar way but into the turtles-own variables turtle-screen-list and turtle-battery-list. That way, each turtle is aware of its own ratings for each screen and battery so you can modify your evaluated screen or battery as needed.
Set the screen and battery of interest with screen-to-evaluate and battery-to-evaluate (or use a Chooser on the interface), then use a reporter (similar to what you had set up) that checks the position of the battery and screen of interest to return the current turtle's rating for each- this will set turtle-screen-eval and turtle-bat-eval. Finally, sum those last two values.
to setup
ca
reset-ticks
file-close-all
file-open "turtle_details.csv"
let headings csv:from-row file-read-line
set screen-headings sublist headings 0 4
set battery-headings sublist headings 4 length headings
let screen-to-evaluate 13.5
let battery-to-evaluate 24
while [ not file-at-end? ] [
let data csv:from-row file-read-line
create-turtles 1 [
set turtle-screen-list sublist data 0 4
set turtle-battery-list sublist data 4 length data
set turtle-screen-eval turtle-screen-rating screen-to-evaluate
set turtle-bat-eval turtle-battery-rating battery-to-evaluate
set turtle-sum-eval turtle-screen-eval + turtle-bat-eval
]
]
file-close-all
end
to-report turtle-screen-rating [sc]
let pos position sc screen-headings
let turt-screen-rate-value item pos turtle-screen-list
report turt-screen-rate-value
end
to-report turtle-battery-rating [bc]
let pos position bc battery-headings
let turt-bat-rate-value item pos turtle-battery-list
report turt-bat-rate-value
end
Of course, you can condense some of this as needed. I tested this setup with 4 rows (patterned after your example data) and it seemed to work ok. If you have a huge number of rows it might not work so well.
Edit:
If you end up replacing screen-to-evaluate and battery-to-evaluate with interface choosers (which I recommend- it allows you to quickly see the different values), you can update to the new values with something like:
to update-vals
ask turtles [
set turtle-screen-eval turtle-screen-rating scrn
set turtle-bat-eval turtle-battery-rating batr
set turtle-sum-eval turtle-screen-eval + turtle-bat-eval
]
end
where scrn and batr are the names of the choosers.
Or, if you want them to dynamically update, you can make an interface monitor that reports the following reporter :
to-report update
ask turtles [
set turtle-screen-eval turtle-screen-rating scrn
set turtle-bat-eval turtle-battery-rating batr
set turtle-sum-eval turtle-screen-eval + turtle-bat-eval
]
report "Dynamically updating."
end
With that one, as soon as you change the Chooser value, the turtles should immediately update their turtle-bat-eval, turtle-screen-eval, and turtle-sum-eval. Fun!
Edit 2
If your csv includes a column for row numbers like this:
row 12 13.5 14 15 5 12 24 30
1 1 2.0 1 3 2 2 4 5
2 4 3.0 1 2 1 1 2 3
I would recommend creating a turtle variable to store the row number used. I've now included one called turtle-row-number. Then, you would just have to include a line to update that variable from the first item in the list, which is the row number, and then modify your sublist values accordingly, to something like:
to setup-row-nums
ca
reset-ticks
file-close-all
file-open "turtle_details_2.csv"
let headings csv:from-row file-read-line
set screen-headings sublist headings 1 5
set battery-headings sublist headings 5 length headings
while [ not file-at-end? ] [
let data csv:from-row file-read-line
create-turtles 1 [
set turtle-row-number first data
set turtle-screen-list sublist data 1 5
set turtle-battery-list sublist data 5 length data
]
]
update-vals
file-close-all
end
Where update-vals is as shown above.

Importing/Conditioning a file.txt with a "kind" of json structure in R

I wanted to import a .txt file in R but the format is really special and it's looks like a json format but I don't know how to import it. There is an example of my data:
{"datetime":"2015-07-08 09:10:00","subject":"MMM","sscore":"-0.2280","smean":"0.2593","svscore":"-0.2795","sdispersion":"0.375","svolume":"8","sbuzz":"0.6026","lastclose":"155.430000000","companyname":"3M Company"},{"datetime":"2015-07-07 09:10:00","subject":"MMM","sscore":"0.2977","smean":"0.2713","svscore":"-0.7436","sdispersion":"0.400","svolume":"5","sbuzz":"0.4895","lastclose":"155.080000000","companyname":"3M Company"},{"datetime":"2015-07-06 09:10:00","subject":"MMM","sscore":"-1.0057","smean":"0.2579","svscore":"-1.3796","sdispersion":"1.000","svolume":"1","sbuzz":"0.4531","lastclose":"155.380000000","companyname":"3M Company"}
To deal with this is used this code:
test1 <- read.csv("C:/Users/test1.txt", header=FALSE)
## Import as 5 observations (5th is all empty) of 1700 variables
#(in fact 40 observations of 11 variables). In fact when I imported the
#.txt file, it's having one line (5th obs) empty, and 4 lines of data and
#placed next to each other 4 lines of data of 11 variables.
# Get the different lines
part1=test1[1:10]
part2=test1[11:20]
part3=test1[21:30]
part4=test1[31:40]
...
## Remove the empty line (there were an empty line after each)
part1=part1[-5,]
part2=part2[-5,]
part3=part3[-5,]
...
## Rename the columns
names(part1)=c("Date Time","Subject","Sscore","Smean","Svscore","Sdispersion","Svolume","Sbuzz","Last close","Company name")
names(part2)=c("Date Time","Subject","Sscore","Smean","Svscore","Sdispersion","Svolume","Sbuzz","Last close","Company name")
names(part3)=c("Date Time","Subject","Sscore","Smean","Svscore","Sdispersion","Svolume","Sbuzz","Last close","Company name")
...
## Assemble data to have one dataset
data=rbind(part1,part2,part3,part4,part5,part6,part7,part8,part9,part10)
## Formate Date Time
times <- as.POSIXct(data$`Date Time`, format='{datetime:%Y-%m-%d %H:%M:%S')
data$`Date Time` <- times
## Keep only the Date
data$Date <- as.Date(times)
## Formate data - Remove text
data$Subject <- gsub("subject:", "", data$Subject)
data$Sscore <- gsub("sscore:", "", data$Sscore)
...
So My code is working to reinstate the data but it's maybe very difficult and more long I know there is better ways to do it, so if you could help me with that I would be very grateful.

There are many packages that read JSON, e.g. rjson, jsonlite, RJSONIO (they will turn in up a google search) - just pick one and give it a go.
e.g.
library(jsonlite)
json.text <- '{"datetime":"2015-07-08 09:10:00","subject":"MMM","sscore":"-0.2280","smean":"0.2593","svscore":"-0.2795","sdispersion":"0.375","svolume":"8","sbuzz":"0.6026","lastclose":"155.430000000","companyname":"3M Company"},{"datetime":"2015-07-07 09:10:00","subject":"MMM","sscore":"0.2977","smean":"0.2713","svscore":"-0.7436","sdispersion":"0.400","svolume":"5","sbuzz":"0.4895","lastclose":"155.080000000","companyname":"3M Company"},{"datetime":"2015-07-06 09:10:00","subject":"MMM","sscore":"-1.0057","smean":"0.2579","svscore":"-1.3796","sdispersion":"1.000","svolume":"1","sbuzz":"0.4531","lastclose":"155.380000000","companyname":"3M Company"}'
x <- fromJSON(paste0('[', json.text, ']'))
datetime subject sscore smean svscore sdispersion svolume sbuzz lastclose companyname
1 2015-07-08 09:10:00 MMM -0.2280 0.2593 -0.2795 0.375 8 0.6026 155.430000000 3M Company
2 2015-07-07 09:10:00 MMM 0.2977 0.2713 -0.7436 0.400 5 0.4895 155.080000000 3M Company
3 2015-07-06 09:10:00 MMM -1.0057 0.2579 -1.3796 1.000 1 0.4531 155.380000000 3M Company
I paste the '[' and ']' around your JSON because you have multiple JSON elements (the rows in the dataframe above) and for this to be well-formed JSON it needs to be an array, i.e. [ {...}, {...}, {...} ] rather than {...}, {...}, {...}.

Scraping data from tables on multiple web pages in R (football players)

I'm working on a project for school where I need to collect the career statistics for individual NCAA football players. The data for each player is in this format.
http://www.sports-reference.com/cfb/players/ryan-aplin-1.html
I cannot find an aggregate of all players so I need to go page by page and pull out the bottom row of each passing scoring Rushing & receiving etc. html table
Each player is catagorized by their last name with links to each alphabet going here.
http://www.sports-reference.com/cfb/players/
For instance, each player with the last name A is found here.
http://www.sports-reference.com/cfb/players/a-index.html
This is my first time really getting into data scraping so I tried to find similar questions with answers. The closest answer I found was this question
I believe I could use something very similar where I switch page number with the collected player's name. However, I'm not sure how to change it to look for player name instead of page number.
Samuel L. Ventura also gave a talk about data scraping for NFL data recently that can be found here.
EDIT:
Ben was really helpful and provided some great code. The first part works really well, however when I attempt to run the second part I run into this.
> # unlist into a single character vector
> links <- unlist(links)
> # Go to each URL in the list and scrape all the data from the tables
> # this will take some time... don't interrupt it!
> all_tables <- lapply(links, readHTMLTable, stringsAsFactors = FALSE)
Error in UseMethod("xmlNamespaceDefinitions") :
no applicable method for 'xmlNamespaceDefinitions' applied to an object of class "NULL"
> # Put player names in the list so we know who the data belong to
> # extract names from the URLs to their stats page...
> toMatch <- c("http://www.sports-reference.com/cfb/players/", "-1.html")
> player_names <- unique (gsub(paste(toMatch,collapse="|"), "", links))
Error: cannot allocate vector of size 512 Kb
> # assign player names to list of tables
> names(all_tables) <- player_names
Error: object 'player_names' not found
> fix(inx_page)
Error in edit(name, file, title, editor) :
unexpected '<' occurred on line 1
use a command like
x <- edit()
to recover
In addition: Warning message:
In edit.default(name, file, title, editor = defaultEditor) :
deparse may be incomplete
This could be an error due to not having sufficient memory (only 4gb on computer I am currently using). Although I do not understand the error
> all_tables <- lapply(links, readHTMLTable, stringsAsFactors = FALSE)
Error in UseMethod("xmlNamespaceDefinitions") :
no applicable method for 'xmlNamespaceDefinitions' applied to an object of class "NULL"
Looking through my other datasets my players really only go back to 2007. If there would be some way to pull just people from 2007 onwards that may help shrink the data. If I had a list of people whose names I wanted to pull could I just replace the lnk in
links[[i]] <- paste0("http://www.sports-reference.com", lnk)
with only the players that I need?

Here's how you can easily get all the data in all the tables on all the player pages...
First make a list of the URLs for all the players' pages...
require(RCurl); require(XML)
n <- length(letters)
# pre-allocate list to fill
links <- vector("list", length = n)
for(i in 1:n){
print(i) # keep track of what the function is up to
# get all html on each page of the a-z index pages
inx_page <- htmlParse(getURI(paste0("http://www.sports-reference.com/cfb/players/", letters[i], "-index.html")))
# scrape URLs for each player from each index page
lnk <- unname(xpathSApply(inx_page, "//a/#href"))
# skip first 63 and last 10 links as they are constant on each page
lnk <- lnk[-c(1:63, (length(lnk)-10):length(lnk))]
# only keep links that go to players (exclude schools)
lnk <- lnk[grep("players", lnk)]
# now we have a list of all the URLs to all the players on that index page
# but the URLs are incomplete, so let's complete them so we can use them from
# anywhere
links[[i]] <- paste0("http://www.sports-reference.com", lnk)
}
# unlist into a single character vector
links <- unlist(links)
Now we have a vector of some 67,000 URLs (seems like a lot of players, can that be right?), so:
Second, scrape all the tables at each URL to get their data, like so:
# Go to each URL in the list and scrape all the data from the tables
# this will take some time... don't interrupt it!
# start edit1 here - just so you can see what's changed
# pre-allocate list
all_tables <- vector("list", length = (length(links)))
for(i in 1:length(links)){
print(i)
# error handling - skips to next URL if it gets an error
result <- try(
all_tables[[i]] <- readHTMLTable(links[i], stringsAsFactors = FALSE)
); if(class(result) == "try-error") next;
}
# end edit1 here
# Put player names in the list so we know who the data belong to
# extract names from the URLs to their stats page...
toMatch <- c("http://www.sports-reference.com/cfb/players/", "-1.html")
player_names <- unique (gsub(paste(toMatch,collapse="|"), "", links))
# assign player names to list of tables
names(all_tables) <- player_names
The result looks like this (this is just a snippet of the output):
all_tables
$`neli-aasa`
$`neli-aasa`$defense
Year School Conf Class Pos Solo Ast Tot Loss Sk Int Yds Avg TD PD FR Yds TD FF
1 *2007 Utah MWC FR DL 2 1 3 0.0 0.0 0 0 0 0 0 0 0 0
2 *2010 Utah MWC SR DL 4 4 8 2.5 1.5 0 0 0 1 0 0 0 0
$`neli-aasa`$kick_ret
Year School Conf Class Pos Ret Yds Avg TD Ret Yds Avg TD
1 *2007 Utah MWC FR DL 0 0 0 0 0 0
2 *2010 Utah MWC SR DL 2 24 12.0 0 0 0 0
$`neli-aasa`$receiving
Year School Conf Class Pos Rec Yds Avg TD Att Yds Avg TD Plays Yds Avg TD
1 *2007 Utah MWC FR DL 1 41 41.0 0 0 0 0 1 41 41.0 0
2 *2010 Utah MWC SR DL 0 0 0 0 0 0 0 0 0
Finally, let's say we just want to look at the passing tables...
# just show passing tables
passing <- lapply(all_tables, function(i) i$passing)
# but lots of NULL in here, and not a convenient format, so...
passing <- do.call(rbind, passing)
And we end up with a data frame that is ready for further analyses (also just a snippet)...
Year School Conf Class Pos Cmp Att Pct Yds Y/A AY/A TD Int Rate
james-aaron 1978 Air Force Ind QB 28 56 50.0 316 5.6 3.6 1 3 92.6
jeff-aaron.1 2000 Alabama-Birmingham CUSA JR QB 100 182 54.9 1135 6.2 6.0 5 3 113.1
jeff-aaron.2 2001 Alabama-Birmingham CUSA SR QB 77 148 52.0 828 5.6 4.3 4 6 99.8

Is it possible to write a table to a file in JSON format in R?

I'm making word frequency tables with R and the preferred output format would be a JSON file. sth like
{
"word" : "dog",
"frequency" : 12
}
Is there any way to save the table directly into this format? I've been using the write.csv() function and convert the output into JSON but this is very complicated and time consuming.

set.seed(1)
( tbl <- table(round(runif(100, 1, 5))) )
## 1 2 3 4 5
## 9 24 30 23 14
library(rjson)
sink("json.txt")
cat(toJSON(tbl))
sink()
file.show("json.txt")
## {"1":9,"2":24,"3":30,"4":23,"5":14}
or even better:
set.seed(1)
( tab <- table(letters[round(runif(100, 1, 26))]) )
a b c d e f g h i j k l m n o p q r s t u v w x y z
1 2 4 3 2 5 4 3 5 3 9 4 7 2 2 2 5 5 5 6 5 3 7 3 2 1
sink("lets.txt")
cat(toJSON(tab))
sink()
file.show("lets.txt")
## {"a":1,"b":2,"c":4,"d":3,"e":2,"f":5,"g":4,"h":3,"i":5,"j":3,"k":9,"l":4,"m":7,"n":2,"o":2,"p":2,"q":5,"r":5,"s":5,"t":6,"u":5,"v":3,"w":7,"x":3,"y":2,"z":1}
Then validate it with http://www.jsonlint.com/ to get pretty formatting. If you have multidimensional table, you'll have to work it out a bit...
EDIT:
Oh, now I see, you want the dataset characteristics sink-ed to a JSON file. No problem, just give us a sample data, and I'll work on a code a bit. Practically, you need to carry out the data into desirable format, hence convert it to JSON. list should suffice. Give me a sec, I'll update my answer.
EDIT #2:
Well, time is relative... it's a common knowledge... Here you go:
( dtf <- structure(list(word = structure(1:3, .Label = c("cat", "dog",
"mouse"), class = "factor"), frequency = c(12, 32, 18)), .Names = c("word",
"frequency"), row.names = c(NA, -3L), class = "data.frame") )
## word frequency
## 1 cat 12
## 2 dog 32
## 3 mouse 18
If dtf is a simple data frame, yes, data.frame, if it's not, coerce it! Long story short, you can do:
toJSON(as.data.frame(t(dtf)))
## [1] "{\"V1\":{\"word\":\"cat\",\"frequency\":\"12\"},\"V2\":{\"word\":\"dog\",\"frequency\":\"32\"},\"V3\":{\"word\":\"mouse\",\"frequency\":\"18\"}}"
I though I'll need some melt with this one, but simple t did the trick. Now, you only need to deal with column names after transposing the data.frame. t coerces data.frames to matrix, so you need to convert it back to data.frame. I used as.data.frame, but you can also use toJSON(data.frame(t(dtf))) - you'll get X instead of V as a variable name. Alternatively, you can use regexp to clean the JSON file (if needed), but it's a lousy practice, try to work it out by preparing the data.frame.
I hope this helped a bit...

These days I would typically use the jsonlite package.
library("jsonlite")
toJSON(mydatatable, pretty = TRUE)
This turns the data table into a JSON array of key/value pair objects directly.

RJSONIO is a package "that allows conversion to and from data in Javascript object notation (JSON) format". You can use it to export your object as a JSON file.
library(RJSONIO)
writeLines(toJSON(anobject), "afile.JSON")

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

NetLogo - using BehaviorSpace get all turtles locations as the result of each repetition - csv

Related

rjson::fromJSON returns only the first item

Using to-reports while creating agents from a csv file

Importing/Conditioning a file.txt with a "kind" of json structure in R

Scraping data from tables on multiple web pages in R (football players)

Is it possible to write a table to a file in JSON format in R?

Categories

Resources