Get a specific data from a json array in erlang using rfc4627 - json

This is the output I get after running following 2 commands.
Wc = os:cmd("curl -s -k -X GET 'http://10.210.12.158:10065/iot/get/task_id?id=1'"),
WW = decode_json(Wc),
OUTPUT ---
{ok,{obj,[{"status",200},
{"data",
[{obj,[{"id",1},
{"task",
<<"Turn on the bulb when the temperature is greater than 28 ">>},
{"working_condition",1},
{"depending_value",<<"Temperature">>},
{"device",<<" BulbOnly">>},
{"turning_point",28},
{"config_id",null}]}]}]}}
I want to get these data separately.
Eg - Task = Turn on the bulb when the temperature is greater than 28
So, How can I do this?

This the structure returned seems to return a list of tasks and each task being a proplist, you can do something like
-module(test).
-export([input/0, get_tasks/1]).
input() ->
{ok,{obj,[{"status",200},
{"data",
[{obj,[{"id",1},
{"task",
<<"Turn on the bulb when the temperature is greater than 28 ">>},
{"working_condition",1},
{"depending_value",<<"Temperature">>},
{"device",<<" BulbOnly">>},
{"turning_point",28},
{"config_id",null}]}]}]}}.
get_tasks({ok, {obj, [_Status, {"data", Tasks}|_Tail]}}) ->
[ get_task_description(T) || T <- Tasks ].
get_task_description({obj, Proplist}) ->
proplists:get_value("task", Proplist).
When running it in a shell, you would get:
1> test:get_tasks(test:input()).
[<<"Turn on the bulb when the temperature is greater than 28 ">>]

Related

Is there a way to lookup a value from a CSV in nextflow? Or, alternately, reuse a CSV?

I have a simple csv created as part of a workflow, like below:
sample,value
A,1
B,0.5
Separately, I have another channel with file names matching the sample names. I'd like to be able to use the values associated with each sample name within a new process.
I've tried splitting the CSV using .splitCsv but (unsurprisingly) sometimes the incorrect value gets used with a sample, although it does run the correct number of times. I've also tried just using awk within the script to pull out the corresponding value and save it to a variable, and this causes the correct value to be used, but it consumes the CSV file and so only one sample gets processed.
Super simplified nextflow (DSL2) script:
#!/usr/bin/env nextflow
nextflow.enable.dsl=2
process foo {
input:
path input_file
output:
path 'file.csv', emit csv
"""
script that creates csv
"""
}
process bar {
input:
path input_file2
output:
path 'file.bam', emit bam
"""
script that creates bam files
"""
}
process help_me {
input:
path csv
path bam
output:
path 'result'
"""
script that uses value from csv on associated bam file
"""
}
workflow {
foo(params.input)
bar(params.input2)
help_me(foo.out.csv, bar.out.bam)
}
Thanks!!
Edit: In essence, is there a way to synchronize two channels such that I can use a csv's individual rows with associated files?
If you have a value channel, you can reuse a file (like a CSV) an unlimited number of times without consuming the channel. For example:
workflow {
input1 = file( params.input1 )
input2 = file( params.input2 )
foo( input1 )
bar( input2 )
help_me(foo.out.csv, bar.out.bam)
}
Here, both input1 and input2 are value channels. Also, (emphasis mine):
A value channel is implicitly created by a process when an input
specifies a simple value in the from clause. Moreover, a value channel
is also implicitly created as output for a process whose inputs are
only value channels.
Means that both foo.out.csv and bar.out.bam are also value channels. Additionally, help_me.out is also a value channel. If input2 was instead a queue channel, you can see that input1 can be re-used an unlimited number of times:
$ mkdir -p ./path/to/bams
$ touch ./path/to/bams/{A,B,C}.bam
$ touch ./foo.txt
params.input1 = './foo.txt'
params.input2 = './path/to/bams/*.bam'
workflow {
input1 = file( params.input1 )
input2 = Channel.fromPath( params.input2 )
foo( input1 )
bar( input2 )
help_me(foo.out.csv, bar.out.bam)
}
Results:
$ nextflow run script.nf
N E X T F L O W ~ version 22.04.0
Launching `script.nf` [trusting_allen] DSL2 - revision: 75209e4c85
executor > local (7)
[24/d459f7] process > foo [100%] 1 of 1 ✔
[04/a903e4] process > bar (2) [100%] 3 of 3 ✔
[24/7a9a1d] process > help_me (3) [100%] 3 of 3 ✔
Note that bar.out.bam and help_me.out are now queue channels.
If instead, you have one CSV per sample (or similar configuration), you will need some way to join these channels prior and adjust your new process' input declaration accordingly. What you want to avoid is declaring two (or more) queue channels in your input block. This part of docs is well worth the time investment: Understand how multiple input channels work, and would explain why you saw the incorrect value being associated with a particular sample when consuming the splitCsv output. To join these channels, you can use the join operator. For example, given your simple CSV (as 'foo.csv') and the test bams created previously:
nextflow.enable.dsl=2
params.input1 = './foo.csv'
params.input2 = './path/to/bams/*.bam'
process help_me {
debug true
input:
tuple val(sample), val(myval), path(bam)
output:
path 'result'
"""
echo -n "sample: ${sample}, myval: ${myval}, bam: ${bam}"
touch result
"""
}
workflow {
Channel.fromPath( params.input1 ) \
| splitCsv( header:true ) \
| map { row -> tuple( row.sample, row.value ) } \
| set { rows_ch }
Channel.fromPath( params.input2 ) \
| map { bam -> tuple( bam.baseName, bam ) } \
| join( rows_ch ) \
| map { sample, bam, myval -> tuple( sample, myval, bam ) } \
| help_me
}
Results:
$ nextflow run script.nf
N E X T F L O W ~ version 22.04.0
Launching `script.nf` [lethal_mayer] DSL2 - revision: 395732babc
executor > local (2)
[c5/e96085] process > help_me (1) [100%] 2 of 2 ✔
sample: B, myval: 0.5, bam: B.bam
sample: A, myval: 1, bam: A.bam
If your CSV has more than one value for a particalar sample and these are specified on seperate lines, you probably want instead the combine operator. For example, if your 'foo.csv' contains:
sample,value
A,1
B,0.5
B,2
And replace, join( rows_ch ) with combine( rows_ch, by:0 ) in the above example. Results:
nextflow run script.nf
N E X T F L O W ~ version 22.04.0
Launching `script.nf` [festering_miescher] DSL2 - revision: f8de1e0d20
executor > local (3)
[ee/8af543] process > help_me (3) [100%] 3 of 3 ✔
sample: A, myval: 1, bam: A.bam
sample: B, myval: 0.5, bam: B.bam
sample: B, myval: 2, bam: B.bam

Difference in Difference in R (Callaway & Sant'Anna)

I'm trying to implement the DiD package by Callaway and Sant'Anna in my master thesis, but I'm coming across errors when I run the DiD code and when I try to view the summary.
did1 <- att_gt(yname = "countgreen",
gname = "signing_year",
idname = "investorid",
tname = "dealyear",
data = panel8)
This code warns me that:
"Be aware that there are some small groups in your dataset.
Check groups: 2006,2007,2008,2011. Dropped 109 observations that had missing data.overlap condition violated for 2009 in time period 2001Not enough control units for group 2009 in time period 2001 to run specified regression"
This error is repeated several hundred times.
Does this mean I need to re-match my treatment firms to control firms using a 1:3 ration (treat:control) rather than the 1:1 I used previously?
Then when I run this code:
summary(did1)
I get this message:
Error in Math.data.frame(list(`mpobj$group` = c(2009L, 2009L, 2009L, 2009L, : non-numeric variable(s) in data frame: mpobj$att
I'm really not too sure what this means.
Can anyone help trouble shoot?
Thanks,
Rory
I don't know the DiD package but i can't answer about the :summary(did1)
If you do str(did1) you should have something like this :
'data.frame': 6 obs. of 7 variables:
$ cluster : int 1 2 3 4 5 6
$ price_scal : num -0.572 -0.132 0.891 1.091 -0.803 ...
$ hd_scal : num -0.778 0.63 0.181 -0.24 0.244 ...
$ ram_scal : num -0.6937 0.00479 0.46411 0.00653 -0.31204 ...
$ screen_scal: num -0.457 2.642 -0.195 2.642 -0.325 ...
$ ads_scal : num 0.315 -0.889 0.472 0.47 -0.822 ...
$ trend_scal : num -0.604 1.267 -0.459 -0.413 1.156 ...
But in your case you should have one variable mpobj$att that is a factor or a str column.
Maybe this should also make the DiD code run.

NetLogo - using BehaviorSpace get all turtles locations as the result of each repetition

I am using BehaviorSpace to run the model hundreds of times with different parameters. But I need to know the locations of all turtles as a result instead of only the number of turtles. How can I achieve it with BehaviorSpace?
Currently, I output the results in a csv file by this code:
to-report get-locations
report (list xcor ycor)
end
to generate-output
file-open "model_r_1.0_locations.csv"
file-print csv:to-row get-locations
file-close
end
but all results are popped into same csv file, so I can't tell the condition of each running.
Seth's suggestion of incorporating behaviorspace-run-number in the filename of your csv output is one alternative. It would allow you to associate that file with the summary data in your main BehaviorSpace output file.
Another option is to include list reporters as "measures" in your behavior space experiment definition. For example, in your case:
map [ t -> [ xcor ] of t ] sort turtles
map [ t -> [ ycor ] of t ] sort turtles
You can then parse the resulting list "manually" in your favourite data analysis language. I've used the following function for this before, in Julia:
parselist(strlist, T = Float64) = parse.(T, split(strlist[2:end-1]))
I'm sure you can easily write some equivalent code in Python or R or whatever language you're using.
In the example above, I've outputted separate lists for the xcor and the ycor of turtles. You could also output a single "list of lists", but the parsing would be trickier.
Edit: How to do this using the csv extension and R
Coincidentally, I had to do something similar today for a different project, and I realized that a combination of the csv extension and R can make this very easy.
The general idea is the following:
In NetLogo, use csv:to-string to encode list data into a string and then write that string directly in the BehaviorSpace output.
In R, use purrr::map and readr::read_csv, followed by tidyr::unnest, to unpack everything in a neat "one observation per row" dataframe.
In other words: we like CSV, so we put CSV in our CSV so we can parse while we parse.
Here is a full-fledged example. Let's say we have the following NetLogo model:
extensions [ csv ]
to setup
clear-all
create-turtles 2 [ move-to one-of patches ]
reset-ticks
end
to go
ask turtles [ forward 1 ]
tick
end
to-report positions
let coords [ (list who xcor ycor) ] of turtles
report csv:to-string fput ["who" "x" "y"] coords
end
We then define the following tiny BehaviorSpace experiment, with only two repetitions and a time limit of two, using our positions reporter as an output:
The R code to process this is pleasantly straightforward:
library(tidyverse)
df <- read_csv("experiment-table.csv", skip = 6) %>%
mutate(positions = map(positions, read_csv)) %>%
unnest()
Which results in the following dataframe, all neat and tidy:
> df
# A tibble: 12 x 5
`[run number]` `[step]` who x y
<int> <int> <int> <dbl> <dbl>
1 1 0 0 16 10
2 1 0 1 10 -2
3 1 1 1 9.03 -2.24
4 1 1 0 -16.0 10.1
5 1 2 1 8.06 -2.48
6 1 2 0 -15.0 10.3
7 2 0 1 -14 1
8 2 0 0 13 15
9 2 1 0 14.0 15.1
10 2 1 1 -13.7 0.0489
11 2 2 0 15.0 15.1
12 2 2 1 -13.4 -0.902
The same thing in Julia:
using CSV, DataFrames
df = CSV.read("experiment-table.csv", header = 7)
cols = filter(col -> col != :positions, names(df))
df = by(df -> CSV.read(IOBuffer(df[:positions][1])), df, cols)

Data left out when reading json online to R

I try to read online json data to R through the below codes in R:
library('jsonlite')
address<-'https://data.cityofchicago.org/resource/qnmj-8ku6.json'
sample<-fromJSON(address)
The codes did run and have results in right format of a table. But only produced 1000 observations while the original city portal database has more than 200,000 observations. I am not sure what to be fixed to download the whole dataset. Please help.
You're using the wrong link to get the data. You can see the correct link by going to 'Export'
library(jsonlite)
address <- "https://data.cityofchicago.org/api/views/qnmj-8ku6/rows.json?accessType=DOWNLOAD"
sample <- fromJSON(address)
length(sample)
# [1]
length(sample[[2]])
# [1] 274228
Although, you may want to get it as a .csv to make it easier to work with straight away?
address <- "https://data.cityofchicago.org/api/views/qnmj-8ku6/rows.csv?accessType=DOWNLOAD"
sample_csv <- read.csv(address)
nrow(sample_csv)
# [1] 274228
str(sample_csv)
# 'data.frame': 274228 obs. of 22 variables:
# $ ID : int 10512552 10517063 10517120 10518590 10518648
# $ Case.Number : Factor w/ 274219 levels "HA107183","HA156050",..
# $ Date : Factor w/ 112977 levels "01/01/2014 01:00:00 AM",..
# $ Block : Factor w/ 27499 levels "0000X E 100TH PL",..
# $ IUCR : Factor w/ 331 levels "0110","0141",..
# $ Primary.Type : Factor w/ 33 levels "ARSON","ASSAULT",..
# $ Description : Factor w/ 310 levels "$500 AND UNDER",..
# ... etc

Importing/Conditioning a file.txt with a "kind" of json structure in R

I wanted to import a .txt file in R but the format is really special and it's looks like a json format but I don't know how to import it. There is an example of my data:
{"datetime":"2015-07-08 09:10:00","subject":"MMM","sscore":"-0.2280","smean":"0.2593","svscore":"-0.2795","sdispersion":"0.375","svolume":"8","sbuzz":"0.6026","lastclose":"155.430000000","companyname":"3M Company"},{"datetime":"2015-07-07 09:10:00","subject":"MMM","sscore":"0.2977","smean":"0.2713","svscore":"-0.7436","sdispersion":"0.400","svolume":"5","sbuzz":"0.4895","lastclose":"155.080000000","companyname":"3M Company"},{"datetime":"2015-07-06 09:10:00","subject":"MMM","sscore":"-1.0057","smean":"0.2579","svscore":"-1.3796","sdispersion":"1.000","svolume":"1","sbuzz":"0.4531","lastclose":"155.380000000","companyname":"3M Company"}
To deal with this is used this code:
test1 <- read.csv("C:/Users/test1.txt", header=FALSE)
## Import as 5 observations (5th is all empty) of 1700 variables
#(in fact 40 observations of 11 variables). In fact when I imported the
#.txt file, it's having one line (5th obs) empty, and 4 lines of data and
#placed next to each other 4 lines of data of 11 variables.
# Get the different lines
part1=test1[1:10]
part2=test1[11:20]
part3=test1[21:30]
part4=test1[31:40]
...
## Remove the empty line (there were an empty line after each)
part1=part1[-5,]
part2=part2[-5,]
part3=part3[-5,]
...
## Rename the columns
names(part1)=c("Date Time","Subject","Sscore","Smean","Svscore","Sdispersion","Svolume","Sbuzz","Last close","Company name")
names(part2)=c("Date Time","Subject","Sscore","Smean","Svscore","Sdispersion","Svolume","Sbuzz","Last close","Company name")
names(part3)=c("Date Time","Subject","Sscore","Smean","Svscore","Sdispersion","Svolume","Sbuzz","Last close","Company name")
...
## Assemble data to have one dataset
data=rbind(part1,part2,part3,part4,part5,part6,part7,part8,part9,part10)
## Formate Date Time
times <- as.POSIXct(data$`Date Time`, format='{datetime:%Y-%m-%d %H:%M:%S')
data$`Date Time` <- times
## Keep only the Date
data$Date <- as.Date(times)
## Formate data - Remove text
data$Subject <- gsub("subject:", "", data$Subject)
data$Sscore <- gsub("sscore:", "", data$Sscore)
...
So My code is working to reinstate the data but it's maybe very difficult and more long I know there is better ways to do it, so if you could help me with that I would be very grateful.
There are many packages that read JSON, e.g. rjson, jsonlite, RJSONIO (they will turn in up a google search) - just pick one and give it a go.
e.g.
library(jsonlite)
json.text <- '{"datetime":"2015-07-08 09:10:00","subject":"MMM","sscore":"-0.2280","smean":"0.2593","svscore":"-0.2795","sdispersion":"0.375","svolume":"8","sbuzz":"0.6026","lastclose":"155.430000000","companyname":"3M Company"},{"datetime":"2015-07-07 09:10:00","subject":"MMM","sscore":"0.2977","smean":"0.2713","svscore":"-0.7436","sdispersion":"0.400","svolume":"5","sbuzz":"0.4895","lastclose":"155.080000000","companyname":"3M Company"},{"datetime":"2015-07-06 09:10:00","subject":"MMM","sscore":"-1.0057","smean":"0.2579","svscore":"-1.3796","sdispersion":"1.000","svolume":"1","sbuzz":"0.4531","lastclose":"155.380000000","companyname":"3M Company"}'
x <- fromJSON(paste0('[', json.text, ']'))
datetime subject sscore smean svscore sdispersion svolume sbuzz lastclose companyname
1 2015-07-08 09:10:00 MMM -0.2280 0.2593 -0.2795 0.375 8 0.6026 155.430000000 3M Company
2 2015-07-07 09:10:00 MMM 0.2977 0.2713 -0.7436 0.400 5 0.4895 155.080000000 3M Company
3 2015-07-06 09:10:00 MMM -1.0057 0.2579 -1.3796 1.000 1 0.4531 155.380000000 3M Company
I paste the '[' and ']' around your JSON because you have multiple JSON elements (the rows in the dataframe above) and for this to be well-formed JSON it needs to be an array, i.e. [ {...}, {...}, {...} ] rather than {...}, {...}, {...}.