How to sort nested dictionaries in TCL - tcl

I have a nested dictionary in TCL where the values at a particular index are unsorted at a subindex value, for example in this dictionary, I want to sort 55,21,36 and move the subkeys while sorting, and I also want to sort 103,344,3 while moving around their subkeys as well
I am very confused how one goes about doing something like this. I cannot use lsort -index -stride since my sub Dictionary isn't a single flattened list
a1,a2 {0,6 {103 55} 1,5 {344 21} 6,7 {3 36}}
Expected Output 1 [Sort based on 21,36,55 [final subvalues], 55 goes all the way towards the end with its key]
a1,a2 {1,5 {344 21} 6,7 {3 36} 0,6 {103 55}}
Expected Output 2 [Sort based on 344,3,103, 344 goes all the way towards the end along with its key]
a1,a2 {6,7 {3 36} 0,6 {103 55} 1,5 {344 21}}

Related

Getting number of duplicate elements in a list (in tcl)

I have a list which looks like
list = {ab bc 8 ab d1 10 xy uv bc ab xy 10 d1}
I would like to know how often each element of the list occurs inside it, that is, I need a result like this:
ab 3
bc 2
8 1
d1 2
....
I prefer a single line argument (if such exists) instead of a proc. I need to work with both: list elements and their frequency in the list.
Any advice is welcome.
Thank you!
Assuming that counter is the name of the dictionary where you want to collect this information (and is either currently unset or set to the empty string):
foreach item $list {dict incr counter $item}
You can then print that out in approximately the form you gave with:
dict for {item count} $counter {puts [format "%6s %-3d" $item $count]}
Note that this second line is about displaying the data, not actually finding it out.

NetLogo - using BehaviorSpace get all turtles locations as the result of each repetition

I am using BehaviorSpace to run the model hundreds of times with different parameters. But I need to know the locations of all turtles as a result instead of only the number of turtles. How can I achieve it with BehaviorSpace?
Currently, I output the results in a csv file by this code:
to-report get-locations
report (list xcor ycor)
end
to generate-output
file-open "model_r_1.0_locations.csv"
file-print csv:to-row get-locations
file-close
end
but all results are popped into same csv file, so I can't tell the condition of each running.
Seth's suggestion of incorporating behaviorspace-run-number in the filename of your csv output is one alternative. It would allow you to associate that file with the summary data in your main BehaviorSpace output file.
Another option is to include list reporters as "measures" in your behavior space experiment definition. For example, in your case:
map [ t -> [ xcor ] of t ] sort turtles
map [ t -> [ ycor ] of t ] sort turtles
You can then parse the resulting list "manually" in your favourite data analysis language. I've used the following function for this before, in Julia:
parselist(strlist, T = Float64) = parse.(T, split(strlist[2:end-1]))
I'm sure you can easily write some equivalent code in Python or R or whatever language you're using.
In the example above, I've outputted separate lists for the xcor and the ycor of turtles. You could also output a single "list of lists", but the parsing would be trickier.
Edit: How to do this using the csv extension and R
Coincidentally, I had to do something similar today for a different project, and I realized that a combination of the csv extension and R can make this very easy.
The general idea is the following:
In NetLogo, use csv:to-string to encode list data into a string and then write that string directly in the BehaviorSpace output.
In R, use purrr::map and readr::read_csv, followed by tidyr::unnest, to unpack everything in a neat "one observation per row" dataframe.
In other words: we like CSV, so we put CSV in our CSV so we can parse while we parse.
Here is a full-fledged example. Let's say we have the following NetLogo model:
extensions [ csv ]
to setup
clear-all
create-turtles 2 [ move-to one-of patches ]
reset-ticks
end
to go
ask turtles [ forward 1 ]
tick
end
to-report positions
let coords [ (list who xcor ycor) ] of turtles
report csv:to-string fput ["who" "x" "y"] coords
end
We then define the following tiny BehaviorSpace experiment, with only two repetitions and a time limit of two, using our positions reporter as an output:
The R code to process this is pleasantly straightforward:
library(tidyverse)
df <- read_csv("experiment-table.csv", skip = 6) %>%
mutate(positions = map(positions, read_csv)) %>%
unnest()
Which results in the following dataframe, all neat and tidy:
> df
# A tibble: 12 x 5
`[run number]` `[step]` who x y
<int> <int> <int> <dbl> <dbl>
1 1 0 0 16 10
2 1 0 1 10 -2
3 1 1 1 9.03 -2.24
4 1 1 0 -16.0 10.1
5 1 2 1 8.06 -2.48
6 1 2 0 -15.0 10.3
7 2 0 1 -14 1
8 2 0 0 13 15
9 2 1 0 14.0 15.1
10 2 1 1 -13.7 0.0489
11 2 2 0 15.0 15.1
12 2 2 1 -13.4 -0.902
The same thing in Julia:
using CSV, DataFrames
df = CSV.read("experiment-table.csv", header = 7)
cols = filter(col -> col != :positions, names(df))
df = by(df -> CSV.read(IOBuffer(df[:positions][1])), df, cols)

How to find intersection or subset of two CSV files

I have 2 CSV files containing two columns and a large number of rows. The first column is the id, and the second is the set of paired values.
e.g.:
CSV1:
1 {[1,2],[1,4],[5,6],[3,1]}
2 {[2,4] ,[6,3], [8,3]}
3 {[3,2], [5,2], [3,5]}
CSV2:
1 {[2,4] ,[6,3], [8,3]}
2 {[3,4] ,[3,3], [2,3]}
3 {[1,4],[5,6],[3,1],[5,5]}
Now I need to get a CSV file which contains either exact matching items or subset which belongs to both CSVs.
Here the result should be:
{[2,4] ,[6,3], [8,3]}
{[1,4],[5,6],[3,1]}
Can anyone suggest python code to do this?
As suggested by this answer you can use set.intersection to get the intersection of two sets, however this does not work with lists as items. Instead you can also use filter (comparable to this answer):
>>> l1 = [[1,2],[1,4],[5,6],[3,1]]
>>> l2 = [[1,4],[5,6],[3,1],[5,5]]
>>> filter(lambda q: q in l2, l1)
[[1, 4], [5, 6], [3, 1]]
In Python 3 you should convert it to list since there filter returns an iterable:
>>> list(filter(lambda x: x in l2,l1))
You can load CSV files (if they are really comma [or some other character] separated files) with csv.reader or pandas.read_csv for example.

R convert json to list to data.table

I have a data.table where one of the columns contains JSON. I am trying to extract the content so that each variable is a column.
library(jsonlite)
library(data.table)
df<-data.table(a=c('{"tag_id":"34","response_id":2}',
'{"tag_id":"4","response_id":1,"other":4}',
'{"tag_id":"34"}'),stringsAsFactors=F)
The desired result, that does not refer to the "other" variable:
tag_id response_id
1 "34" 2
2 "4" 1
3 "34" NA
I have tried several versions of:
parseLog <- function(x){
if (is.na(x))
e=c(tag_id=NA,response_id=NA)
else{
j=fromJSON(x)
e=c(tag_id=as.integer(j$tag_id),response_id=j$response_id)
}
e
}
that seems to work well to retrieve a list of vectors (or lists if c is replaced by list) but when I try to convert the list to data.table something doesn´t work as expected.
parsed<-lapply(df$a,parseLog)
rparsed<-do.call(rbind.data.frame,parsed)
colnames(rparsed)<-c("tag_id","response_id")
Because of the missing value in the third row. How can I solve it in a R-ish clean way? How can I make that my parse method returns an NA for the missing value. Alternative, Is there a parameter "fill" like there is for rbind that can be used in rbind.data.frame or analogous method?
The dataset I am using has 11M rows so performance is important.
Additionally, there is an equivalent method to rbind.data.frame to obtain a data.table. How would that be used? When I check the documentation it refers me to rbindlist but it complains the parameter is not used and if call directly(without do.call it complains about the type of parsed):
rparsed<-do.call(rbindlist,fill=T,parsed)
EDIT: The case I need to cover is more general, in a set of 11M records all the possible circumstances happen:
df<-data.table(a=c('{"tag_id":"34","response_id":2}',
'{"trash":"34","useless":2}',
'{"tag_id":"4","response_id":1,"other":4}',
NA,
'{"response_id":"34"}',
'{"tag_id":"34"}'),stringsAsFactors=F)
and the output should only contain tag_id and response_id columns.
There might be a simpler way but this seems to be working:
library(data.table)
library(jsonlite)
df[, json := sapply(a, fromJSON)][, rbindlist(lapply(json, data.frame), fill=TRUE)]
#or if you need all the columns :
#df[, json := sapply(a, fromJSON)][,
# c('tag_id', 'response_id') := rbindlist(lapply(json, data.frame), fill=TRUE)]
Output:
> df[, json := sapply(a, fromJSON)][, rbindlist(lapply(json, data.frame), fill=TRUE)]
tag_id response_id
1: 34 2
2: 4 1
3: 34 NA
EDIT:
This solution comes after the edit of the question with additional requests.
There are lots of ways to do this but I find the simplest one is at the creation of the data.frame like this:
df[, json := sapply(a, fromJSON)][,
rbindlist(lapply(json, function(x) data.frame(x)[-3]), fill=TRUE)]
# tag_id response_id
#1: 34 2
#2: 4 1
#3: 34 NA

Check whether procedure return list or list with sub list

I am facing problem how to check whether the list returned by the procedure consist of a single list or may have sub list inside.
#simple list
set a { 1 2 3 4}
# list consisting of sub list
set a { {1 2 3 4} {5 6 7 7} }
As above some times the variable a will have a list and sometime proc will return list consisting of sub list.
Update part
set a [mysqlsel $db "SELECT * FROM abc" -list]
I do not know weather query will return a single list or list consisting of sublist
You should really rethink your approach: since Tcl is typeless, you can't really tell if {{1 2 3 4} {5 6 7 8}} is a list of two lists or a list of two strings or a literal string {1 2 3 4} {5 6 7 8}, because all these propositions are true depending on how you make Tcl interpret this value.
Another thing, is that even if you were to try something like catch {lindex $element 0} or string is list $element on each top-level element to see if it can be interpreted as a list, that would qualify as being non-lists only strings that really can't be parsed as lists, like aaa { bbb. And string foo is also a proper list (of length 1, containing "foo" as its sole element).
One approach you can consider using is wrapping the returned value in another value which has some sort of "tag" attached to it--the trick routinely used in some other typeless languages like LISP and Erlang. That would look like this:
If you need to return 1 2 3 4, return {flat {1 2 3 4}} instead.
If you need to return {1 2 3 4} {5 6 7 8}, return {nested {{1 2 3 4} {1 2 3 4 5}}}.
Then in the client code do switch on the "tag" element and decapsulate the payload:
lassign [your_procedure ...] tag payload
switch -- $tag {
flat {
# do something with $payload
}
nested {
# do something with $payload
}
}