Parse CSV into key value pair - tcl

I have a csv file which has hostname and attached serial numbers. I want to create a key value pair with key being hostname and value being the list of serial numbers. The serial numbers can be one or many.
For example:
A, 1, 2, 3, 4
B, 5, 6
C, 7, 8, 9
D, 10
I need to access key A and get {1 2 3 4} as output. And if I access D i should get {10}
How should I do this? As the version of TCL i am using doesn't support any packages like CSV and I also won't be able to install it as it is in the server, So I am looking at a solution which doesn't include any packages.
For now, I am splitting the line with \n and then I process each element. Then I split the elements with "," and then I get the host name and serial numbers in a list. I then use the 0th index of the list as hostname and remaining values as serial numbers. Is there a cleaner solution?

I'd do something like:
#!/usr/bin/env tclsh
package require csv
package require struct::queue
set filename "file.csv"
set fh [open $filename r]
set q [struct::queue]
csv::read2queue $fh $q
close $fh
set data [dict create]
while {[$q size] > 0} {
set values [lassign [$q get] hostname]
dict set data $hostname [lmap elem $values {string trimleft $elem}]
}
dict for {key value} $data {
puts "$key => $value"
}
then
$ tclsh csv.tcl
A => 1 2 3 4
B => 5 6
C => 7 8 9
D => 10

The repeated recommendation given here is to use the CSV package for this purpose. See also the answer by #glenn-jackman. If unavailable, the time is better invested in obtaining it at your server side.
To get you started, however, you might want to adopt something along the lines of:
set dat {
A, 1, 2, 3, 4
B, 5, 6
C, 7, 8, 9
D, 10
}
set d [dict create]
foreach row [split [string trim $dat] \n] {
set row [lassign [split $row ,] key]
dict set d [string trim $key] [concat {*}$row]
}
dict get $d A
dict get $d D
Be warned, however, such hand-knitted solutions typically only serve their purpose when you have full control of the data being processed and its representation. Again, time is better invested by obtaining the CSV package.

I tried this way and got it working. Thanks again for your inputs. Yes, I know csv package would be easy but I cannot install it in server/product.
set multihost "host_slno.csv"
set fh1 [open $multihost r]
set data [read -nonewline $fh1]
close $fh1
set hostslnodata [ split $data "\n" ]
set hostslno [dict create];
foreach line $hostslnodata {
set line1 [join [split $line ", "] ]
puts "$line1"
if {[regexp {([A-Za-z0-9_\-]+)\s+(.*)} $line1 match hostname serial_numbers]} {
dict lappend hostslno $hostname $serial_numbers
}
}
puts [dict get $hostslno]

The sourcecode from the csv package is available. If you are unable to install the full csv package, you can include the code from here:
http://core.tcl.tk/tcllib/artifact/2898cd911697ecdb
If you still can't use that option, then stripping out all the whitespace and splitting on "," is required.
An alternative to the earlier answers is using string map:
set row [split [string map {" " ""} $row ] ,]
The string map will remove all spaces, and then split on ","
Once you have converted the lines of text into valid tcl lists:
A 1 2 3 4
B 5 6
C 7 8 9
D 10
Then you can use the lindex and lrange commands to pluck off all the pieces.
foreach row $data {
set server [lindex $row 0]
set serial_numbers [lrange $row 1 end]
dict set ...

One possibility:
set hostslno [dict create]
set multihost "host_slno.csv"
set fh1 [open $multihost]
while {[gets $fh line] >= 0} {
set numbers [lassign [regexp -inline -all {[^\s,]+} $line] hostname]
dict set hostslno $hostname $numbers
}
close $fh1
puts [dict get $hostslno A]

Related

print dictionary keys and values in one column in tcl

I am new learner of tcl scripting language. I am using TCL version 8.5. I read text file through tcl script and count similar words frequency. I used for loop and dictionary to count similar words and their frequency but output of the program print like this: alpha 4 beta 2 gamma 1 delta 1
But I want to print it in one column each key, value pair of dictionary or we could say each key, value pair print line by line in output. Following is my script in tcl and its output at the end.
set f [open input.txt]
set text [read $f]
foreach word [split $text] {
dict incr words $word
}
puts $words
Output of the above script:
alpha 4 beta 2 gamma 1 delta 1
You would do:
dict for {key value} $words {
puts "$key $value"
}
When reading the dict documentation, take care about which subcommands require a dictionaryVariable (like dict incr) and which require a dictionaryValue (like dict for)
For nice formatting, as suggested by Donal, here's a very terse method:
set maxWid [tcl::mathfunc::max {*}[lmap w [dict keys $words] {string length $w}]]
dict for {word count} $words {puts [format "%-*s = %s" $maxWid $word $count]}
Or, look at the source code for the parray command for further inspiration:
parray tcl_platform ;# to load the proc
info body parray

Addition or Subtraction of a number from each element of a list using Tcl Script

I have a input file name "input.dat" with the values as:
7 0
9 9
0 2
2 1
3 4
4 6
5 7
5 6
And I want to add/subtract any number from column 2 by converting it into a list using Tcl Script. I have written the Tcl Script as follows:
set input [open "input.dat" r]
set data [read $input]
set values [list]
foreach line [split $data \n] {
if {$line eq ""} {break}
lappend values [lindex [split $line " "] 1]
}
puts "$values-2"
close $input
But the output comes out to be: 0 9 2 1 4 6 7 6-2
Can anybody help me, how to fix this problem ? or what is the error in the script ? It's also helpful if anybody can help me with a correct script.
I'm still not 100% sure what you want, but the options all seem to be solvable with the lmap command, which is for applying an operation to each element of a list.
Here's how to concatenate each element with -2:
set values [lmap val $values {
string cat $val "-2"
}]
Here's how to subtract 2 from each element:
set values [lmap val $values {
expr {$val - 2}
}]
puts will treat it as a string, you'll have to use [expr $val - 2]
NOTE: If it doesn't work, it is possible your input list is a string not int or float (Depends on how the values were read). In this case you can use:
scan $val %d tmp
set newval [expr $tmp - 2]
puts $newval
This will convert your string to int before applying mathematical expressions. You can similarly convert to float by using %f in scan instead of %d

Converting Columns in a List in Tcl Script

I want to convert a column of a file in to list using Tcl Script. I have a file names "input.dat" with the data in two columns as follows:
7 0
9 9
0 2
2 1
3 4
And I want to convert the first column into a list and I wrote the Tcl Script as follows:
set input [open "input.dat" r]
set data [read $input]
set values [list]
foreach line [split $data \n] {
lappend values [lindex [split $line " "] 0]
}
puts "$values"
close $input
The result shows as: 7 9 0 2 3 {} {}
Now, my question is what is these two extra "{}" and what is the error in my script because of that it's producing two extra "{}" and How can I solve this problem?
Can anybody help me?
Those empty braces indicate empty strings. The file you used most probably had a couple empty lines at the end.
You could avoid this situation by checking a line before lappending the first column to the list of values:
foreach line [split $data \n] {
# if the line is not equal to blank, then lappend it
if {$line ne ""} {
lappend values [lindex [split $line " "] 0]
}
}
You can also remove those empty strings after getting the result list, but it would mean you'll be having two loops. Still can be useful if you cannot help it.
For example, using lsearch to get all the values that are not blank (probably simplest in this situation):
set values [lsearch -all -inline -not $values ""]
Or lmap to achieve the same (a bit more complex IMO but gives more flexibility when you have more complex situations):
set values [lmap n $values {if {$n != ""} {set n}}]
The first {} is caused by the blank line after 3 4.
The second {} is caused by a blank line which indicates end of file.
If the last blank line is removed from the file, then there will be only one {}.
If the loop is then coded in the following way, then there will be no {}.
foreach line [split $data \n] {
if { $line eq "" } { break }
lappend values [lindex [split $line " "] 0]
}
#jerry has a better solution
Unless intermittent empty strings carry some meaning important to your program's task, you may also use a transformation from a Tcl list (with empty-string elements) to a string that prunes empty-string elements (at the ends, and in-between):
concat {*}[split $data "\n"]

how to split a file to list of lists TCL

I'm coding TCL and I would like to split a file into two lists of lists,
the file contain:
(1,2) (3,4) (5,6)
(7,8) (9,10) (11,12)
and I would like to get two list
one for each line, that contain lists that each one contain to two number
for example:
puts $list1 #-> {1 2} {3 4} {5 6}
puts [lindex $list1 0] #-> 1 2
puts [lindex $list2 2] #-> 11 12
I tried to use regexp and split but no success
The idea of using regexp is good, but you'll need to do some post-processing on its output.
# This is what you'd read from a file
set inputdata "(1,2) (3,4) (5,6)\n(7,8) (9,10) (11,12)\n"
foreach line [split $inputdata "\n"] {
# Skip empty lines.
# (I often put a comment format in my data files too; this is where I'd handle it.)
if {$line eq ""} continue
# Parse the line.
set bits [regexp -all -inline {\(\s*(\d+)\s*,\s*(\d+)\s*\)} $line]
# Example results of regexp:
# (1,2) 1 2 (3,4) 3 4 (5,6) 5 6
# Post-process to build the lists you really want
set list([incr idx]) [lmap {- a b} $bits {list $a $b}]
}
Note that this is building up an array; long experience says that calling variables list1, list2, …, when you're building them in a loop is a bad idea, and that an array should be used, effectively giving variables like list(1), list(2), …, as that yields a much lower bug rate.
An alternate approach is to use a simpler regexp and then have scan parse the results. This can be more effective when the numbers aren't just digit strings.
foreach line [split $inputdata "\n"] {
if {$line eq ""} continue
set bits [regexp -all -inline {\([^()]+\)} $line]
set list([incr idx]) [lmap substr $bits {scan $substr "(%d,%d)"}]
}
If you're not using Tcl 8.6, you won't have lmap yet. In that case you'd do something like this instead:
foreach line [split $inputdata "\n"] {
if {$line eq ""} continue
set bits [regexp -all -inline {\(\s*(\d+)\s*,\s*(\d+)\s*\)} $line]
set list([incr idx]) {}
foreach {- a b} $bits {
lappend list($idx) [list $a b]
}
}
foreach line [split $inputdata "\n"] {
if {$line eq ""} continue
set bits [regexp -all -inline {\([^()]+\)} $line]
set list([incr idx]) {}
foreach substr $bits {
lappend list($idx) [scan $substr "(%d,%d)"]
# In *very* old Tcl you'd need this:
# scan $substr "(%d,%d)" a b
# lappend list($idx) [list $a $b]
}
}
You have an answer already, but it can actually be done a little bit simpler (or at least without regexp, which is usually a good thing).
Like Donal, I'll assume this to be the text read from a file:
set lines "(1,2) (3,4) (5,6)\n(7,8) (9,10) (11,12)\n"
Clean it up a bit, removing the parentheses and any white space before and after the data:
% set lines [string map {( {} ) {}} [string trim $lines]]
1,2 3,4 5,6
7,8 9,10 11,12
One way to do it with good old-fashioned Tcl, resulting in a cluster of variables named lineN, where N is an integer 1, 2, 3...:
set idx 0
foreach lin [split $lines \n] {
set res {}
foreach li [split $lin] {
lappend res [split $li ,]
}
set line[incr idx] $res
}
A doubly iterative structure like this (a number of lines, each having a number of pairs of numbers separated by a single comma) is easy to process using one foreach within the other. The variable res is used for storing result lines as they are assembled. At the innermost level, the pairs are split and list-appended to the result. For each completed line, a variable is created to store the result: its name consists of the string "line" and an increasing index.
As Donal says, it's not a good idea to use clusters of variables. It's much better to collect them into an array (same code, except for how the result variable is named):
set idx 0
foreach lin [split $lines \n] {
set res {}
foreach li [split $lin] {
lappend res [split $li ,]
}
set line([incr idx]) $res
}
If you have the results in an array, you can use the parray utility command to list them in one fell swoop:
% parray line
line(1) = {1 2} {3 4} {5 6}
line(2) = {7 8} {9 10} {11 12}
(Note that this is printed output, not a function return value.)
You can get whole lines from this result:
% set line(1)
{1 2} {3 4} {5 6}
Or you can access pairs:
% lindex $line(1) 0
1 2
% lindex $line(2) 2
11 12
If you have the lmap command (or the replacement linked to below), you can simplify the solution somewhat (you don't need the res variable):
set idx 0
foreach lin [split $lines \n] {
set line([incr idx]) [lmap li [split $lin] {
split $li ,
}]
}
Still simpler is to let the result be a nested list:
set lineList [lmap lin [split $lines \n] {
lmap li [split $lin] {
split $li ,
}
}]
You can access parts of the result similar to above:
% lindex $lineList 0
{1 2} {3 4} {5 6}
% lindex $lineList 0 0
1 2
% lindex $lineList 1 2
11 12
Documentation:
array,
foreach,
incr,
lappend,
lindex,
lmap (for Tcl 8.5),
lmap,
parray,
set,
split,
string
The code works for windows :
TCL file code is :
proc captureImage {} {
#open the image config file.
set configFile [open "C:/main/image_config.txt" r]
#To retrive the values from the config file.
while {![eof $configFile]} {
set part [split [gets $configFile] "="]
set props([string trimright [lindex $part 0]]) [string trimleft [lindex $part 1]]
}
close $configFile
set time [clock format [clock seconds] -format %Y%m%d_%H%M%S]
set date [clock format [clock seconds] -format %Y%m%d]
#create the folder with the current date
set folderPath $props(folderPath)
append folderDate $folderPath "" $date "/"
set FolderCreation [file mkdir $folderDate]
while {0} {
if { [file exists $date] == 1} {
}
break
}
#camera selection to capture image.
set camera "video"
append cctv $camera "=" $props(cctv)
#set the image resolution (XxY).
set resolutionX $props(resolutionX)
set resolutionY $props(resolutionY)
append resolution $resolutionX "x" $resolutionY
#set the name to the save image
set imagePrefix $props(imagePrefix)
set imageFormat $props(imageFormat)
append filename $folderDate "" $imagePrefix "_" $time "." $imageFormat
set logPrefix "Image_log"
append logFile $folderDate "" $logPrefix "" $date ".txt"
#ffmpeg command to capture image in background
exec ffmpeg -f dshow -benchmark -i $cctv -s $resolution $filename >& $logFile &
after 3000
}
}
captureImage
thext file code is :
cctv=Integrated Webcam
resolutionX=1920
resolutionY=1080
imagePrefix=ImageCapture
imageFormat=jpg
folderPath=c:/test/
//camera=video=Integrated Webcam,Logitech HD Webcam C525
This code works for me me accept the code from text file were list of parameters are passed.

How to get selective data from a file in TCL?

I am trying to parse selective data from a file based on certain key words using tcl,for example I have a file like this
...
...
..
...
data_start
30 abc1 xyz
90 abc2 xyz
214 abc3 xyz
data_end
...
...
...
How do I catch only the 30, 90 and 214 between "data_start" and "data_end"? What I have so far(tcl newbie),
proc get_data_value{ data_file } {
set lindex 0
set fp [open $data_file r]
set filecontent [read $fp]
while {[gets $filecontent line] >= 0} {
if { [string match "data_start" ]} {
#Capture only the first number?
#Use regex? or something else?
if { [string match "data_end" ] } {
break
} else {
##Do Nothing?
}
}
}
close $fp
}
If your file is smaller in size, then you can use read command to slurp the whole data into a variable and then apply regexp to extract the required information.
input.txt
data_start
30 abc1 xyz
90 abc2 xyz
214 abc3 xyz
data_end
data_start
130 abc1 xyz
190 abc2 xyz
1214 abc3 xyz
data_end
extractNumbers.tcl
set fp [open input.txt r]
set data [read $fp]
close $fp
set result [regexp -inline -all {data_start.*?\n(\d+).*?\n(\d+).*?\n(\d+).*?data_end} $data]
foreach {whole_match number1 number2 number3} $result {
puts "$number1, $number2, $number3"
}
Output :
30, 90, 214
130, 190, 1214
Update :
Reading a larger file's content into a single variable will cause the program to crash depends on the memory of your PC. When I tried to read a file of size 890MB with read command in a Win7 8GB RAM laptop, I got unable to realloc 531631112 bytes error message and tclsh crashed. After some bench-marking found that it is able to read a file with a size of 500,015,901 bytes. But the program will consume 500MB of memory since it has to hold the data.
Also, having a variable to hold this much data is not efficient when it comes to extracting the information via regexp. So, in such cases, it is better to go ahead with read the content line by line.
Read more about this here.
Load all the data from the file into a variable. Set start and end tokens and seek to those positions. Process the item line by line. Tcl uses lists of strings separated by white space so we can process the items in the line with foreach {a b c} $line {...}.
tcl:
set data {...
...
..
...
data_start
30 abc1 xyz
90 abc2 xyz
214 abc3 xyz
data_end
...
...
...}
set i 0
set start_str "data_start"
set start_len [string length $start_str]
set end_str "data_end"
set end_len [string length $end_str]
while {[set start [string first $start_str $data $i]] != -1} {
set start [expr $start + $start_len]
set end [string first $end_str $data $start]
set end [expr $end - 1]
set item [string range $data $start $end]
set lines [split $item "\n"]
foreach {line} $lines {
foreach {a b c} $line {
puts "a=$a, b=$b, c=$c"
}
}
set i [expr $end + $end_len]
}
output:
a=30, b=abc1, c=xyz
a=90, b=abc2, c=xyz
a=214, b=abc3, c=xyz
I'd write that as
set fid [open $data_file]
set p 0
while {[gets $fid line] != -1} {
switch -regexp -- $line {
{^data_end} {set p 0}
{^data_start} {set p 1}
default {
if {$p && [regexp {^(\d+)\M} $line -> num]} {
lappend nums $num
}
}
}
}
close $fid
puts $nums
or, even
set nums [exec sed -rn {/data_start/,/data_end/ {/^([[:digit:]]+).*/ s//\1/p}} $data_file]
puts $nums
My favorite method would be to declare procs for each of the acceptable tokens and utilize the unknown mechanism to quietly ignore the unacceptable ones.
proc 30 args {
... handle 30 $args
}
proc 90 args {
... process 90 $args
}
rename unknown original_unknown
proc unknown args {
# This space was deliberately left blank
}
source datafile.txt
rename original_unknown unknown
You'll be using Tcl's built-in parsing, which should be considerably faster. It also looks better in my opinion.
You can also put the line-handling logic into your unknown-procedure entirely:
rename unknown original_unknown
proc unknown {first args} {
process $first $args
}
source input.txt
rename original_unknown unknown
Either way, the trick is that Tcl's own parser (implemented in C) will be breaking up the input lines into tokens for you -- so you don't have to implement the parsing in Tcl yourself.
This does not always work -- if, for example, the input is using multi-line syntax (without { and }) or if the tokens are separated with something other than white space. But in your case it should do nicely.