Remove double quotes from a 'string with comma' inside csv - csv

i'm converting xls to csv. Since i'm having commas in a single column, i'm getting csv as below:
AMP FAN,Yes,Shichi,PON Seal,,"Brass, Silver"
AMP FAN,Yes,Shichi,PON Seal,,"Platinum, Gel"
If you see double quote is coming for the last column as it has comma inside. Now i'm reading this csv in tcl file and i'm sending to my target system. In target system this value is getting saved with double quotes (means exactly like "Brass, Silver"). But the user doesn't want that double quotes. So i want to set like Brass, Silver . is there any way i can avoid that double quotes. below is the current script i'm using.
while {[gets $fileIn sLine] >= 0} {
#using regex to handle multiple commas in a single column
set matches [regexp -all -inline -- {("[^\"]+"|[^,]*)(?:$|,)} $sLine]
set lsLine {}
foreach {a b} $matches {lappend lsLine $b}
set sType [lindex $lsLine 0]
set sIsOk [lindex $lsLine 1]
set sMaterial [lindex $lsLine 5]
#later i'm setting sMaterial to some attribute
}
Kindly help me.
Note : I will not be able to use csv package as the user don't have that in their environment and i can't add there myself.

You can remove them from the token after getting each element, like this:
while {[gets $fileIn sLine] >= 0} {
#using regex to handle multiple commas in a single column
set matches [regexp -all -inline -- {("[^\"]+"|[^,]*)(?:$|,)} $sLine]
set lsLine {}
foreach {a b} $matches {
# Remove the quotes here
lappend lsLine [string map {\" {}} $b]
}
set sType [lindex $lsLine 0]
set sIsOk [lindex $lsLine 1]
set sMaterial [lindex $lsLine 5]
#later i'm setting sMaterial to some attribute
}

% set input {AMP FAN,Yes,Shichi,PON Seal,,"Brass, Silver"}
AMP FAN,Yes,Shichi,PON Seal,,"Brass, Silver"
% regsub -all \" $input {}
AMP FAN,Yes,Shichi,PON Seal,,Brass, Silver
%

Related

Inserting single curly braces to Tcl list elements

I have a report file having multiple lines in this form:
str1 num1 num2 ... numN str2
Given that (N) is not the same across lines. These numbers represent coordinates, so I need to enclose each point with curly braces to be:
{num1 num2} {num3 num4} and so on...
I have tried this piece of code:
set file_r [open file.rpt r]
set lines [split [read $file_r] "\n"]
close $file_r
foreach line $lines {
set items [split $line]
set str1 [lindex $items 0]
set str2 [lindex $items [expr [llength $items] - 1]]
set box [lrange $items 1 [expr [llength $items] - 2]]
foreach coord $box {
set index [lsearch $box $coord]
set index_rem [expr $index % 2]
if {index_rem == 0} {
set box [lreplace $box $index $index "{$coord"]
} else {
set box [lreplace $box $index $index "$coord}"]
}
}
puts "box: $box"
}
This gives me a syntax error that a close-brace is missing. And if I try "\{$coord" the back-slash character gets typed in the $box.
Any ideas to overcome this?
There are a few things you could improve to have better and simpler Tcl style.
You usually don't need to use split to form a list from a line if the line is already space separated. Space separated strings can almost always be used directly in list commands.
The exceptions are when the string contains { or " characters.
lindex and lrange can take end and end-N arguments.
This plus Donal's comment to use lmap will result in this:
set file_r [open file.rpt r]
set lines [split [read $file_r] "\n"]
close $file_r
foreach line $lines {
set str1 [lindex $line 0]
set str2 [lindex $line end]
set numbers [lrange $line 1 end-1]
set boxes [lmap {a b} $numbers {list $a $b}]
foreach box $boxes {
puts "box: {$box}"
}
}

Need to write specific columns in output file using tcl

I am trying to read a file with 5 columns( separated using space delimiter)
#text tag x y data_lay
bad bad1 10.0 10.0 L1
good goodn 13.0 11.0 L1
And trying to output the specific columns with a prefix on the first column in a new file. Output format should be like following
Add_obj bad 10.0 10.0 L1
Add_obj good 13.0 11.0 L1
I tried the following but has been unsuccessful in getting the anticipated output. Here, is the snippet of the code
set fp [open [lindex $argv 0] r]
set colData {}
while {[gets $fp line]>=0} {
if {[llength $line] ==4 } {
set colData [split $line “ “]
puts “Add_obj [lindex $colData 0] [lindex $colData 2] [lindex $colData 3] [lindex $colData 4]”
}
}
close $fp
Could you please help with a sample code?
Thanks.
There's no need to split $line by a space. As long as $line can be used as a proper list, then you can use lindex on $line.
I think you want to print only when llength is 5 (not 4).
I noticed in your sample code that there are non-ascii double quotes “ and ”. You need to have regular double quotes ".
set fp [open a.txt]
while {[gets $fp line]>=0} {
if {[llength $line] == 5 } {
# Skip header?
if {[string match "#*" $line]} {
continue
}
puts "Add_obj [lindex $line 0] [lindex $line 2] [lindex $line 3] [lindex $line 4]"
}
}
close $fp
You might want to also print a formatted string, prepared with the format command.

Setting String with comma into a single variable in CSV file using TCL

I'm having a Csv file like below:
Production,FALSE,Other Line,Release,UOF-919 BASE,3A001A11,9X999,PC,"Jap,Ind",006
And i'm reading and setting the values to a variable using tcl like below:
set fileIn [open "C:/myfile.csv" r]
while {[gets $fileIn sLine] >= 0} {
set lsLine [split $sLine ","]
set sType "Hardware"
set sName [lindex $lsLine 1]
set Sdev [lindex $lsLine 2]
set spara [lindex $lsLine 3]
set sDescription [lindex $lsLine 4]
set sManage [lindex $lsLine 5]
set sconnect [lindex $lsLine 6]
set sUOM [lindex $lsLine 7]
set sCountry [lindex $lsLine 8]
#my operations
}
flush $fileId
close $fileId
}
Here i'm not able to set "Jap,Ind" to sCountry because it already has one more comma inside the quotes. Can anybody help me to set that? I'm new in TCL.
You can use the csv package (it is a package that has been included in the default libraries for a while now):
set fileIn [open "C:/myfile.csv" r]
package require csv
while {[gets $fileIn sLine] >= 0} {
set lsLine [::csv::split $sLine] # I'd use the -alternate
# switch if you can have empty elements
set sType "Hardware"
set sName [lindex $lsLine 0]
set Sdev [lindex $lsLine 1]
set spara [lindex $lsLine 2]
set sDescription [lindex $lsLine 3]
set sManage [lindex $lsLine 4]
set sconnect [lindex $lsLine 5]
set sUOM [lindex $lsLine 6]
set sPin [lindex $lsLine 7]
set sCountry [lindex $lsLine 8]
#my operations
}
flush $fileId
close $fileId
Note that I also changed the indices. Tcl lists are 0-based, meaning that the first element of a list has the index 0. [lindex $lsLine 0] thus gives the first element from the list $lsLine.
And maybe if you want to make the code shorter, you could use lassign (as of Tcl 8.5)
set fileIn [open "C:/myfile.csv" r]
package require csv
while {[gets $fileIn sLine] >= 0} {
set lsLine [::csv::split $sLine]
set sType "Hardware"
lassign $lsLine sName Sdev spara sDescription sManage sconnect sUOM sPin sCountry
#my operations
}
flush $fileId
close $fileId
Alternate solution if csv is not available which works for most cases (Tcl 8.6):
set lsLine [lmap {a b} [regexp -all -inline -- {("[^\"]+"|[^,]*)(?:$|,)} $sLine] {set b}]
Tcl 8.5:
set matches [regexp -all -inline -- {("[^\"]+"|[^,]*)(?:$|,)} $sLine]
set lsLine {}
foreach {a b} $matches {lappend lsLine $b}
The \" can be replaced with a simple " but I usually insert it if there's an issue with the code editor's syntax highlighting.
For more complex cases where escaped characters with a backslash can be involved (Tcl 8.6):
set lsLine [lmap {a b} [regexp -all -inline -- {("(?:\\.|[^\"])+"|(?:\\.|[^,])*)(?:$|,)} $sLine] {set b}]
Tcl 8.5:
set matches [regexp -all -inline -- {("(?:\\.|[^\"])+"|(?:\\.|[^,])*)(?:$|,)} $sLine]
set lsLine {}
foreach {a b} $matches {lappend lsLine $b}
Since you are using split to extract the variable (yes, your input is based on the comma, so we obviously go to that approach), the input's value should not have a 'comma' in it. To avoid that, we can replace it for a while and revert back wherever needed.
set fileIn [open "file.csv" r]
while {[gets $fileIn sLine] >= 0} {
# Replacing the 'comma' with 'colon' and saving it into the save variable
regsub {"(.*?),(.*?)"} $sLine {\1:\2} sLine
set lsLine [split $sLine ","]
# Your other indices can be processed and saved here
# Getting the country values
set sCountry [lindex $lsLine 8]; # Yes, the country value is available in '8th' index only, not on 9th. (Index starts with '0')
# Replacing the 'colon' with 'comma' back again
regsub : $sCountry , sCountry
puts "Country Value : $sCountry"
}
close $fileIn
package require csv
set fileIn [open C:/myfile.csv r]
while {[gets $fileIn sLine] >= 0} {
set lsLine [csv::split $sLine]
lassign $lsLine - sName Sdev spara sDescription sManage sConnect sUOM sCountry
}
close $fileId
}
Never try to parse CSV data with split. It will end in tears.
Note that this assumes that you want to assign to sName from index 1, i.e. the second element. The first element is assigned to the dummy variable -.
For sparse assignment, you can either use (assuming you want #8 rather than #9)
lassign $lsLine - - Sdev - - sManage sConnect sUOM sCountry
or
foreach idx {2 5 6 7 8} name {Sdev sManage sConnect sUOM sCountry} {
set $name [lindex $lsLine $idx]
}
If you don't have csv installed, you can use teacup install csv from the command line to get it.
Documentation: close, csv package, gets, lassign, open, package, set, while

TCL String Manipulation and Extraction

I have a string xxxxxxx-s12345ab7_0_0_xx2.log and need to have an output like AB700_xx2 in TCL.
ab will be the delimiter and need to extract from ab to . (including ab) and also have to remove only the first two underscores.
Tried string trim, string trimleft and string trimright, but not much use. Is there anything like string split in TCL?
The first stage is to extract the basic relevant substring; the easiest way to do that is actually with a regular expression:
set inputString "xxxxxxx-s12345ab7_0_0_xx2.log"
if {![regexp {ab[^.]+} $inputString extracted]} {
error "didn't match!"
}
puts "got $extracted"
# ===> got ab7_0_0_xx2
Then, we want to get rid of those nasty underscores with string map:
set final [string map {"_" ""} $extracted]
puts "got $final"
# ===> ab700xx2
Hmm, not quite what we wanted! We wanted to keep the last underscore and to up-case the first part.
set pieces [split $extracted "_"]
set final [string toupper [join [lrange $pieces 0 2] ""]]_[join [lrange $pieces 3 end] "_"]
puts "got $final"
# ===> got AB700_xx2
(The split command divides a string up into “records” by an optional record specifier — which defaults to any whitespace character — that we can then manipulate easily with list operations. The join command does the reverse, but here I'm using an empty record specifier on one half which makes everything be concatenated. I think you can guess what the string toupper and lrange commands do…)
set a "xxxxxxx-s12345ab7_0_0_xx2.log"
set a [split $a ""]
set trig 0
set extract ""
for {set i 0} {$i < [llength $a]} {incr i} {
if {"ab" eq "[lindex $a $i][lindex $a [expr $i+1]]"} {
set trig 1
}
if {$trig == 1} {
append extract [lindex $a $i]
}
}
set extract "[string toupper [join [lrange [split [lindex [split $extract .] 0] _] 0 end-1] ""]]_[lindex [split [lindex [split $extract .] 0] _] end]"
puts $extract
Only regexp is enough to do the trick.
Set string "xxxxxxx-s12345ab7_0_0_xx2.log"
regexp {(ab)(.*)_(.*)_(.*)_(.*)\\.} $string -> s1 s2 s3 s4 s5
Set rstring "$s1$s2$s3$s4\_$s5"
Puts $rstring

splitting input line with varying formats in tcl with

Good afternoon,
I am attempting to write a tcl script which given the input file
input hreadyin;
input wire htrans;
input wire [7:0] haddr;
output logic [31:0] hrdata;
output hreadyout;
will produce
hreadyin(hreadyin),
htrans(htrans),
haddr(haddr[7:0]),
hrdata(hrdata[31:0]),
hready(hreadyout)
In other words, the format is:
<input/output> <wire/logic optional> <width, optional> <paramName>;
with the number of whitespaces unrestricted between each of them.
I have no problem reading from the input file and was able to put each line in a $line element. Now I have been trying things like:
set param0 [split $line "input"]
set param1 [lindex $param0 1]
But since not all lines have "input" line in them i am unable to get the elements i want (the name and the width if it exists).
Is there another command in tcl capable for doing this kind of parsing?
The regexp command is useful to find words separated by arbitrary whitespace:
while {[gets $fh line] != -1} {
# get all whitespace-separated words in the line, ignoring the semi-colon
set i [string first ";" $line]
set fields [regexp -inline -all {\S+} [string range $line 0 $i-1]]
switch -exact -- [llength $fields] {
2 - 3 {
set name [lindex $fields end]
puts [format "%s(%s)," $name $name]
}
4 {
lassign $fields - - width name
puts [format "%s(%s%s)," $name $name $width]
}
}
}
I think you should look at something like
# Compress all multiple spaces to single spaces
set compressedLine [resgub " +" $line " "]
set items [split [string range $compressedLine 0 end-1] $compressedLine " "]
switch [llength $items] {
2 {
# Handle case where neither wire/logic nor width is specificed
set inputOutput [lindex $items 0]
set paramName [lindex $items 1]
.
.
.
}
4 {
# Handle case where both wire/logic and width are specified
set inputOutput [lindex $items 0]
set wireLogic [lindex $items 1]
set width [lindex $items 2]
set paramName [lindex $items 3]
.
.
.
}
default {
# Don't know how to handle other cases - add them in if you know
puts stderr "Can't handle $line
}
}
I hope it's not legal to have exactly one of wire/logic and width specified - you'd need to work hard to determine which is which.
(Note the [string range...] fiddle to discard the semicolon at the end of the line)
Or if you can write up a regex that catches the right data, you can do this with this:
set data [open "file.txt" r]
set output [open "output.txt" w]
while {[gets $data line] != -1} {
regexp -- {(\[\d+:\d+\])?\s*(\w+);} $line - width params
puts $output "$params\($params$width\),"
}
close $data
close $output
This one will also print the comma you have inserted in your expected output, but will insert it in the last line as well so you get:
hreadyin(hreadyin),
htrans(htrans),
haddr(haddr[7:0]),
hrdata(hrdata[31:0]),
hready(hreadyout),
If you don't want it and the file is not too large (apparently the limit is 2147483672 bytes for a list, which I'm gonna use), you could use a group like this:
set data [open "file.txt" r]
set output [open "output.txt" w]
set listing "" #Empty list
while {[gets $data line] != -1} {
regexp -- {(\[\d+:\d+\])?\s*(\w+);} $line - width params
lappend listing "$params\($params$width\)" #Appending to list instead
}
puts $output [join $listing ",\n"] #Join all in a single go
close $data
close $output