lsort doesn't properly sort file names properly - tcl

I capture a list of pictures in a repository folder like this:
foreach image [lsort [glob -nocomplain -directory $image_path -type f *]] {
puts $image
}
All the images come back sorted because of the lsort, but a few images simply doesn't get sorted, and I haven't been able to figure out why.
The order returned from the folder is:
Repository/Unsorted/3.jpg
Repository/Unsorted/30.jpg
Repository/Unsorted/33.jpg
Repository/Unsorted/6.jpg
Repository/Unsorted/9.jpg
Expected:
Repository/Unsorted/3.jpg
Repository/Unsorted/6.jpg
Repository/Unsorted/9.jpg
Repository/Unsorted/30.jpg
Repository/Unsorted/33.jpg
Update:
When I use the -dictionary switch it returns the correct order. Can someone elaborate on why?
foreach image [lsort -dictionary [glob -nocomplain -directory $image_path -type f *]] {
puts $image
}

The lsort command has several ways in which it can decide the ordering of elements.
-ascii (the default mode for various reasons, slightly misnamed) just uses the numeric ordering of each pair characters in the two strings, in the order of the characters in the string. It's the sort of thing you'd expect with the C strcmp()… if that was Unicode-aware.
-dictionary is the same… except that digit sequences in the string (so just the characters 0 through 9; not - so no negative numbers) are compared as numbers. This gives an ordering that feels a lot more like what you get in a dictionary; it was specifically made to produce a pleasing order of filenames in Tk's file selection dialogs.
and, for completeness, -integer and -real parse the strings as integers and floating-point numbers respectively and then sort those, and -command lets you provide your own ordering command (slower, but totally gives you control).
The only reason that Tcl knows in what order to sort values is because you tell it.
Thus, when comparing Repository/Unsorted/3.jpg and Repository/Unsorted/30.jpg, in -ascii mode the . (Unicode U+00002E) comes before 0 (Unicode U+000030), whereas in -dictionary mode the 3 and the 30 digit sequences are parsed as integers and compared that way (because the non-numeric part before that was identical).

Related

apparent inconsistency read/write variable

I'm learning about Tcl just now. I've seen just a bit of it, I see for instance to create a variable (and initialize it) you can do
set varname value
I am familiarizing with the fact that basically everything is a string, such as "value" above, but "varname" gets kind of a special treatment I guess because of the "set" built-in function, so varname is not interpreted as a string but rather as a name.
I can later on access the value with $varname, and this is fine to me, it is used to specify varname is not to be considered as a string.
I'm now reading about lists and a couple commands make me a bit confused
set colors {"aqua" "maroon" "cyan"}
puts "list length is [llength $colors]"
lappend colors "purple"
So clearly "lappend" is another one of such functions like set that can interpret the first argument as a name and not a string, but then why didn't they make it llength the same (no need for $)?
I'm thinking that it's just a convention that, in general, when you "read" a variable you need the $ while you don't for "writing".
A different look at the question: what Tcl commands are appropriate for list literals?
It's valid to count the elements of a list literal:
llength {my dog has fleas}
But it doesn't make sense to append a new element to a literal
lappend {my dog has fleas} and ticks
(That is actually valid Tcl, but it sets the odd variable ${my dog has fleas})
this is more sensible:
set mydog {my dog has fleas}
lappend mydog and ticks
Names are strings. Or rather a string is a name because it is used as a name. And $ in Tcl means “read this variable right now”, unlike in some other languages where it really means “here is a variable name”.
The $blah syntax for reading from a variable is convenient syntax that approximately stands in for doing [set blah] (with just one argument). For simple names, they become the same bytecode, but the $… form doesn't handle all the weird edge cases (usually with generated names) that the other one does. If a command (such as set, lappend, unset or incr) takes a variable name, it's because it is going to write to that variable and it will typically be documented to take a varName (variable name, of course) or something like that. Things that just read the value (e.g., llength or lindex) will take the value directly and not the name of a variable, and it is up to the caller to provide the value using whatever they want, perhaps $blah or [call something].
In particular, if you have:
proc ListRangeBy {from to {by 1}} {
set result {}
for {set x $from} {$x <= $to} {incr x $by} {
lappend result $x
}
return $result
}
then you can do:
llength [ListRangeBy 3 77 8]
and
set listVar [ListRangeBy 3 77 8]
llength $listVar
and get exactly the same value out of the llength. The llength doesn't need to know anything special about what is going on.

Set a variable to a filename using glob

I have a directory that contains another directory named ABC_<version number>
I'd like to set my path to whatever ABC_<version number> happens to be (in a modulefile)
How do I use glob in TCL to get the name of the directory I want and put it into a TCL variable?
Thanks!
The glob command expands wildcards, but produces a Tcl list of everything that matches, so you need to be a bit careful. What's more, the order of the list is “random” — it depends on the raw order of entries in the OS's directory structure, which isn't easily predicted in general — so you really need to decide what you want. Also, if you only want a single item of the list, you must use lindex (or lassign in a degenerate operation mode) to pick it out: otherwise your code will blow up when it encounters a user who puts special characters (space, or one of a small list of other ones) in a pathname. It pays to be safe from the beginning.
For example, if you want to only match a single element and error out otherwise, you should do this:
set thePaths [glob -directory $theDir -type d ABC_*]
if {[llength $thePaths] != 1} {
error "ambiguous match for ABC_* in $theDir"
}
set theDir [lindex $thePaths 0]
If instead you want to sort by the version number and pick the (presumably) newes, you can use lsort -dictionary. That's pretty magical internally (seriously; read the docs if you want to see what it really does), but does the right thing with all sane version number schemes.
set thePaths [glob -directory $theDir -type d ABC_*]
set theSortedPaths [lsort -dictionary -decreasing $thePaths]
set theDir [lindex $theSortedPaths 0]
You could theoretically make a custom sort by the actual date on the directories, but that's more complex and can sometimes surprise when you're doing system maintenance.
Notice the use of -type d in glob. That's a type filter, which is great in this case where you're explicitly only wanting to get directory names back. The other main useful option there (in general) is -type f to get only real files.
Turns out the answer was:
set abc_path [glob -directory $env(RELDIR) ABC_*]
No need for quotes around the path. The -directory controls where you look.
Later in the modulefile
append-path PATH $abc_path

command to get filelist ordered by age (mtime) in Tcl

I'm trying to simply process a list of files in a directory using Tcl, but want to process them in age order (oldest mtime to newest). I expected some sort of argument in glob or lsort to sort by file mtime, but I don't see such option.
I am trying to avoid creating a custom function to do this
Is there an option which I am missing that will do this built-in?
None that I know of, but you could of course exec your system's file listing command with the appropriate options.
Taking the mtime is a fairly expensive operation, so applications that use it typically take shortcuts to avoid querying for it. Making it portable also adds overhead.
Anyway, it's easy to implement it:
set files [glob x*]
set fileAndMTime [lmap name $files {list $name [file mtime $name]}]
lmap item [lsort -integer -index 1 $fileAndMTime] {lindex $item 0}
The last line gives you a list of filenames, sorted in order of least mtime to greatest mtime (use -decreasing to reverse order, and note that the sort is stable).
Documentation:
file,
glob,
lindex,
lmap (for Tcl 8.5),
lmap,
lsort

Split camelcase value with TCL

I have this TCL expression:
[string toupper [join [lrange [file split [value [topnode].file]] 1 1]]]
This retrieves companyName value from c:/companyName... and I need to split that value before the first capital letter into Company Name. Any ideas?
Thanks in advance.
That's rather more in one word than I would consider a good idea. It makes the whole thing quite opaque! Let's split it up.
Firstly, I would expect the base company name to be better retrieved with lindex from the split filename.
set companyName [lindex [file split [value [topnode].file]] 1]
Now, we need to process that to get the human-readable version out of it. Alas, that's going be a bit difficult without knowing what's been done to it, but if we use as our example fooBarBoo_grill then we can see what we can do. First, we get the pieces with some regular expressions (this part might need tweaking if there are non-ASCII characters involved, or if certain critical characters need special treatment):
# set companyName "fooBarBoo_grill"
set pieces [regexp -all -inline {[a-z]+|[A-Z][a-z]*} $companyName]
# pieces = foo Bar Boo grill
Next, we need to capitalise. I'll assume you're using Tcl 8.6 and so have lmap as it is perfect for this task. The string totitle command has been around for a very long time.
set pieces [lmap word $pieces {string totitle $word}]
# pieces = Foo Bar Boo Grill
That list might need a bit more tweaking, or it might be OK as it is. An example of tweaking that might be necessary is if you've got an Irish name like O'Hanrahan, or if you need to insert a comma before and period after Inc.
Finally, we properly ought to set companyName [join $pieces] to get back a true string, but that doesn't have a noticeable effect with a list of words made purely out of letters. Also, more complex joins with regular expressions might be needed if you've done insertion of prefixing punctuation (the , Inc. case).
If I was doing this for real, I'd try to have the proper company name expressed directly elsewhere rather than relying on the filename. Much simpler to get right!
To begin with, try using
lindex [file split [value [topnode].file]] 1
The lrange command will return a list, which might cause problems with some directory names. The join command should be pointless if you don't use lrange, and string toupper removes the information you need to do the operation you want to do.
To split before uppercase letters, you can use repetitive matches of either (?:[a-z]+|[A-Z][a-z]+) (ASCII / English alphabet letters only) or (?:[[:lower:]]+|[[:upper:]][[:lower:]]+) (any Unicode letters).
% regexp -all -inline {(?:[a-z]+|[A-Z][a-z]+)} camelCaseWord
camel Case Word
Use string totitle to change the first letter of the first word to upper case.
Documentation:
file,
lindex,
regexp,
string,
Syntax of Tcl regular expressions

How to grep parameters inside square brackets?

Could you please help me with the following script?
It is a Tcl script which Synopsys IC Compiler II will source.
set_dont_use [get_lib_cells */*CKGT*0P*] -power
set_dont_use [get_lib_cells */*CKTT*0P*] -setup
May I know how to take only */*CKGT*0P* and */*CKTT*0P* and assign these to a variable.
Of course you can treat a Tcl script as something you search through; it's just a file with text in it after all.
Let's write a script to select the text out. It'll be a Tcl script, of course. For readability, I'm going to put the regular expression itself in a global variable; treat it like a constant. (In larger scripts, I find it helps a lot to give names to REs like this, as those names can be used to remind me of the purpose of the regular expression. I'll call it “RE” here.)
set f [open theScript.tcl]
# Even with 10 million lines, modern computers will chew through it rapidly
set lines [split [read $f] "\n"]
close $f
# This RE will match the sample lines you've told us about; it might need tuning
# for other inputs (and knowing what's best is part of the art of RE writing)
set RE {^set_dont_use \[get_lib_cells ([\w*/]+)\] -\w+$}
foreach line $lines {
if {[regexp $RE $line -> term]} {
# At this point, the part you want is assigned to $term
puts "FOUND: $term"
}
}
The key things in the RE above? It's in braces to reduce backslash-itis. Literal square brackets are backslashed. The bit in parentheses is the bit we're capturing into the term variable. [\w*/]+ matches a sequence of one or more characters from a set consisting of “standard word characters” plus * and /.
The use of regexp has -> as a funny name for a variable that is ignored. I could have called it dummy instead; it's going to have the whole matched string in it when the RE matches, but we already have that in $term as we're using a fully-anchored RE. But I like using -> as a mnemonic for “assign the submatches into these”. Also, the formal result of regexp is the number of times the RE matched; without the -all option, that's effectively a boolean that is true exactly when there was a match, which is useful. Very useful.
To assign the output of any command <command> to a variable with a name <name>, use set <name> [<command>]:
> set hello_length [string length hello]
5
> puts "The length of 'hello' is $hello_length."
The length of 'hello' is 5.
In your case, maybe this is what you want? (I still don't quite understand the question, though.)
set ckgt_cells [get_lib_cells */*CKGT*0P*]
set cktt_cells [get_lib_cells */*CKTT*0P*]