How to manipulate expect out buffer - tcl

I have written an application in Tcl-Expect which spawns minicom and
sends and receives data via the serial port. Sometimes i see that the
application receives control characters along with human readable data.
A search on the internet tells me that the control characters are
nothing but the VT terminal codes. However i could not find a solution
to filter out the terminal codes.
I am attaching a sample of the expected buffer and the actual buffer.
Expected :-
Microsoft Windows [Version 6.2.9200](c) 2012 Microsoft Corporation. All rights reserved. C:\Windows\system32>
Actual:-
[1;1H[37m[40m [2;1H [3;1H [4;1H [5;1H [6;1H [7;1H [8;1H [9;1H [10;1H [11;1H [12;1H [13;1H [14;1H [15;1H [16;1H [17;1H [18;1H [19;1H [20;1H [21;1H [22;1H [23;1H [24;1H [1;1HMicrosoft Windows [Version 6.2.9200][2;1H(c) 2012 Microsoft Corporation. All rights reserved.[4;1HC:Windowssystem32>

You don't want to filter them out when doing the initial match; they're useful for ensuring that you get exactly what you want (a prompt/banner). But when you're extracting the sense, you most certainly do want to filter them out first. Luckily, it's fairly easy with regsub magic:
regsub -all {\u00AB\[[\d;]*[A-Za-z]} $expect_out(buffer) "" filtered
That replaces all escape sequences (ESC (\u00AB) followed by a bracket, any number of digits and semicolons, and a letter) in the Expect match buffer (pick something else if you have a better candidate) with the empty string, and then stores the result in the filtered variable.

Related

octave plot() not recognizing fmt string

According to the documentation for plot(), I should be able to pass a format argument to control the style of the graph. However, Octave seems to be misinterpreting this as an incomplete property specification, rather than a format string:
$ octave-cli
GNU Octave, version 4.4.1
Copyright (C) 2018 John W. Eaton and others.
This is free software; see the source code for copying conditions.
There is ABSOLUTELY NO WARRANTY; not even for MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. For details, type 'warranty'.
Octave was configured for "x86_64-pc-linux-gnu".
Additional information about Octave is available at https://www.octave.org.
Please contribute if you find this software useful.
For more information, visit https://www.octave.org/get-involved.html
Read https://www.octave.org/bugs.html to learn how to submit bug reports.
For information about changes from previous versions, type 'news'.
1> t = linspace(0,5,1001);
2> plot(t, sin(pi*t), "g_;sin(πt)");
error: plot: properties must appear followed by a value
error: called from
__plt__ at line 90 column 15
plot at line 223 column 10
Am I doing something wrong or is this a bug?
Converting my comment into an answer as requested
From the documentation:
The fmt format argument can also be used to control the plot style. It is a string composed of four optional parts: "<linestyle><marker><color><;displayname;>"
where, valid linestyles are:
‘-’ Use solid lines (default).
‘--’ Use dashed lines.
‘:’ Use dotted lines.
‘-.’ Use dash-dotted lines.
Based on the above, you have two typos in the format string.
The first is that you used an 'underscore' (_) instead of a 'dash' (-) as your linestyle specifier.
The second is that you didn't 'enclose' the "displayname" in semicolons as the syntax expects; you only put the left semicolon but forgot the right one.
So the correct format string would be:
plot( t, sin( pi * t ), "g- ;sin(πt);" );

In relative terms, how fast should TCL on Windows 10 be?

I have the latest TCL build from Active State installed on a desktop and laptop both running Windows 10. I'm new to TCL and a novice developer and my reason for learning TCL is to enhance my value on the F5 platform. I figured a good first step would be to stop the occasional work I do in VBScript and port that to TCL. Learning the language itself is coming along alright, but I'm worried my project isn't viable due to performance. My VBScripts absolutely destroy my TCL scripts in performance. I didn't expect that outcome as my understanding was TCL was so "fast" and that's why it was chosen by F5 for iRules etc.
So the question is, am I doing something wrong? Is the port for Windows just not quite there? Perhaps I misunderstood the way in which TCL is fast and it's not fast for file parsing applications?
My test application is a firewall log parser. Take a log with 6 million hits and find the unique src/dst/port/policy entries and count them; split up into accept and deny. Opening the file and reading the lines is fine, TCL processes 18k lines/second while VBScript does 11k. As soon as I do anything with the data, the tide turns. I need to break the four pieces of data noted above from the line read and put in array. I've "split" the line, done a for-next to read and match each part of the line, that's the slowest. I've done a regexp with subvariables that extracts all four elements in a single line, and that's much faster, but it's twice as slow as doing four regexps with a single variable and then cleaning the excess data from the match away with trims. But even this method is four times slower than VBScript with ad-hoc splits/for-next matching and trims. On my desktop, i get 7k lines/second with TCL and 25k with VBscript.
Then there's the array, I assume because my 3-dimensional array isn't a real array that searching through 3x as many lines is slowing it down. I may try to break up the array so it's looking through a third of the data currently. But the truth is, by the time the script gets to the point where there's a couple hundred entries in the array, it's dropped from processing 7k lines/second to less than 2k. My VBscript drops from about 25k lines to 22k lines. And so I don't see much hope.
I guess what I'm looking for in an answer, for those with TCL experience and general programming experience, is TCL natively slower than VB and other scripts for what I'm doing? Is it the port for Windows that's slowing it down? What kind of applications is TCL "fast" at or good at? If I need to try a different kind of project than reading and manipulating data from files I'm open to that.
edited to add code examples as requested:
while { [gets $infile line] >= 0 } {
some other commands I'm cutting out for the sake of space, they don't contribute to slowness
regexp {srcip=(.*)srcport.*dstip=(.*)dstport=(.*)dstint.*policyid=(.*)dstcount} $line -> srcip dstip dstport policyid
the above was unexpectedly slow. the fasted way to extract data I've found so far
regexp {srcip=(.*)srcport} $line srcip
set srcip [string trim $srcip "cdiloprsty="]
regexp {dstip=(.*)dstport} $line dstip
set dstip [string trim $dstip "cdiloprsty="]
regexp {dstport=(.*)dstint} $line dstport
set dstport [string trim $dstport "cdiloprsty="]
regexp {policyid=(.*)dstcount} $line a policyid
set policyid [string trim $policyid "cdiloprsty="]
Here is the array search that really bogs down after a while:
set start [array startsearch uList]
while {[array anymore uList $start]} {
incr f
#"key" returns the NAME of the association and uList(key) the VALUE associated with name
set key [array nextelement uList $start]
if {$uCheck == $uList($key)} {
##puts "$key CONDITOIN MET"
set flag true
adduList $uCheck $key $flag2
set flag2 false
break
}
}
Your question is still a bit broad in scope.
F5 has published some comment why they choose Tcl and how it is fast for their specific usecases. This is actually a bit different to a log parsing usecase, as they do all the heavy lifting in C-code (via custom commands) and use Tcl mostly as a fast dispatcher and for a bit of flow control. And Tcl is really good at that compared to various other languages.
For things like log parsing, Tcl is often beaten in performance by languages like Python and Perl in simple benchmarks. There are a variety of reasons for that, here are some of them:
Tcl uses a different regexp style (DFA), which are more robust for nasty patterns, but slower for simple patterns.
Tcl has a more abstract I/O layer than for example Python, and usually converts the input to unicode, which has some overhead if you do not disable it (via fconfigure)
Tcl has proper multithreading, instead of a global lock which costs around 10-20% performance for single threaded usecases.
So how to get your code fast(er)?
Try a more specific regular expression, those greedy .* patterns are bad for performance.
Try to use string commands instead of regexp, some string first commands followed by string range could be faster than a regexp for these simple patterns.
Use a different structure for that array, you probably want either a dict or some form of nested list.
Put your code inside a proc, do not put it all in a toplevel script and use local variables instead of globals to make the bytecode faster.
If you want, use one thread for reading lines from file and multiple threads for extracting data, like a typical producer-consumer pattern.

MySQL on remote machine accessed via chromebook terminal returns nonsense unicode which persists after I leave MySQL

I am using the terminal in a chromebook to ssh into a remote server. When I run a MySQL (5.6) select query, sometimes one of the fields will return nonsense unicode (when the field should return an email address) and change the MySQL prompt from:
mysql>
to
└≤⎽─┌>
and whatever text I type is converted into weird unicode. The problem persists even after I exit MySQL
One of the values in your database happened to have the sequence of bytes 0x1B, 0x28, 0x30 (ESC ) 0) in it. When you did the query, MySQL printed this byte sequence directly to your console. You can reproduce the effect by typing from python:
>>> print '\x1B\x28\x30'
Consoles use control characters (in particular 0x1B, ESC) as a way to allow applications to control aspects of the console other than pure text, such as colours and cursor movements. This behaviour is inherited from the old dumb-terminal devices that they are pretending to be (which is why they are also known as terminal emulators), along with some weirder tricks that we probably don't need any more. One of those is to switch permanently between different character sets (considered encodings, now, but this long predates Unicode).
One of those alternative character sets is the DEC Special Graphics Character Set which it looks like you have here. In this character set the byte 0x6D, usually used in ASCII for m, comes out as the graphical character └.
You could in principle reset your terminal to normal ASCII by printing a byte sequence 0x1B, 0x28, 0x42 (ESC ) B), but this tends to be a pain to arrange when your console is displaying rubbish.
There are potentially other ways your console can become confused; it's not, in general safe to print arbitrary binary data to the console. There even used to be nastier things you could do with the console by faking keyboard input, which made this a security problem, but today it's just an annoyance factor.
However, one wouldn't normally expect to have any control codes in an e-mail address field. I suggest the application using the database should be doing some validation on the input it receives, and dropping or blocking all control codes (other than potentially newlines where necessary).
As a quick hack to clean this field for the specific case of the ESC character, you could do something like:
UPDATE things SET email=REPLACE(email, CHAR(0x1B), '');

What does "the composition of UNIX byte streams" mean?

In the opening page of the book of "Lisp In Small Pieces", there is a paragraph goes like this:
Based on the idea of "function", an idea that has matured over
several centuries of mathematical research, applicative languages are
omnipresent in computing; they appear in various forms, such as the
composition of Un*x byte streams, the extension language for the Emacs
editor, as well as other scripting languages.
Can anyone elaborate a bit on "the composition of unix byte streams"? What does it mean? and how it is related to applicative/functional programming?
Thanks,
/bruin
My guess is that this is a reference to something like a pipe under linux.
cal | wc
the symbol | it's what invokes a pipe between 2 applications, a pipe is a feature provided by the kernel so you can use pipes where the applications are written using this kind of kernel APIs.
In this example cal is just the utility that prints a calendar, wc is an utility that counts words, rows and columns in the input that you pass to it, in this case the input is the result of piping cal to wc which makes things easier for you because it's more functional, you only care about what each applications does, you don't care, for example, about what is the name of the argument or where to allocate a temporary file to store the input/output in between.
Without the pipes you should do something like
cal > temp.txt
wc temp.txt
rm temp.xt
to obtain pretty much the same information. Also this second solution could possibly generate problems, for example what if temp.txt already exists ? Following what kind of rationale you will tell to your script to pick a name for your temporary file ? What if another process modifies your file in between the 2 calls to cal and wc ?

Excel does not display currency symbol(for example ¥) generated in my tcl code

I actually am generating an MS Excel file with the currencies and if you see the file I generated (tinyurl.com/currencytestxls), opening it in the text editor shows the correct symbol but somehow, MS Excel does not display the symbol. I am guessing there is some issue with the encoding. Any thoughts?
Here is my tcl code to generate the symbol:
set yen_val [format %c 165]
Firstly, this does produce a Yen symbol (I put format string in double quotes here just for clarity with the formatting):
format "%c" 165
You can then pass it around just fine. The problem is likely to come when you try to output it; when Tcl writes a string to the outside world (with the possible exception of the terminal on Windows, as that's tricky) it encodes that string into a definite byte sequence. The default encoding is the one reported by:
encoding system
But you can see what it is and change it for any channel (if you pass in the new name):
fconfigure $theChannel -encoding $theEncoding
For example, on my system (which uses UTF-8, which can handle any character):
% fconfigure stdout -encoding
utf-8
% puts [format %c 165]
¥
If you use an encoding that cannot represent a particular character, the replacement character for that encoding is used instead. For many encodings, that's a “?”. When you are sending data to another program (including to a web server or to a browser over the internet) it is vital that both sides agree on what the encoding of the data is. Sometimes this agreement is by convention (e.g., the system encoding), sometimes it is defined by the protocol (HTTP headers have this clearly defined), and sometimes this is done by explicitly transferred metadata (HTTP content).
If you're writing a CSV file to be ingested by Excel, use either the “unicode” or the “utf-8” encoding and make sure you put the byte-order mark in correctly. Tcl doesn't write BOMs automatically (because it's the wrong thing to do in some cases). To write a BOM, do this as the first thing when you start writing the file:
puts -nonewline $channel "\ufeff"