When trying to print a tab character (ASCII hex 0x09) from tcl it gets converted to spaces (ASCII hex 0x20). How do I avoid this behavior and get the tab character itself on the terminal window?
% puts "aa\t\tbb"
aa bb
Space between aa and bb wrongly filled with 0x20 characters and not two 0x09 characters.
% puts "aa\u0009\u0009bb\n"
aa bb
Same result. Strings though do seem to contain 0x09:
% set a "aa\t\tbb"
aa bb
% string length $a
6
so this is only a puts behavior.
Background:
I want the tab character to be present so I can copy-paste a list of numbers from the terminal into a spreadsheet, where the numbers end up in different spreadsheet cells.
Tcl itself will be writing real tabs — the puts command doesn't translate tabs to anything else in any mode that I'm aware of — but it is up to the terminal to determine what to do with them. It seems that your terminal is replacing the tabs with spaces; it's happening after the character has left Tcl's control, and so there's not much it can do.
You can verify that what I say is true by redirecting the output of your code to a file and looking directly in that with a tool like a text editor or od -c (if you've got a set of POSIX tools). The tab will be there. Tcl does not change behaviour significantly between writing to a file and writing to a terminal; that would be a horrible bug if it was true.
(It's technically possible that there is an output filter on the stdout channel that is doing the transform, but that's pretty unlikely! What's more, it'd be even more unlikely for it to be there without you knowing it.)
Related
When I try to read a csv file in Octave I realize that the very first value from it is converted to zero. I tried both csvread and dlmread and I'm receiving no errors. I am able to open the file in a plain text editor and I can see the correct value there. From what I can tell, there are no funny hidden characters, spacings, or similar in the csv file. Files also contain only numbers. The only thing that I feel might be important is that I have five columns/groups that each have different number of values in them.
I went through the commands' documentation on Octave Forge and I do not know what may be causing this. Does anyone have an idea what I can troubleshoot?
To try to illustrate the issue, if I try to load a file with the contents:
1.1,2.1,3.1,4.1,5.1
,2.2,3.2,4.2,5.2
,2.3,3.3,4.3,
,,3.4,4.4
,,3.5,
Command window will return:
0.0,2.1,3.1,4.1,5.1
,2.2,3.2,4.2,5.2
,2.3,3.3,4.3,
,,3.4,4.4
,,3.5,
( with additional trailing zeros after the decimal point).
Command syntaxes I'm using are:
dt = csvread("FileName.csv")
and
dt = dlmread("FileName.csv",",")
and they both return the same.
Your csv file contains a Byte Order Mark right before the first number. You can confirm this if you open the file in a hex editor, you will see the sequence EF BB BF before the numbers start.
This causes the first entry to be interpreted as a 'string', and since strings are parsed based on whether there are numbers in 'front' of the string sequence, this is parsed as the number zero. (see also this answer for more details on how csv entries are parsed).
In my text editor, if I start at the top left of the file, and press the right arrow key once, you can tell that the cursor hasn't moved (meaning I've just gone over the invisible byte order mark, which takes no visible space). Pressing backspace at this point to delete the byte order mark allows the csv to be read properly. Alternatively, you may have to fix your file in a hex editor, or find some other way to convert it to a proper Ascii file (or UTF without the byte order mark).
Also, it may be worth checking how this file was produced; if you have any control in that process, perhaps you can find why this mark was placed in the first place and prevent it. E.g., if this was exported from Excel, you can choose plain 'csv' format instead of 'utf-8 csv'.
UPDATE
In fact, this issue seems to have already been submitted as a bug and fixed in the development branch of octave. See #58813 :)
I am trying to port my first app I ever wrote from old Borland Pascal to FreePascal and run it in Linux unicode shell.
Unfortunately, the app uses CRT unit and writes non-standard ASCII graphical characters. So I tried to rewrite statements like these:
gotoxy(2,3); write(#204);
writeln('3. Intro');
to these:
gotoxy(2,3); write('╠');
write('3. Intro', #10);
Two notes:
I use unicode characters directly in code because I did not find out how to write unicode characters via their code.
I used write procedure instead of writeln to make sure that unix line endings will be produced.
But after replacing all non-standard ASCII characters and getting rid of all writeln statements, it became even worse.
Before changes:
After changes:
Why it ends up like this? What I can do better?
After some time here is an update what I found out.
1) I cannot port it
As user #dmsc rightly pointed out, CRT does not support UTF-8. His suggested hack that did not work for me.
2) When you can't port it, emulate environment.
The graphical characters I needed were part of CP-437. There is a program called luit that is made for converting application output from the locale's encoding into UTF-8. Unfortunately this does not work for me. It simple erased the characters:
# Via iconv, everything is OK:
$ printf "top right corner in CP437: \xbf \n" | iconv -f CP437 -t UTF-8
top right corner in CP437: ┐
# But not via luit, that simply omit the character:
$ luit -gr g2 -g2 'CP 437' printf "top right corner in CP437: \xbf \n"
top right corner in CP437:
So my solution is to run gnome-terminal, add and set Hebrew (IBM862) encoding (tutorial here) and enjoy your app!
The CRT unit does not currently works with UTF-8, as it assumes that each character on the screen is exactly one byte, see http://www.freepascal.org/docs-html-3.0.0/rtl/crt/index.html
But, simple applications can be made to work by "tricking" GotoXY to always do a full cursor positioning, by doing:
GotoXY(1,1);
GotoXY(x, y);
To replace all the strings in your source file, you can use recode, in a terminal type:
recode cp437..u8 < original.pas > fixed.pas
Then, you need to replace all the numeric characters (like your #204 example) with the equivalent UTF-8, you can use:
echo -e '\xCC' | recode cp437/..u8
The 'CC' is hexadecimal for 204, and as a result the character '╠' will be printed.
I am navigating a Java-based CLI menu on a remote machine with expect inside a bash script and I am trying to extract something from the output without leaving the expect session.
Expect command in my script is:
expect -c "
spawn ssh user#host
expect \"#\"
send \"java cli menu command here\r\"
expect \"java cli prompt\"
send \"java menu command\"
"
###I want to extract a specific string from the above output###
Expect output is:
Id Name
-------------------
abcd 12 John Smith
I want to extract abcd 12 from the above output into another expect variable for further use within the expect script. So that's the 3rd line, first field by using a double-space delimiter. The awk equivalent would be: awk -F ' ' 'NR==3 {$1}'
The big issue is that the environment through which I am navigating with Expect is, as I stated above, a Java CLI based menu so I can't just use awk or anything else that would be available from a bash shell.
Getting out from the Java menu, processing the output and then getting in again is not an option as the login process lasts for 15 seconds so I need to remain inside and extract what I need from the output using expect internal commands only.
You can use regexp in expect itself directly with the use of -re flag. Thanks to Donal on pointing out the single quote and double quote issues. I have given solution using both ways.
I have created a file with the content as follows,
Id Name
-------------------
abcd 12 John Smith
This is nothing but your java program's console output. I have tested this in my system with this. i.e. I just simulated your program's output with cat. You just replace the cat code with your program commands. Simple. :)
Double Quotes :
#!/bin/bash
expect -c "
spawn ssh user#domain
expect \"password\"
send \"mypassword\r\"
expect {\\\$} { puts matched_literal_dollar_sign}
send \"cat input_file\r\"; # Replace this code with your java program commands
expect -re {-\r\n(.*?)\s\s}
set output \$expect_out(1,string)
#puts \$expect_out(1,string)
puts \"Result : \$output\"
"
Single Quotes :
#!/bin/bash
expect -c '
spawn ssh user#domain
expect "password"
send "mypasswordhere\r"
expect "\\\$" { puts matched_literal_dollar_sign}
send "cat input_file\r"; # Replace this code with your java program commands
expect -re {-\r\n(.*?)\s\s}
set output $expect_out(1,string)
#puts $expect_out(1,string)
puts "Result : $output"
'
As you can see, I have used {-\r\n(.*?)\s\s}. Here the braces prevent any variable substitutions. In your output, we have a 2nd line with full of hyphens. Then a newline. Then your 3rd line content. Let's decode the regex used.
-\r\n is to match one literal hyphen and a new line together. This will match the last hyphen in the 2nd line and the newline which in turn make it to 3rd line now. So, .*? will match the required output (i.e. abcd 12) till it encounters double space which is matched by \s\s.
You might be wondering why I need parenthesis which is used to get the sub-match patterns.
In general, expect will save the expect's whole match string in expect_out(0,string) and buffer all the matched/unmatched input to expect_out(buffer). Each sub match will be saved in subsequent numbering of string such as expect_out(1,string), expect_out(2,string) and so on.
As Donal pointed out, it is better to use single quote's approach since it looks less messy. :)
It is not required to escape the \r with the backslash in case of double quotes.
Update :
I have changed the regexp from -\r\n(\w+\s+\w+)\s\s to -\r\n(.*?)\s\s.
With this way - your requirement - such as match any number of letters and single spaces until you encounter first occurrence of double spaces in the output
Now, let's come to your question. You have mentioned that you have tried -\r\n(\w+)\s\s. But, there is a problem here with \w+. Remember \w+ will not match space character. Your output has some spaces in it till double spaces.
The use of regexp will matter based on your requirements on the input string which is going to get matched. You can customize the regular expressions based on your needs.
Update version 2 :
What is the significance of .*?. If you ask separately, I am going to repeat what you commented. In regular expressions, * is a greedy operator and ? is our life saver. Let us consider the string as
Stackoverflow is already overflowing with number of users.
Now, see the effect of the regular expression .*flow as below.
* matches any number of characters. More precisely, it matches the longest string possible while still allowing the pattern itself to match. So, due to this, .* in the pattern matched the characters Stackoverflow is already over and flow in pattern matched the text flow in the string.
Now, in order to prevent the .* to match only up to the first occurrence of the string flow, we are adding the ? to it. It will help the pattern to behave as non-greedy manner.
Now, again coming back to your question. If we have used .*\s\s, then it will match the whole line since it is trying to match as much as possible. This is common behavior of regular expressions.
Update version 3:
Have your code in the following way.
x=$(expect -c "
spawn ssh user#host
expect \"password\"
send \"password\r\"
expect {\\\$} { puts matched_literal_dollar_sign}
send \"cat input\r\"
expect -re {-\r\n(.*?)\s\s}
if {![info exists expect_out(1,string)]} {
puts \"Match did not happen :(\"
exit 1
}
set output \$expect_out(1,string)
#puts \$expect_out(1,string)
puts \"Result : \$output\"
")
y=$?
# $x now contains the output from the 'expect' command, and $y contains the
# exit status
echo $x
echo $y;
If the flow happened properly, then exit code will have value as 0. Else, it will have 1. With this way, you can check the return value in bash script.
Have a look at here to know about the info exists command.
I actually am generating an MS Excel file with the currencies and if you see the file I generated (tinyurl.com/currencytestxls), opening it in the text editor shows the correct symbol but somehow, MS Excel does not display the symbol. I am guessing there is some issue with the encoding. Any thoughts?
Here is my tcl code to generate the symbol:
set yen_val [format %c 165]
Firstly, this does produce a Yen symbol (I put format string in double quotes here just for clarity with the formatting):
format "%c" 165
You can then pass it around just fine. The problem is likely to come when you try to output it; when Tcl writes a string to the outside world (with the possible exception of the terminal on Windows, as that's tricky) it encodes that string into a definite byte sequence. The default encoding is the one reported by:
encoding system
But you can see what it is and change it for any channel (if you pass in the new name):
fconfigure $theChannel -encoding $theEncoding
For example, on my system (which uses UTF-8, which can handle any character):
% fconfigure stdout -encoding
utf-8
% puts [format %c 165]
¥
If you use an encoding that cannot represent a particular character, the replacement character for that encoding is used instead. For many encodings, that's a “?”. When you are sending data to another program (including to a web server or to a browser over the internet) it is vital that both sides agree on what the encoding of the data is. Sometimes this agreement is by convention (e.g., the system encoding), sometimes it is defined by the protocol (HTTP headers have this clearly defined), and sometimes this is done by explicitly transferred metadata (HTTP content).
If you're writing a CSV file to be ingested by Excel, use either the “unicode” or the “utf-8” encoding and make sure you put the byte-order mark in correctly. Tcl doesn't write BOMs automatically (because it's the wrong thing to do in some cases). To write a BOM, do this as the first thing when you start writing the file:
puts -nonewline $channel "\ufeff"
I have a set of unit tests in C. Their form is: test_<filename>.c and when compiled they are test_<filename>.
I am trying to have new *.C file show up when a hg status is displayed, but any binary files (test_<filename>) to be suppressed.
What I have now is:
src/project/test/.+/test_.+[^\.][^c]$
this works fine except for one case: where the <filename> ends with a c (i.e., test_func, from test_func.c)
Then test_func is displayed with a status of '? test_func'
I am a moderate regex guy, but have searched for a couple of weeks, but haven't found a solution - which I assume to be easy, once I see it.
This is a bit hairy, but it seems to work, using a negative lookbehind (that's the (?<!a)b part):
src/project/test/.+/test.+([^c]|(?<!\.)c)$
To expand the part I changed — that is, ([^c]|(?<!\.)c):
(
[^c] // the last character can be anything other than c
| // or, if it is c
(?<!\.)c // it cannot be preceded by .
)
The extra \ in the negative lookbehind ("c not preceded by .") is needed to escape the ., which otherwise means "any character".