How to treat tcl string as a hex number and convert it into binary? - tcl

I have a tcl string set in a variable. I want to treat it as a hex to convert into binary of it. Can anybody help me to achieve this.
Here is what i am doing :
$ /usr/bin/tclsh8.5
% set a a1a2a3a4a5a6
a1a2a3a4a5a6
% set b [ string range $a 0 3 ]
a1a2
Now i want that a1a2 value of variable "b" should be treated as 0xa1a2, so that i can convert it into binary. Please help me to solve this.

If you are using Tcl 8.6, then binary decode hex is the best choice:
binary decode hex $b
If you are using an older version of Tcl, then you have to use the binary format with the H format specifier:
binary format H* $b
You can write the resulting byte array to a file or send it through a socket etc, but if you want to display it as text, I suggest converting it to a string first:
encoding convertfrom utf-8 [binary format H* $b]

Related

Converting character to hex via Tcl

I am trying to convert some characters into hex using tcl.
And i would usually do something like this: [binary format a* 'o'] that return 111, which is the int representation of 'o' that can then be converted.
However the way that i retrive the character, [value string_split] returns "o" instead of 'o' cuasing the the function to throw an error, esentially like doing: [binary format a* "o"] which returns "ERROR: Nothing is named "o""
So, what is the difference between "o" and 'o' in a tcl context and how can i get my [binary format a* [value string_split]] call to return 111 like [binary format a* 'o'] would do.
It should be noted that i am using TheFoundry's Nuke to do this and I don't know exactly what version of TCL they are using, but it is a rather old one.
You can use scan with a format of %c to get the Unicode codepoint value of a character, and then format to print it as hex:
#!/usr/bin/env tclsh
set o_str o
scan $o_str %c o_value
puts $o_value ;# 111
puts [format 0x%x $o_value] ;# 0x6f

convert the binary number into hexadecimal no using shell or perl

i have a file which contain some binary number in each line .
I have to convert each binary number the line into hexadecimal
file is like
10101010101010101111
11010101010111110011
11011111110000000000
10010101111110010010
convert this numbers into hexadecimal no.
if there is any gvim command it is also useful
thank you

How to insert ASCII control characters into a TCL string?

I need to create a TCL script that contains ASCII control characters. This is the full list of these characters from the ASCII table but I am only interested in putting in the "start of text" value 2 and "end of text" value 3.
You can enter a hex code in a string by writing \xnn where nn is the code, e.g.
set start_of_text "\x02"
set end_of_text "\x03"
See the documentation at https://www.tcl-lang.org/man/tcl8.6/TclCmd/Tcl.htm#M27
You can also use format with the %c code (which might be more useful if you don't know the relevant number until run-time because it's in a variable or whatever):
set ascii(STX) [format %c 2]
set ascii(ETX) [format %c 3]
If I'm going to be wrapping text in a control sequence (often for things like applying a colouring) then I'll make a procedure to do the job:
proc wrapped {string} {
# These use Unicode escapes
return "\u0002$string\u0003"
}
puts [wrapped "this is some test text"]

Decode or unescape \u00f0\u009f\u0091\u008d to 👍

We all know UTF-8 is hard. I exported my messages from Facebook and the resulting JSON file escaped all non-ascii characters to unicode code points.
I am looking for an easy way to unescape these unicode code points to regular old UTF-8. I also would love to use PowerShell.
I tried
$str = "\u00f0\u009f\u0091\u008d"
[Regex]::Replace($str, "\\[Uu]([0-9A-Fa-f]{4})", `
{[char]::ToString([Convert]::ToInt32($args[0].Groups[1].Value, 16))} )
but that only gives me ð as a result, not 👍.
I also tried using Notepad++ and I found this SO post: How to convert escaped Unicode (e.g. \u0432\u0441\u0435) to UTF-8 chars (все) in Notepad++. The accepted answer also results in exactly the same as the example above: ð.
I found the decoding solution here: the UTF8.js library that decodes the text perfectly and you can try it out here (with \u00f0\u009f\u0091\u008d as input).
Is there a way in PowerShell to decode \u00f0\u009f\u0091\u008d to receive 👍? I'd love to have real UTF-8 in my exported Facebook messages so I can actually read them.
Bonus points for helping me understand what \u00f0\u009f\u0091\u008d actually represents (besides it being some UTF-8 hex representation). Why is it the same as U+1F44D or \uD83D\uDC4D in C++?
The Unicode code point of the 👍character is U+1F44D.
Using the variable-length UTF-8 encoding, the following 4 bytes (expressed as hex. numbers) are needed to represent this code point: F0 9F 91 8D.
While these bytes are recognizable in your string,
$str = "\u00f0\u009f\u0091\u008d"
they shouldn't be represented as \u escape codes, because they're not Unicode code units / code point, they're bytes.
With a 4-hex-digit escape sequence (UTF-16), the proper representation would require 2 16-bit Unicode code units, a so-called surrogate pair, which together represent the single non-BMP code point U+1F44D:
$str = "\uD83D\uDC4D"
If your JSON input used such proper Unicode escapes, PowerShell would process the string correctly; e.g.:
'{ "str": "\uD83D\uDC4D" }' | ConvertFrom-Json > out.txt
If you examine file out.txt, you'll see something like:
str
---
👍
(The output was sent to a file, because console windows wouldn't render the 👍char. correctly, at least not without additional configuration; note that if you used PowerShell Core on Linux or macOS, however, terminal output would work.)
Therefore, the best solution would be to correct the problem at the source and use proper Unicode escapes (or even use the characters themselves, as long as the source supports any of the standard Unicode encodings).
If you really must parse the broken representation, try the following workaround (PSv4+), building on your own [regex]::Replace() technique:
$str = "A \u00f0\u009f\u0091\u008d for Mot\u00c3\u00b6rhead."
[regex]::replace($str, '(?:\\u[0-9a-f]{4})+', { param($m)
$utf8Bytes = (-split ($m.Value -replace '\\u([0-9a-f]{4})', '0x$1 ')).ForEach([byte])
[text.encoding]::utf8.GetString($utf8Bytes)
})
This should yield A 👍 for Motörhead.
The above translates sequences of \u... escapes into the byte values they represent and interprets the resulting byte array as UTF-8 text.
To save the decoded string to a UTF-8 file, use ... | Set-Content -Encoding utf8 out.txt
Alternatively, in PSv5+, as Dennis himself suggests, you can make Out-File and therefore it's virtual alias, >, default to UTF-8 via PowerShell's global parameter-defaults hashtable:
$PSDefaultParameterValues['Out-File:Encoding'] = 'utf8'
Note, however, that on Windows PowerShell (as opposed to PowerShell Core) you'll get an UTF-8 file with a BOM in both cases - avoiding that requires direct use of the .NET framework: see Using PowerShell to write a file in UTF-8 without the BOM
iso-8859-1 - very often - intermediate member in operations with Utf-8
$text=[regex]::Unescape("A \u00f0\u009f\u0091\u008d for Mot\u00c3\u00b6rhead.")
Write-Host "[regex]::Unescape(utf-8) = $text"
$encTo=[System.Text.Encoding]::GetEncoding('iso-8859-1') # Change it to yours (iso-8859-2) i suppose
$bytes = $encTo.GetBytes($Text)
$text=[System.Text.Encoding]::UTF8.GetString($bytes)
Write-Host "utf8_DecodedFrom_8859_1 = $text"
[regex]::Unescape(utf-8) = A ð for Motörhead.
utf8_DecodedFrom_8859_1 = A 👍 for Motörhead.
What pleases in mklement0 example - it is easy to get an encoded string of this type.
What is bad - the line will be huge. (First 2 nibbles '00' is a waste)
I must admit, the mklement0 example is charming.
The code for encoding - one line only!!!:
$emoji='A 👍 for Motörhead.'
[Reflection.Assembly]::LoadWithPartialName("System.Web") | Out-Null
$str=(([System.Web.HttpUtility]::UrlEncode($emoji)) -replace '%','\u00') -replace '\+',' '
$str
You can decode this by the standard url way:
$str="A \u00f0\u009f\u0091\u008d for Mot\u00c3\u00b6rhead."
$str=$str -replace '\\u00','%'
[Reflection.Assembly]::LoadWithPartialName("System.Web") | Out-Null
[System.Web.HttpUtility]::UrlDecode($str)
A 👍 for Motörhead.

Python 2.7 writing strings elements (character) to a binary file

I am using Python 2.7 to access an API that returns JSON with a single key="ringtone_file" and an associated value that is an mp3 file encoded for transport via HTTP. I created a bogus mp3 file consisting of 256 bytes in order from 0x00 through 0xff and the returned file appears below.
{"ringtone_file":"\u0000\u0001\u0002\u0003\u0004\u0005\u0006\u0007\b\t\n\u000b\f\r\u000e\u000f\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017\u0018\u0019\u001a\u001b\u001c\u001d\u001e\u001f !\"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ"}
I accessed the API using the following code minus exception handing code
import requests
response = requests.get(url)
dict = response.json()
print dict
This yields the following output
{u'ringtone_file': u'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'}
What I desire to do is write each character or hex value of this string to a file in binary format. I desire the result to be a file of size 256 bytes where the first byte in the file has value 0 and the last byte has value 255. I can't change the API. Can someone suggest a reasonable way of accomplishing this with Python 2.7.
I attempted to do what was obvious to me which was to open a file for writing in binary mode and then writing the unicode string to the file. The error message from the codec indicates I can't write values between and including 128 and 255.
Since the string value is Unicode, you have to encode the string to write it to a file. The latin1 codec directly maps to the first 256 Unicode characters, so use .encode('latin1') on the string.
Example:
>>> s=u'\x00\x01\x02\xfd\xfe\xff'
>>> s
u'\x00\x01\x02\xfd\xfe\xff' # Unicode string
>>> s.encode('latin1')
'\x00\x01\x02\xfd\xfe\xff' # Now a byte string.