How to create a TCL variable of type bytearray - tcl

I am using TCL 8.4.20.
So I have the following code:
set a [binary format H2 1]
set b [binary format H2 2]
set c [binary format H2 3]
set bytes $a
append bytes $a
append bytes $b
append bytes $c
puts $bytes
I set a breakpoint at Tcl_PutsObjCmd() function in TCL's C source code and I see its argument, $bytes, is of type string while I expect it to be bytearray.
Question 1:Why is that? From the first assignment to the final appending, "bytes" accepts nothing but binary data.
The reason I do this experiment is, we have a TCL extension command in C, it expects the command argument is of byte array type - it has a check the value's typePtr should be tclByteArrayType. My TCL code currently fails on this command because the data passed to the command is of type string, just as demo'ed above.
I googled around, seems the "right" way to make a byte array object is to have every byte ready first and finally use one "binary format" command to put all into one. But it is a fairly big change to my current TCL code.
Question 2: Given that I already have a TCL variable whose data are all binaries (created using "binary format" for each byte and put together using "append") while its type is string, How can I change its internal type to "bytearray" through some TCL maneuvering?

Technically, the internal type is not a guaranteed property. Everything is a string. The code may shimmer a type away whenever it feels like. And code that depends on the internal type is usually very brittle or outright broken.
So your C code should call Tcl_GetByteArrayFromObj() instead of peeking at the arguments internals. That does the proper conversion if the object has not yet a byteArray representation.
About your questions:
Why doesn't append of two byte arrays keep the byte array type?
It does, at least for 8.6, if you do it right and never trigger the creation of a string rep.
Running this in tkcon, the append turns the value into a string:
() 98 % set a [binary format H2 1]

() 99 % set b [binary format H2 1]

() 100 % ::tcl::unsupported::representation $a
value is a bytearray with a refcount of 2, object pointer at 0000000005665420, internal representation 000000000587B280:0000000005665240, string representation ""
() 101 % ::tcl::unsupported::representation $b
value is a bytearray with a refcount of 2, object pointer at 000000000564EEB0, internal representation 000000000587B4A0:00000000056590E0, string representation ""
() 102 % set x $a

() 103 % ::tcl::unsupported::representation $x
value is a bytearray with a refcount of 4, object pointer at 0000000005665420, internal representation 000000000587B280:0000000005665240, string representation ""
() 104 % append x $b

() 105 % ::tcl::unsupported::representation $x
value is a string with a refcount of 3, object pointer at 0000000005663F50, internal representation 0000000005896BA0:000000000564F030, string representation ""
this happens, because the bytearray has a string rep (due to Tkcon echoing the value) created. The append optimization only works for 'pure' bytearrays, e.g. bytearrays that do not have a string rep. This is similar to some optimizations for 'pure' lists.
So it works like this, preventing the shimmering result echo:
() 106 % set b [binary format H2 1]; puts "pure"
pure
() 107 % set a [binary format H2 1]; puts "pure"
pure
() 108 % set x $a; puts "pure"
pure
() 109 % ::tcl::unsupported::representation $a
value is a bytearray with a refcount of 3, object pointer at 0000000005658780, internal representation 000000000587B320:0000000005658CF0, no string representation
() 110 % ::tcl::unsupported::representation $b
value is a bytearray with a refcount of 2, object pointer at 000000000564ED60, internal representation 000000000587B500:0000000005658750, no string representation
() 111 % ::tcl::unsupported::representation $x
value is a bytearray with a refcount of 3, object pointer at 0000000005658780, internal representation 000000000587B320:0000000005658CF0, no string representation
() 112 % append x $b; puts "pure"
pure
() 113 % ::tcl::unsupported::representation $x
value is a bytearray with a refcount of 2, object pointer at 0000000005658690, internal representation 00000000058A5C60:0000000005658960, no string representation
Note the no string representation part.
How to turn a string into a bytearray
Just do a binary format:
set x [binary format a* $x]

Related

How can I convert a bitstring to the binary form in Julia

I am using bitstring to perform an xor operation on the ith bit of a string:
string = bitstring(string ⊻ 1 <<i)
However the result will be a string, so I cannot continue with other i.
So I want to know how do I convert a bitstring (of the form “000000000000000000000001001”) to (0b1001)?
Thanks
You can use parse to create an integer from the string, and then use string (alt. bitstring)to go the other way. Examples:
julia> str = "000000000000000000000001001";
julia> x = parse(UInt, str; base=2) # parse as UInt from input in base 2
0x0000000000000009
julia> x == 0b1001
true
julia> string(x; base=2) # stringify in base 2
"1001"
julia> bitstring(x) # stringify as bits (64 bits since UInt64 is 64 bits)
"0000000000000000000000000000000000000000000000000000000000001001"
don't use bitstring. You can either do the math with a BitVector or just a UInt. No reason to bring a String into it.

Shouldn't TclInvalidateStringRep() reset length?

I have a doubt on the following code in TCL 8.6.8 source tclInt.h:
4277 #define TclInvalidateStringRep(objPtr) \
4278 if (objPtr->bytes != NULL) { \
4279 if (objPtr->bytes != tclEmptyStringRep) { \
4280 ckfree((char *) objPtr->bytes); \
4281 } \
4282 objPtr->bytes = NULL; \
4283 }
This macro is called by Tcl_InvalidateStringRep() in tclObj.c.
My doubt is, why doesn't tclObj's length get reset to zero?
Here is from definition of Tcl_Obj:
808 typedef struct Tcl_Obj {
809 int refCount; /* When 0 the object will be freed. */
810 char *bytes; /* This points to the first byte of the
811 * object's string representation. The array
812 * must be followed by a null byte (i.e., at
813 * offset length) but may also contain
814 * embedded null characters. The array's
815 * storage is allocated by ckalloc. NULL means
816 * the string rep is invalid and must be
817 * regenerated from the internal rep. Clients
818 * should use Tcl_GetStringFromObj or
819 * Tcl_GetString to get a pointer to the byte
820 * array as a readonly value. */
821 int length; /* The number of bytes at *bytes, not
822 * including the terminating null. */
So you can see length is tightly coupled with bytes, when bytes is cleared, shouldn't we reset length?
My doubt comes from the following code, TclCreateLiteral() in tclLiteral.c:
200 for (globalPtr=globalTablePtr->buckets[globalHash] ; globalPtr!=NULL;
201 globalPtr = globalPtr->nextPtr) {
202 objPtr = globalPtr->objPtr;
203 if ((globalPtr->nsPtr == nsPtr)
204 && (objPtr->length == length) && ((length == 0)
205 || ((objPtr->bytes[0] == bytes[0])
206 && (memcmp(objPtr->bytes, bytes, (unsigned) length) == 0)))) {
So at line 204, when length is not zero while bytes is NULL, the program crashes.
My product includes TCL source and I find the above problem when I trace a program crash. I put the workaround in our code, but like to confirm with the community if it indeed is a vulnerability.
Your aproach seems to be wrong somewhere.
The call of TclInvalidateStringRep is basically allowed for objects with no references (refCount == 0) or with exactly one reference (so refCount <= 1) and then only if you are sure, that this 1 reference is your own reference only.
Tcl's shared objects could switch its internal representation, but the string representation remains immutable. Otherwise you will break the basic principles of Tcl (like EIAS, etc).
Simplest example that can explain this:
set k 0x7f
dict set d $k test
expr {$k}; # ==> 127 (obj is integer now, but...)
puts $k; # ==> 0x7f (... still remains the string-representation)
puts [dict get $d $k]; # ==> test
# some code that fouls it up (despite of two references var `k` and key in dict `d`):
magic_happens_here $k; # string representation gets lost.
# and hereafter:
puts $k; # ==> 127 (representation is now 127, so...)
puts [dict get $d $k]; # ==> ERROR: key "127" not known in dictionary
As you can see, reset resp. altering of the string representation of shared object is wrong by design.
Please avoid this in Tcl.
I've had a think about this, and while I believe that the code that is purging the representation is wrong to do so (since the object should in principle be shared and so shouldn't be observed to change) I certainly think that it is extremely difficult to actually prove that that can't happen. For sure, TclCreateLiteral in tclLiteral.c shouldn't blow up if it happens!
The fix I'm using is to make TclCreateLiteral use TclGetStringFromObj (the Tcl-internal macro-ized version of Tcl_GetStringFromObj) to get the bytes and length fields instead of using them directly, so that the correct constraints are preserved. This should make the string representation exist once more if it is removed. If the code continues to crash, the problem is your code that is calling TclInvalidateStringRep on a literal (and setting a type that can't have a string generated for it; Tcl has some of those, but that's because it never purges the original string from them).
Remember, a Tcl_Obj should only have its string rep purged when it becomes wrong, not just when it gains a non-string representation. The fact a value has been interpreted as an integer doesn't mean that it shouldn't be interpretable as a list (quite the reverse!) and if the internal representation is never updated to a different value (in-place modifications should only ever happen to unshared objects) it should never need to lose that string representation at all.

How to lossless convert a double to string and back in Octave

When saving a double to a string there is some loss of precision. Even if you use a very large number of digits the conversion may not be reversible, i.e. if you convert a double x to a string sx and then convert back you will get a number x' which may not be bitwise equal to x. This may cause some problem for instance when checking for differences in a battery of tests. One possibility is to use binary form (for instance the native Binary form, or HDF5) but I want to store the number in a text file, so I need a conversion to a string. I have a working solution but I ask if there is some standard for this or a better solution.
In C/C++ you could cast the double to some integer type like char* and then convert each byte to an hexa of length 2 with printf("%02x",c[j]). Then for instance PI would be converted to a string of length 16: 54442d18400921fb. The problem with this is that if you read the hexa you don get any idea of which number it is. So I would be interested in some mix for instance pi -> 3.14{54442d18400921fb}. The first part is a (probably low precision) decimal representation of the number (typically I would use a "%g" output conversion) and the string in braces is the lossless hexadecimal representation.
EDIT: I pass the code as an aswer
Following the ideas already suggested in the post I wrote the
following functions, that seem to work.
function s = dbl2str(d);
z = typecast(d,"uint32");
s = sprintf("%.3g{%08x%08x}\n",d,z);
endfunction
function d = str2dbl(s);
k1 = index(s,"{");
k2 = index(s,"}");
## Check that there is a balanced {} or not at all
assert((k1==0) == (k2==0));
if k1>0; assert(k2>k1); endif
if (k1==0);
## If there is not {hexa} part convert with loss
d = str2double(s);
else
## Convert lossless
ss = substr(s,k1+1,k2-k1-1);
z = uint32(sscanf(ss,"%8x",2));
d = typecast(z,"double");
endif
endfunction
Then I have
>> spi=dbl2str(pi)
spi = 3.14{54442d18400921fb}
>> pi2 = str2dbl(spi)
pi2 = 3.1416
>> pi2-pi
ans = 0
>> snan = dbl2str(NaN)
snan = NaN{000000007ff80000}
>> nan1 = str2dbl(snan)
nan1 = NaN
A further improvement would be to use other type of enconding, for
instance Base64 (as suggested by #CrisLuengo in a comment) that would
reduce the length of the binary part from 16 to 11 bytes.

Tcl API to get type of Tcl_obj*

Whether any Tcl Api exist which will tell the type of Tcl_Obj
whether it is string ,int or list
I want to check whether it is list so that i can put
{} over it
You're not supposed to do that. Tcl regards types as things that it is allowed to freely change behind the scenes, and makes no guarantee at all that what it is doing is what you expect.
That said, in Tcl 8.6 you can use tcl::unsupported::representation to get a description of a value that includes the current type.
% tcl::unsupported::representation [expr 1]
value is a int with a refcount of 1, object pointer at 0x100870c10, internal representation 0x1:0x0, no string representation
% tcl::unsupported::representation [expr 123456789123456789123456789]
value is a bignum with a refcount of 1, object pointer at 0x100871a20, internal representation 0x100882b90:0x20004, no string representation
% tcl::unsupported::representation [expr 1.5]
value is a double with a refcount of 1, object pointer at 0x1008713f0, internal representation 0x3ff8000000000000:0x100871420, no string representation
% tcl::unsupported::representation abc
value is a pure string with a refcount of 3, object pointer at 0x100874b10, string representation "abc"
% tcl::unsupported::representation [list a b c]
value is a list with a refcount of 3, object pointer at 0x100870e80, internal representation 0x100902f50:0x0, string representation "a b c"
% tcl::unsupported::representation [dict create a b c d]
value is a dict with a refcount of 3, object pointer at 0x1008717e0, internal representation 0x1008d4f10:0x0, string representation "a b c d"
% tcl::unsupported::representation [set s abc;string length $s;set s]
value is a string with a refcount of 5, object pointer at 0x100874b10, internal representation 0x1008a2ed0:0x1008710f0, string representation "abc"
% tcl::unsupported::representation {}
value is a bytecode with a refcount of 19, object pointer at 0x100870b50, internal representation 0x1008d2510:0x0, string representation ""
(That last one surprised me.)
% tcl::unsupported::representation [set c stdin;gets $c;set c]
value is a channel with a refcount of 5, object pointer at 0x100871810, internal representation 0x1008b9a10:0x100829a10, string representation "stdin"
At the C level, the types are in the nullable typePtr field of the Tcl_Obj structure. (The “pure string” example above has a null typePtr.) The typePtr points to a static Tcl_ObjType structure, which in turn has a name field that should have a human readable type name in it. The types themselves are not normally exposed to third-party code, though they might be possible to look up using Tcl_GetObjType(). Not all types are registered, by policy.
You should not make your code behave differently according to what type you receive. That is not the Tcl Way Of Doing Things. We really do mean this.

How to force expr to address a value as a string and not a number?

When TCL gets a string that starts with a 0 as its return value, it'll treat it as an octal number and will return the decimal value of the octal number. Is there a way to circumvent it and force expr to address the value as a string?
I encounter this problem because I have a line:
set val [expr {( $obj == "" ) ? "" : [$obj data]}]
And one the results of the [$obj data] operation is a binary string starting with 0, and the expr turns it into another number. Is there a way to fix this without turning the expr into an if?
The expr command is defined to convert its result to a number if it is legal to do so. It's been this way since… well, since at least Tcl 7.0 and probably since the first version of Tcl to have an expr command (which takes it a hugely long way back). This means that if you return a valid octal number (which 09 isn't), expr will convert it.
If this behaviour isn't desired, don't use expr for conditionals; use if. In your case, this works quite nicely (and I think it's clearer this time with the then and else pseudo-keywords).
set val [if {$obj == ""} then {} else {$obj data}]
(At the bytecode level, this generates almost the identical bytecode to what your original does, except it omits a call to the tryCvtToNumeric operation; that's the one you say you don't want!)
[warning: not an answer but a coment requiring formatting]
I'm not observing the same results
% proc obj {args} {return "09"} ;#specifically using invalid octal value
% set obj obj
obj
% $obj data
09
% set val [expr {($obj == "") ? "" : [$obj data]}]
09
% info patchlevel
8.6.1
What Tcl version are you using?