Fast string replace - tcl

After building up a potentially very large string, I'm going to do a lot of changing single characters in it (or bytes, if necessary), to another char.
Actually, my script is building a crossword puzzle, so the string won't be very long, but my question is general:
How can I use the fact that I'm not altering the strings (or whatever data type is better) length, to speed things up?
I guess part of what I'm looking for is a way to send a pointer or reference to the string, or in Tcl's case the variable name.
My other question is what happens internally in the C code.
Will this call copy the entire string zero, one or even two times?
set index [expr {$row * $width + $col}]
set puzzle [string replace $puzzle $index $index "E"]

The string replace operation will do an in-place change provided two conditions are satisfied:
The string being inserted must be the same length as the string being excised. I assume this one is obvious to you.
The string must be in an unshared reference, so that nothing else can observe the value being modified. (This is a critical part of how all Tcl references work; shared references cannot be modified in-place.)
That call, as written, will copy. This is predictable based on simple examination of the reference handling for the string; the issue is that the old version of the string remains in puzzle until after the string replace completes (the set needs the result to work). To fix that, we do this slightly strange thing:
set puzzle [string replace $puzzle[set puzzle {}] $index $index "E"]
Yes, this is weird but it works well because concatenation with a known-empty string is an explicitly optimised case, assuming you're dealing with untraced variables here. (It'll work with traced variables, but the double write is observable and traces could do tricky things so you lose optimisation opportunities.)
If you were doing extensive changes that sometimes change the length of things, switching to using lists and lset would be more efficient. The equivalent operations on lists all use the same general reference and in-place semantics, but work on list elements instead of characters.
Disassembly
The optimisation I'm talking about is in the strcat opcode, and strreplace knows to do in-place when it can but you don't see the information at the bytecode level; virtually all operations know that.
% tcl::unsupported::disassemble lambda {{puzzle index} {
set puzzle [string replace $puzzle[set puzzle {}] $index $index "E"]
}}
ByteCode 0x0x7fbff6021c10, refCt 1, epoch 17, interp 0x0x7fbff481e010 (epoch 17)
Source "\n set puzzle [string replace $puzzle[set puzzle {}]..."
Cmds 3, src 74, inst 18, litObjs 2, aux 0, stkDepth 4, code/src 0.00
Proc 0x0x7fbff601cc90, refCt 1, args 2, compiled locals 2
slot 0, scalar, arg, "puzzle"
slot 1, scalar, arg, "index"
Commands 3:
1: pc 0-16, src 5-72 2: pc 0-14, src 17-71
3: pc 2-5, src 40-52
Command 1: "set puzzle [string replace $puzzle[set puzzle {}] $inde..."
Command 2: "string replace $puzzle[set puzzle {}] $index $index \"E..."
(0) loadScalar1 %v0 # var "puzzle"
Command 3: "set puzzle {}..."
(2) push1 0 # ""
(4) storeScalar1 %v0 # var "puzzle"
(6) strcat 2
(8) loadScalar1 %v1 # var "index"
(10) loadScalar1 %v1 # var "index"
(12) push1 1 # "E"
(14) strreplace
(15) storeScalar1 %v0 # var "puzzle"
(17) done

Related

Comma separated binary arguments? - elixir

I've been learning elixir this month, and was in a situation where I wanted to convert a binary object into a list of bits, for pattern matching.
My research led me here, to an article showing a method for doing so. However, I don't fully understand one of the arguments passed to the extract function.
I could just copy and paste the code, but I'd like to understand what's going on under the hood here.
The argument is this: <<b :: size(1), bits :: bitstring>>.
What I understand
I understand that << x >> denotes a binary object x. Logically to me, it looks as though this is similar to performing: [head | tail] = list on a List, to get the first element, and then the remaining ones as a new list called tail.
What I don't understand
However, I'm not familiar with the syntax, and I have never seen :: in elixir, nor have I ever seen a binary object separated by a comma: ,. I also, haven't seen size(x) used in Elixir, and have never encountered a bitstring.
The Bottom Line
If someone, could explain exactly how the syntax for this argument breaks down, or point me towards a resource I would highly appreciate it.
For your convenience, the code from that article:
defmodule Bits do
# this is the public api which allows you to pass any binary representation
def extract(str) when is_binary(str) do
extract(str, [])
end
# this function does the heavy lifting by matching the input binary to
# a single bit and sends the rest of the bits recursively back to itself
defp extract(<<b :: size(1), bits :: bitstring>>, acc) when is_bitstring(bits) do
extract(bits, [b | acc])
end
# this is the terminal condition when we don't have anything more to extract
defp extract(<<>>, acc), do: acc |> Enum.reverse
end
IO.inspect Bits.extract("!!") # => [0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1]
IO.inspect Bits.extract(<< 99 >>) #=> [0, 1, 1, 0, 0, 0, 1, 1]
Elixir pattern matching seems mind blowingly easy to use for
structured binary data.
Yep. You can thank the erlang inventors.
According to the documentation, <<x :: size(y)>> denotes a bitstring,
whos decimal value is x and is represented by a string of bits that is
y in length.
Let's dumb it down a bit: <<x :: size(y)>> is the integer x inserted into y bits. Examples:
<<1 :: size(1)>> => 1
<<1 :: size(2)>> => 01
<<1 :: size(3)>> => 001
<<2 :: size(3)>> => 010
<<2 :: size(4)>> => 0010
The number of bits in the binary type is divisible by 8, so a binary type has a whole number of bytes (1 byte = 8 bits). The number of bits in a bitstring is not divisible by 8. That's the difference between the binary type and the bitstring type.
I understand that << x >> denotes a binary object x. Logically to me,
it looks as though this is similar to performing: [head | tail] = list
on a List, to get the first element, and then the remaining ones as a
new list called tail.
Yes:
defmodule A do
def show_list([]), do: :ok
def show_list([head|tail]) do
IO.puts head
show_list(tail)
end
def show_binary(<<>>), do: :ok
def show_binary(<<char::binary-size(1), rest::binary>>) do
IO.puts char
show_binary(rest)
end
end
In iex:
iex(6)> A.show_list(["a", "b", "c"])
a
b
c
:ok
iex(7)> "abc" = <<"abc">> = <<"a", "b", "c">> = <<97, 98, 99>>
"abc"
iex(9)> A.show_binary(<<97, 98, 99>>)
a
b
c
:ok
Or you can interpret the integers in the binary as plain old integers:
def show(<<>>), do: :ok
def show(<<ascii_code::integer-size(8), rest::binary>>) do
IO.puts ascii_code
show(rest)
end
In iex:
iex(6)> A.show(<<97, 98, 99>>)
97
98
99
:ok
The utf8 type is super useful because it will grab as many bytes as required to get a whole utf8 character:
def show(<<>>), do: :ok
def show(<<char::utf8, rest::binary>>) do
IO.puts char
show(rest)
end
In iex:
iex(8)> A.show("ۑ")
8364
235
:ok
As you can see, the uft8 type returns the unicode codepoint of the character. To get the character as a string/binary:
def show(<<>>), do: :ok
def show(<<codepoint::utf8, rest::binary>>) do
IO.puts <<codepoint::utf8>>
show(rest)
end
You take the codepoint(an integer) and use it to create the binary/string <<codepoint::utf8>>.
In iex:
iex(1)> A.show("ۑ")
€
ë
:ok
You can't specify a size for the utf8 type, though, so if you want to read multiple utf8 characters, you have to specify multiple segments.
And of course, the segment rest::binary, i.e. a binary type with no size specified, is super useful. It can only appear at the end of a pattern, and rest::binary is like the greedy regex: (.*). The same goes for rest::bitstring.
Although the elixir docs don't mention it anywhere, the total number of bits in a segment, where a segment is one of those things:
| | |
v v v
<< 1::size(8), 1::size(16), 1::size(1) >>
is actually unit * size, where each type has a default unit. The default type for a segment is integer, so the type for each segment above defaults to integer. An integer has a default unit of 1 bit, so the total number of bits in the first segment is: 8 * 1 bit = 8 bits. The default unit for the binary type is 8 bits, so a segment like:
<< char::binary-size(6)>>
has a total size of 6 * 8 bits = 48 bits. Equivalently, size(6) is just the number of bytes. You can specify the unit just like you can the size, e.g. <<1::integer-size(2)-unit(3)>>. The total bit size of that segment is: 2 * 3 bits = 6 bits.
However, I'm not familiar with the syntax
Check this out:
def bitstr2bits(bitstr) do
for <<bit::integer-size(1) <- bitstr>>, do: bit
end
In iex:
iex(17)> A.bitstr2bits <<1::integer-size(2), 2::integer-size(2)>>
[0, 1, 1, 0]
Equivalently:
iex(3)> A.bitstr2bits(<<0b01::integer-size(2), 0b10::integer-size(2)>>)
[0, 1, 1, 0]
Elixir tends to abstract away recursion with library functions, so usually you don't have to come up with your own recursive definitions like at your link. However, that link shows one of the standard, basic recursion tricks: adding an accumulator to the function call to gather results that you want the function to return. That function could also be written like this:
def bitstr2bits(<<>>), do: []
def bitstr2bits(<<bit::integer-size(1), rest::bitstring>>) do
[bit | bitstr2bits(rest)]
end
The accumulator function at the link is tail recursive, which means it takes up a constant (small) amount of memory--no matter how many recursive function calls are needed to step through the bitstring. A bitstring with 10 million bits? Requiring 10 million recursive function calls? That would only require a small amount of memory. In the old days, the alternate definition I posted could potentially crash your program because it would take up more and more memory for each recursive function call, and if the bitstring were long enough the amount of memory needed would be too large, and you would get stackoverflow and your program would crash. However, erlang has optimized away the disadvantages of recursive functions that are not tail recursive.
You can read about all these here, short answer:
:: is similar as guard, like a when is_integer(a), in you case size(1) expect a 1 bit binary
, is a separator between matching binaries, like | in [x | []] or like comma in [a, b]
bitstring is a superset over binaries, you can read about it here and here, any binary can be respresented as bitstring
iex> ?h
104
iex> ?e
101
iex> ?l
108
iex> ?o
111
iex> <<104, 101, 108, 108, 111>>
"hello"
iex> [104, 101, 108, 108, 111]
'hello'
but not vice versa
iex> <<1, 2, 3>>
<<1, 2, 3>>
After some research, I realized I overlooked some important information located at: elixir-lang.
According to the documentation, <<x :: size(y)>> denotes a bitstring, whos decimal value is x and is represented by a string of bits that is y in length.
Furthermore, <<binary>> will always attempt to conglomerate values in a left-first direction, into bytes or 8-bits, however, if the number of bits is not divisible by 8, there will by x bytes, followed by a bitstring.
For example:
iex> <<3::size(5), 5::size(6)>> # <<00011, 000101>>
<<24, 5::size(3)>> # automatically shifted to: <<00011000(24) , 101>>
Now, elixir also lets us pattern match binaries, and bitstrings like so:
iex> <<3 :: size(2), b :: bitstring>> = <<61 :: size(6)>> # [11] [1101]
iex> b
<<13 :: size(4)>> # [1101]
So, i completly misunderstood binaries and biststrings, and pattern matching between the two.
Not really the answer to the question stated, but I’d put it here for the sake of formatting. In elixir we usually use Kernel.SpecialForms.for/1 comprehension for bitstring generation.
for << b :: size(1) <- "!!" >>, do: b
#⇒ [0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1]
for << b :: size(1) <- <<99>> >>, do: b
#⇒ [0, 1, 1, 0, 0, 0, 1, 1]
I wanted to use the bits, in an 8 bit binary to toggle conditions. So
[b1, b2, ...] = extract(<<binary>>)
I then wanted to say:
if b1, do: x....
if b2, do: y...
Is there a better way to do what I'm trying to do, instead of pattern
matching?
First of all, the only terms that evaluate to false in elixir are false and nil (just like in ruby):
iex(18)> x = 1
1
iex(19)> y = 0
0
iex(20)> if x, do: IO.puts "I'm true."
I'm true.
:ok
iex(21)> if y, do: IO.puts "I'm true."
I'm true.
:ok
Although, the fix is easy:
if b1 == 1, do: ...
Extracting the bits into a list is unnecessary because you can just iterate the bitstring:
def check_bits(<<>>), do: :ok
def check_bits(<<bit::integer-size(1), rest::bitstring>>) do
if bit == 1, do: IO.puts "bit is on"
check_bits(rest)
end
In other words, you can treat a bitstring similarly to a list.
Or, instead of performing the logic in the body of the function to determine whether the bit is 1, you can use pattern matching in the head of the function:
def check_bits(<<>>), do: :ok
def check_bits(<< 1::integer-size(1), rest::bitstring >>) do
IO.puts "The bit is 1."
check_bits(rest)
end
def check_bits(<< 0::integer-size(1), rest::bitstring >>) do
IO.puts "The bit is 0."
check_bits(rest)
end
Instead of using a variable, bit, for the match like here:
bit::integer-size(1)
...you use a literal value, 1:
1::integer-size(1)
The only thing that can match a literal value is the literal value itself. As a result, the pattern:
<< 1::integer-size(1), rest::bitstring >>
will only match a bitstring where the first bit, integer-size(1), is 1. The literal matching employed there is similar to doing the following with a list:
def list_contains_4([4|_tail]) do
IO.puts "found a 4"
true #end the recursion and return true
end
def list_contains_4([head|tail]) do
IO.puts "#{head} is not a 4"
list_contains_4(tail)
end
def list_contains_4([]), do: false
The first function clause tries to match the literal 4 at the head of the list. If the head of the list is not 4, there's no match; so elixir moves on to the next function clause, and in the next function clause the variable head will match anything in the list.
Using pattern matching in the head of a function rather than performing logic in the body of a function is considered more stylish and efficient in erlang.

Shouldn't TclInvalidateStringRep() reset length?

I have a doubt on the following code in TCL 8.6.8 source tclInt.h:
4277 #define TclInvalidateStringRep(objPtr) \
4278 if (objPtr->bytes != NULL) { \
4279 if (objPtr->bytes != tclEmptyStringRep) { \
4280 ckfree((char *) objPtr->bytes); \
4281 } \
4282 objPtr->bytes = NULL; \
4283 }
This macro is called by Tcl_InvalidateStringRep() in tclObj.c.
My doubt is, why doesn't tclObj's length get reset to zero?
Here is from definition of Tcl_Obj:
808 typedef struct Tcl_Obj {
809 int refCount; /* When 0 the object will be freed. */
810 char *bytes; /* This points to the first byte of the
811 * object's string representation. The array
812 * must be followed by a null byte (i.e., at
813 * offset length) but may also contain
814 * embedded null characters. The array's
815 * storage is allocated by ckalloc. NULL means
816 * the string rep is invalid and must be
817 * regenerated from the internal rep. Clients
818 * should use Tcl_GetStringFromObj or
819 * Tcl_GetString to get a pointer to the byte
820 * array as a readonly value. */
821 int length; /* The number of bytes at *bytes, not
822 * including the terminating null. */
So you can see length is tightly coupled with bytes, when bytes is cleared, shouldn't we reset length?
My doubt comes from the following code, TclCreateLiteral() in tclLiteral.c:
200 for (globalPtr=globalTablePtr->buckets[globalHash] ; globalPtr!=NULL;
201 globalPtr = globalPtr->nextPtr) {
202 objPtr = globalPtr->objPtr;
203 if ((globalPtr->nsPtr == nsPtr)
204 && (objPtr->length == length) && ((length == 0)
205 || ((objPtr->bytes[0] == bytes[0])
206 && (memcmp(objPtr->bytes, bytes, (unsigned) length) == 0)))) {
So at line 204, when length is not zero while bytes is NULL, the program crashes.
My product includes TCL source and I find the above problem when I trace a program crash. I put the workaround in our code, but like to confirm with the community if it indeed is a vulnerability.
Your aproach seems to be wrong somewhere.
The call of TclInvalidateStringRep is basically allowed for objects with no references (refCount == 0) or with exactly one reference (so refCount <= 1) and then only if you are sure, that this 1 reference is your own reference only.
Tcl's shared objects could switch its internal representation, but the string representation remains immutable. Otherwise you will break the basic principles of Tcl (like EIAS, etc).
Simplest example that can explain this:
set k 0x7f
dict set d $k test
expr {$k}; # ==> 127 (obj is integer now, but...)
puts $k; # ==> 0x7f (... still remains the string-representation)
puts [dict get $d $k]; # ==> test
# some code that fouls it up (despite of two references var `k` and key in dict `d`):
magic_happens_here $k; # string representation gets lost.
# and hereafter:
puts $k; # ==> 127 (representation is now 127, so...)
puts [dict get $d $k]; # ==> ERROR: key "127" not known in dictionary
As you can see, reset resp. altering of the string representation of shared object is wrong by design.
Please avoid this in Tcl.
I've had a think about this, and while I believe that the code that is purging the representation is wrong to do so (since the object should in principle be shared and so shouldn't be observed to change) I certainly think that it is extremely difficult to actually prove that that can't happen. For sure, TclCreateLiteral in tclLiteral.c shouldn't blow up if it happens!
The fix I'm using is to make TclCreateLiteral use TclGetStringFromObj (the Tcl-internal macro-ized version of Tcl_GetStringFromObj) to get the bytes and length fields instead of using them directly, so that the correct constraints are preserved. This should make the string representation exist once more if it is removed. If the code continues to crash, the problem is your code that is calling TclInvalidateStringRep on a literal (and setting a type that can't have a string generated for it; Tcl has some of those, but that's because it never purges the original string from them).
Remember, a Tcl_Obj should only have its string rep purged when it becomes wrong, not just when it gains a non-string representation. The fact a value has been interpreted as an integer doesn't mean that it shouldn't be interpretable as a list (quite the reverse!) and if the internal representation is never updated to a different value (in-place modifications should only ever happen to unshared objects) it should never need to lose that string representation at all.

How to create a TCL variable of type bytearray

I am using TCL 8.4.20.
So I have the following code:
set a [binary format H2 1]
set b [binary format H2 2]
set c [binary format H2 3]
set bytes $a
append bytes $a
append bytes $b
append bytes $c
puts $bytes
I set a breakpoint at Tcl_PutsObjCmd() function in TCL's C source code and I see its argument, $bytes, is of type string while I expect it to be bytearray.
Question 1:Why is that? From the first assignment to the final appending, "bytes" accepts nothing but binary data.
The reason I do this experiment is, we have a TCL extension command in C, it expects the command argument is of byte array type - it has a check the value's typePtr should be tclByteArrayType. My TCL code currently fails on this command because the data passed to the command is of type string, just as demo'ed above.
I googled around, seems the "right" way to make a byte array object is to have every byte ready first and finally use one "binary format" command to put all into one. But it is a fairly big change to my current TCL code.
Question 2: Given that I already have a TCL variable whose data are all binaries (created using "binary format" for each byte and put together using "append") while its type is string, How can I change its internal type to "bytearray" through some TCL maneuvering?
Technically, the internal type is not a guaranteed property. Everything is a string. The code may shimmer a type away whenever it feels like. And code that depends on the internal type is usually very brittle or outright broken.
So your C code should call Tcl_GetByteArrayFromObj() instead of peeking at the arguments internals. That does the proper conversion if the object has not yet a byteArray representation.
About your questions:
Why doesn't append of two byte arrays keep the byte array type?
It does, at least for 8.6, if you do it right and never trigger the creation of a string rep.
Running this in tkcon, the append turns the value into a string:
() 98 % set a [binary format H2 1]

() 99 % set b [binary format H2 1]

() 100 % ::tcl::unsupported::representation $a
value is a bytearray with a refcount of 2, object pointer at 0000000005665420, internal representation 000000000587B280:0000000005665240, string representation ""
() 101 % ::tcl::unsupported::representation $b
value is a bytearray with a refcount of 2, object pointer at 000000000564EEB0, internal representation 000000000587B4A0:00000000056590E0, string representation ""
() 102 % set x $a

() 103 % ::tcl::unsupported::representation $x
value is a bytearray with a refcount of 4, object pointer at 0000000005665420, internal representation 000000000587B280:0000000005665240, string representation ""
() 104 % append x $b

() 105 % ::tcl::unsupported::representation $x
value is a string with a refcount of 3, object pointer at 0000000005663F50, internal representation 0000000005896BA0:000000000564F030, string representation ""
this happens, because the bytearray has a string rep (due to Tkcon echoing the value) created. The append optimization only works for 'pure' bytearrays, e.g. bytearrays that do not have a string rep. This is similar to some optimizations for 'pure' lists.
So it works like this, preventing the shimmering result echo:
() 106 % set b [binary format H2 1]; puts "pure"
pure
() 107 % set a [binary format H2 1]; puts "pure"
pure
() 108 % set x $a; puts "pure"
pure
() 109 % ::tcl::unsupported::representation $a
value is a bytearray with a refcount of 3, object pointer at 0000000005658780, internal representation 000000000587B320:0000000005658CF0, no string representation
() 110 % ::tcl::unsupported::representation $b
value is a bytearray with a refcount of 2, object pointer at 000000000564ED60, internal representation 000000000587B500:0000000005658750, no string representation
() 111 % ::tcl::unsupported::representation $x
value is a bytearray with a refcount of 3, object pointer at 0000000005658780, internal representation 000000000587B320:0000000005658CF0, no string representation
() 112 % append x $b; puts "pure"
pure
() 113 % ::tcl::unsupported::representation $x
value is a bytearray with a refcount of 2, object pointer at 0000000005658690, internal representation 00000000058A5C60:0000000005658960, no string representation
Note the no string representation part.
How to turn a string into a bytearray
Just do a binary format:
set x [binary format a* $x]

How to force expr to address a value as a string and not a number?

When TCL gets a string that starts with a 0 as its return value, it'll treat it as an octal number and will return the decimal value of the octal number. Is there a way to circumvent it and force expr to address the value as a string?
I encounter this problem because I have a line:
set val [expr {( $obj == "" ) ? "" : [$obj data]}]
And one the results of the [$obj data] operation is a binary string starting with 0, and the expr turns it into another number. Is there a way to fix this without turning the expr into an if?
The expr command is defined to convert its result to a number if it is legal to do so. It's been this way since… well, since at least Tcl 7.0 and probably since the first version of Tcl to have an expr command (which takes it a hugely long way back). This means that if you return a valid octal number (which 09 isn't), expr will convert it.
If this behaviour isn't desired, don't use expr for conditionals; use if. In your case, this works quite nicely (and I think it's clearer this time with the then and else pseudo-keywords).
set val [if {$obj == ""} then {} else {$obj data}]
(At the bytecode level, this generates almost the identical bytecode to what your original does, except it omits a call to the tryCvtToNumeric operation; that's the one you say you don't want!)
[warning: not an answer but a coment requiring formatting]
I'm not observing the same results
% proc obj {args} {return "09"} ;#specifically using invalid octal value
% set obj obj
obj
% $obj data
09
% set val [expr {($obj == "") ? "" : [$obj data]}]
09
% info patchlevel
8.6.1
What Tcl version are you using?

binary scan format 8-bit character code (c) in TCL

I am confused about the binary scan of types "c", as it is said that "c" refers to 8 bit character code. I have the following code to be maintained
puts ""
set b_val2 "0000021002020a042845245d868a8d9900081b000315aef0010c105d39b4f7c9a083a65e7d000306140508063024"
set msg2 [binary format H* $b_val2]
set msg2 [string range $msg2 1 end]
while {[binary scan $msg2 cc id2 len2] == 2} {
puts ""
#puts "SECOND ID is $id2 and SECOND LENGTH is $len2"
set id2 [expr {$id2 & 0xff}]
set len2 [expr {$len2 & 0xff}]
set val2 [string range $msg2 2 [expr {1+$len2}]]
switch -exact -- $id2 {
0 {
puts ""
if {$val2 == "\x10\x04"} {
puts "val is found for 04 "
} elseif {$val2 == "\x10\x02"} {
puts "ID is found for 02! CORRECT "
} else {
puts "not supported"
}
}
}
}
the idea is to take the value of "10 02" from the given hex. and this code work just fine until I Change the given Input of b_val as
00021002020903e845245ca0d29858081c00031ef7c001106d3931e7d3414e6d3df26831030614051608045c22
for the first given hex code "len" is "3" and it parses the binary correctly, but for the second hex Input, the "len2" is 16, hence parsing the wrong bytes.
I read that the binary scan cc will give back two variable of type 8 bit character code, but the above failure does not make any sense to me at all, as what i understand that what is the previous author tries to aim with the above code (expecially the set val2 where it tries to take the range) and why it fails for the second input
For starters: your code snippet never modifies msg2 within the while loop, so the scan returns the same result every loop, and you have an infinite loop. I tossed a break in to only loop once, but that leaves me uncertain I have the right behavior.
That said, the obvious issue is that when you go from your original message to the replacement, you've dropped the first byte (value of 00). Starting with line 2 (ignoring blank lines), where you set
set b_val2 "0000021002020a..."
let's parse by hand. Line 3 converts it to hex, and line 4 drops the first byte so that we start with a string of bytes with hex values of \x00 \x02 \x10 \x02 \x02 \x0a .... The binary scan on line 5 sets id2 to the first byte and len2 to the second byte; line 10 sets val2 to a string with values \x10 \x02, which matches your criteria. Success.
Now reparse with input of
set b_val2 "00021002020903e84..."
from your second input line. Again, the first byte is DISCARDED on line 4, leaving you with \x02 \x10 \x02 \x02 \x09 \x03.... Line 5 sets id2 to 2 and len2 to \x10, or decimal 16, which is what you see. That means val2 is very different from what you expected, but that's due to you dropping a byte from your input.
Byte parsers are EXTREMELY sensitive to initial position in the string. Once you mess that up, you'd better have a robust resynchronization mechanism or it's all over bar the shouting. This is one major reason that wire protocols are difficult. :)