Maximum size of xattr in OS X - json

I would like to use xattr to store some meta-data on my files directly on the files. These are essentially tags that I use for the categorization of files when I do searches on them. My goal is to extend the usual Mac OS X tags by associating more info to each tag, for instance the date of addition of that tag and maybe other thing.
I was thinking to add an xattr to the files, using xattr -w. My first guess would be to store something like a JSON in this xattr value, but I was wondering
1) what are the limits of size I can store in an xattr? (man of xattr is vauge and refers to something called _PC_XATTR_SIZE_BITS which I cannot locate anywhere)
2) anything wrong with storing a JSON formatted string as an xattr?

According to man pathconf, there is a “configurable system limit or option variable” called _PC_XATTR_SIZE_BITS which is
the number of bits used to store maximum extended attribute size in bytes. For
example, if the maximum attribute size supported by a file system is 128K, the value
returned will be 18. However a value 18 can mean that the maximum attribute size can be
anywhere from (256KB - 1) to 128KB. As a special case, the resource fork can have much
larger size, and some file system specific extended attributes can have smaller and preset
size; for example, Finder Info is always 32 bytes.
You can determine the value of this parameter using this small command line tool written in Swift 4:
import Foundation
let args = CommandLine.arguments.dropFirst()
guard let pathArg = args.first else {
print ("File path argument missing!")
exit (EXIT_FAILURE)
}
let v = pathconf(pathArg, _PC_XATTR_SIZE_BITS)
print ("_PC_XATTR_SIZE_BITS: \(v)")
exit (EXIT_SUCCESS)
I get:
31 bits for HFS+ on OS X 10.11
64 bits for APFS on macOS 10.13
as the number of bits used to store maximum extended attribute size. These imply that the actual maximum xattr sizes are somewhere in the ranges
1 GiB ≤ maximum < 2 GiB for HFS+ on OS X 10.11
8 EiB ≤ maximum < 16 EiB for APFS on macOS 10.13

I seem to be able to write at least 260kB, like this by generating 260kB of nulls and converting them to the letter a so I can see them:
xattr -w myattr "$(dd if=/dev/zero bs=260000 count=1|tr '\0' a)" fred
1+0 records in
1+0 records out
260000 bytes transferred in 0.010303 secs (25235318 bytes/sec)
And then read them back with:
xattr -l fred
myattr: aaaaaaaaaaaaaaaaaa...aaa
And check the length returned:
xattr -l fred | wc -c
260009
I suspect this is actually a limit of ARGMAX on the command line:
sysctl kern.argmax
kern.argmax: 262144
Also, just because you can store 260kB in an xattr, that does not mean it is advisable. I don't know about HFS+, but on some Unixy filesystems, the attributes can be stored directly in the inode, but if you go over a certain limit, additional space has to be allocated on disk for the data.
——-
With the advent of High Sierra and APFS to replace HFS+, be sure to test on both filesystems - also make sure that Time Machine backs up and restores the data as well and that utilities such as ditto, tar and the Finder propagate them when copying/moving/archiving files.
Also consider what happens when Email a tagged file, or copy it to a FAT-formatted USB Memory Stick.
I also tried setting multiple attributes on a single file and the following script successfully wrote 1,000 attributes (called attr-0, attr-1 ... attr-999) each of 260kB to a single file - meaning that the file effectively carries 260MB of attributes:
#!/bin/bash
for ((a=1;a<=1000;a++)) ; do
echo Setting attr-$a
xattr -w attr-$a "$(dd if=/dev/zero bs=260000 count=1 2> /dev/null | tr '\0' a)" fred
if [ $? -ne 0 ]; then
echo ERROR: Failed to set attr
exit
fi
done
These can all be seen and read back too - I checked.

Related

How long is this memory section specified in this .dtb file?

I feel like I'm not understanding how to interpret the format of dtb/dts files, and was hoping you could help. After running these commands:
qemu-system-riscv64 -machine virt -machine dumpdtb=riscv64-virt.dtb
dtc -I dtb -O dts -o riscv-virt.dts riscv-virt.dtb
The resulting riscv-virt.dts contains the definition of the memory for the machine:
/dts-v1/;
/ {
#address-cells = <0x02>;
#size-cells = <0x02>;
compatible = "riscv-virtio";
model = "riscv-virtio,qemu";
...other memory definitions...
memory#80000000 {
device_type = "memory";
reg = <0x0 0x80000000 0x0 0x8000000>;
};
};
I have a few questions:
Why are there multiple pairs of reg definitions? Based on this link, it appears the second 0x0 0x8000000 overwrites what was just set in the previous pair, 0x0 0x80000000.
How long is this memory bank? Which value tells me this?
The first line says memory#80000000, but then the reg commands start at 0x0. Does the memory start at 0x0 or 0x80000000?
Basically, I just feel like I don't understand how to interpret this. In plain English, what is being defined here?
In dts-specification p. 13 u can read it partially. Reg is given in (address,length) pairs. In your case address and length are given in 64 byte, which is done by using 2! 32-bit values.
Thus the address is 0x80000000, and the size 0x8000000
Edit:
The variables #address-cells and #size-cells specify how many cells (32-bit values) are used for either address and size. In an original dts it is always specified within the device's mother node. Maybe you can find it within your decompiled dts

Trying to reproduce mirbase results locally with BLAST

I am trying to reproduce locally on my computer what I get running mirbase on their website using BLAST. The 'search sequences' option is: mature miRNAs which I had downloaded on my computer and make it as a BLAST database with command:
./makeblastdb -in /home/marianoavino/Downloads/mature.fa -dbtype 'nucl' -out /home/marianoavino/Downloads/mature
then on mirbase I see they use an e-value of 10, which I leave locally.
On mirbase at the end of the analysis they give you these parameter setting:
Search parameters
Search algorithm:
BLASTN
Sequence database:
mature
Evalue cutoff:
10
Max alignments:
100
Word size:
4
Match score:
+5
Mismatch penalty:
-4
and this is the command line I use on my computer for BLAST
./blastn -db /home/marianoavino/Downloads/mature -evalue 10 -word_size 4 -query /home/marianoavino/Downloads/testinputblast.fasta -task "blastn" -out /home/marianoavino/Downloads/testBLast.out
The results of the two analysis are different, with mirbase finding much more stuff than local BLAST.
Do you have any idea on which parameters I should use on local blast command line to match those listed mirbase parameters in order to get the same answer?
There can be lots of reasons for different results including the version of blast you are using and which they used, parameters (like you said) and differences in the databases (remember, database size is used to calculate things like evalue, so you may end up with different results).
Exact replication of results may be difficult, but the question is are the differences meaningful? Just because an alignment has some evalue (which 10 is unusually high) does not mean it is meaningful. For a given sequence, if searches are yielding different number of alignments, but the same number of high quality alignments (high bitscore, low evalue, full alignment between query and subject sequences), does it matter?
I would try and compare results to see where these differences are, then move forward

What is the max number of files to select in an HTML5 [multiple] file input?

I have 64000 small images I want to upload to my website (using existing validation, so no FTP etc). I've created an HTML5 [multiple] type=file input for this a while back to be used for a hundred or hundreds of images. Hundreds is not a problem. The images are batched and sent to the server.
But when I select a folder of ~ 16000 images, the file input's FileList is empty... The onchange event triggers, but the file list is empty. The browser (or file system or OS?) seems to have a problem selecting this many files.
I've created a very small tool to help determine what could be the max: http://jsfiddle.net/rudiedirkx/Ehhk5/1/show/
$inp.onchange = function(e) {
var l = 0, b = 0;
for (var i=0, F=this.files, L=F.length; i<L; i++) {
l += F[i].name.length;
b += F[i].size;
}
$nf.innerHTML += this.files.length + ' files: ' + (b/1000/1000) + ' MB / ' + l + ' chars of filename<br>';
};
All it does is count:
the number of files
the number of characters all file names are combined
the number of MB of total file size
When I try this, I get as very most:
1272 files: 176.053987 MB / 31469 chars of filename
(On 32 & 64 bit Win7, Chrome 26-52)
The next image (which fails) would be:
1273 images, which is not an obvious cut-off
between 176 and 177 MB filesize, also not an obvious cut-off
less than 32000 chars of filenames, also not an obvious cut-off, although it sort-of maybe looks like 32k...
In my calc, 1 MB = 1000^2 Bytes, not 1024^2. (That would be a MiB, but maybe my OS/filesystem/browser disagrees.)
My question would be: why this many files? Why this max? Is it OS dependent or browser dependent? Where do I find the specs for that? Is it JS' fault? Search for "file input max files" et al only results into the [max] attribute, which is irrelevant.
More test results:
In Firefox the max seems to be much higher. At least "2343 files: 310.66553999999996 MB / 60748 chars of filename" (that's all the files I have right here)
In Firefox also: "16686 files: 55.144415 MB / 146224 chars of filename" (much smaller, but more files)
Update
Chrome 52 canary Windows is still 32k of file name
Firefox (44+) Windows is still unlimited
why this many files?
The number of files depends on the number of characters all file names are combined.
Why this max?
In the Windows API, the maximum path length limitation is 256 chars, the Unicode version API is 32,767 chars.
Chrome simply sets the max path length of the Unicode version API, so it's about 32k chars as you observed.
Check this fix: https://code.google.com/p/chromium/issues/detail?id=44068
Firefox dynamically allocates a buffer big enough to hold the size of multiple selected files, which could handle much larger path length.
Check this fix: https://bugzilla.mozilla.org/show_bug.cgi?id=660833
Is it OS dependent or browser dependent?
Both.
Where do I find the specs for that?
For Windows API usage and reference:
http://msdn.microsoft.com/en-us/library/aa365247.aspx (Maximum Path Length Limitation)
http://msdn.microsoft.com/en-us/library/ms646839(VS.85).aspx
Is it JS' fault?
No.

Searching through very large rainbow table file

I am looking for the best way to search through a very large rainbow table file (13GB file). It is a CSV-style file, looking something like this:
1f129c42de5e4f043cbd88ff6360486f; somestring
78f640ec8bf82c0f9264c277eb714bcf; anotherstring
4ed312643e945ec4a5a1a18a7ccd6a70; yetanotherstring
... you get the idea - there are about ~900 Million lines, always with a hash, semicolon, clear text string.
So basically, the program should look if a specific hash is lited in this file.
Whats the fastest way to do this?
Obviously, I can't read the entire file into memory and then put a strstr() on it.
So whats the most efficent way to do this?
read file line by line, always to a strstr();
read larger chunk of the file (e.g. 10.000 lines), do a strstr()
Or would it be more efficient import all this data into an MySQL database and then search for the hash via SQL querys?
Any help is appreciated
The best way to do it would be to sort it and then use a binary search-like algorithm on it. After sorting it, it will take around O(log n) time to find a particular entry where n is the number of entries you have. Your algorithm might look like this:
Keep a start offset and end offset. Initialize the start offset to zero and end offset to the file size.
If start = end, there is no match.
Read some data from the offset (start + end) / 2.
Skip forward until you see a newline. (You may need to read more, but if you pick an appropriate size (bigger than most of your records) to read in step 3, you probably won't have to read any more.)
If the hash you're on is the hash you're looking for, go on to step 6.
Otherwise, if the hash you're on is less than the hash you're looking for, set start to the current position and go to step 2.
If the hash you're on is greater than the hash you're looking for, set end to the current position and go to step 2.
Skip to the semicolon and trailing space. The unhashed data will be from the current position to the next newline.
This can be easily converted into a while loop with breaks.
Importing it into MySQL with appropriate indices and such would use a similarly (or more, since it's probably packed nicely) efficient algorithm.
Your last solution might be the easiest one to implement as you move the whole performance optimizing to the database (and usually they are optimized for that).
strstr is not useful here as it searches a string, but you know a specific format and can jump and compare more goal oriented. Thing about strncmp, and strchr.
The overhead for reading a single line would be really high (as it is often the case for file IO). So I'd recommend reading a larger chunk and perform your search on that chunk. I'd even think about parallelizing the search by reading the next chunk in another thread and do comparison there aswell.
You can also think about using memory mapped IO instead of the standard C file API. Using this you can leave the whole contents loading to the operating system and don't have to care about caching yourself.
Of course restructuring the data for faster access would help you too. For example insert padding bytes so all datasets are equally long. This will provide you "random" access to your data stream as you can easily calculate the position of the nth entry.
I'd start by splitting the single large file into 65536 smaller files, so that if the hash begins with 0000 it's in the file 00/00data.txt, if the hash begins with 0001 it's in the file 00/01data.txt, etc. If the full file was 12 GiB then each of the smaller files would be (on average) 208 KiB.
Next, separate the hash from the string; such that you've got 65536 "hash files" and 65536 "string files". Each hash file would contain the remainder of the hash (the last 12 digits only, because the first 4 digits aren't needed anymore) and the offset of the string in the corresponding string file. This would mean that (instead of 65536 files at an average of 208 KiB each) you'd have 65536 hash files at maybe 120 KiB each and 65536 string files at maybe 100 KiB each.
Next, the hash files should be in a binary format. 12 hexadecimal digits costs 48 bits (not 12*8=96-bits). This alone would halve the size of the hash files. If the strings are aligned on a 4 byte boundary in the strings file then a 16-bit "offset of the string / 4" would be fine (as long as the string file is less than 256 KiB). Entries in the hash file should be sorted in order, and the corresponding strings file should be in the same order.
After all these changes; you'd use the highest 16-bits of the hash to find the right hash file, load the hash file and do a binary search. Then (if found) you'd get the offset for the start of the string (in the strings file) from entry in the hash file, plus get the offset for the next string from next entry in the hash file. Then you'd load data from the strings file, starting at the start of the correct string and ending at the start of the next string.
Finally, you'd implement a "hash file cache" in memory. If your application can allocate 1.5 GiB of RAM, then that'd be enough to cache half of the hash files. In this case (half the hash files cached) you'd expect that half the time the only thing you'd need to load from disk is the string itself (e.g. probably less than 20 bytes) and the other half the time you'd need to load the hash file into the cache first (e.g. 60 KiB); so on average for each lookup you'd be loading about 30 KiB from disk. Of course more memory is better (and less is worse); and if you can allocate more than about 3 GiB of RAM you can cache all of the hash files and start thinking about caching some of the strings.
A faster way would be to have a reversible encoding, so that you can convert a string into an integer and then convert the integer back into the original string without doing any sort of lookup at all. For an example; if all your strings use lower case ASCII letters and are a max. of 13 characters long, then they could all be converted into a 64-bit integer and back (as 26^13 < 2^63). This could lead to a different approach - e.g. use a reversible encoding (with bit 64 of the integer/hash clear) where possible; and only use some sort of lookup (with bit 64 of the integer/hash set) for strings that can't be encoded in a reversible way. With a little knowledge (e.g. carefully selecting the best reversible encoding for your strings) this could slash the size of your 13 GiB file down to "small enough to fit in RAM easily" and be many orders of magnitude faster.

Convert Tektronix's RIBinary data in TCL

I am pulling data from a Tektronix oscilloscope in Tektronix' RIBinary format using a TCL script, and then within the script I need to convert that to a decimal value.
I have done very little with binary conversions in the first place, but to add to my frustration the documentation on this binary format is also very vague in my opinion. Anyway, here's my current code:
proc ::Scope::CaptureWaveform {VisaAlias Channel} {
# Apply scope settings
::VISA::Write $VisaAlias "*WAI"
::VISA::Write $VisaAlias "DATa:STARt 1"
::VISA::Write $VisaAlias "DATa:STOP 4000"
::VISA::Write $VisaAlias "DATa:ENCdg RIBinary"
::VISA::Write $VisaAlias "DATa:SOUrce $Channel"
# Download waveform
set RIBinaryWaveform [::VISA::Query $VisaAlias "CURVe?"]
# Parse out leading label from scope output
set RIBinaryWaveform [string range $RIBinaryWaveform 11 end]
# Convert binary data to a binary string usable by TCL
binary scan $RIBinaryWaveform "I*" TCLBinaryWaveform
set TCLBinaryWaveform
# Convert binary data to list
}
Now, this code pulls the following data from the machine:
-1064723993 -486674282 50109321 -6337556 70678 8459972 143470359 1046714383 1082560884 1042711231 1074910212 1057300801 1061457453 1079313832 1066305613 1059935120 1068139252 1066053580 1065228329 1062213553
And this is what the machine pulls when I just take regular ASCII data (i.e. what the above data should look like after the conversion):
-1064723968 -486674272 50109320 -6337556 70678 8459972 143470352 1046714368 1082560896 1042711232 1074910208 1057300800 1061457472 1079313792 1066305600 1059935104 1068139264 1066053568 1065228352 1062213568
Finally, here is a reference to the RIBinary specification from Tektronix since I don't think it is a standard data type:
http://www.tek.com/support/faqs/how-binary-data-represented-tektronix-oscilloscopes
I've been looking for a while now on the Tektronix website for more information on converting the data and the above URL is all I've been able to find, but I'll comment or edit this post if I find any more information that might be useful.
Updates
Answers don't necessarily have to be in TCL. If anyone can help me logically work through this on a high level I can hash out the TCL details (this I think would be more helpful to others as well)
The reason I need to transfer the data in binary and then convert it afterwards is for the purpose of optimization. Due to this I can't have the device perform the conversion before the transfer as it will slow down the process.
I updated my code some and now my results are maddeningly close to the actual results. I assume it may have something to do with the commas that are in the data originally.
Below are now examples of the raw data sent from the device without any of my parsing.
On suggestion from #kostix, I made a second script with code he gave me that I modified to fit my data set. It can be seen below, however the result are exactly the same as my above code.
ASCIi:
:CURVE -1064723968,-486674272,50109320,-6337556,70678,8459972,143470352,1046714368,1082560896,1042711232,1074910208,1057300800,1061457472,1079313792,1066305600,1059935104,1068139264,1066053568,1065228352,1062213568
RIBinary:
:CURVE #280ÀçâýðüÿKì
Note on RIBinary - ":CURVE #280" is all part of the header that I need to parse out, but the #280 part of it can vary depending on the data I'm collecting. Here's some more info from Tektronix on what the #280 means:
block is the waveform data in binary format. The waveform is formatted
as: # where is the number of y bytes. For
example, if = 500, then = 3. is the number of bytes to
transfer including checksum.
So, for my current data set x = 2 and yyy = 80. I am just really unfamiliar with converting binary data, so I'm not sure what to do programmatically to deal with the block format.
On suggestion from #kostix I made a second script with code he gave me that I modified to fit my data set:
set RIBinaryWaveform [::VISA::Query ${VisaAlias} "CURVe?"]
binary scan $RIBinaryWaveform a8a curv nbytes
encoding convertfrom ascii ${curv}
scan $nbytes %u n
set n
set headerlen [expr {$n + 9}]
binary scan $RIBinaryWaveform #9a$n nbytes
scan $nbytes %u n
set n
set numints [expr {$n / 4}]
binary scan $RIBinaryWaveform #${headerlen}I${numints} data
set data
The output of this code is the same as the code I provided above.
According to the documentation you link to, RIBinary is signed big-endian. Thus, you convert the binary data to integers with binary scan $data "I*" someVar (I* means “as many big-endian 4-byte integers as you can”). You use the same conversion with RPBinary (if you've got that) but you then need to chop each value to the positive 32-bit integer range by doing & 0xFFFFFFFF (assuming at least Tcl 8.5). For FPBinary, use R* (requires 8.5). SRIBinary, SRPBinary and SFPBinary are the little-endian versions, for which you use lower-case format characters.
Getting conversions correct can take some experimentation.
I have no experience with this stuff but like googleing. Here are my findings.
This document, in the section titled "Formatted I/O Operations" tells that the viQueryf() standard C API function combines viPrintf() (writing to a device) with viScanf() (reading from a device), and examples include calls like viQueryf (io, ":CURV?\n", "%#b", &totalPoints, rdBuffer); (see the section «IEEE-488.2 Binary Data—"%b"»), where the third argument to the function specifies the desired format.
The VISA::Query procedure from your Tcl library pretty much resembles that viQueryf() in my eyes, so I'd expect it to accept the third (optional) argument which specifies the format you want the data to be in.
If there's nothing like it, let's look at your ASCII data. Your FAQ entry and the document I found both specify that the opaque data might come in the form of a series of integers of different size and endianness. The "RIBinary" format states it should be big-endian signed integers.
The binary scan Tcl command is able to scan 16-bit and 32-bit big-endian integers from a byte stream — use the S* and I* formats, correspondingly.
Your ASCII data clearly looks like 32-bit integers, so I'd try scanning using I*.
Also see this doc — it appears to have much in common with the PDF guide I linked above, but might be handy anyway.
TL;DR
Try studying your API to find a way to explicitly tell the device the data format you want. This might produce a more robust solution in the case the device might be somehow reconfigured externally to change its default data format effectively pulling the rug under the feet of your code which relies on certain (guessed) default.
Try interpreting the data as outlined above and see if the interpretation looks sensible.
P.S.
This might mean nothing at all, but I failed to find any example which has "e" between the "CURV" and the "?" in the calls to viQueryf().
Update (2013-01-17, in light of the new discoveries about the data format): to binary scan the data of varying types, you might employ two techniques:
binary scan accepts as many specifiers in a row, you like; they're are processed from left to right as binary scan reads the supplied data.
You can do multiple runs of binary scanning over a chunk of your binary data either by cutting pieces of this chunk (string manipulation Tcl commands understand they're operating on a byte array and behave accordingly) or use the #offset term in the binary scan format string to make it start scanning from the specified offset.
Another technique worth employing here is that you'd better first train yourself on a toy example. This is best done in an interactive Tcl shell — tkcon is a best bet but plain tclsh is also OK, especially if called via rlwrap (POSIX systems only).
For instance, you could create a fake data for yourself like this:
% set b [encoding convertto ascii ":CURVE #224"]
:CURVE #224
% append b [binary format S* [list 0 1 2 -3 4 -5 6 7 -8 9 10 -11]]
:CURVE #224............
Here we first created a byte array containing the header and then created another byte array containing twelve 16-bit integers packed MSB first, and then appended it to the first array essentially creating a data block our device is supposed to return (well, there's less integers than the device returns). encoding convertto takes the name of a character encoding and a string and produces a binary array of that string converted to the specified encoding. binary format is told to consume a list of arbitrary size (* in the format list) and interpret it as a list of 16-bit integers to be packed in the big-endian format — the S format character.
Now we can scan it back like this:
% binary scan $b a8a curv nbytes
2
% encoding convertfrom ascii $curv
:CURVE #
% scan $nbytes %u n
1
% set n
2
% set headerlen [expr {$n + 9}]
11
% binary scan $b #9a$n nbytes
1
% scan $nbytes %u n
1
% set n
24
% set numints [expr {$n / 2}]
12
% binary scan $b #${headerlen}S${numints} data
1
% set data
0 1 2 -3 4 -5 6 7 -8 9 10 -11
Here we proceeded like this:
Interpret the header:
Read the first eight bytes of the data as ASCII characters (a8) — this should read our :CURVE # prefix. We convert the header prefix from the packed ASCII form to the Tcl's internal string encoding using encoding convertfrom.
Read the next byte (a) which is then interpreted as the length, in bytes, of the next field, using the scan command.
We then calculate the length of the header read so far to use it later. This values is saved to the "headerlen" variable. The length of the header amounts to the 9 fixed bytes plus variable-number of bytes (2 in our case) specifying the length of the following data.
Read the next field which will be interpreted as the "number of data bytes" value.
To do this, we offset the scanner by 9 (the length of ":CURVE #2") and read so many ASCII bytes as obtained on the previous step, so we use #9a$n for the format: $n is just obtaining the value of a variable named "n", and it will be 2 in our case. Then we scan the obtained value and finally get the number of the following raw data.
Since we will read 16-bit integers, not bytes, we divide this number by 2 and store the result to the "numints" variable.
Read the data. To do this, we have to offset the scanner by the length of the header. We use #${headerlen}S${numints} for the format string. Tcl expands those ${varname} before passing the string to the binary scan so the actual string in our case will be #11S12 which means "offset by 11 bytes then scan 12 16-bit big-endian integers".
binary scan puts a list of integers to the variable which name is passed, so no additional decoding of those integers is needed.
Note that in the real program you should probably do certain sanity checks:
* After the first step check that the static part of the header is really ":CURVE #".
* Check the return value of binary scan and scan after each invocation and check it equals to the number of variables passed to the command (which means the command was able to parse the data).
One more insight. The manual you cited says:
is the number of bytes to transfer including checksum.
so it's quite possible that not all of those data bytes represent measures, but some of them represent the checksum. I don't know what format (and hence length) and algorithm and position of this checksum is. But if the data does indeed include a checksum, you can't interpret it all using S*. Instead, you will probably take another approach:
Extract the measurement data using string range and save it to a variable.
binary scan the checksum field.
Calculate the checksum on the data obtained on the first step, verify it.
Use binary scan on the extracted data to get back your measurements.
Checksumming procedures are available in tcllib.
# Download waveform
set RIBinaryWaveform [::VISA::Query ${VisaAlias} "CURVe?"]
# Extract block format data
set ResultCount [expr [string range ${RIBinaryWaveform} 2 [expr [string index${RIBinaryWaveform} 1] + 1]] / 4]
# Parse out leading label from Tektronics block format
set RIBinaryWaveform [string range ${RIBinaryWaveform} [expr [string index ${RIBinaryWaveform} 1] + 2] end]
# Convert binary data to integer values
binary scan ${RIBinaryWaveform} "I${ResultCount}" Waveform
set Waveform
Okay, the code above does the magic trick. This is very similar to all the things discussed on this page, but I figured I needed to clear up the confusion about the numbers from the binary conversion being different from the numbers received in ASCII.
After troubleshooting with a Tektronix application specialist we discovered that the data I had been receiving after the binary conversion (the numbers that were off by a few digits) were actually the true values captured by the scope.
The reason the ASCII values are wrong is a result of the binary-to-ASCII conversion done by the instrument and then the incorrect values are then passed by the scope to TCL.
So, we had it right a few days ago. The instrument was just throwing me for a loop.