I'm importing some data from a CSV into Mathematica. The first few lines of the CSV look like this:
"a_use","tstart","tend"
"bind items on truck to prevent from flying off",1328661514469,1328661531032
"hang laundry on",1328661531035,1328661541700
"tie firewood with",1328661541702,1328661554940
"anchor tent",1328661554942,1328661559797
Mathematica handles this almost perfectly:
data = Import["mystuff.csv"]
The problem is that those big timestamps get converted into scientific notation, and the precision is lost:
In[283]:= data[[2,2]]
Out[283]= 1.32866*10^12
As you can see, even though 1328661531035 is not the same as 1328661541700, the imported data is no longer precise enough to tell the two apart, since both get imported as 1.32866*10^12. I know Mathematica can handle integers of arbitrary length, so how can I get it to import these numbers as (large) integers instead of converting them into this lossy scientific notation?
What version are you using? No problem on Mma 8.0.1.
If you are creating the CSV file in Excel set the format of the timestamps to Number with zero decimal places (via More Number Formats...)
Related
This might be a trivial question... Or might not be. When I serialize an object to JSON how are numbers represented?
Specifically, I need to know how efficiently they are encoded to binary. There are 2 ways:
Transform number to its decimal string representation and then encode that string to binary.
Or encode the number directly to binary.
Which is the case?
That is a big difference: Let's say serialized object contains number 12345678. Encoded first way it will take 8 B to transfer, encoded second way only 4 B. When it comes to lots of big numbers (my case) than in the first case I would better use base64 as pre-process for serialization.
I can imagine that this might be dependent on serializer (though I really hope it is not). In that case, I am using Firebase Realtime database SDK.
JSON is a textual notation. So the number 12345678 is sent as those eight characters, 1, 2, 3, etc. Depending on your text encoding, that's probably eight bytes (e.g., UTF-8 or Windows-1252; but if you were using UTF-16, for instance, it would be 16 bytes).
There have been various "binary JSON" proposals over the years, but I don't think any of them really caught on outside of specific applications (for instance, BSON in MongoDB).
I am working with SAP HANA I would like to know if it is possible to get disk size of CSV data with SQL.
The CSV data that I mean is the file \index\SCHEMA NAME\CL\TABLE_NAME\data.csv after export.
Best Regards
Houssem
Nope, there is no way to generate this directly.\
You could, however, do some rough estimation, by looking at M_CS_COLUMNS to see the estimated uncompressed size for each column.
Then you could add six bytes (double byte encoding) for every column * no. of records to account for the enclosing quotation marks and separators between columns.
I'm working with some binary waveform files from various early to mid-90's HP scopes. I am trying to do a bulk conversion (we have over 5000) of the files to CSV's and then upload them into a database. I've tried hexdump, xxd, od, strings, etc. and none of them seem to work. I did hunt down a programmers manual but it's not making a whole lot of sense.
The files have a preamble line as ascii text but then the data points are in binary and for some reason nothing I try can decode them. The preamble gives the data necessary to use the binary values and calculate the correct values. It also states that the data is in WORD format.
:WAV:PRE 2,1,32768,1,+4.000000E-08,-4.9722700001108E-06,0,+2.460630E-04,+2.500000E+00,16384;:WAV:DATA #800065536^W�^W�^W�^
I'm pretty confused.
Have a look at
http://www.naic.edu/~phil/hardware/oscilloscopes/9000A_Programmer_Reference.pdf
specifically page 1-21. After ":WAV:DATA", I think the rest of the chunk above will have 65536 8-bit data bytes (the start of which is represented above by �) . The ^W is probably a delimiter, so you would have to parse that out. Just a thought.
UPDATE: I'm new to oscilloscope data collection and am trying to figure the whole thing out from scratch. So, on further digging, it looks like the data you have provided shows this:
PREamble:
- WORD format (16-bit signed integers split into 2 8-bit bytes)
- If there is a WAV:BYT section, that would specify byte order for each pair
- RAW data
- 32768 data points
- COUNT = 1 (I'm not clear on the meaning of this)
- Next 3 should be X increment, origin, reference
- Next 3 should be Y increment, origin, reference, although the manual that I pointed you at above has many more fields than just these, so you might want to consult your specific scope manual.
DATA:
- On closer examination, I don't think the ^W is a delimiter, I think it is the first byte of the pair (0010111). The � character is apparently a standard "I don't know how to represent this character" web representation. You would need to look at that character as 8 bits also.
- 65536 byte pairs of data
I'm not finding a utility that will do this for you. I think you're going to have to write or acquire some code (Perl, C, Java, Python, VB, etc.) to get this done.
Below is the converted values in different bases ie Hexadecimal, Decimal, Binary.
HexaDecimal - 33161fa59009c58000006198
Decimal - 15810481316372905437683540376
Binary - 1100110001011000011111101001011001000000001001110001011000000000000000000000000110000110011000
This one i have achieved correctly in Java. But for project i need to do this kind conversion in MySQL. I found out about Conv() http://dev.mysql.com/doc/refman/5.1/en/mathematical-functions.html#function_conv function which seems work for small no but not for big one sas given above.
Kindly help me if there is any work around for to get these desired results.
Regards,
Amit
I've been asked to process some files serialized as binary (not text/JSON unfortunately) Thrift objects, but I don't have access to the program or programmer that created the files, so I have no idea of their structure, field order, etc. Is there a way using the Thrift libraries to open a binary file and analyze it, getting a list of the field types, values, nesting, etc.?
Unfortunately it appears that Thrift's binary protocol does not do very much tagging of data at all; to decode it appears to assume you have the .thrift file in hand so you know, say, the next 4 bytes are supposed to be an integer, and aren't actually the first half of a float. So it appears you are stuck with, basically, looking at the files in a hex editor (or equivalent) and trying to deduce fields based on the exact patterns you're seeing.
There are a very few helpful bits:
Each file begins with a version, protocol identifier string, and sequence number. Maps will begin with 6 bytes that identify the key and value types (first two bytes, as integer codes) plus the number of elements as a 4 byte integer. The type codes appear to be standard (the canonical location of their definitions seems to be TProtocol.h in the Thrift sources, for instance a boolean value is specified by type code 2, UTF-8 string by type code 16, and so on). Strings are prefixed by a 4 byte integer length field, and lists are prefixed by the type (1 byte) and a 4 byte length. It looks like all integer fields are saved big-endian, and floating points are saved in IEEE format (which should make doubles relatively easy to find, at least).
The TBinaryProtocol* files in Thrift have a few more helpful details; on the plus side, there are a number of different implementations so you can read the ones implemented in the language you are most comfortable with.
Sorry, I know this probably isn't that helpful but it really does appear this is all the information the Thrift binary format provides; clearly the binary format was designed with the intent that you would always know the exact protocol spec already, and that the goal was the minimize wire space, rather than make it at all easy to decode blindly.