Is there a way to increase Binary Array capacity in ray/pyarrow? - pyarrow

Is there a way increase BinaryArray limit in pyarrow? I'm hitting this exception when using ray.get:
Capacity error: BinaryArray cannot contain more than 2147483646 bytes, have 2147483655

Related

Need a formula to get total LUN size using lunSizeLow and lunSizeHigh SNMP objects

I have 2 SNMP Objects/OIDs. Below are the details:
Object1:
Name: lunSizeLow
OID: 1.3.6.1.4.1.43906.1.4.3.2.3.1.9
Description: `LUN` size in bytes - low order bytes
Object2:
Name: lunSizeHigh
OID: 1.3.6.1.4.1.43906.1.4.3.2.3.1.10
Description: `LUN` size in bytes - high order bytes
My requirement:
I want to monitor LUN size through some script. But i didn't found any SNMP object, which can give total LUN size directly. I found 2 separate objects (lunSizeLow and lunSizeHigh) to get LUN total size, so i need a formula to get total LUN size using these 2 low order and high order SNMP objects (lunSizeLow and lunSizeHigh).
I gone through many articles over internet and i found couple of formulas in community.hpe.com.
But I'm not sure which one is correct.
Formula 1:
Max unsigned number that can be stored in 32bits counter is 4294967295.
Total size would be: LOW_ORDER_BYTES + HIGH_ORDER_BYTES * 4294967296
Formula 2:
Total size in GB is LOW_ORDER_BYTES / 1073741824 + HIGH_ORDER_BYTES * 4
Could any one help me to get correct formula.
Most languages will have the bit-shift operator, allowin you to do something similar to the below (pseudo-Java):
long myBigInteger = lunSizeHigh
myBigInteger << 32 # Shifts the high bits 32 positions to the left, into the high half of the long
myBigInteger = myBigInteger + lunSizeLow
This has two advantages over multiplying:
Bit shifting is often faster than multiplication, even though most compilers would optimize that particular multiplication into a bit shift anyway.
It is easier to read the code and understand why this would provide the correct answer, given the description from the MIB. Magic numbers should be avoided where possible.
That aside, putting some numbers into the Windows Calculator (using Programmer Mode) and trying formula 1, we can see that it works.
Now, you don't specify what language or environment you're working in, and in some languages you won't have any number type that supports the size of numbers you want to manipulate. (Same reason that this number had to be split into two counters to begin with - it's larger than the largest number representation available on some (primitive) platforms.) If you want to do it using multiplication, you'll have to make sure your implementation language can do better.

Cassandra .csv import error:batch too large

I'm trying to import data from a .csv file to Cassandra 3.2.1 via copy command.In the file are only 299 rows with 14 columns. I get the Error:
Failed to import 299 rows: InvalidRequest - code=2200 [Invalid query] message="Batch too large"
I used the following copy comand and tryied to increase the batch size:
copy table (Col1,Col2,...)from 'file.csv'
with delimiter =';' and header = true and MAXBATCHSIZE = 5000;
I think 299 rows are not too much to import to cassandra or i am wrong?
Adding the CHUNKSIZE keyword resolved the problem for me.
e.g.
copy event_stats_user from '/home/kiren/dumps/event_stats_user.csv ' with CHUNKSIZE=1 ;
The error you're encountering is a server-side error message, saying that the size (in term of bytes count) of your batch insert is too large.
This batch size is defined in the cassandra.yaml file:
# Log WARN on any batch size exceeding this value. 5kb per batch by default.
# Caution should be taken on increasing the size of this threshold as it can lead to node instability.
batch_size_warn_threshold_in_kb: 5
# Fail any batch exceeding this value. 50kb (10x warn threshold) by default.
batch_size_fail_threshold_in_kb: 50
If you insert a lot of big columns (in size) you may reach quickly this threshold. Try to reduce MAXBATCHSIZE to 200.
More info on COPY options here

error-correcting code checksum

Question! : Adding all bytes together gives 118h.
Drop the Carry Nibble to give you 18h. I can't get this word 'Carry Nible'.
If I make checksum for this byte 10010101(95hex), then the checksum is 4(04hex)?
source : http://www.asic-world.com/digital/numbering4.html#Error_Detecting_and_Correction_Codes
"
The parity method is calculated over byte, word or double word. But when errors need to be checked over 128 bytes or more (basically blocks of data), then calculating parity is not the right way. So we have checksum, which allows to check for errors on block of data. There are many variations of checksum.
Adding all bytes
CRC
Fletcher's checksum
Adler-32
The simplest form of checksum, which simply adds up the asserted bits in the data, cannot detect a number of types of errors. In particular, such a checksum is not changed by:
Reordering of the bytes in the message
Inserting or deleting zero-valued bytes
Multiple errors which sum to zero
Example of Checksum : Given 4 bytes of data (can be done with any number of bytes): 25h, 62h, 3Fh, 52h
Adding all bytes together gives 118h.
Drop the Carry Nibble to give you 18h.
Get the two's complement of the 18h to get E8h. This is the checksum byte.
To Test the Checksum byte simply add it to the original group of bytes. This should give you 200h.
Drop the carry nibble again giving 00h. Since it is 00h this means the checksum means the bytes were probably not changed."

getThreads of Very Large Label

I have 900+ threads in a label. I would like to fetch them all to work out some metrics in a script. getThreads() seems to max at 500 threads, which aligns with what the documentation was saying:
This call will fail when the size of all threads is too large for the system to handle. Where the thread size is unknown, and potentially very large, please use the 'paged' call, and specify ranges of the threads to retrieve in each call.
So now the problem is when I do
GmailApp.getUserLabelByName("Huge Label").getThreads(501, 1000).length;
I get the message: "Argument max cannot exceed 500." Any suggestions on how to process a label with a very large thread count?
The signature of getThreads() method is
getThreads(start, max)
So you must use
GmailApp.getUserLabelByName("Huge Label").getThreads(501, 500).length;
That will return you threads from 501 to 1000.

Sql Msg 1701, Level 16, State 1 while making a very wide table for testing purposes

I am making a huge table simulating a very rough scenario in SQL (a huge table with 1024 atts, of course a lot of rows if you wonder), the data type for each attribute are floats.
To do so I am using another table which has 300 attributes and I am doing something like
SELECT [x1]
,[x2]
,[x3]
,[x4]
,[x5]
,[x6]
,[x7]
,[x8]
,[x9]
,[x10]
,[x11]
,[x12]
,[x13]
,[x14]
,[x300]
,x301= x1
,x302= x2
...
,x600= x300
,x601= x1
,x602= x2
...
,x900= x300
,x901= x1
,x902= x2
...
,x1000= x100
,x1001= x101
,x1002= x102
,x1003= x103
,x1004= x104
...
,x1024= x124
INTO test_1024
FROM my_300;
However an error is present:
Msg 1701, Level 16, State 1, Line 2
Creating or altering table 'test_1024' failed because the minimum row size
would be 8326, including 134 bytes of internal overhead. This exceeds the
maximum allowable table row size of 8060 bytes.
How to overcome this issue? (I know SQL can handle 1024 columns...)
You will have to change your data types to either varchar, nvarchar, varbinary or text to circumvent this error - or break the input into several tables (or better yet, find a better way to structure your data...which I know isn't always possible depending on constraints).
To read more about the 'why' - check out this article which explains it better than I could: http://blog.sqlauthority.com/2007/06/23/sql-server-2005-row-overflow-data-explanation/
Let's have a look at the figures in the error message.
'8326, including 134 bytes of internal overhead' means that data only has taken 8326-134=8192 bytes.
Given that the number of columns is 1024, it's exactly 8192÷1024=8 bytes per column.
Moving on to the overhead, of those 134 bytes, your 1024 columns require 1024÷8=128 bytes for the NULL bitmap.
As for the remaining 134-128=6 bytes, I am not entirely sure but we can very well consider that size a constant overhead.
Now, let's try to estimate the maximum possible number of float columns per table in theory.
The maximum row size is said to be 8060 bytes.
Taking off the constant overhead, it's 8060-6=8054 bytes.
As we now know, one float column takes 8 bytes of data plus 1 bit in the bitmap, which is 8×8+1=65 bits.
The data + NULL bitmap size in bits is 8054×8=64432.
The estimated maximum number of float columns per table is therefore 64432÷65≈991 columns.
So, commenting out 33 columns in your script should result in successful creation of the table.
To verify, uncommenting one back should produce the error again.
SQL server limits row sizes to approximately 8KB - certain column types are excluded from this total, but the value of each individual column must fit within the 8KB limit, and a certain amount of data will be placed in the row itself as a pointer. If you are exceeding this limit, you should step back and reconsider your schema; you do NOT need 300 columns in a table.