My goal is to store a 256 bit object (this is actually a bitfield) into MySQL and be able to do some bitwise operations and comparisons to it.
Ideally, I would use BIT(256) for the type but MySQL limits bitfields to 64 bits.
My proposed solution was to use Binary String BINARY(32) type for this field and I can store my objects but there is no way I can operate on them.
My table structure is
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`bin` binary(32) NOT NULL,
PRIMARY KEY (`id`)
)
but then the query
SELECT
bit_count( bin ) AS fullBin,
bit_count( substring( bin, 0, 4 ) ) AS partialBin
FROM test
always returns 0 as it does not convert my binary string neither the substring into a number for bit_count to operate on.
I am looking for a way to extract parts of the binary string as BIGINT or some other type that I can operate on (I only need bitwise AND and bit_count() operations).
Performance wise, I would prefer a solution that does not involve creating strings and parsing them.
I would also accept any proposal for storing my data as another type but the obvious solution to split my bin column into 4 ones of type BIT(64) is not an option as I must preserve the table naming structure.
Related
In MySQL 5.7, a table defined as following shown
CREATE TABLE `person` (
`person_id` bigint(20) NOT NULL AUTO_INCREMENT,
`name` varchar(64) DEFAULT NULL,
PRIMARY KEY (`person_id`),
KEY `ix_name` (`name`)
) ENGINE=InnoDB CHARSET=utf8
And then we prepared two records for testing, the value of name field (with varchar type) are
123456789123456789
1
respectively.
Case 1
select * from person where name = 123456789123456789-1;
Note that we are using a number instead of string inside the where clause. The record with name 123456789123456789 returned, and it seemed that -1 in the end are ignored!
Furthermore, we add another record with name = 123456789123456788, and this time the above select returns two records, including both 123456789123456789 and 123456789123456788;
The output looks so strange!
Case 2
select * from person where name = 123456789123456789-123456789123456788;
We could get the record with name 1, and in this case it seems that the - act as a minus operator.
Why the behavior of - in two cases are so different!
I can't immediately tell you what the type of 123456789123456789-1 is but for the comparison operation, we're almost certainly falling through most of the more "normal" data type conversion rules for mysql and ending up at:
In all other cases, the arguments are compared as floating-point (real) numbers.
Because one of the argument for the comparison (name) is a string type and the other is numeric, nothing else matches. So both get converted to floats and float types don't have too many digits of precision. Certainly less than the 18 required to represent 123456789123456789 and 123456789123456788 as two different numbers.
Look here:
SELECT person_id, name, name + 0.0, 123456789123456789-1 + 0.0, name = 123456789123456789-1
FROM person
ORDER BY person_id;
Perhaps, before comparing name = 123456789123456789-1 MySQL converts name and 123456789123456789-1 to DOUBLE as I showed in select. So some digits are lost.
Demo.
I've got the following table in MySQL (MySQL Server 5.7):
CREATE TABLE IF NOT EXISTS SIMCards (
SIMCardID INTEGER UNSIGNED PRIMARY KEY AUTO_INCREMENT,
ICCID VARCHAR(50) UNIQUE NOT NULL,
MSISDN BIGINT UNSIGNED UNIQUE);
INSERT INTO SIMCards (ICCID, MSISDN) VALUES
(89441000154687982548, 905511528749),
(89441000154687982549, 905511528744),
(89441000154687982547, 905511528745);
I then run the following query:
SELECT SIMCardID FROM SIMCards WHERE ICCID = 89441000154687982549;
However, rather than returning just the relevant row, it returns all of them. If I surround the ICCID in quotes, it works fine, e.g.:
SELECT SIMCardID FROM SIMCards WHERE ICCID = '89441000154687982549';
Why does the first SELECT query not work as I expected?
An integer in MySQL has a maximum value (unsigned) of 4294967295. Your IDs are substantially larger than that number. As a result, if you select * from your database by integer, your behavior is going to be undefined because the number you are selecting by cannot be represented by an integer.
I'm not sure exactly why you are getting the results that you are getting, but I do know that trying to select by an integer when your data can't be represented by an integer will definitely not work.
Edit to add detail I forgot: even a bigint in MySQL is not large enough to represent your IDs. So you need to make sure and just always use strings.
I am looking into storing a "large" amount of data and not sure what the best solution is, so any help would be most appreciated. The structure of the data is
450,000 rows
11,000 columns
My requirements are:
1) Need as fast access as possible to a small subset of the data e.g. rows (1,2,3) and columns (5,10,1000)
2) Needs to be scalable will be adding columns every month but the number of rows are fixed.
My understanding is that often its best to store as:
id| row_number| column_number| value
but this would create 4,950,000,000 entries? I have tried storing as just rows and columns as is in MySQL but it is very slow at subsetting the data.
Thanks!
Build the giant matrix table
As N.B. said in comments, there's no cleaner way than using one mysql row for each matrix value.
You can do it without the id column:
CREATE TABLE `stackoverflow`.`matrix` (
`rowNum` MEDIUMINT NOT NULL ,
`colNum` MEDIUMINT NOT NULL ,
`value` INT NOT NULL ,
PRIMARY KEY ( `rowNum`, `colNum` )
) ENGINE = MYISAM ;
You may add a UNIQUE INDEX on colNum, rowNum, or only a non-unique INDEX on colNum if you often access matrix by column (because PRIMARY INDEX is on ( `rowNum`, `colNum` ), note the order, so it will be inefficient when it comes to select a whole column).
You'll probably need more than 200Go to store the 450.000x11.000 lines, including indexes.
Inserting data may be slow (because there are two indexes to rebuild, and 450.000 entries [1 per row] to add when adding a column).
Edit should be very fast, as index wouldn't change and value is of fixed size
If you access same subsets (rows + cols) often, maybe you can use PARTITIONing of the table if you need something "faster" than what mysql provides by default.
After years of experience (20201 edit)
Re-reading myself years later, I would say the "cache" ideas are totally dumb, as it's MySQL role to handle these sort of cache (it should actually already be in the innodb pool cache).
A better thing would be, if matrix is full of zeroes, not storing the zero values, and consider 0 as "default" in the client code. That way, you may lightenup the storage (if needed: mysql should actually be pretty fast responding to queries event on such 5 billion row table)
Another thing, if storage makes issue, is to use a single ID to identify both row and col: you say number of rows is fixed (450000) so you may replace (row, col) with a single (id = 450000*col+row) value [tho it needs BIGINT so maybe not better than 2 columns)
Don't do like below: don't reinvent MySQL cache
Add a cache (actually no)
Since you said you add values, and doesn't seem to edit matrix values, a cache can speed up frequently asked rows/columns.
If you often read the same rows/columns, you can cache their result in another table (same structure to make it easier):
CREATE TABLE `stackoverflow`.`cachedPartialMatrix` (
`rowNum` MEDIUMINT NOT NULL ,
`colNum` MEDIUMINT NOT NULL ,
`value` INT NOT NULL ,
PRIMARY KEY ( `rowNum`, `colNum` )
) ENGINE = MYISAM ;
That table will be void at the beginning, and each SELECT on the matrix table will feed the cache. When you want to get a column / row:
SELECT the row/column from that caching table
If the SELECT returns a void/partial result (no data returned or not enough data to match the expected row/column number) then do the SELECT on the matrix table
Save the SELECT from the matrix table to the cachingPartialMatrix
If the caching matrix gets too big, clear it (the bigger cached matrix is, the slower it becomes)
Smarter cache (actually, no)
You can make it even smarter with a third table to count how many times a selection is done:
CREATE TABLE `stackoverflow`.`requestsCounter` (
`isRowSelect` BOOLEAN NOT NULL ,
`index` INT NOT NULL ,
`count` INT NOT NULL ,
`lastDate` DATETIME NOT NULL,
PRIMARY KEY ( `isRowSelect` , `index` )
) ENGINE = MYISAM ;
When you do a request on your matrix (one may use TRIGGERS) for the Nth-row or Kth-column, increment the counter. When the counter gets big enough, feed the cache.
lastDate can be used to remove some old values from the cache (take care: if you remove the Nth-column from cache entries because its ``lastDate```is old enough, you may break some other entries cache) or to regularly clear the cache and only leave the recently selected values.
Based on the answer of question, UUID performance in MySQL, the person who answers suggest to store UUID as a number and not as a string. I'm not so sure how it can be done. Anyone could suggest me something? How my ruby code deal with that?
If I understand correctly, you're using UUIDs in your primary column? People will say that a regular (integer) primary key will be faster , but there's another way using MySQL's dark side. In fact, MySQL is faster using binary than anything else when indexes are required.
Since UUID is 128 bits and is written as hexadecimal, it's very easy to speed up and store the UUID.
First, in your programming language remove the dashes
From 110E8400-E29B-11D4-A716-446655440000 to 110E8400E29B11D4A716446655440000.
Now it's 32 chars (like an MD5 hash, which this also works with).
Since a single BINARY in MySQL is 8 bits in size, BINARY(16) is the size of a UUID (8*16 = 128).
You can insert using:
INSERT INTO Table (FieldBin) VALUES (UNHEX("110E8400E29B11D4A716446655440000"))
and query using:
SELECT HEX(FieldBin) AS FieldBin FROM Table
Now in your programming language, re-insert the dashes at the positions 9, 14, 19 and 24 to match your original UUID. If the positions are always different you could store that info in a second field.
Full example :
CREATE TABLE `test_table` (
`field_binary` BINARY( 16 ) NULL ,
PRIMARY KEY ( `field_binary` )
) ENGINE = INNODB ;
INSERT INTO `test_table` (
`field_binary`
)
VALUES (
UNHEX( '110E8400E29B11D4A716446655440000' )
);
SELECT HEX(field_binary) AS field_binary FROM `test_table`
If you want to use this technique with any hex string, always do length / 2 for the field length. So for a sha512, the field would be BINARY (64) since a sha512 encoding is 128 characters long.
I don't think that its a good idea to use a binary.
Let's say that you want to query some value:
SELECT HEX(field_binary) AS field_binary FROM `test_table`
If we are returning several values then we are calling the HEX function several times.
However, the main problem is the next one:
SELECT * FROM `test_table`
where field_binary=UNHEX('110E8400E29B11D4A716446655440000')
And using a function inside the where, simply ignores the index.
Also
SELECT * FROM `test_table`
where field_binary=x'skdsdfk5rtirfdcv##*#(&##$9'
Could leads to many problems.
I have a MySql table where I want to get the count of rows where a given VARCHAR column has a numeric value (convertible to number, you know). Right now, I'm doing a simple REGEXP check on this field. Since this table is very large, I'm using a series of indexes to REGEXP as few rows as possible.
But this VARCHAR column is also indexed. Is there a clever hack of the MySql indexing algorithm that I can exploit to scan even fewer rows? :-/ This is an InnoDB table.
You may not like this, as you are probably already trying to avoid it, but rather than trying to do some clever trick, when I have had situations like this, I add an additional column that stores the varchar in an numeric column (updated using a trigger), and query on that.
But, there is a way I can see to do it (though I have never had a reason to do this in production), which is to exploit the fact that indexing will put the values in order, such that all that begin with a number are sequenced together.
Assuming a table like this:
CREATE TABLE `test_1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`text_or_number` varchar(255),
PRIMARY KEY (`id`),
KEY `test_1_idx` (`text_or_number`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
You can select only values starting with numbers by exploiting the order of utf8 characters - http://en.wikipedia.org/wiki/UTF-8#Examples
The lowest value before 0 in utf-8 is "/", and the highest after it is ":", so this should extract only values that start with a number:
select cast(text_or_number as unsigned)
from test_1
where text_or_number < ':'
and text_or_number > '/'
and cast(text_or_number as unsigned) > 0;
That could still contain values that start with a number, but do not end with one, which is why I have added the cast(...) > 0 clause, but I think mysql will be smart enough to run the where clauses in order, so hopefully it will only run the cast on the subset of rows that start with a numeric char.