Store UUID v4 in MySQL - mysql

I'm generating UUIDs using PHP, per the function found here
Now I want to store that in a MySQL database. What is the best/most efficient MySQL field format for storing UUID v4?
I currently have varchar(256), but I'm pretty sure that's much larger than necessary. I've found lots of almost-answers, but they're generally ambiguous about what form of UUID they're referring to, so I'm asking for the specific format.

Store it as VARCHAR(36) if you're looking to have an exact fit, or VARCHAR(255) which is going to work out with the same storage cost anyway. There's no reason to fuss over bytes here.
Remember VARCHAR fields are variable length, so the storage cost is proportional to how much data is actually in them, not how much data could be in them.
Storing it as BINARY is extremely annoying, the values are unprintable and can show up as garbage when running queries. There's rarely a reason to use the literal binary representation. Human-readable values can be copy-pasted, and worked with easily.
Some other platforms, like Postgres, have a proper UUID column which stores it internally in a more compact format, but displays it as human-readable, so you get the best of both approaches.

If you always have a UUID for each row, you could store it as CHAR(36) and save 1 byte per row over VARCHAR(36).
uuid CHAR(36) CHARACTER SET ascii
In contrast to CHAR, VARCHAR values are stored as a 1-byte or 2-byte
length prefix plus data. The length prefix indicates the number of
bytes in the value. A column uses one length byte if values require no
more than 255 bytes, two length bytes if values may require more than
255 bytes.
https://dev.mysql.com/doc/refman/5.7/en/char.html
Though be careful with CHAR, it will always consume the full length defined even if the field is left empty. Also, make sure to use ASCII for character set, as CHAR would otherwise plan for worst case scenario (i.e. 3 bytes per character in utf8, 4 in utf8mb4)
[...] MySQL must reserve four bytes for each character in a CHAR
CHARACTER SET utf8mb4 column because that is the maximum possible
length. For example, MySQL must reserve 40 bytes for a CHAR(10)
CHARACTER SET utf8mb4 column.
https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html

Question is about storing an UUID in MySQL.
Since version 8.0 of mySQL you can use binary(16) with automatic conversion via UUID_TO_BIN/BIN_TO_UUID functions:
https://mysqlserverteam.com/mysql-8-0-uuid-support/
Be aware that mySQL has also a fast way to generate UUIDs as primary key:
INSERT INTO t VALUES(UUID_TO_BIN(UUID(), true))

Most efficient is definitely BINARY(16), storing the human-readable characters uses over double the storage space, and means bigger indices and slower lookup. If your data is small enough that storing as them as text doesn't hurt performance, you probably don't need UUIDs over boring integer keys. Storing raw is really not as painful as others suggest because any decent db admin tool will display/dump the octets as hexadecimal, rather than literal bytes of "text". You shouldn't need to be looking up UUIDs manually in the db; if you have to, HEX() and x'deadbeef01' literals are your friends. It is trivial to write a function in your app – like the one you referenced – to deal with this for you. You could probably even do it in the database as virtual columns and stored procedures so the app never bothers with the raw data.
I would separate the UUID generation logic from the display logic to ensure that existing data are never changed and errors are detectable:
function guidv4($prettify = false)
{
static $native = function_exists('random_bytes');
$data = $native ? random_bytes(16) : openssl_random_pseudo_bytes(16);
$data[6] = chr(ord($data[6]) & 0x0f | 0x40); // set version to 0100
$data[8] = chr(ord($data[8]) & 0x3f | 0x80); // set bits 6-7 to 10
if ($prettify) {
return guid_pretty($data);
}
return $data;
}
function guid_pretty($data)
{
return strlen($data) == 16 ?
vsprintf('%s%s-%s-%s-%s-%s%s%s', str_split(bin2hex($data), 4)) :
false;
}
function guid_ugly($data)
{
$data = preg_replace('/[^[:xdigit:]]+/', '', $data);
return strlen($data) == 32 ? hex2bin($data) : false;
}
Edit: If you only need the column pretty when reading the database, a statement like the following is sufficient:
ALTER TABLE test ADD uuid_pretty CHAR(36) GENERATED ALWAYS AS (CONCAT_WS('-', LEFT(HEX(uuid_ugly), 8), SUBSTR(HEX(uuid_ugly), 9, 4), SUBSTR(HEX(uuid_ugly), 13, 4), SUBSTR(HEX(uuid_ugly), 17, 4), RIGHT(HEX(uuid_ugly), 12))) VIRTUAL;

This works like a charm for me in MySQL 8.0.26
create table t (
uuid BINARY(16) default (UUID_TO_BIN(UUID())),
)
When querying you may use
select BIN_TO_UUID(uuid) uuid from t;
The result is:
# uuid
'8c45583a-0e1f-11ec-804d-005056219395'

The most space-efficient would be BINARY(16) or two BIGINT UNSIGNED.
The former might give you headaches because manual queries do not (in a straightforward way) give you readable/copyable values.
The latter might give you headaches because of having to map between one value and two columns.
If this is a primary key, I would definitely not waste any space on it, as it becomes part of every secondary index as well. In other words, I would choose one of these types.
For performance, the randomness of random UUIDs (i.e. UUID v4, which is randomized) will hurt severely. This applies when the UUID is your primary key or if you do a lot of range queries on it. Your insertions into the primary index will be all over the place rather than all at (or near) the end. Your data loses temporal locality, which was a helpful property in various cases.
My main improvement would be to use something similar to a UUID v1, which uses a timestamp as part of its data, and ensure that the timestamp is in the highest bits. For example, the UUID might be composed something like this:
Timestamp | Machine Identifier | Counter
This way, we get a locality similar to auto-increment values.

This could be useful if you use binary(16) data type:
INSERT INTO table (UUID) VALUES
(UNHEX(REPLACE(UUID(), "-","")))

I just found a nice article going in more depth on these topics: https://www.xaprb.com/blog/2009/02/12/5-ways-to-make-hexadecimal-identifiers-perform-better-on-mysql/
It covers the storage of values, with the same options already expressed in the different answers on this page:
One: watch out for character set
Two: use fixed-length, non-nullable values
Three: Make it BINARY
But also adds some interesting insight about indexes:
Four: use prefix indexes
In many but not all cases, you don’t need to index the full length of
the value. I usually find that the first 8 to 10 characters are
unique. If it’s a secondary index, this is generally good enough. The
beauty of this approach is that you can apply it to existing
applications without any need to modify the column to BINARY or
anything else—it’s an indexing-only change and doesn’t require the
application or the queries to change.
Note that the article doesn't tell you how to create such a "prefix" index. Looking at MySQL documentation for Column Indexes we find:
[...] you can create an index that uses only the first N characters of the
column. Indexing only a prefix of column values in this way can make
the index file much smaller. When you index a BLOB or TEXT column, you
must specify a prefix length for the index. For example:
CREATE TABLE test (blob_col BLOB, INDEX(blob_col(10)));
[...] the prefix length in
CREATE TABLE, ALTER TABLE, and CREATE INDEX statements is interpreted
as number of characters for nonbinary string types (CHAR, VARCHAR,
TEXT) and number of bytes for binary string types (BINARY, VARBINARY,
BLOB).
Five: build hash indexes
What you can do is generate a checksum of the values and index that.
That’s right, a hash-of-a-hash. For most cases, CRC32() works pretty
well (if not, you can use a 64-bit hash function). Create another
column. [...] The CRC column isn’t guaranteed to be unique, so you
need both criteria in the WHERE clause or this technique won’t work.
Hash collisions happen quickly; you will probably get a collision with
about 100k values, which is much sooner than you might think—don’t
assume that a 32-bit hash means you can put 4 billion rows in your
table before you get a collision.

This is a fairly old post but still relevant and comes up in search results often, so I will add my answer to the mix. Since you already have to use a trigger or your own call to UUID() in your query, here are a pair of functions that I use to keep the UUID as text in for easy viewing in the database, but reducing the footprint from 36 down to 24 characters. (A 33% savings)
delimiter //
DROP FUNCTION IF EXISTS `base64_uuid`//
DROP FUNCTION IF EXISTS `uuid_from_base64`//
CREATE definer='root'#'localhost' FUNCTION base64_uuid() RETURNS varchar(24)
DETERMINISTIC
BEGIN
/* converting INTO base 64 is easy, just turn the uuid into binary and base64 encode */
return to_base64(unhex(replace(uuid(),'-','')));
END//
CREATE definer='root'#'localhost' FUNCTION uuid_from_base64(base64_uuid varchar(24)) RETURNS varchar(36)
DETERMINISTIC
BEGIN
/* Getting the uuid back from the base 64 version requires a little more work as we need to put the dashes back */
set #hex = hex(from_base64(base64_uuid));
return lower(concat(substring(#hex,1,8),'-',substring(#hex,9,4),'-',substring(#hex,13,4),'-',substring(#hex,17,4),'-',substring(#hex,-12)));
END//

Related

What are the potential risks of increasing column size in SQL?

Suppose I have a column called ShortDescription in a table called Ticket.
ShortDescription varchar (16) NOT NULL
Now, suppose I increase the size like this -
alter table Ticket modify ShortDescription varchar (32) NOT NULL;
What are the potential risks of doing this? One potential risk is that if some other applications have statically set any of their fields to size 16 based on the previous size of ShortDescription, then those applications may not behave correctly with data of greater size.
SQL is a query language, not a specific DB implementation, so your mileage may vary, but ...
Assuming 'SQL' means a MySQL DB on the DB site you've nothing to worry about on the storage and performance outside the fact if you store a bunch of 32byte stings you'll use more memory and disk working with them, but if what you actually store in them is 16 byte characters, the convertion to VARCHAR(32) is a wash.
Within MySQL, VARCHAR there is no impact (assuming you keep the NOT NULL). If the column is used in a composite primary key, you may hit a size limit, but otherwise all varchar entries only take size of data + 1 byte to store.
If the column is referenced as a foriegn key in some other table, you'll need to grow that column as well to VARCHAR(32) or may experience truncation of the extra 16 characters if you try to jam at 32 character string into a 16 character column.
If not MySQL, implementations could be different across DB technologies. However, VARCHAR implementations tend to be similar, using just the size of the stored data and then a constant amount to signify end of data. Hence why usually you have options bewteen a static CHAR and a dynamic VARCHAR type in many DB systems.
As you noted in your post, external systems relying on static data size have to be considered.
Note: please excuse the above fast and free swapping of terms byte
and character, I'm assuming UTF8 or ASCII. If you're using some
multibyte encoding, substitute approriately.

SHA1 sum as a primary key?

I am going to store filenames and other details in a table where, I am planning to use sha1 hash of the filename as PK.
Q1. SHA1 PK will not be a sequentially increasing/decreasing number.
so, will it be more resource consuming for the database to
maintain/search_into and index on that key? If i decide to keep it in database as 40 char value.
Q2. I read here:
https://stackoverflow.com/a/614483/986818 storing the data as
binary(20) field. can someone advice me in this regard:
a) do i have to create this column as: TYPE=integer, LENGTH=20,
COLLATION=binary, ATTRIBUTES=binary?
b) how to convert the sha1 value in MySQL or Perl to store into the
table?
c) is there a danger of duplicacy for this 20 char value?
**
---------UPDATE-------------
**
The requirement is to search the table on filename. user supplies filename, i go search the table and if filename is not there adds it. So either i index on varchar(100) filename field or generate a column with sha1 of the filename - hoping it would be easy for indexing for MySql compared to indexing a varchar field. Also i can search using the sha1 value from my program against the sha1 column. what say? primary key or just indexd key: i choose PK coz DBIx likes using PK. and PK or INDEX+UNIQ would be same amount of overhead for the system(so i thought)
Ok, then use a very -short- hash on the filename and accept collisions. Use an integer type for it (thats much faster!!!). E.g. you can use md5(filename) and then use the first 8 characters and convert them to an integer. SQL could look like this:
CREATE TABLES files (
id INT auto_increment,
hash INT unsigned,
filename VARCHAR(100),
PRIMARY KEY(id),
INDEX(hash)
);
Then you can use:
SELECT id FROM files WHERE hash=<hash> AND filename='<filename>';
The hash is then used for sorting out most other files (normally all other files) and then the filename is for selecting the right entry out of the few hash collisions.
For generating an integer hash-key in perl I suggest using md5() and pack().
If i decide to keep it in database as 40 char value.
Using a character sequence as a key will degrade performance for obvious reasons.
Also the PK is supposed to be unique. Although it will be probably be unlikely that you end up with collisions (theoretically using that for a function to create the PK seems inappropriate.
Additionally anyone knowing the filename and the hash you use, would know all your database ids. I am not sure if this is something not to consider.
Q1: Yes, it will need to build up a B-Tree of nodes that contain not only 1 Integer (4 Bytes) but a CHAR(40). Speed would be aproximately the same, as long the INDEX is kept in memory. As the entries are about 10 times bigger, you need 10 times more memory to keep it in memory. BUT: You probably want to lookup by the Hash anyway. So you'll need to have it either as Primary key OR as an Index.
Q2: Just create a Table field like CREATE TABLE test (ID BINARY(40), ...); later you can use INSERT INTO test (ID, ..) VALUES (UNHEX('4D7953514C'), ...);
-- Regarding: Is there a danger of duplicacy for this 20 char value?
The chance is 1 in 2^(8*20). 1 in 1,46 * 10^48 ... or 1 of 14615016373309029182036848327163*10^18. So the chance for that is very very v.. improbable.
There is no reason to use a cryptographically secure hash here. Instead, if you do this, use an ordinary hash. See here: https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed
The hash is NOT a 40 char value! It's a 160 bit number, and you should store it that way (as a 20 char binary field). Edit: I see you mentioned that in comment 2. Yes, you should definitely do that. But I can't tell you how since I don't know what programming language you are using. Edit2: I see it's perl - sorry I don't know how to convert it in perl, but look for "pack" functions.
No, do not create it as type integer. The maximum integer is 128 bits which doesn't hold the entire thing. Although you could really just truncate it to 128 bits without real harm.
It's better to use a simpler hash anyway. You could risk it and ignore collisions, but if you do it properly you kinda of have to handle them.
I would stick with the standard auto-incrementing integer for the primary key. If uniqueness of file names is important (which it sounds like it is), then you can add a UNIQUE constraint on the file name itself or some derived, canonical version of the file name. Most languages/frameworks have some sort of method for getting a canonical version of a path (relative to absolute, standardized case, etc).
If you implement my suggestion or pursue your original plan, then you should be aware that multiple strings can map to the same filename/path. Both versions will have different hashes/pass the uniqueness constraint but will actually both refer to the same file. This depends on operating system and may or may not be a problem for you. Just something to keep in mind.

Which is better Storing hash value or the bigint variable from which the hash value is generated

I have a table in which a column stores image src which is in hash value and that hash value is generated from microtime(),Now I have two choice storing directly hash value in database or storing that bigint microtime from which the image name is derived.Which would make my db faster.
We have to analyze this from all sides to asses what speed faults are incured.
I will make a few assumptions:
this data will be used as an identifier (primary key, unique key, composite key);
this data is used for searches and joins;
you are using a hashing algorithm such as SHA1 that yields a 40 character string of hex encoded data (MD5 yields a 32 character string of hex encoded data all said bellow can be adapted to MD5 if that's what you're using);
you may be interested in converting the hex values of the hash into binary to reduce the storage required by half and to improve comparison speed;
Inserting and Updating on the application side:
As #Namphibian stated is composed of 2 operations for the BIGINT versus 3 operations for the CHAR.
But the speed difference in my opinion really isn't that big. You can run 10.000.000 continuous calculations (in a while loop) and benchmark them to find out the real difference between them.
Also a speed difference in the application code affects users linearly, while speed differences in the DB affect users nonlinearly when traffic increases because overlapping writes have to wait for each other and some reads have to wait for writes to finish.
Inserting and Updating on the DB side:
Is almost the same for a BIGINT as it is for a CHAR(40) or a BINARY(20) because the more serious time consumption is done waiting for access to the disk rather than actually writing to it.
Selecting and Joining on the DB side:
This is always faster for a BIGINT compared to a CHAR(40) or a BINARY(20) for two reasons:
BIGINT is stored in 8 bytes while CHAR(40) is stored in 40 bytes and BINARY(20) in 20 bytes;
BIGINT's serially increasing nature makes it predictable and easy to compare and sort.
Second best option is the BINARY(20) because it saves some space and it is easier to compare due to reduced length.
Both BINARY(20) and CHAR(40) are the result of the hashing mechanism and are randomized, hence comparing and sorting takes a longer time on average because randomized data in indexes (for a btree index) needs more tree traversals to fetch (i mean that in the context of multiple values, not for one single value).
An important scientific principle may apply here: don't lose the original data. You never know what you may need it for.

How to choose optimized datatypes for columns [innodb specific]?

I'm learning about the usage of datatypes for databases.
For example:
Which is better for email? varchar[100], char[100], or tinyint (joking)
Which is better for username? should I use int, bigint, or varchar?
Explain. Some of my friends say that if we use int, bigint, or another numeric datatype it will be better (facebook does it). Like u=123400023 refers to user 123400023, rather then user=thenameoftheuser. Since numbers take less time to fetch.
Which is better for phone numbers? Posts (like in blogs or announcments)? Or maybe dates (I use datetime for that)? maybe some have make research that would like to share.
Product price (I use decimal(11,2), don't know about you guys)?
Or anything else that you have in mind, like, "I use serial datatype for blablabla".
Why do I mention innodb specifically?
Unless you are using the InnoDB table
types (see Chapter 11, "Advanced
MySQL," for more information), CHAR
columns are faster to access than
VARCHAR.
Inno db has some diffrence that I don't know.
I read that from here.
Brief Summary:
(just my opinions)
for email address - VARCHAR(255)
for username - VARCHAR(100) or VARCHAR(255)
for id_username - use INT (unless you plan on over 2 billion users in you system)
phone numbers - INT or VARCHAR or maybe CHAR (depends on if you want to store formatting)
posts - TEXT
dates - DATE or DATETIME (definitely include times for things like posts or emails)
money - DECIMAL(11,2)
misc - see below
As far as using InnoDB because VARCHAR is supposed to be faster, I wouldn't worry about that, or speed in general. Use InnoDB because you need to do transactions and/or you want to use foreign key constraints (FK) for data integrity. Also, InnoDB uses row level locking whereas MyISAM only uses table level locking. Therefore, InnoDB can handle higher levels of concurrency better than MyISAM. Use MyISAM to use full-text indexes and for somewhat less overhead.
More importantly for speed than the engine type: put indexes on the columns that you need to search on quickly. Always put indexes on your ID/PK columns, such as the id_username that I mentioned.
More details:
Here's a bunch of questions about MySQL datatypes and database design (warning, more than you asked for):
What DataType should I pick?
Table design question
Enum datatype versus table of data in MySQL?
mysql datatype for telephne number and address
Best mysql datatype for grams, milligrams, micrograms and kilojoule
MySQL 5-star rating datatype?
And a couple questions on when to use the InnoDB engine:
MyISAM versus InnoDB
When should you choose to use InnoDB in MySQL?
I just use tinyint for almost everything (seriously).
Edit - How to store "posts:"
Below are some links with more details, but here's the short version. For storing "posts," you need room for a long text string. CHAR max length is 255, so that's not an option, and of course CHAR would waste unused characters versus VARCHAR, which is variable length CHAR.
Prior to MySQL 5.0.3, VARCHAR max length was 255, so you'd be left with TEXT. However, in newer versions of MySQL, you can use VARCHAR or TEXT. The choice comes down to preference, but there are a couple differences. VARCHAR and TEXT max length is now both 65,535, but you can set you own max on VARCHAR. Let's say you think your posts will only need to be 2000 max, you can set VARCHAR(2000). If you every run into the limit, you can ALTER you table later and bump it to VARCHAR(3000). On the other hand, TEXT actually stores its data in a BLOB (1). I've heard that there may be performance differences between VARCHAR and TEXT, but I haven't seen any proof, so you may want to look into that more, but you can always change that minor detail in the future.
More importantly, searching this "post" column using a Full-Text Index instead of LIKE would be much faster (2). However, you have to use the MyISAM engine to use full-text index because InnoDB doesn't support it. In a MySQL database, you can have a heterogeneous mix of engines for each table, so you would just need to make your "posts" table use MyISAM. However, if you absolutely need "posts" to use InnoDB (for transactions), then set up a trigger to update the MyISAM copy of your "posts" table and use the MyISAM copy for all your full-text searches.
See bottom for some useful quotes.
MySQL Data Type Chart (outdated)
MySQL Datatypes (outdated)
Chapter 10. Data Types (better details)
The BLOB and TEXT Types (1)
11.9. Full-Text Search Functions (2)
10.4.1. The CHAR and VARCHAR Types (3)
(3) "Values in VARCHAR columns are
variable-length strings. The length
can be specified as a value from 0 to
255 before MySQL 5.0.3, and 0 to
65,535 in 5.0.3 and later versions.
Before MySQL 5.0.3, if you need a data
type for which trailing spaces are not
removed, consider using a BLOB or TEXT
type.
When CHAR values are stored, they are
right-padded with spaces to the
specified length. When CHAR values are
retrieved, trailing spaces are
removed.
Before MySQL 5.0.3, trailing spaces
are removed from values when they are
stored into a VARCHAR column; this
means that the spaces also are absent
from retrieved values."
Lastly, here's a great post about the pros and cons of VARCHAR versus TEXT. It also speaks to the performance issue:
VARCHAR(n) Considered Harmful
There are multiple angles to approach your question.
From a design POV it is always best to chose the datatype which expresses the quantity you want to model best. That is, get the data domain and data size right so that illegal data cannot be stored in the database in the first place. But that is not where MySQL is strong in the first place, and especially not with the default sql_mode (http://dev.mysql.com/doc/refman/5.1/en/server-sql-mode.html). If it works for you, try the TRADITIONAL sql_mode, which is a shorthand for many desireable flags.
From a performance POV, the question is entirely different. For example, regarding the storage of email bodies, you might want to read http://www.mysqlperformanceblog.com/2010/02/09/blob-storage-in-innodb/ and then think about that.
Removing redundancies and having short keys can be a big win. For example, in a project that I have seen, a log table has been storing http User-Agent information. By simply replacing each user agent string in the log table with a numeric id of a user agent string in a lookup table, data set size was considerably (more than 60%) reduced. By parsing the user agent further and then storing a bunch of ids (operating system, browser type, version index) data set size was reduced to 1% of the original size.
Finally, there is a number of rules that can help you spot errors in schema design.
For example, anything that has id in the name and is not an unsigned integer type is probably a bug (especially in the context of innodb).
For example, anything that has price or cost in the name and is not unsigned is a potential source of fraud (fraudster creates article with negative price, and buys that).
For example, anything that works on monetary data and is not using the DECIMAL data type of the appropriate size is probably doing math wrong (DECIMAL is doing BCD, decimal paper math with correct precision and rounding, DOUBLE and FLOAT do not).
SQLyog has Calculate optimal datatype feature which helps in finding out optimal datatype based on records inserted in a table.
It uses
SELECT * FROMtable_name` PROCEDURE ANALYSE(1, 10);
query to find out optimal datatype

MySQL binary against non-binary for hash IDs

Assuming that I want to use a hash as an ID instead of a numeric. Would it be an performance advantage to store them as BINARY over non-binary?
CREATE TABLE `test`.`foobar` (
`id` CHAR(32) BINARY CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
PRIMARY KEY (`id`)
)
CHARACTER SET ascii;
Yes. Often a hash digest is stored as the ASCII representation of hex digits, for example MD5 of the word 'hash' is:
0800fc577294c34e0b28ad2839435945
This is a 32-character ASCII string.
But MD5 really produces a 128-bit binary hash value. This should require only 16 bytes to be stored as binary values instead of hex digits. So you can gain some space efficiency by using binary strings.
CREATE TABLE test.foobar (
id BINARY(16) NOT NULL PRIMARY KEY
);
INSERT INTO test.foobar (id) VALUES (UNHEX(MD5('hash')));
Re. your comments that you are more concerned about performance than space efficiency:
I don't know of any reason that the BINARY data type would be speedier than CHAR.
Being half as large can be an advantage for performance if you use cache buffers effectively. That is, a given amount of cache memory can store twice as many rows worth of BINARY data if the string is half the size of the CHAR needed to store the same value in hex. Likewise the cache memory for the index on that column can store twice as much.
The result is a more effective cache, because a random query has a greater chance of hitting the cached data or index, instead of requiring a disk access. Cache efficiency is important for most database applications, because usually the bottleneck is disk I/O. If you can use cache memory to reduce frequency of disk I/O, it's a much bigger bang for the buck than the choice between one data type or another.
As for the difference between a hash string stored in BINARY versus a BIGINT, I would choose BIGINT. The cache efficiency will be even greater, and also on 64-bit processors integer arithmetic and comparisons should be very fast.
I don't have measurements to support the claims above. The net benefit of choosing one data type over another depends a lot on data patterns and types of queries in your database and application. To get the most precise answer, you must try both solutions and measure the difference.
Re. your supposition that binary string comparison is quicker than default case-insensitive string comparison, I tried the following test:
mysql> SELECT BENCHMARK(100000000, 'foo' = 'FOO');
1 row in set (5.13 sec)
mysql> SELECT BENCHMARK(100000000, 'foo' = BINARY 'FOO');
1 row in set (4.23 sec)
So binary string comparison is 17.5% faster than case-insensitive string comparison. But notice that after evaluating this expression 100 million times, the total difference is still less than 1 second. While we can measure the relative difference in speed, the absolute difference in speed is really insignificant.
So I'll reiterate:
Measure, don't guess or suppose. Your educated guesses will be wrong a lot of the time. Measure before and after every change you make, so you know how much it helped.
Invest your time and attention where you get the greatest bang for the buck.
Don't sweat the small stuff. Of course, a tiny difference adds up with enough iterations, but given those iterations, a performance improvement with greater absolute benefit is still preferable.
From the manual:
The BINARY and VARBINARY types are similar to CHAR and VARCHAR, except
that they contain binary strings rather than non-binary strings. That is,
they contain byte strings rather than character strings. This means that
they have no character set, and sorting and comparison are based on the
numeric values of the bytes in the values.
Since CHAR(32) BINARY causes a BINARY(32) column to be created under the hood, the benefit is that it will take less time to sort by that column, and probably less time to find corresponding rows if the column is indexed.