I was taking a look at this question regarding the field length and type to use for bcrypt hashes. Several the answers mention using the BINARY MYSQL column type. However, when reading from this column with the mysql node.js module, it reads BINARY columns into a buffer type rather than a string. The bcrypt compare function bcrypt.compare(password, hash, callback) does not like the buffer type:
Error: data and hash must be strings
at node_modules/bcrypt/bcrypt.js:150:16
This leads me to two questions:
First, I assume that what I want to do is hash_buffer.toString(), but I notice in the documentation that there are different character encodings that can be used. I'm not sure what the correct encoding to use is since the data doesn't really represent actual characters. Since I want the binary data to remain unchanged, I would guess ASCII. Can anyone confirm this?
Second, I don't understand why not to use the CHAR data type. The hash is specifically made to be a printable string. I understand that the MYSQL comparisons might not be made as expected, but there is no appropriate time to search for or sort by a password hash anyways.
Generally speaking, it makes sense to use BINARY columns to store bcrypt hashes if MySQL is the one doing the comparison. A binary collation will prevent unwanted comparison results, e.g. 'A' being equal to 'a'; this makes a big difference in Base64 encoding.
However, if the comparison is exclusively performed in the application, you can save yourself the trouble and use a regular CHAR column for storing the hash.
Related
I have a Tableau data source (original source is MariaDB) containing a number of email addresses. When doing some quality checks on the data source I wanted to identify a variety of possible data entry issues. One of these is whether the string is stored in all lower case.
While emails are not case sensitive, the data entry should ensure they are stored in a standardised way and not just as free text. But historic data has not imposed such checks. So I wanted to identify where emails have upper-case characters which is one (small) signal that the data entry has not been careful.
But with Tableau 2020.2 this seems hard to do (either directly from the source table or as extracts). For example testing whether lower([Email])=[Email] simply returns true for all emails regardless of capitalisation.
Is there any easy way to force case sensitivity in comparisons inside Tableau or to check for upper-case characters in a string?
NB: this question is about Tableau not about solving the problem in simple DB queries. I use Tableau connected to a variety of data sources and would prefer a generic solution inside Tableau so details of DB setups are irrelevant.
REGEX_MATCH([email],'[A-Z]') should be case-sensitive regardless of the data source, since by definition the pattern [A-Z] matches only uppercase letters.
Is there any easy way to force case sensitivity in comparisons or to check for upper-case characters in a string?
Basically, you want binary. This conditions evaluates as true if the string has any upper case character:
where lower(email) <> binary email
You could also use a regex:
where email rlike binary '[A-Z]'
Your question indicates that you are storing your data in case-insensitive character set / collation. If the case is meaningful, then you should use case-sensitive settings, which would remove the need for binary, hence making your queries more straight-forward, less error-prone, and potentially more efficient.
Which datatype can I use to store really big integer in SQL. I am using phpmyAdmin to view data and java program for storing and retrieving values. Actually I am working with Bilinear Maps which uses random numbers generated from Zp where p is very large prime number and then "raised to" operations on those number.
I want to store some numbers in database like public keys. What data type can I use for table columns in SQL for such values?
You could store them as strings of decimal digits using type CHARACTER. While this does waste some space, an advantage is that the database will be easier for humans to understand.
You could store them as raw binary big-endian values using type BLOB. This is the most efficient for software to access and takes up the least space. However, humans will not be able to easily query the database for these values or understand them in dumps.
Personally, I would opt for the blob unless there's a real need for the database to be understandable by humans using standard query tools. If you can't get around needing to administer the database with tools that don't understand your data format, then just use decimal values in text.
For MySQL, VARCHAR(300) CHARACTER SET ascii.
VAR, assuming the numbers won't always be exactly 300.
CHAR -- no big advantage in BLOB.
ascii -- no need for utf8 involvement.
DECIMAL won't work because there is a 64-digit limit.
The space taken will be 2+length bytes (302 in your example), where the 2 is for length for VAR.
I have a piece of information which is encoded using aes-256-cbc encryption. How should I store it in the database? Currently I'm using VARCHAR(255) utf8_bin. Is this OK or should I use other field type like VARBINARY(255)? Is there a possibility of losing some data using VARCHAR in this case? Thanks.
The possible (in)appropriateness of storing encrypted (as opposed to hashed) passwords in a database notwithstanding, AES ciphertext is binary data, and therefore should be stored as such, i.e. in a BINARY / VARBINARY column or a BLOB.
It's also possible to encode the ciphertext e.g. as base64, and then store it in a text (i.e. CHAR / VARCHAR / TEXT) column. This is less space-efficient, but it may sometimes be more convenient, e.g. when inspecting the data visually or passing it between programs that may have trouble dealing with fields containing arbitrary binary data.
I'm not sure how password hashing works (will be implementing it later), but need to create database schema now.
I'm thinking of limiting passwords to 4-20 characters, but as I understand after encrypting hash string will be of different length.
So, how to store these passwords in the database?
Update: Simply using a hash function is not strong enough for storing passwords. You should read the answer from Gilles on this thread for a more detailed explanation.
For passwords, use a key-strengthening hash algorithm like Bcrypt or Argon2i. For example, in PHP, use the password_hash() function, which uses Bcrypt by default.
$hash = password_hash("rasmuslerdorf", PASSWORD_DEFAULT);
The result is a 60-character string similar to the following (but the digits will vary, because it generates a unique salt).
$2y$10$.vGA1O9wmRjrwAVXD98HNOgsNpDczlqm3Jq7KnEd1rVAGv3Fykk1a
Use the SQL data type CHAR(60) to store this encoding of a Bcrypt hash. Note this function doesn't encode as a string of hexadecimal digits, so we can't as easily unhex it to store in binary.
Other hash functions still have uses, but not for storing passwords, so I'll keep the original answer below, written in 2008.
It depends on the hashing algorithm you use. Hashing always produces a result of the same length, regardless of the input. It is typical to represent the binary hash result in text, as a series of hexadecimal digits. Or you can use the UNHEX() function to reduce a string of hex digits by half.
MD5 generates a 128-bit hash value. You can use CHAR(32) or BINARY(16)
SHA-1 generates a 160-bit hash value. You can use CHAR(40) or BINARY(20)
SHA-224 generates a 224-bit hash value. You can use CHAR(56) or BINARY(28)
SHA-256 generates a 256-bit hash value. You can use CHAR(64) or BINARY(32)
SHA-384 generates a 384-bit hash value. You can use CHAR(96) or BINARY(48)
SHA-512 generates a 512-bit hash value. You can use CHAR(128) or BINARY(64)
BCrypt generates an implementation-dependent 448-bit hash value. You might need CHAR(56), CHAR(60), CHAR(76), BINARY(56) or BINARY(60)
As of 2015, NIST recommends using SHA-256 or higher for any applications of hash functions requiring interoperability. But NIST does not recommend using these simple hash functions for storing passwords securely.
Lesser hashing algorithms have their uses (like internal to an application, not for interchange), but they are known to be crackable.
Always use a password hashing algorithm: Argon2, scrypt, bcrypt or PBKDF2.
Argon2 won the 2015 password hashing competition. Scrypt, bcrypt and PBKDF2 are older algorithms that are considered less preferred now, but still fundamentally sound, so if your platform doesn't support Argon2 yet, it's ok to use another algorithm for now.
Never store a password directly in a database. Don't encrypt it, either: otherwise, if your site gets breached, the attacker gets the decryption key and so can obtain all passwords. Passwords MUST be hashed.
A password hash has different properties from a hash table hash or a cryptographic hash. Never use an ordinary cryptographic hash such as MD5, SHA-256 or SHA-512 on a password. A password hashing algorithm uses a salt, which is unique (not used for any other user or in anybody else's database). The salt is necessary so that attackers can't just pre-calculate the hashes of common passwords: with a salt, they have to restart the calculation for every account. A password hashing algorithm is intrinsically slow — as slow as you can afford. Slowness hurts the attacker a lot more than you because the attacker has to try many different passwords. For more information, see How to securely hash passwords.
A password hash encodes four pieces of information:
An indicator of which algorithm is used. This is necessary for agility: cryptographic recommendations change over time. You need to be able to transition to a new algorithm.
A difficulty or hardness indicator. The higher this value, the more computation is needed to calculate the hash. This should be a constant or a global configuration value in the password change function, but it should increase over time as computers get faster, so you need to remember the value for each account. Some algorithms have a single numerical value, others have more parameters there (for example to tune CPU usage and RAM usage separately).
The salt. Since the salt must be globally unique, it has to be stored for each account. The salt should be generated randomly on each password change.
The hash proper, i.e. the output of the mathematical calculation in the hashing algorithm.
Many libraries include a pair functions that conveniently packages this information as a single string: one that takes the algorithm indicator, the hardness indicator and the password, generates a random salt and returns the full hash string; and one that takes a password and the full hash string as input and returns a boolean indicating whether the password was correct. There's no universal standard, but a common encoding is
$algorithm$parameters$salt$output
where algorithm is a number or a short alphanumeric string encoding the choice of algorithm, parameters is a printable string, and salt and output are encoded in Base64 without terminating =.
16 bytes are enough for the salt and the output. (See e.g. recommendations for Argon2.) Encoded in Base64, that's 21 characters each. The other two parts depend on the algorithm and parameters, but 20–40 characters are typical. That's a total of about 82 ASCII characters (CHAR(82), and no need for Unicode), to which you should add a safety margin if you think it's going to be difficult to enlarge the field later.
If you encode the hash in a binary format, you can get it down to 1 byte for the algorithm, 1–4 bytes for the hardness (if you hard-code some of the parameters), and 16 bytes each for the salt and output, for a total of 37 bytes. Say 40 bytes (BINARY(40)) to have at least a couple of spare bytes. Note that these are 8-bit bytes, not printable characters, in particular the field can include null bytes.
Note that the length of the hash is completely unrelated to the length of the password.
You can actually use CHAR(length of hash) to define your datatype for MySQL because each hashing algorithm will always evaluate out to the same number of characters. For example, SHA1 always returns a 40-character hexadecimal number.
You might find this Wikipedia article on salting worthwhile. The idea is to add a set bit of data to randomize your hash value; this will protect your passwords from dictionary attacks if someone gets unauthorized access to the password hashes.
You should use TEXT (storing unlimited number of characters) for the sake of forward compatibility. Hashing algorithms (need to) become stronger over time and thus this database field will need to support more characters over time. Additionally depending on your migration strategy you may need to store new and old hashes in the same field, so fixing the length to one type of hash is not recommended.
As a fixed length string (VARCHAR(n) or however MySQL calls it).
A hash has always a fixed length of for example 12 characters (depending on the hash algorithm you use). So a 20 char password would be reduced to a 12 char hash, and a 4 char password would also yield a 12 char hash.
Hashes are a sequence of bits (128 bits, 160 bits, 256 bits, etc., depending on the algorithm). Your column should be binary-typed, not text/character-typed, if MySQL allows it (SQL Server datatype is binary(n) or varbinary(n)). You should also salt the hashes. Salts may be text or binary, and you will need a corresponding column.
It really depends on the hashing algorithm you're using. The length of the password has little to do with the length of the hash, if I remember correctly. Look up the specs on the hashing algorithm you are using, run a few tests, and truncate just above that.
I've always tested to find the MAX string length of an encrypted string and set that as the character length of a VARCHAR type. Depending on how many records you're going to have, it could really help the database size.
for md5 vARCHAR(32) is appropriate. For those using AES better to use varbinary.
I'm not sure how password hashing works (will be implementing it later), but need to create database schema now.
I'm thinking of limiting passwords to 4-20 characters, but as I understand after encrypting hash string will be of different length.
So, how to store these passwords in the database?
Update: Simply using a hash function is not strong enough for storing passwords. You should read the answer from Gilles on this thread for a more detailed explanation.
For passwords, use a key-strengthening hash algorithm like Bcrypt or Argon2i. For example, in PHP, use the password_hash() function, which uses Bcrypt by default.
$hash = password_hash("rasmuslerdorf", PASSWORD_DEFAULT);
The result is a 60-character string similar to the following (but the digits will vary, because it generates a unique salt).
$2y$10$.vGA1O9wmRjrwAVXD98HNOgsNpDczlqm3Jq7KnEd1rVAGv3Fykk1a
Use the SQL data type CHAR(60) to store this encoding of a Bcrypt hash. Note this function doesn't encode as a string of hexadecimal digits, so we can't as easily unhex it to store in binary.
Other hash functions still have uses, but not for storing passwords, so I'll keep the original answer below, written in 2008.
It depends on the hashing algorithm you use. Hashing always produces a result of the same length, regardless of the input. It is typical to represent the binary hash result in text, as a series of hexadecimal digits. Or you can use the UNHEX() function to reduce a string of hex digits by half.
MD5 generates a 128-bit hash value. You can use CHAR(32) or BINARY(16)
SHA-1 generates a 160-bit hash value. You can use CHAR(40) or BINARY(20)
SHA-224 generates a 224-bit hash value. You can use CHAR(56) or BINARY(28)
SHA-256 generates a 256-bit hash value. You can use CHAR(64) or BINARY(32)
SHA-384 generates a 384-bit hash value. You can use CHAR(96) or BINARY(48)
SHA-512 generates a 512-bit hash value. You can use CHAR(128) or BINARY(64)
BCrypt generates an implementation-dependent 448-bit hash value. You might need CHAR(56), CHAR(60), CHAR(76), BINARY(56) or BINARY(60)
As of 2015, NIST recommends using SHA-256 or higher for any applications of hash functions requiring interoperability. But NIST does not recommend using these simple hash functions for storing passwords securely.
Lesser hashing algorithms have their uses (like internal to an application, not for interchange), but they are known to be crackable.
Always use a password hashing algorithm: Argon2, scrypt, bcrypt or PBKDF2.
Argon2 won the 2015 password hashing competition. Scrypt, bcrypt and PBKDF2 are older algorithms that are considered less preferred now, but still fundamentally sound, so if your platform doesn't support Argon2 yet, it's ok to use another algorithm for now.
Never store a password directly in a database. Don't encrypt it, either: otherwise, if your site gets breached, the attacker gets the decryption key and so can obtain all passwords. Passwords MUST be hashed.
A password hash has different properties from a hash table hash or a cryptographic hash. Never use an ordinary cryptographic hash such as MD5, SHA-256 or SHA-512 on a password. A password hashing algorithm uses a salt, which is unique (not used for any other user or in anybody else's database). The salt is necessary so that attackers can't just pre-calculate the hashes of common passwords: with a salt, they have to restart the calculation for every account. A password hashing algorithm is intrinsically slow — as slow as you can afford. Slowness hurts the attacker a lot more than you because the attacker has to try many different passwords. For more information, see How to securely hash passwords.
A password hash encodes four pieces of information:
An indicator of which algorithm is used. This is necessary for agility: cryptographic recommendations change over time. You need to be able to transition to a new algorithm.
A difficulty or hardness indicator. The higher this value, the more computation is needed to calculate the hash. This should be a constant or a global configuration value in the password change function, but it should increase over time as computers get faster, so you need to remember the value for each account. Some algorithms have a single numerical value, others have more parameters there (for example to tune CPU usage and RAM usage separately).
The salt. Since the salt must be globally unique, it has to be stored for each account. The salt should be generated randomly on each password change.
The hash proper, i.e. the output of the mathematical calculation in the hashing algorithm.
Many libraries include a pair functions that conveniently packages this information as a single string: one that takes the algorithm indicator, the hardness indicator and the password, generates a random salt and returns the full hash string; and one that takes a password and the full hash string as input and returns a boolean indicating whether the password was correct. There's no universal standard, but a common encoding is
$algorithm$parameters$salt$output
where algorithm is a number or a short alphanumeric string encoding the choice of algorithm, parameters is a printable string, and salt and output are encoded in Base64 without terminating =.
16 bytes are enough for the salt and the output. (See e.g. recommendations for Argon2.) Encoded in Base64, that's 21 characters each. The other two parts depend on the algorithm and parameters, but 20–40 characters are typical. That's a total of about 82 ASCII characters (CHAR(82), and no need for Unicode), to which you should add a safety margin if you think it's going to be difficult to enlarge the field later.
If you encode the hash in a binary format, you can get it down to 1 byte for the algorithm, 1–4 bytes for the hardness (if you hard-code some of the parameters), and 16 bytes each for the salt and output, for a total of 37 bytes. Say 40 bytes (BINARY(40)) to have at least a couple of spare bytes. Note that these are 8-bit bytes, not printable characters, in particular the field can include null bytes.
Note that the length of the hash is completely unrelated to the length of the password.
You can actually use CHAR(length of hash) to define your datatype for MySQL because each hashing algorithm will always evaluate out to the same number of characters. For example, SHA1 always returns a 40-character hexadecimal number.
You might find this Wikipedia article on salting worthwhile. The idea is to add a set bit of data to randomize your hash value; this will protect your passwords from dictionary attacks if someone gets unauthorized access to the password hashes.
You should use TEXT (storing unlimited number of characters) for the sake of forward compatibility. Hashing algorithms (need to) become stronger over time and thus this database field will need to support more characters over time. Additionally depending on your migration strategy you may need to store new and old hashes in the same field, so fixing the length to one type of hash is not recommended.
As a fixed length string (VARCHAR(n) or however MySQL calls it).
A hash has always a fixed length of for example 12 characters (depending on the hash algorithm you use). So a 20 char password would be reduced to a 12 char hash, and a 4 char password would also yield a 12 char hash.
Hashes are a sequence of bits (128 bits, 160 bits, 256 bits, etc., depending on the algorithm). Your column should be binary-typed, not text/character-typed, if MySQL allows it (SQL Server datatype is binary(n) or varbinary(n)). You should also salt the hashes. Salts may be text or binary, and you will need a corresponding column.
It really depends on the hashing algorithm you're using. The length of the password has little to do with the length of the hash, if I remember correctly. Look up the specs on the hashing algorithm you are using, run a few tests, and truncate just above that.
I've always tested to find the MAX string length of an encrypted string and set that as the character length of a VARCHAR type. Depending on how many records you're going to have, it could really help the database size.
for md5 vARCHAR(32) is appropriate. For those using AES better to use varbinary.