Issue is to create index for table 'visits_visit' (Django visit app), because every query lasts at least 60 ms and is going to be worse.
CREATE INDEX resource ON visits_visit (object_app(200), object_model(200), object_id(200));
It returns:
ERROR 1071 (42000): Specified key was too long; max key length is 1000 bytes
What to do? Structure of table is on the screenshot.
See the reference to a possible duplicate question already answered in comments under your question. Or should I say a canonical duplicate target to close this question to if it does close. That said, not much there in that reference in terms of storage engines or character sets.
In your case the character set factors in with the use of string-type columns in your composite index.
A side note is certainly performance. Don't expect a great one in general with what you are attempting. Your index is way too wide and may very well not even be of the intended use. Indexes and their benefit need careful scrutiny. This can be ascertained with the use of mysql explain. See the following, in particular the General Comments section.
Please see the following article Using Innodb_large_prefix to Avoid ERROR 1071 and below is an excerpt.
The character limit depends on the character set you use. For example
if you use latin1 then the largest column you can index is
varchar(767), but if you use utf8 then the limit is varchar(255).
There is also a separate 3072 byte limit per index. The 767 byte limit
is per column, so you can include multiple columns (each 767 bytes or
smaller) up to 3072 total bytes per index, but no column longer than
767 bytes. (MyISAM is a little different. It has a 1000 byte index
length limit, but no separate column length limit within that). One
workaround for these limits is to only index a prefix of the longer
columns, but what if you want to index more than 767 bytes of a column
in InnoDB? In that case you should consider using innodb_large_prefix,
which was introduced in MySQL 5.5.14 and allows you to include columns
up to 3072 bytes long in InnoDB indexes. It does not affect the index
limit, which is still 3072 bytes.
Also see the Min and Max section from the Mysql Manual Page Limits on InnoDB Tables
The 'right' answer is to shorten the fields and/or normalize them.
Do you really have 200-character-long apps, models, etc? If not, shorten the fields.
Probably model is repeated in the table a lot? If so, normalize it and replace the column with the id from normalizing it.
You seem to be using MyISAM; you could (should) also switch to InnoDB. That will change the error message, or it might make it go away.
Are you using utf8 characters? Are you doing everything in English? Changing the CHARACTER SET could make 200 characters mean 200 bytes, not 600 (utf8) or 800 (utf8mb4).
Changing the character set for ip_address would shrink its footprint from 15 * (bytes/char). So would changing from CHAR to VARCHAR. Note also that 15 is insufficient to handle IPv6.
Related
My main issue is that after we expanded CHAR(50) to CHAR(64) we started receiving timeouts on internal backup queries. The record size is a few kb and the database is very very big, so this column that is a primary key must be the reason for our trouble.
I searched through the internet but I found only selecting the type of the keys or comparison of CHAR vs VARCHAR but nothing about the optimal size.
For example, is there some special optimization in MYSQL that for indices smaller than, let's say, 60 bytes it uses some for of caching while for larger it starts swapping stuff?
Any help would be appreciated. Even those suggesting there is no difference and simply the % of time spent on join was increased by % of the index size.
EDIT
THIS IS NOT THE ANSWER FITTING THE QUESTION however I have found out the reason our change got a HUGE performance hit.
We expanded column using
ALTER TABLE table MODIFY job_id CHAR(64);
This caused CHARACTER SET fall back to the default one = utf8mb4 dropping previous latin1
That would conclude my research, but I will leave this question opened for anyone that would be able to answer the impact on resizing key column.
This question does look for suggestion on type change.
Thank you all for their time and inputs!
Short answer: There is no caching/swapping/optimal/etc size.
Long answer:
Don't use CHAR unless the data for that column really is fixed length -- such as country_code, postal_code, UUID, SSN, etc. Furthermore, use the minimal charset needed, such as ascii for those. CHAR wastes space by padding with spaces.
There is no such cutoff. There is no inherent problem in having a long PRIMARY KEY except that ...
Every secondary key has a copy of the PK. (This says the break-even is at 2 indexes; for more than 2, the extra bulk in secondary keys adds up for long PKs.)
Columns in other tables that need to JOIN to this PK (with or without an Foreign Key declared) will be bulkier than an INT.
Many users (or 3rd party software that generates SQL) blindly uses 8-byte BIGINT for ids. Even the 4-byte INT is usually overkill; see the smaller INT types.
Indexes are limited, but many things factor in:
767 / 1000 / 3072 bytes depending on engine and version
Character set of char/varchar: CHAR/VARCHAR(50) may take 50 / 100 / 150 / 200 bytes, depending on charset.
InnoDB's buffer_pool is limited by innodb_buffer_pool_size, which should be set to something like 70% of RAM, does have caching. This implies that the bigger a table or index is, the more I/O is likely to be done.
Bottom line: Your timeouts are coming from other things. Consider increasing the timeout.
Also
When doing ALTER TABLE ... MODIFY COLUMN ..., you must specify all the characteristics of the column, specifically including the ones you are not changing. These include CHARACTER SET, COLLATION, [NOT] NULL, DEFAULT, etc.
I like to do SHOW CREATE TABLE to get the current definition for the columns, then copy the one I want to change into my fresh ALTER, modifying the one thing I am changing.
We have table where we store tokens for users (i.e. accessTokens).
The problem is, sometimes tokens can have more than 255 length and MySQL/MariaDB is unable to store it into table that have unique index on this column.
We need unique indexes, therefore one solution is to add additional column with hash of token which has max 255 length and put unique index to it. Any search/save will go through this hash, after match, we select the whole token and send it back. After a lot of thinking and googling this is probably the only viable solution for this use-case (but you can try to give us another one).
Every single token we generate right now is at least partially random, therefore slightly chance of hash collision is "ok", the user is not stucked forever in next request, it should pass.
Do you know any good modern method in 2017? Having some statistical data about hash collision for this method would be appreciated.
The hash is only for internal use - we dont need it to be secure (fast insecure hash is best for us), it should be long enough to have low chance of collision but must never ever pass the 255 length limit.
PS: Setting up special version of database/table that allows more length is not viable, we need it also in some older system without migration.
Are these access tokens representable with 8-bit characters? That is, are all the characters in them taken from the ASCII or iso-8859-1 character sets?
If so, you can get a longer unique index than 255 by declaring the access-token column with COLLATE latin1_bin. The limit of an index prefix is 767 bytes, but utf8 characters in VARCHAR columns take 3 bytes per character.
So a column with 767 unique latin1 characters should be uniquely indexable. That may solve your problem if your unique hashes all fit in about 750 bytes.
If not ...
You've asked for a hash function for your long tokens with a "low" risk of collision. SHA1 is pretty good, and is available as a function in MySQL. SHA512 is even better, but doesn't work in all MySQL servers. But the question is this: What is the collision risk of taking the first, or last, 250 characters of your long tokens and using them as a hash?
Why do I ask? Because your spec calls for a unique index on a column that's too long for a MySQL unique index. You're proposing to solve that problem by using a hash function that is also not guaranteed to be unique. That gives you two choices, both of which require you to live with a small collision probability.
Add a hash column that's computed by SHA2('token', 512) and live with the tiny probablility of collision.
Add a hash column that's computed by LEFT('token', 255) and live with the tiny probability of collision.
You can implement the second choice simply by removing the unique constraint on your index on the token column. (In other words, by doing very little.)
The SHA has family has well-known collision characteristics. To evaluate some other hash function would require knowing the collision characteristics of your long tokens, and you haven't told us those.
Comments on HASHing
UNHEX(MD5(token)) fits in 16 bytes - BINARY(16).
As for collisions: Theoretically, there is only one chance in 9 trillion that you will get a collision in a table of 9 trillion rows.
For SHA() in BINARY(20) the odds are even less. Bigger shas are, in my opinion, overkill.
Going beyond the 767 limit to 3072
⚈ Upgrade to 5.7.7 (MariaDB 10.2.2?) for 3072 byte limit -- but your cloud may not provide this;
⚈ Reconfigure (if staying with 5.6.3 - 5.7.6 (MariaDB 10.1?)) -- 4 things to change: Barracuda + innodb_file_per_table + innodb_large_prefix + dynamic or compressed.
Later versions of 5.5 can probably perform the 'reconfigure'.
Similar Question: Does MariaDB allow 255 character unique indexes?
I usually use maximum chars possible for varchar fields, so in most cases I set 255 but only using 16 chars in columns...
does this decreases performance for my database?
When it comes to storage, a VARCHAR(255) column will take up 1 byte to store the length of the actual value plus the bytes required to store the actual value.
For a latin1 VARCHAR(255) column, that's at most 256 bytes. For a UTF8 column, where each character can take up to 3 bytes (though rarely), the maximum size is 766 bytes. As we know the maximum index length for a single column in bytes in InnoDB is 767 bytes, hence perhaps the reason some declare 255 as the maximum supported column length.
So, again, when storing the value, it only takes up as much room as is actually needed.
However, if the column is indexed, the index automatically allocates the maximum possible size so that each node in the index has enough room to store any possible value. When searching through an index, MySQL loads the nodes in specific byte size chunks at a time. Large nodes means less nodes per read, which means it takes longer to search the index.
MySQL will also use the maximum size when storing the values in a temp table for sorting.
So, even if you aren't using indexes, but are ever performing a query that can't utilize an index for sorting, you will get a performance hit.
Therefore, if performance is your goal, setting any VARCHAR column to 255 characters should not be a rule of thumb. Instead, you should use the minimum required.
There may be edge cases where you'd rather suffer the performance every day so that you never have to lock a table completely to increase the size of a column, but I don't think that's the norm.
One possible exception is if you are joining on a VARCHAR column between two tables. MySQL says:
MySQL can use indexes on columns more efficiently if they are declared
as the same type and size.
In that case, you might use the max size between the two.
Whenever you're talking about "performance" you can only find out one way: Benchmarking.
In theoretical terms there's no difference between VARCHAR(20) and VARCHAR(255) if they're both populated with the same data. Keep in mind if you get your length wrong you will have massive truncation problems and MySQL does not warn you before it starts chopping data to fit.
I try to avoid setting limits on VARCHAR columns unless the data would be completely invalid if it was longer. For instance, two-character ISO country codes can be stored in VARCHAR(2) because longer strings are meaningless. For other things, especially names or phone numbers, limiting the length is potentially and probably harmful.
Still, you will want to test any schema you create to be sure it meets your performance requirements. I expect you'd have a hard time detecting any difference at all between VARCHAR(25) and VARCHAR(16).
There are two ways in which this will decrease performance.
if you're loading those columns many many times, performing a join on the column, or other such thing that means they need to be accessed a large number of times. The number of times depends on your machine, but think on the order of millions.
if you're always filling the field (using 20 chars in a varchar(20), then the length checks are adding a little overhead whenever you perform an insert.
The best way to determine this though is to benchmark your database though.
I am reading that MySQL 5.6 can only index the first 767 bytes of a varchar (or other text-based types). My schema character set is utf-8, so each character can be stored on up to 3 bytes. Since 767/3 = 255.66, this would indicate that the maximum length for a text column that needs to be indexed in 255 characters. Experience seems to confirm this as the following goes through:
create table gaga (
val varchar(255),
index(val)
) engine = InnoDB;
But changing the definition of val to varchar(256) yields an "Error Code: 1071. Specified key was too long; max key length is 767 bytes".
In this day in age, the limit to 255 characters seems awfully low, so: is this correct? If it is what is the best way to get larger pieces of text indexed with MySQL? (Should I avoid it? Store a SHA? Use another sort of index? Use another database character encoding?)
Though the limitation might seem ridiculous, it makes you think over if you really need the index for such a long varchar field. Even with 767 bytes the index size grows very fast and for a large table (where it is most useful) most probably won't fit into memory.
From the other side, the only frequent case at least in my experience where I needed to index a long varchar field was a unique constraint. And in all those cases a composite index of some group id and MD5 from the varchar field was sufficient. The only problem is to mimick the case-insensitive collation (which considers accented charactes and not-accented equal), though in all my cases I anyway used binary collation, so it was not a problem.
UPD. Another frequent case for indexing a long varchar is ordering. For this case I usually define a separate indexed sorter field which is a prefix of 5-15 characters depending on data distribution. For me, a compact index is more preferable than rarely inaccurate ordering.
Most of the times I define varchar(255) length auto.
But now I thinking how much varchar length should be best to define for utf8 fields:
password
username
email
If this fields should be define less than varchar 255, how much performance it will improve?
Thanks
'password' should be char(40) if you use SHA1 hashes. This might have binary collation if you are sure that the cases of the hash is always the same. This gives you better performance. If you're not, use latin1, but don't use utf8.
'email'... use 255, you cannot know how long someone's email address is.
For the username I'd just use whatever your max username length is. 20 or 30 would probably be good.
If you have an index on a character field (especially if it's part f the PK) choose the length very carefully, because longer and longer indexes might reduce performance heavily (and increases memory usage).
Also, if you use UTF8 char field in an index, you have to be aware, that MySQL reserves 3 times more bytes that the actual character length of the field, preparing for the worst case (UTF8 might store certain characters on 3 bytes). This can also cause lack of memory.
If you index any of those fields (and you don't use a prefix as the index), bear in mind that MySQL will index the field as though it were CHAR rather than VARCHAR, and each index record will use the maximum potential space (so 3n bytes for a VARCHAR(n), since a UTF8 character can be up to 3 bytes long). That could mean the index will be larger than necessary. To get around this, make the field smaller, or index on a prefix.
(I should say: I'm sure I've read that this is the case somewhere in the MySQL documentation, but I couldn't find it when I looked just now.)
changing that won't have a big effect on performance (depending on how much rows are in that table - probably you won't notice any effect), but maybe it will make your database using less disk space. (i use a lengh of 30 for user names, 64 for passwords(legth of the hash) and 50 for email adresses).