Varchar type and performance issues - mysql

I'm modelling a database and I want to ask about the Varchar type.
Is there any performance difference between Varchar(50) and Varchar(100)?
For example, I got a varchar(50) field and user saved 5 char data to this field and another User saved 25 char to this field. But no one saved 50 char data to this field. So I have to turn this field varchar(25) for max performance?

Assuming SQL Server (since you also listed MySQL), no.
The overhead is the same as long as the data in the field is the same.

No, there's no performance penalty if you define varchar(50) and you only store a max of 10 characters, to give you an example. Varchar will always store the data in row.

EDIT: For MySQL: No. At least, not unless you have some strings between 50-100 characters.
Normally varchars are stored as a length + data; only the data supplied are stored. A varchar(100) takes more bytes for the length (maximum 300 bytes, assuming utf8 3-bytes-per-character), but it's not enough that you care.
It's the same for the indexes. It's only as much as you really store.

The column length should reflect the maximum length of the data your users want to store in that column. If your application has been designed on the basis of well-researched requirements then there ought to be some justification for that length; just because nobody is storing strings > 25 characters in that column yet doesn't mean that they won't someday.
To answer the other aspect of your question, there is no performance gain to be made from reducing the length of the column. As the docs have it, the column only takes up storage to fit the data assigned. So a VARCHAR(100) column with only 25 bytes of data won't take up any more storage than a VARCHAR(25) column with the same data. Find out more.
Is storage a performance issue? It can be, because shorter records on disk equals more records retrieved per I/O operation.

For most implementations there may be an inconsequential performance difference between VARCHAR(50) and VARCHAR(25); probably not worth worrying about.
As a rule of thumb, make the field as large as you'd think you'd ever want to use.
Here is some relevant discussion:
What are the optimum varchar sizes for MySQL?
From that discussion, here is the relevant MySQL manual page:
http://dev.mysql.com/doc/refman/5.0/en/char.html

Related

What are the potential risks of increasing column size in SQL?

Suppose I have a column called ShortDescription in a table called Ticket.
ShortDescription varchar (16) NOT NULL
Now, suppose I increase the size like this -
alter table Ticket modify ShortDescription varchar (32) NOT NULL;
What are the potential risks of doing this? One potential risk is that if some other applications have statically set any of their fields to size 16 based on the previous size of ShortDescription, then those applications may not behave correctly with data of greater size.
SQL is a query language, not a specific DB implementation, so your mileage may vary, but ...
Assuming 'SQL' means a MySQL DB on the DB site you've nothing to worry about on the storage and performance outside the fact if you store a bunch of 32byte stings you'll use more memory and disk working with them, but if what you actually store in them is 16 byte characters, the convertion to VARCHAR(32) is a wash.
Within MySQL, VARCHAR there is no impact (assuming you keep the NOT NULL). If the column is used in a composite primary key, you may hit a size limit, but otherwise all varchar entries only take size of data + 1 byte to store.
If the column is referenced as a foriegn key in some other table, you'll need to grow that column as well to VARCHAR(32) or may experience truncation of the extra 16 characters if you try to jam at 32 character string into a 16 character column.
If not MySQL, implementations could be different across DB technologies. However, VARCHAR implementations tend to be similar, using just the size of the stored data and then a constant amount to signify end of data. Hence why usually you have options bewteen a static CHAR and a dynamic VARCHAR type in many DB systems.
As you noted in your post, external systems relying on static data size have to be considered.
Note: please excuse the above fast and free swapping of terms byte
and character, I'm assuming UTF8 or ASCII. If you're using some
multibyte encoding, substitute approriately.

Does significantly increasing the length of varchar fields impact on performance?

At the moment, we have a varchar(255) field. One of our users has requested that the field be increased about 10-fold to around varchar(2048). Given his use-case, it appears to be a reasonable request.
The field in question is not indexed, it is not part of any joins and it is never included in any where clauses.
I realize that smaller increases (say from 50 to 100) have no impact, but does this still hold at much larger increases in size?
Apparently not with a sidenote:
"In storage, VARCHAR(255) is smart enough to store only the length you need on a given row, unlike CHAR(255) which would always store 255 characters.
But since you tagged this question with MySQL, I'll mention a MySQL-specific tip: when your query implicitly generates a temporary table, for instance while sorting or GROUP BY, VARCHAR fields are converted to CHAR to gain the advantage of working with fixed-width rows. If you use a lot of VARCHAR(255) fields for data that doesn't need to be that long, this can make the temporary table very large.
It's best to define the column based on the type of data that you intend to store. It's hard to know what the longest postal address is, of course, which is why many people choose a long VARCHAR that is certainly longer than any address. And 255 is customary because it may have been the maximum length of a VARCHAR in some databases in the dawn of time (as well as PostgreSQL until more recently)."
by Bill Karwin
So it basically depends on your specific user for that field; If you don't use GROUP BY with that field, then there's no problem.
For your case there is no difference between varchar(255) and varchar(2048).
In MySQL, temporary tables and MEMORY tables store a VARCHAR column as a fixed-length column, padded out to its maximum length. If you design VARCHAR columns much larger than the greatest size you need, you will consume more memory than you have to. This affects cache efficiency, sorting speed, etc.

MySQL Storage and Optimization

I'm looking at a db schema for a project I'm inheriting. There are many instances of binary answers being stored as INT(11) rather than TinyInt(1), which is the way I've normally handled this type or storage.
I've checked the data and everything is either "1" or "0". Is there any reason to or not to change the datatype to TinyInt(1) Unsigned for all of these instances?
Similarly, if something like "last_name" if the current column allows varchar(255), would switching to varchar(100) create any gains? I'm more interested in performance/efficiency than in just limiting data storage at this point.
Thanks,
D.
I would say definitely go ahead with the changes to the boolean columns. (Note: Actually if you're using MySQL 5+, I would use the bit datatype instead of tinyint).
As far as the varchar columns, it doesn't actually make a difference changing 255 to 100 length.
From The SQL Docs:
A column uses one length byte if
values require no more than 255 bytes,
two length bytes if values may require
more than 255 bytes.
So as long as its under 255, you're really not gaining much in terms of memory storage.
That being said, by limiting the size of the names, less data needs to be transferred between your SQL server and your application.
Switching to TINYINT would save you 3 bytes I believe, which doesn't seem like a lot to me, although it's certainly a little more efficient.
I always try and make VARCHAR columns as small as I can get away with. I would personally focus on any gains you can get from that.
The main reason I can think of to avoid any of these changes is if you have so much data that running an ALTER TABLE would cause significant downtime.
Whether any of this will help your app perform better is open to debate. In theory, with VARCHARs, MySQL will only send the actual data over the wire, so if all your last names are 40 bytes long, it's only sending 40 bytes. If the column isn't being used in lookups, it shouldn't really have any impact on your perfomance. There's a couple relevant questions like this one on SO covering this issue already.

How to choose optimized datatypes for columns [innodb specific]?

I'm learning about the usage of datatypes for databases.
For example:
Which is better for email? varchar[100], char[100], or tinyint (joking)
Which is better for username? should I use int, bigint, or varchar?
Explain. Some of my friends say that if we use int, bigint, or another numeric datatype it will be better (facebook does it). Like u=123400023 refers to user 123400023, rather then user=thenameoftheuser. Since numbers take less time to fetch.
Which is better for phone numbers? Posts (like in blogs or announcments)? Or maybe dates (I use datetime for that)? maybe some have make research that would like to share.
Product price (I use decimal(11,2), don't know about you guys)?
Or anything else that you have in mind, like, "I use serial datatype for blablabla".
Why do I mention innodb specifically?
Unless you are using the InnoDB table
types (see Chapter 11, "Advanced
MySQL," for more information), CHAR
columns are faster to access than
VARCHAR.
Inno db has some diffrence that I don't know.
I read that from here.
Brief Summary:
(just my opinions)
for email address - VARCHAR(255)
for username - VARCHAR(100) or VARCHAR(255)
for id_username - use INT (unless you plan on over 2 billion users in you system)
phone numbers - INT or VARCHAR or maybe CHAR (depends on if you want to store formatting)
posts - TEXT
dates - DATE or DATETIME (definitely include times for things like posts or emails)
money - DECIMAL(11,2)
misc - see below
As far as using InnoDB because VARCHAR is supposed to be faster, I wouldn't worry about that, or speed in general. Use InnoDB because you need to do transactions and/or you want to use foreign key constraints (FK) for data integrity. Also, InnoDB uses row level locking whereas MyISAM only uses table level locking. Therefore, InnoDB can handle higher levels of concurrency better than MyISAM. Use MyISAM to use full-text indexes and for somewhat less overhead.
More importantly for speed than the engine type: put indexes on the columns that you need to search on quickly. Always put indexes on your ID/PK columns, such as the id_username that I mentioned.
More details:
Here's a bunch of questions about MySQL datatypes and database design (warning, more than you asked for):
What DataType should I pick?
Table design question
Enum datatype versus table of data in MySQL?
mysql datatype for telephne number and address
Best mysql datatype for grams, milligrams, micrograms and kilojoule
MySQL 5-star rating datatype?
And a couple questions on when to use the InnoDB engine:
MyISAM versus InnoDB
When should you choose to use InnoDB in MySQL?
I just use tinyint for almost everything (seriously).
Edit - How to store "posts:"
Below are some links with more details, but here's the short version. For storing "posts," you need room for a long text string. CHAR max length is 255, so that's not an option, and of course CHAR would waste unused characters versus VARCHAR, which is variable length CHAR.
Prior to MySQL 5.0.3, VARCHAR max length was 255, so you'd be left with TEXT. However, in newer versions of MySQL, you can use VARCHAR or TEXT. The choice comes down to preference, but there are a couple differences. VARCHAR and TEXT max length is now both 65,535, but you can set you own max on VARCHAR. Let's say you think your posts will only need to be 2000 max, you can set VARCHAR(2000). If you every run into the limit, you can ALTER you table later and bump it to VARCHAR(3000). On the other hand, TEXT actually stores its data in a BLOB (1). I've heard that there may be performance differences between VARCHAR and TEXT, but I haven't seen any proof, so you may want to look into that more, but you can always change that minor detail in the future.
More importantly, searching this "post" column using a Full-Text Index instead of LIKE would be much faster (2). However, you have to use the MyISAM engine to use full-text index because InnoDB doesn't support it. In a MySQL database, you can have a heterogeneous mix of engines for each table, so you would just need to make your "posts" table use MyISAM. However, if you absolutely need "posts" to use InnoDB (for transactions), then set up a trigger to update the MyISAM copy of your "posts" table and use the MyISAM copy for all your full-text searches.
See bottom for some useful quotes.
MySQL Data Type Chart (outdated)
MySQL Datatypes (outdated)
Chapter 10. Data Types (better details)
The BLOB and TEXT Types (1)
11.9. Full-Text Search Functions (2)
10.4.1. The CHAR and VARCHAR Types (3)
(3) "Values in VARCHAR columns are
variable-length strings. The length
can be specified as a value from 0 to
255 before MySQL 5.0.3, and 0 to
65,535 in 5.0.3 and later versions.
Before MySQL 5.0.3, if you need a data
type for which trailing spaces are not
removed, consider using a BLOB or TEXT
type.
When CHAR values are stored, they are
right-padded with spaces to the
specified length. When CHAR values are
retrieved, trailing spaces are
removed.
Before MySQL 5.0.3, trailing spaces
are removed from values when they are
stored into a VARCHAR column; this
means that the spaces also are absent
from retrieved values."
Lastly, here's a great post about the pros and cons of VARCHAR versus TEXT. It also speaks to the performance issue:
VARCHAR(n) Considered Harmful
There are multiple angles to approach your question.
From a design POV it is always best to chose the datatype which expresses the quantity you want to model best. That is, get the data domain and data size right so that illegal data cannot be stored in the database in the first place. But that is not where MySQL is strong in the first place, and especially not with the default sql_mode (http://dev.mysql.com/doc/refman/5.1/en/server-sql-mode.html). If it works for you, try the TRADITIONAL sql_mode, which is a shorthand for many desireable flags.
From a performance POV, the question is entirely different. For example, regarding the storage of email bodies, you might want to read http://www.mysqlperformanceblog.com/2010/02/09/blob-storage-in-innodb/ and then think about that.
Removing redundancies and having short keys can be a big win. For example, in a project that I have seen, a log table has been storing http User-Agent information. By simply replacing each user agent string in the log table with a numeric id of a user agent string in a lookup table, data set size was considerably (more than 60%) reduced. By parsing the user agent further and then storing a bunch of ids (operating system, browser type, version index) data set size was reduced to 1% of the original size.
Finally, there is a number of rules that can help you spot errors in schema design.
For example, anything that has id in the name and is not an unsigned integer type is probably a bug (especially in the context of innodb).
For example, anything that has price or cost in the name and is not unsigned is a potential source of fraud (fraudster creates article with negative price, and buys that).
For example, anything that works on monetary data and is not using the DECIMAL data type of the appropriate size is probably doing math wrong (DECIMAL is doing BCD, decimal paper math with correct precision and rounding, DOUBLE and FLOAT do not).
SQLyog has Calculate optimal datatype feature which helps in finding out optimal datatype based on records inserted in a table.
It uses
SELECT * FROMtable_name` PROCEDURE ANALYSE(1, 10);
query to find out optimal datatype

What are the optimum varchar sizes for MySQL?

How does MySQL store a varchar field? Can I assume that the following pattern represents sensible storage sizes :
1,2,4,8,16,32,64,128,255 (max)
A clarification via example. Lets say I have a varchar field of 20 characters. Does MySQL when creating this field, basically reserve space for 32 bytes(not sure if they are bytes or not) but only allow 20 to be entered?
I guess I am worried about optimising disk space for a massive table.
To answer the question, on disk MySql uses 1 + the size that is used in the field to store the data (so if the column was declared varchar(45), and the field was "FooBar" it would use 7 bytes on disk, unless of course you where using a multibyte character set, where it would be using 14 bytes). So, however you declare your columns, it wont make a difference on the storage end (you stated you are worried about disk optimization for a massive table). However, it does make a difference in queries, as VARCHAR's are converted to CHAR's when MySql makes a temporary table (SORT, ORDER, etc) and the more records you can fit into a single page, the less memory and faster your table scans will be.
MySQL stores a varchar field as a variable length record, with either a one-byte or a two-byte prefix to indicate the record size.
Having a pattern of storage sizes doesn't really make any difference to how MySQL will function when dealing with variable length record storage. The length specified in a varchar(x) declaration will simply determine the maximum length of the data that can be stored. Basically, a varchar(16) is no different disk-wise than a varchar(128).
This manual page has a more detailed explanation.
Edit: With regards to your updated question, the answer is still the same. A varchar field will only use up as much space on disk as the data you store in it (plus a one or two byte overhead). So it doesn't matter if you have a varchar(16) or a varchar(128), if you store a 10-character string in it, you're only going to use 10 bytes (plus 1 or 2) of disk space.