MySQL: Why use VARCHAR(20) instead of VARCHAR(255)? [duplicate] - mysql

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Are there disadvantages to using a generic varchar(255) for all text-based fields?
In MYSQL you can choose a length for the VARCHAR field type. Possible values are 1-255.
But what are its advantages if you use VARCHAR(255) that is the maximum instead of VARCHAR(20)? As far as I know, the size of the entries depends only on the real length of the inserted string.
size (bytes) = length+1
So if you have the word "Example" in a VARCHAR(255) field, it would have 8 bytes. If you have it in a VARCHAR(20) field, it would have 8 bytes, too. What is the difference?
I hope you can help me. Thanks in advance!

Check out: Reference for Varchar
In short there isn't much difference unless you go over the size of 255 in your VARCHAR which will require another byte for the length prefix.
The length indicates more of a constraint on the data stored in the column than anything else. This inherently constrains the MAXIMUM storage size for the column as well. IMHO, the length should make sense with respect to the data. If your storing a Social Security # it makes no sense to set the length to 128 even though it doesn't cost you anything in storage if all you actually store is an SSN.

There are many valid reasons for choosing a value smaller than the maximum that are not related to performance. Setting a size helps indicate the type of data you are storing and also can also act as a last-gasp form of validation.
For instance, if you are storing a UK postcode then you only need 8 characters. Setting this limit helps make clear the type of data you are storing. If you chose 255 characters it would just confuse matters.

I don't know about mySQL but in SQL Server it will let you define fields such that the total number of bytes used is greater than the total number of bytes that can actually be stored in a record. This is a bad thing. Sooner or later you will get a row where the limit is reached and you cannot insert the data.
It is far better to design your database structure to consider row size limits.
Additionally yes, you do not want people to put 200 characters in a field where the maximum value should be 10. If they do, it is almost always bad data.
You say, well I can limit that at the application level. But data does not get into the database just from one application. Sometimes multiple applications use it, sometimes data is imported and sometimes it is fixed manually from the query window (update all the records to add 10% to the price for instance). If any of these other sources of data don't know about the rules you put in your application, you will have bad, useless data in your database. Data integrity must be enforced at the database level (which doesn't stop you from also checking before you try to enter data) or you have no integrity. Plus it has been my experience that people who are too lazy to design their database are often also too lazy to actually put the limits into the application and there is no data integrity check at all.
They have a word for databases with no data integrity - useless.

There is a semantical difference (and I believe that's the only difference): if you try to fill 30 non-space characters into varchar(20), it will produce an error, whereas it will succeed for varchar(255). So it is primarily an additional constraint.

Well, if you want to allow for a larger entry, or limit the entry size perhaps.
For example, you may have first_name as a VARCHAR 20, but perhaps street_address as a VARCHAR 50 since 20 may not be enough space. At the same time, you may want to control how large that value can get.
In other words, you have set a ceiling of how large a particular value can be, in theory to prevent the table (and potentially the index/index entries) from getting too large.
You could just use CHAR which is a fixed width as well, but unlike VARCHAR which can be smaller, CHAR pads the values (although this makes for quicker SQL access.

From a database perspective performance wise I do not believe there is going to be a difference.
However, I think a lot of the decision on the length to use comes down to what you are trying to accomplish and documenting the system to accept just the data that it needs.

Related

MySQL best way to store long strings

I'm looking for some advice on the best way to store long strings of data from the mySQL experts.
I have a general purpose table which is used to store any kind of data, by which I mean it should be able to hold alphanumeric and numeric data.
Currently, the table structure is simple with an ID and the actual data stored in a single column as follows:
id INT(11)
data VARCHAR(128)
I now have a requirement to store a larger amount of data (up to 500 characters) and am wondering whether the best way would be to simply increase the varchar column size, or whether I should add a new column (a TEXT type column?) for the times I need to store longer strings.
If any experts out there has any advice I'm all ears!
My preferred method would be to simply increase the varchar column, but that's because I'm lazy.
The mySQL version I'm running is 5.0.77.
I should mention the new 500 character requirement will only be for the odd record; most records in the table will be not longer than 50 characters.
I thought I'd be future-proofing by making the column 128. Shows how much I knew!
Generally speaking, this is not a question that has a "correct" answer. There is no "infinite length" text storage type in MySQL. You could use LONGTEXT, but that still has an (absurdly high) upper limit. Yet if you do, you're kicking your DBMS in the teeth for having to deal with that absurd blob of a column for your 50-character text. Not to mention the fact that you hardly do anything with it.
So, most futureproofness(TM) is probably offered by LONGTEXT. But it's also a very bad method of resolving the issue. Honestly, I'd revisit the application requirements. Storing strings that have no "domain" (as in, being well-defined in their application) and arbitrary length is not one of the strengths of RDBMS.
If I'd want to solve this on the "application design" level, I'd use NoSQL key-value store for this (and I'm as anti-NoSQL-hype as they get, so you know it's serious), even though I recognize it's a rather expensive change for such a minor change. But if this is an indication of what your DBMS is eventually going to hold, it might be more prudent to switch now to avoid this same problem hundred times in the future. Data domain is very important in RDBMS, whereas it's explicitly sidelined in non-relational solutions, which seems to be what you're trying to solve here.
Stuck with MySQL? Just increase it to VARCHAR(1000). If you have no requirements for your data, it's irrelevant what you do anyway.
Careful if using text. TEXT data is not stored in the database server’s memory, therefore, whenever you query TEXT data, MySQL has to read from it from the disk, which is much slower in comparison with CHAR and VARCHAR as it cannot make use of indexes.The better way to store long string will be nosql databases
We can use varchar(<maximum_limit>). The maximum limit that we can pass is 65535 bytes.
Note: This maximum length of a VARCHAR is shared among all columns except TEXT/BLOB columns and the character set used.

MySQL Storage and Optimization

I'm looking at a db schema for a project I'm inheriting. There are many instances of binary answers being stored as INT(11) rather than TinyInt(1), which is the way I've normally handled this type or storage.
I've checked the data and everything is either "1" or "0". Is there any reason to or not to change the datatype to TinyInt(1) Unsigned for all of these instances?
Similarly, if something like "last_name" if the current column allows varchar(255), would switching to varchar(100) create any gains? I'm more interested in performance/efficiency than in just limiting data storage at this point.
Thanks,
D.
I would say definitely go ahead with the changes to the boolean columns. (Note: Actually if you're using MySQL 5+, I would use the bit datatype instead of tinyint).
As far as the varchar columns, it doesn't actually make a difference changing 255 to 100 length.
From The SQL Docs:
A column uses one length byte if
values require no more than 255 bytes,
two length bytes if values may require
more than 255 bytes.
So as long as its under 255, you're really not gaining much in terms of memory storage.
That being said, by limiting the size of the names, less data needs to be transferred between your SQL server and your application.
Switching to TINYINT would save you 3 bytes I believe, which doesn't seem like a lot to me, although it's certainly a little more efficient.
I always try and make VARCHAR columns as small as I can get away with. I would personally focus on any gains you can get from that.
The main reason I can think of to avoid any of these changes is if you have so much data that running an ALTER TABLE would cause significant downtime.
Whether any of this will help your app perform better is open to debate. In theory, with VARCHARs, MySQL will only send the actual data over the wire, so if all your last names are 40 bytes long, it's only sending 40 bytes. If the column isn't being used in lookups, it shouldn't really have any impact on your perfomance. There's a couple relevant questions like this one on SO covering this issue already.

MySQL: Fields length. Does it really matter?

I'm working with some database abstraction layers and most of them are using attributes like "String" which is VARCHAR 250 or INTEGER which has length of 11 digits. But for example I have something that will be less than 250 characters long. Should I go and make it less? Does it really makes any valuable difference?
Thanks in advance!
INT length does nothing. All INTs are 4 bytes. The number you can set, is only used for zerofill (and who uses that!?).
VARCHAR length does more. It's the maxlength of the field. VARCHAR is saved so that only the actual data is stored, so the length doesn't mattter. These days, you can have bigger VARCHARs than 255 bytes (being 256^2-1). The difference is the bytes that are used for the field length. VARCHAR(100) and VARCHAR(8) and VARCHAR(255) use 1 byte to save the field length. VARCHAR(1000) uses 2.
Hope that helps =)
edit
I almost always make my VARCHARs 250 long. Actual length should be checked in the app anyway. For bigger fields I use TEXT (and those are stored differently, so can be much much longer).
edit
I don't know how current this is, but it used to help me (understand): http://help.scibit.com/Mascon/masconMySQL_Field_Types.html
First, remember that the database is meant to store facts and is designed to protect itself against bad data. Thus, the reason you do not want to allow a user to enter 250 characters for a first name is that a user will put all kinds of data in there that is not a first name. They'll put their whole name, their underwear size, a novel about what they did last summer and so on. Thus, you want to strive to enforce that the data is as correct as possible. It is a mistake to assume that the application is the sole protector against bad data. You want users to tell you that they had a problem stuffing War in Peace into a given column.
Thus, the most important question is, "What is the most appropriate value for the data being stored?" Ideally, you would use an int and a check constraint to ensure that the values have an appropriate range (e.g. greater than zero, less than a billion etc.). Unfortunately, this is one of MySQL's greatest weakness: it does not honor check constraints. That simply means you must implement those integrity checks in triggers which admittedly is more cumbersome.
Will the difference between an int (4 bytes) make an appreciable difference to a tinyint (1 byte)? Obviously, it depends on the amount of data. If you will have no more than 10 rows, the answer is obviously no. If you will have 10 billion rows, the answer is obviously "Yes". However, IMO, this is premature optimization. It is far better to focus on ensuring correctness first.
For text, you should ask whether your data should support Chinese, Japanese or non-ANSI values (i.e., should you use nvarchar or varchar)? Does this value represent a real world code like a currency code, or bank code which has a specific specification?
Not so sure in MySQL, but in MS SQL it only makes a difference for sufficiently large databases. Typically, I like to use smaller fields for a) the space saving (it never hurts to practice good habits) and b) for the implied validation (if you know a certain field should never be more than 10 characters, why allow eleven, let alone 250?).
I thinks Rudie is wrong, not all INTs are 4 bytes... in MySQL you have:
tinyint = 1 byte,
smallint = 2 bytes,
mediumint = 3 bytes,
int = 4 bytes,
bigint = 8 bytes.
I think Rudie refers to the "display with" that is the number you put between parenthesis when you are creating a column, e.g.:
age INT(3)
You're telling to the RDBMS just to SHOW no more than 3 numbers.
And VARCHARs are (variable length charcter string) so if you declare let's say name varchar(5000) and you store a name like "Mario" you only are using 7 bytes (5 for the data and 2 for the length of the value).
The correct field size serves to limit the bad data that can be put in. For instance suppose you have a phone number field. If you allow 250 characters, you will often end up with things like the following in the phone field (an example not taken at random):
Call the good-looking blonde secretary instead.
So first limiting the length is part of how we enforce data integrity rules. As such it is critical.
Second, there is only so much space on a datapage and while some databases will allow you to create tables where the potential record is longer than the width of the data page, they often will not allow you to actually exceed it when storing the data. This can lead to some very hard to find bugs when suddenly one record can't be saved. I don't know about MySql and whether it does this but I know SQL Server does and it is very hard to figure out what is wrong. So making data the correct size can be critical to preventing bugs.

Field size for user registration form in MySQL?

When I add user info to MySQL through a PHP registration form, there are with limits on the data fields (e.g. name is 20 max chars, email 18 chars, additional info 200, pass 12 chars, etc.)
Should I create exact same fields in the MySQL table, or I should define longer fields?
Is there any benefits of doing so rather than just creating all string fields e.g. 500 characters long?
When storing age as an integer, should I use a small int (i.e. with max 256) or not?
In general, it doesn't really matter. The important part is how you validate the information on the server side.
Make sure the entered data does not exceed the size of the column. If you don't, you can run into issues where mysql will auto-truncate the data.
Don't limit the password size. If someone wants to enter a 200 character password, let them. You should be storing it in a storing hash and not in plain text, so the exact length shouldn't make a difference.
Always store your data types properly. If you expect an integer age, store it in an integer column. There's no real reason to store it in a string column type.
As far as the rest of your limits, it's really application dependent more than anything. If you expect 200 character info limit, then store it in a VARCHAR(200). But if you're just assuming, store it in a TEXT type so that the user can enter as much as they'd like. But that's more application and use-case dependent than anything else...
Suggest you be liberal with your database column lengths (for your varchar), but strict on your application in enforcing size/lengths.
The business logic may change over time. Your application tier will be the keeper and enforcer of those rules.
Your database shouldn't have to adjust often to the changing business rules regarding length. Defining a column of type varchar(100) doesn't cost you anything today. The length is variable up to 100, so your performance and storage won't suffer at all.
Application and database changes/maintenance are expensive; database storage is cheap.
Some other detailed suggestions, if you will:
don't store age. Derive it from a date (birthdate) by using math (Today-Birthdate).
passwords shouldn't be stored or have a max length!
all your string fields -- define them as varchar(256) or 1024 and be done with them. Let your application enforce the business rules of the day.
If you're using the MyISAM table type it will be a bit more efficient for queries if you keep the record length static. So if you can use char fields of a fixed size instead of varchar it's better (as long as all the fields in a record are static). However, the whole number of characters you specify will be blocked out in memory so you need to decide if memory usage is more important.
Unless your application is huge and your DB is going to be massive, this should matter very little in terms of performance. I would say that you should give yourself a bit of extra room in the MySQL fields as it is a lot easier to change the registration form max lengths than the MySQL max lengths later. Using smallint for age is fine. Or not. Generally you shouldn't allow for fields to take up more space than they are going to need, but I would give myself some padding just in case. Again, though, it shouldn't make much of a difference.

When setting MySQL schema, why use certain types?

When I'm setting up a MySQL table, it asks me to define the name of the column, type of input, and length. My assumption, without having read anything about it, is that it's for minimization. Specify the smallest possible int/smallint/tinyint for your needs, and it will reduce overhead of some sort. If it's all positives, make it unsigned to double your space, etc.
What happens if I just make every field a varchar-200 characters? When/why is this bad, what will I miss out on, and when will any inefficiencies manifest themselves? 100k records?
I think about this every time I set up a DB, but I haven't built anything to scale enough where I've ever had my scheme setup inappropriately, either too "strict/small" or "loose/big". Can someone confirm that I'm making good assumptions about speed and efficiency?
Thanks!
Data types not only optimize storage, but how data is indexed. As your databases get bigger, it will become apparent that it's quicker to search for all the records that have a 1 in an integer field than those that have a "1" in a varchar field. This becomes especially important when you're joining data from more than one table and your database engine is having to do this sort of thing repeatedly. (Daren also rightly points out below that it's important that the types of the fields you're matching on are identical as well.)
The level at which these inefficiencies become an issue depends greatly on your hardware and your application design. We have big enough iron these days that if you're building moderate-scale apps, you may not see an appreciable difference. (Aside from feeling a little bit guilty about your database design!) But establishing good habits on small projects makes the bigger ones easier when they come along.
If you have two columns as varchar and put in the values 10 and 20 and add them, you'll get 1020 instead of 30 which you'd likely expect.
Sure, you could save everything as VARCHAR strings. But you'd be giving up a lot of functionality provided by the database engine.
You should choose the database type that most closely matches the intended use of the column. For example, using DATE or DATETIME to store dates provides you with all sorts of date/time functions that you don't get with basic VARCHAR types.
Likewise, fields used to count things or provide simple unique IDs should be INT or one of its related types. Also bear in mind that an INT occupies only 4 bytes, whereas a 9-digit string uses at least 9 bytes.
For character data, it's wise to use NVARCHAR for internationalized values that users in any locale are going to enter (esp. names and locations). If you know the text is limited to US or internal use only, VARCHAR is safe.