What should be the typical length of user's Full Name in database [duplicate] - mysql

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
List of standard lengths for database fields
Simple as that, what should be the typical length of allowed "Full Name" of a user in database?
When I create users table, I usually set it as varchar 31 or 32 (according to performance). What do you guys use and what's standard/typical convention.
Sidenote: I never face problem in email length (as I set it 254) and password (hash, 32 length).

The maximum your average varchar field allows (254?).
You are not winning anything by making it arbitrarily shorter. The fine-grained size controls on numbers and chars are more or less a relic from the past, when every byte mattered. It can matter today - if you are dealing with tens or hundreds of millions of rows, or thousands of queries per sec. For your average database (i.e. 99% of them) performance comes from proper indexing and querying, NOT making your rows a couple of bytes smaller.
Only restrict the length of a field when there is some formal specification that defines a maximum length, like 13 digits for an EAN code or 12 characters for an ISIN.

Full name is always a computed column composed of first, middle, last, prefix, suffix, degree, family name, etc in my designs. The list of individual columns are determined by the targeted local of the app. The display length of 'Full Name' is normall contained within the app design not the database. There is not any space savings in SQL Server between varchar(32) and varchar(256). Varchar(256) is my choice.
I never want to be in the meeting when someone says "Your db design will not hold all our data".
You are always assigning an ID to the user so you can join and do look-ups using the ID instead of the FullName, correct?

I would recommend at least 128.

Well you can just put it at 255 if you want.
varchars is a Variable length storage type. This means theres 1 byte which stores the actual length of the string, varchars dont use up more bites then needed so storage wise it really does not matter. This is described on the mysql page
Description can be found here http://dev.mysql.com/doc/refman/5.0/en/char.html
It is illustrated halfway the page check the table.
VARCHAR values are not padded when
they are stored. Handling of trailing
spaces is version-dependent. As of
MySQL 5.0.3, trailing spaces are
retained when values are stored and
retrieved, in conformance with
standard SQL. Before MySQL 5.0.3,
trailing spaces are removed from
values when they are stored into a
VARCHAR column; this means that the
spaces also are absent from retrieved
values.
Conclusion:
Storage wise you could always go for 255 because it wont use up additional space and you wont get intro trouble with string getting cut off.
Greetz

Related

Permance using varchar(750) or text [duplicate]

I've got a messages table in MySQL which records messages between users. Apart from the typical ids and message types (all integer types) I need to save the actual message text as either VARCHAR or TEXT. I'm setting a front-end limit of 3000 characters which means the messages would never be inserted into the db as longer than this.
Is there a rationale for going with either VARCHAR(3000) or TEXT? There's something about just writing VARCHAR(3000) that feels somewhat counter-intuitive. I've been through other similar posts on Stack Overflow but would be good to get views specific to this type of common message storing.
TEXT and BLOB may by stored off the table with the table just having a pointer to the location of the actual storage. Where it is stored depends on lots of things like data size, columns size, row_format, and MySQL version.
VARCHAR is stored inline with the table. VARCHAR is faster when the size is reasonable, the tradeoff of which would be faster depends upon your data and your hardware, you'd want to benchmark a real-world scenario with your data.
Can you predict how long the user input would be?
VARCHAR(X)
Max Length: variable, up to 65,535 bytes (64KB)
Case: user name, email, country, subject, password
TEXT
Max Length: 65,535 bytes (64KB)
Case: messages, emails, comments, formatted text, html, code, images, links
MEDIUMTEXT
Max Length: 16,777,215 bytes (16MB)
Case: large json bodies, short to medium length books, csv strings
LONGTEXT
Max Length: 4,294,967,29 bytes (4GB)
Case: textbooks, programs, years of logs files, harry potter and the goblet of fire, scientific research logging
There's more information on this question.
Just to clarify the best practice:
Text format messages should almost always be stored as TEXT (they end up being arbitrarily long)
String attributes should be stored as VARCHAR (the destination user name, the subject, etc...).
I understand that you've got a front end limit, which is great until it isn't. *grin* The trick is to think of the DB as separate from the applications that connect to it. Just because one application puts a limit on the data, doesn't mean that the data is intrinsically limited.
What is it about the messages themselves that forces them to never be more then 3000 characters? If it's just an arbitrary application constraint (say, for a text box or something), use a TEXT field at the data layer.
Short answer: No practical, performance, or storage, difference.
Long answer:
There is essentially no difference (in MySQL) between VARCHAR(3000) (or any other large limit) and TEXT. The former will truncate at 3000 characters; the latter will truncate at 65535 bytes. (I make a distinction between bytes and characters because a character can take multiple bytes.)
For smaller limits in VARCHAR, there are some advantages over TEXT.
"smaller" means 191, 255, 512, 767, or 3072, etc, depending on version, context, and CHARACTER SET.
INDEXes are limited in how big a column can be indexed. (767 or 3072 bytes; this is version and settings dependent)
Intermediate tables created by complex SELECTs are handled in two different ways -- MEMORY (faster) or MyISAM (slower). When 'large' columns are involved, the slower technique is automatically picked. (Significant changes coming in version 8.0; so this bullet item is subject to change.)
Related to the previous item, all TEXT datatypes (as opposed to VARCHAR) jump straight to MyISAM. That is, TINYTEXT is automatically worse for generated temp tables than the equivalent VARCHAR. (But this takes the discussion in a third direction!)
VARBINARY is like VARCHAR; BLOB is like TEXT.
A table with several 'large' VARCHARs could hit a limit of 64KB for the whole table definition; switching to TEXT is a simple and practical fix. (Example: (42000) Row size too large, from an Oracle dump to a MySQL dump )
Rebuttal to other answers
The original question asked one thing (which datatype to use); the accepted answer answered something else (off-record storage). That answer is now out of date.
When this thread was started and answered, there were only two "row formats" in InnoDB. Soon afterwards, two more formats (DYNAMIC and COMPRESSED) were introduced.
The storage location for TEXT and VARCHAR() is based on size, not on name of datatype. For an updated discussion of on/off-record storage of large text/blob columns, see this .
Disclaimer: I'm not a MySQL expert ... but this is my understanding of the issues.
I think TEXT is stored outside the mysql row, while I think VARCHAR is stored as part of the row. There is a maximum row length for mysql rows .. so you can limit how much other data you can store in a row by using the VARCHAR.
Also due to VARCHAR forming part of the row, I suspect that queries looking at that field will be slightly faster than those using a TEXT chunk.
The preceding answers don't insist enough on the main problem: even in very simple queries like
(SELECT t2.* FROM t1, t2 WHERE t2.id = t1.id ORDER BY t1.id)
a temporary table can be required, and if a VARCHAR field is involved, it is converted to a CHAR field in the temporary table. So if you have in your table say 500 000 lines with a VARCHAR(65000) field, this column alone will use 6.5*5*10^9 byte. Such temp tables can't be handled in memory and are written to disk. The impact can be expected to be catastrophic.
Source (with metrics): https://nicj.net/mysql-text-vs-varchar-performance/
(This refers to the handling of TEXT vs VARCHAR in "standard"(?) MyISAM storage engine. It may be different in others, e.g., InnoDB.)
Varchar is for small data like email addresses, while Text is for much bigger data like news articles, Blob for binary data such as images.
The performance of Varchar is more powerful because it runs completely from memory, but this will not be the case if data is too big like varchar(4000) for example.
Text, on the other hand, does not stick to memory and is affected by disk performance, but you can avoid that by separating text data in a separate table and apply a left join query to retrieve text data.
Blob is much slower so use it only if you don't have much data like 10000 images which will cost 10000 records.
Follow these tips for maximum speed and performance:
Use varchar for name, titles, emails
Use Text for large data
Separate text in different tables
Use Left Join queries on an ID such as a phone number
If you are going to use Blob apply the same tips as in Text
This will make queries cost milliseconds on tables with data >10 M and size up to 10GB guaranteed.
There is a HUGE difference between VARCHAR and TEXT. While VARCHAR fields can be indexed, TEXT fields cannot. VARCHAR type fields are stored inline while TEXT are stored offline, only pointers to TEXT data is actually stored in the records.
If you have to index your field for faster search, update or delete than go for VARCHAR, no matter how big. A VARCHAR(10000000) will never be the same as a TEXT field bacause these two data types are different in nature.
If you use you field only for archiving
you don't care about data
speed retrival
you care about speed but you will use the operator
'%LIKE%' in your search query so indexing will not help much
you
can't predict a limit of the data length
than go for TEXT.
Just a correction to so many answers here, even if it is a bit late to the party.
Text fields can be fully indexed by MySQL as per their documentation.
Link provided
https://dev.mysql.com/doc/refman/5.6/en/column-indexes.html
Overall Varchar fields are longer to write to then Text fields, but it only matters if you have multitude of write requests

MySQL: Fields length. Does it really matter?

I'm working with some database abstraction layers and most of them are using attributes like "String" which is VARCHAR 250 or INTEGER which has length of 11 digits. But for example I have something that will be less than 250 characters long. Should I go and make it less? Does it really makes any valuable difference?
Thanks in advance!
INT length does nothing. All INTs are 4 bytes. The number you can set, is only used for zerofill (and who uses that!?).
VARCHAR length does more. It's the maxlength of the field. VARCHAR is saved so that only the actual data is stored, so the length doesn't mattter. These days, you can have bigger VARCHARs than 255 bytes (being 256^2-1). The difference is the bytes that are used for the field length. VARCHAR(100) and VARCHAR(8) and VARCHAR(255) use 1 byte to save the field length. VARCHAR(1000) uses 2.
Hope that helps =)
edit
I almost always make my VARCHARs 250 long. Actual length should be checked in the app anyway. For bigger fields I use TEXT (and those are stored differently, so can be much much longer).
edit
I don't know how current this is, but it used to help me (understand): http://help.scibit.com/Mascon/masconMySQL_Field_Types.html
First, remember that the database is meant to store facts and is designed to protect itself against bad data. Thus, the reason you do not want to allow a user to enter 250 characters for a first name is that a user will put all kinds of data in there that is not a first name. They'll put their whole name, their underwear size, a novel about what they did last summer and so on. Thus, you want to strive to enforce that the data is as correct as possible. It is a mistake to assume that the application is the sole protector against bad data. You want users to tell you that they had a problem stuffing War in Peace into a given column.
Thus, the most important question is, "What is the most appropriate value for the data being stored?" Ideally, you would use an int and a check constraint to ensure that the values have an appropriate range (e.g. greater than zero, less than a billion etc.). Unfortunately, this is one of MySQL's greatest weakness: it does not honor check constraints. That simply means you must implement those integrity checks in triggers which admittedly is more cumbersome.
Will the difference between an int (4 bytes) make an appreciable difference to a tinyint (1 byte)? Obviously, it depends on the amount of data. If you will have no more than 10 rows, the answer is obviously no. If you will have 10 billion rows, the answer is obviously "Yes". However, IMO, this is premature optimization. It is far better to focus on ensuring correctness first.
For text, you should ask whether your data should support Chinese, Japanese or non-ANSI values (i.e., should you use nvarchar or varchar)? Does this value represent a real world code like a currency code, or bank code which has a specific specification?
Not so sure in MySQL, but in MS SQL it only makes a difference for sufficiently large databases. Typically, I like to use smaller fields for a) the space saving (it never hurts to practice good habits) and b) for the implied validation (if you know a certain field should never be more than 10 characters, why allow eleven, let alone 250?).
I thinks Rudie is wrong, not all INTs are 4 bytes... in MySQL you have:
tinyint = 1 byte,
smallint = 2 bytes,
mediumint = 3 bytes,
int = 4 bytes,
bigint = 8 bytes.
I think Rudie refers to the "display with" that is the number you put between parenthesis when you are creating a column, e.g.:
age INT(3)
You're telling to the RDBMS just to SHOW no more than 3 numbers.
And VARCHARs are (variable length charcter string) so if you declare let's say name varchar(5000) and you store a name like "Mario" you only are using 7 bytes (5 for the data and 2 for the length of the value).
The correct field size serves to limit the bad data that can be put in. For instance suppose you have a phone number field. If you allow 250 characters, you will often end up with things like the following in the phone field (an example not taken at random):
Call the good-looking blonde secretary instead.
So first limiting the length is part of how we enforce data integrity rules. As such it is critical.
Second, there is only so much space on a datapage and while some databases will allow you to create tables where the potential record is longer than the width of the data page, they often will not allow you to actually exceed it when storing the data. This can lead to some very hard to find bugs when suddenly one record can't be saved. I don't know about MySql and whether it does this but I know SQL Server does and it is very hard to figure out what is wrong. So making data the correct size can be critical to preventing bugs.

Field size for user registration form in MySQL?

When I add user info to MySQL through a PHP registration form, there are with limits on the data fields (e.g. name is 20 max chars, email 18 chars, additional info 200, pass 12 chars, etc.)
Should I create exact same fields in the MySQL table, or I should define longer fields?
Is there any benefits of doing so rather than just creating all string fields e.g. 500 characters long?
When storing age as an integer, should I use a small int (i.e. with max 256) or not?
In general, it doesn't really matter. The important part is how you validate the information on the server side.
Make sure the entered data does not exceed the size of the column. If you don't, you can run into issues where mysql will auto-truncate the data.
Don't limit the password size. If someone wants to enter a 200 character password, let them. You should be storing it in a storing hash and not in plain text, so the exact length shouldn't make a difference.
Always store your data types properly. If you expect an integer age, store it in an integer column. There's no real reason to store it in a string column type.
As far as the rest of your limits, it's really application dependent more than anything. If you expect 200 character info limit, then store it in a VARCHAR(200). But if you're just assuming, store it in a TEXT type so that the user can enter as much as they'd like. But that's more application and use-case dependent than anything else...
Suggest you be liberal with your database column lengths (for your varchar), but strict on your application in enforcing size/lengths.
The business logic may change over time. Your application tier will be the keeper and enforcer of those rules.
Your database shouldn't have to adjust often to the changing business rules regarding length. Defining a column of type varchar(100) doesn't cost you anything today. The length is variable up to 100, so your performance and storage won't suffer at all.
Application and database changes/maintenance are expensive; database storage is cheap.
Some other detailed suggestions, if you will:
don't store age. Derive it from a date (birthdate) by using math (Today-Birthdate).
passwords shouldn't be stored or have a max length!
all your string fields -- define them as varchar(256) or 1024 and be done with them. Let your application enforce the business rules of the day.
If you're using the MyISAM table type it will be a bit more efficient for queries if you keep the record length static. So if you can use char fields of a fixed size instead of varchar it's better (as long as all the fields in a record are static). However, the whole number of characters you specify will be blocked out in memory so you need to decide if memory usage is more important.
Unless your application is huge and your DB is going to be massive, this should matter very little in terms of performance. I would say that you should give yourself a bit of extra room in the MySQL fields as it is a lot easier to change the registration form max lengths than the MySQL max lengths later. Using smallint for age is fine. Or not. Generally you shouldn't allow for fields to take up more space than they are going to need, but I would give myself some padding just in case. Again, though, it shouldn't make much of a difference.

MySQL: Why use VARCHAR(20) instead of VARCHAR(255)? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Are there disadvantages to using a generic varchar(255) for all text-based fields?
In MYSQL you can choose a length for the VARCHAR field type. Possible values are 1-255.
But what are its advantages if you use VARCHAR(255) that is the maximum instead of VARCHAR(20)? As far as I know, the size of the entries depends only on the real length of the inserted string.
size (bytes) = length+1
So if you have the word "Example" in a VARCHAR(255) field, it would have 8 bytes. If you have it in a VARCHAR(20) field, it would have 8 bytes, too. What is the difference?
I hope you can help me. Thanks in advance!
Check out: Reference for Varchar
In short there isn't much difference unless you go over the size of 255 in your VARCHAR which will require another byte for the length prefix.
The length indicates more of a constraint on the data stored in the column than anything else. This inherently constrains the MAXIMUM storage size for the column as well. IMHO, the length should make sense with respect to the data. If your storing a Social Security # it makes no sense to set the length to 128 even though it doesn't cost you anything in storage if all you actually store is an SSN.
There are many valid reasons for choosing a value smaller than the maximum that are not related to performance. Setting a size helps indicate the type of data you are storing and also can also act as a last-gasp form of validation.
For instance, if you are storing a UK postcode then you only need 8 characters. Setting this limit helps make clear the type of data you are storing. If you chose 255 characters it would just confuse matters.
I don't know about mySQL but in SQL Server it will let you define fields such that the total number of bytes used is greater than the total number of bytes that can actually be stored in a record. This is a bad thing. Sooner or later you will get a row where the limit is reached and you cannot insert the data.
It is far better to design your database structure to consider row size limits.
Additionally yes, you do not want people to put 200 characters in a field where the maximum value should be 10. If they do, it is almost always bad data.
You say, well I can limit that at the application level. But data does not get into the database just from one application. Sometimes multiple applications use it, sometimes data is imported and sometimes it is fixed manually from the query window (update all the records to add 10% to the price for instance). If any of these other sources of data don't know about the rules you put in your application, you will have bad, useless data in your database. Data integrity must be enforced at the database level (which doesn't stop you from also checking before you try to enter data) or you have no integrity. Plus it has been my experience that people who are too lazy to design their database are often also too lazy to actually put the limits into the application and there is no data integrity check at all.
They have a word for databases with no data integrity - useless.
There is a semantical difference (and I believe that's the only difference): if you try to fill 30 non-space characters into varchar(20), it will produce an error, whereas it will succeed for varchar(255). So it is primarily an additional constraint.
Well, if you want to allow for a larger entry, or limit the entry size perhaps.
For example, you may have first_name as a VARCHAR 20, but perhaps street_address as a VARCHAR 50 since 20 may not be enough space. At the same time, you may want to control how large that value can get.
In other words, you have set a ceiling of how large a particular value can be, in theory to prevent the table (and potentially the index/index entries) from getting too large.
You could just use CHAR which is a fixed width as well, but unlike VARCHAR which can be smaller, CHAR pads the values (although this makes for quicker SQL access.
From a database perspective performance wise I do not believe there is going to be a difference.
However, I think a lot of the decision on the length to use comes down to what you are trying to accomplish and documenting the system to accept just the data that it needs.

What are the optimum varchar sizes for MySQL?

How does MySQL store a varchar field? Can I assume that the following pattern represents sensible storage sizes :
1,2,4,8,16,32,64,128,255 (max)
A clarification via example. Lets say I have a varchar field of 20 characters. Does MySQL when creating this field, basically reserve space for 32 bytes(not sure if they are bytes or not) but only allow 20 to be entered?
I guess I am worried about optimising disk space for a massive table.
To answer the question, on disk MySql uses 1 + the size that is used in the field to store the data (so if the column was declared varchar(45), and the field was "FooBar" it would use 7 bytes on disk, unless of course you where using a multibyte character set, where it would be using 14 bytes). So, however you declare your columns, it wont make a difference on the storage end (you stated you are worried about disk optimization for a massive table). However, it does make a difference in queries, as VARCHAR's are converted to CHAR's when MySql makes a temporary table (SORT, ORDER, etc) and the more records you can fit into a single page, the less memory and faster your table scans will be.
MySQL stores a varchar field as a variable length record, with either a one-byte or a two-byte prefix to indicate the record size.
Having a pattern of storage sizes doesn't really make any difference to how MySQL will function when dealing with variable length record storage. The length specified in a varchar(x) declaration will simply determine the maximum length of the data that can be stored. Basically, a varchar(16) is no different disk-wise than a varchar(128).
This manual page has a more detailed explanation.
Edit: With regards to your updated question, the answer is still the same. A varchar field will only use up as much space on disk as the data you store in it (plus a one or two byte overhead). So it doesn't matter if you have a varchar(16) or a varchar(128), if you store a 10-character string in it, you're only going to use 10 bytes (plus 1 or 2) of disk space.