Suggested character set for non utf8 columns in mysql - mysql

Currently I'm using VARCHAR/TEXT with utf8_general_ci for all character columns in mysql. Now I want to improve database layout/performance.
What I figured out so far is to better use
CHAR instead of VARCHAR for fixed length columns as GUIDs or session ids
also use CHAR for small columns having length of 1 or maybe 2?
As I do not want to go as wide to save my GUIDs as BINARY(16) because of handling issues, I'd rather save them as CHAR(32) to especially improve keys. (I would even save 2/3 when switching from utf8 to some 1-byte-charset)
So what will the best character set be for such columns? ASCII? latin1? BINARY? Which collation?
What characterset/collation to use for other columns where I do not need utf8 support but need proper sorting. A Binary would fail?
Is it good practice to mix up different character sets in same mysql (innodb) table? Or do I get better performance when all columns have same charset within same table? Or even database?

GUID/UUID/MD5/SHA1 are all hex and dash. For them
CHAR(..) CHARACTER SET ascii COLLATE ascii_general_ci
That will allow for A=a when comparing hex strings.
For Base64 things, use either of
CHAR(..) CHARACTER SET ascii COLLATE ascii_bin
BINARY(..)
since A is not semantically the same as a.
Further notes...
utf8 spits at you if you give it an invalid 8-bit value.
ascii spits at you for any 8-bit value.
latin1 accepts anything -- thereby your problems down the road
It is quite OK to have different columns in a table having different charsets and/or collations.
The charset/collation on the table is just a default, ripe for overriding at the column definition.
BINARY may be a tiny bit faster than any _bin collation, but not enough to notice.
Use CHAR for columns that are truly fixed length; don't mislead the user by using it for other cases.
%_bin is faster than %_general_ci, which is faster than other collations. Again, you would be hard-pressed to measure a difference.
Never use TINYTEXT or TINYBLOB.
For proper encoding, use the appropriate charset.
For "proper sorting", use the appropriate collation. See example below.
For "proper sorting" where multiple languages are represented, and you are using utf8mb4, use utf8mb4_unicode_520_ci (or utf8mb4_900_ci if using version 8.0). The 520 and 900 refer to Unicode standards; new collations are likely to come in the future.
If you are entirely in Czech, then consider these charsets and collations. I list them in preferred order:
mysql> show collation like '%czech%';
+------------------+---------+-----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+------------------+---------+-----+---------+----------+---------+
| utf8mb4_czech_ci | utf8mb4 | 234 | | Yes | 8 | -- opens up the world
| utf8_czech_ci | utf8 | 202 | | Yes | 8 | -- opens up most of the world
| latin2_czech_cs | latin2 | 2 | | Yes | 4 | -- kinda like latin1
The rest are "useless":
| cp1250_czech_cs | cp1250 | 34 | | Yes | 2 |
| ucs2_czech_ci | ucs2 | 138 | | Yes | 8 |
| utf16_czech_ci | utf16 | 111 | | Yes | 8 |
| utf32_czech_ci | utf32 | 170 | | Yes | 8 |
+------------------+---------+-----+---------+----------+---------+
7 rows in set (0.00 sec)
More
The reason for using smaller datatypes (where appropriate) is to shrink the dataset, which leads to less I/O, which leads to things being more cacheable, which makes the program run faster. This is especially important for huge datasets; it is less important for small- or medium-sized datasets.
ENUM is 1 byte, yet acts like a string. So you get the "best of both worlds". (There are drawbacks, and there is a 'religious war' among advocates for ENUM vs TINYINT vs VARCHAR.)
Usually columns that are "short" are always the same length. A country_code is always 2 letters, always ascii, always could benefit from case insensitive collation. So CHAR(2) CHARACTER SET ascii COLLATE ascii_general_ci is optimal. If you have something that is sometimes 1-char, sometimes 2, then flip a coin; whatever you do won't make much difference.
VARCHAR (up to 255) has an extra 1-byte length attached to it. So, if your strings vary in length at all, VARCHAR is at least as good as CHAR. So simplify your brain processing: "variable length --> `VARCHAR".
BIT, depending on version, may be implemented as a 1-byte TINYINT UNSIGNED. If you have only a few bits in your table, it is not worth worrying about.
One of my Rules of Thumb says that if you aren't likely to get a 10% improvement, move on to some other optimization. Much of what we are discussing here is under 10% (space in this case). Still, get in the habit of thinking about it when writing CREATE TABLE. I often see tables with BIGINT and DOUBLE (each 8 bytes) that could easily use smaller columns. Sometimes saving more than 50% (space).
How does "space" translate into "speed". Tiny tables -> a tiny percentage. Huge tables -> In some cases 10x. (That's 10-fold, not 10%.) (UUIDs are one way to get really bad performance on huge tables.)
ENUM
Acts and feels like a string, yet takes only one byte. (One byte translates, indirectly, into a slight speed improvement.)
Practical when fewer than, say, 10 different values.
Impractical if frequently adding a new value -- requires ALTER TABLE, though it can be "inplace".
Suggest starting the list with 'unknown' (or something like that) and making the column NOT NULL (versus NULL).
The character set for the enum needs to be whatever is otherwise being used for the connection. The choice does not matter much unless you have options that collate equal (eg, A versus a).

Related

html input text field with data enter of maxlength characters exceeds database column width due to multibyte characters

We allow multibyte input into user profiles on our website. We have run into an issue when the user enters the maximum allowed field width, but includes multibyte characters that cause the field to exceed the maximum width of the database column. Other than truncating the data, is there a general solution for this?
I was surprised after searching on this topic and not finding anything good about it, although there is a lot of discussion about character sets and related topics.
We are using MariaDB. The documentation seems to suggest that a uft8mb4 column should be able to hold the number of characters specified. However I have proven in our system that is not correct.
For example, in a varchar(60) field, we can store the 30-character string made up of 30 repetitions of ç. If one more character is added, an error is returned when writing to the database. The html form has maxlength of 60 and the db table has the column defined as follows:
companyname varchar(60) CHARACTER SET utf8mb4 NOT NULL DEFAULT '',
So possibly something is wrong with the way we are setting it up. Or perhaps the documentation and much online discussion is wrong; but I hardly think so.
What would be a general way to handle this situation? It seems defining all such columns to be double or quadruple the form field maximum (depending upon which character set is supported) is a messy way to accommodate the odd multibyte character. And counting the multibyte characters by scanning the bitwise representation of each field also seems to be a messy approach.
If nobody can point out the error of our ways to solve the dilemma, I would probably opt for the combination approach of (1) increasing the size of the character fields in the database by 10% or so while leaving the html maxlength as at present; (2) relying on the existing routine that truncates to the maximum database column width as we already do for defensive reasons.
EDIT:
I tried to put the code in as "code", but it would not format it properly.
Column definition:
companyname varchar(60) CHARACTER SET utf8mb4 NOT NULL DEFAULT '',
A value from table:
select userid, companyname from user where userid = 855;
| userid | companyname
| 855 | çççççççççççççççççççççççççççççç |
Display of value on screen:
Company Name
çççççççççççççççççççççççççççççç
Attempt to add on character to the companyname:
update user
-> set companyname = concat(companyname, 'a')
-> where userid = 855;
ERROR 1406 (22001): Data too long for column 'companyname' at row 1
The above shows that a varchar(60) CHARACTER SET utf8mb4 column with 30 2-byte characters is "full". So what is wrong with my setup?
EDIT 2:
I looked into this a bit further. In fact I can write directly into the database via SQL a 60 character string of ç characters.
However, when I enter into the web form a string of ç characters, they are doubled in the database into a string of ç characters. I can only enter 30 ç characters because the expansion causes them to take 60 characters. So the problem is the encoding difference.
I have in php.ini default_charset = 'UTF-8'. The definition of the column in question in the database is utf8mb4 as shown above in the original question.
So what is missing that is causing the expansion of the 30 character string in php UTF-8 into the 60-character string in the database?
EDIT 3:
I found the page at https://www.toptal.com/php/a-utf-8-primer-for-php-and-mysql helpful. Most of the points are already in place. However one point is not--adding several lines covering character set in my.cnf, such as default-character-set=UTF-8.
I will be working with the suggestions from that page.
VARCHAR(60) allows 60 characters in whatever CHARACTER SET it is declared to hold. For utf8mb4, that could take up to 240 bytes.
If you want to hold as much as someone might want to type, consider TEXT, which can hold 64K bytes, which is between 16K and 64K utf8mb4 characters.
The disk space consumed by VARCHAR and TEXT is only the number of bytes needed for the value, plus some overhead.
If you have lots of VARCHAR or TEXT columns, you might be hitting a limit on the row size. But that does not seem to be the case.
Example:
mysql> CREATE TABLE c ( x VARCHAR(60) CHARSET utf8mb4 );
Query OK, 0 rows affected (0.01 sec)
mysql> INSERT INTO c (x) VALUES ( REPEAT('ç', 60) );
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO c (x) VALUES ( REPEAT('ç', 61) );
Query OK, 1 row affected, 1 warning (0.00 sec)
mysql> SHOW WARNINGS;
+---------+------+----------------------------------------+
| Level | Code | Message |
+---------+------+----------------------------------------+
| Warning | 1265 | Data truncated for column 'x' at row 1 |
+---------+------+----------------------------------------+
1 row in set (0.00 sec)
mysql> SELECT LENGTH(x), CHAR_LENGTH(x), x FROM c;
+-----------+----------------+--------------------------------------------------------------------------------------------------------------------------+
| LENGTH(x) | CHAR_LENGTH(x) | x |
+-----------+----------------+--------------------------------------------------------------------------------------------------------------------------+
| 120 | 60 | çççççççççççççççççççççççççççççççççççççççççççççççççççççççççççç |
| 120 | 60 | çççççççççççççççççççççççççççççççççççççççççççççççççççççççççççç |
+-----------+----------------+--------------------------------------------------------------------------------------------------------------------------+

MySQL query returns wrong data when column contains utf-8 characters

I have some MySQL table with column 'title' which type is varchar(255), character set is utf8mb4 and collation is utf8mb4_general_ci.
Lets say I have few records with title and those titles contain (or not) diactrics:
id | title
-----------
1 | zolc
2 | żółć
3 | żołc
4 | zólć
I can correctly insert those diactrics and also they are properly displayed when table is selected. But when I try something like this:
SELECT *
FROM my_table
WHERE title LIKE "%zolc%";
I got:
id | title
-----------
1 | zolc
4 | zólć
As you see I asked for version without any diactricts, but also got row with id 4. Selecting żółć returns rows with ids 2 (as expected) and 3. Querying for zołć returns rows 2 and 3, where I would expect that nothing will be returned. There is many combination like this, where some "wrong" rows are returned after query (I tried also with ą and ę and they also act strange).
At first I thought that it is problem with configuration of my technological stack (java web application on top of Spring Boot) but I got exactly the same results when executing queries from MySQL Workbench on local db on Windows machine and by executing queries by ssh to remote db running on Ubuntu machine. There is also no difference if query is done using title LIKE "value" or with WHERE title = "value".
I couldn't find explanation for this - note that this does not simply returns all rows that "match" query parameter but without special characters. I'm struggling to enable search by title but I would like to it be 1:1, so when I use "ż" in my query parameter only rows where "ż" is actually present will be returned.
Thanks in advance for any help.
Your query will use the table/column collation and since that collation considers all those characters equivalent you don't really ask for the values you think. Your choices are to either use proper cultural settings (e.g. utf8mb4_polish_ci) or use none (e.g. utf8mb4_bin). Which option to choose depends on your use case but both are probably better than just using some arbitrary settings: utf8mb4_general_ci is a kind of one size fits all collation designed for speed rather than correctness.
It's also worth nothing that MySQL allows setting collation at different levels:
Table
Column
Specific string
Once more, which one to choose will depend on your specific needs. Here's a little example of the last case (the other ones are straightforward):
SELECT
CASE WHEN 'zolc' COLLATE utf8mb4_general_ci ='zólć' THEN 'equal' ELSE 'different' END AS General,
CASE WHEN 'zolc' COLLATE utf8mb4_unicode_ci ='zólć' THEN 'equal' ELSE 'different' END AS Unicode,
CASE WHEN 'zolc' COLLATE utf8mb4_polish_ci ='zólć' THEN 'equal' ELSE 'different' END AS Polish,
CASE WHEN 'zolc' COLLATE utf8mb4_bin ='zólć' THEN 'equal' ELSE 'different' END AS BinaryCollation,
CASE WHEN BINARY 'zolc'='zólć' THEN 'equal' ELSE 'different' END AS BinaryOperator;
General | Unicode | Polish | BinaryCollation | BinaryOperator
------- | ------- | --------- | --------------- | --------------
equal | equal | different | different | different
(I've assumed the text is in Polish, sorry if it's not.)
utf8mb4_general_ci fails to implement all of the Unicode sorting rules, which will result in undesirable sorting in some situations, such as when using particular languages or characters .
Try changing collation from "utf8mb4_general_ci" to "utf8_bin".
(Reference)

MySQL - UNHEX(HEX(UTF-8)) issue

I've got a database with UTF-8 characters in it, which are improperly displayed. I figured that I could use UNHEX(HEX(column)) != column condition to know what fields have UTF-8 characters in them. The results are rather interesting:
id | content | HEX(content) | UNHEX(HEX(content)) LIKE '%c299%' | UNHEX(HEX(content)) LIKE '%FFF%' | UNHEX(HEX(content))
49829102 | | C299 | 0 | 0 | c299
874625485 | FFF | 464646 | 0 | 1 | FFF
How is this possible and, possibly, how can I find the row with this character in it?
-- edit(2): since my edit has been removed (probably when JamWaffles was fixing my beautiful data table), here it is again: as editor strips out UTF-8 characters, the content in first row is \uc299 (if that's not clear ;) )
-- edit(3): I've figured out what the issue is - the actual representation of UNHEX(HEX(content)) is WRONG - to display my multibyte character I had to do the following: SELECT UNHEX(SUBSTR(HEX(content),1))). Sadly UNHEX(C299) doesn't work as UNHEX(C2)+UNHEX(99) so it's back to the drawing board.
There are two ways to determine if a string contains UTF-8 specific characters. The first is to see if the string has values outside the ASCII character set:
SELECT _utf8 'amńbcd' REGEXP '[^[.NUL.]-[.DEL.]]';
The second is to compare the binary and character lengths:
SELECT LENGTH(_utf8 'amńbcd') <> CHAR_LENGTH(_utf8 'amńbcd');
Both return TRUE.
See http://sqlfiddle.com/#!2/d41d8/9811

How to store URLs in MySQL

I need to store potentially 100s of millions URLs in a database. Every URL should be unique, hence I will use ON DUPLICATE KEY UPDATE and count the duplicate URLs.
However, I am not able to create an index on the URL field as my varchar field is 400 characters. MySQL is complaining and saying; "#1071 - Specified key was too long; max key length is 767 bytes". (Varchar 400 will take 1200 bytes)
What is the best way to do this, if you need to process minimum 500000 URLs per day in a single server?
We are already thinking using MongoDB for the same application, so we can simply query MongoDB and find the duplicate URL, and update the row. However, I am not in favor of solving this problem using MongoDB , and I'd like to use just MySQL at this stage as I'd like to be as lean as possible in the beginning and finish this section of the project much faster. (We haven't played with MongoDB yet and don't want to spend time at this stage)
Is there any other possibility doing this using less resources and time. I was thinking to get MD5 hash of the URL and store it as well. And I can make that field UNIQUE instead. I know, there will be collision but it is ok to have 5-10-20 duplicates in the 100 million URLs, if that's the only problem.
Do you have any suggestions? I also don't want to spend 10 seconds to insert just one URL, as it will process 500k URLs per day.
What would you suggest?
Edit: As per the request this is the table definition. (I am not using MD5 at the moment, it is for testing)
mysql> DESC url;
+-------------+-----------------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-----------------------+------+-----+-------------------+-----------------------------+
| url_id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| url_text | varchar(400) | NO | | | |
| md5 | varchar(32) | NO | UNI | | |
| insert_date | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| count | mediumint(9) unsigned | NO | | 0 | |
+-------------+-----------------------+------+-----+-------------------+-----------------------------+
5 rows in set (0.00 sec)
According to the DNS spec the maximum length of the domain name is :
The DNS itself places only one restriction on the particular labels
that can be used to identify resource records. That one restriction
relates to the length of the label and the full name. The length of
any one label is limited to between 1 and 63 octets. A full domain
name is limited to 255 octets (including the separators).
255 * 3 = 765 < 767 (Just barely :-) )
However notice that each component can only be 63 characters long.
So I would suggest chopping the url into the component bits.
Using http://foo.example.com/a/really/long/path?with=lots&of=query&parameters=that&goes=on&forever&and=ever
Probably this would be adequate:
protocol flag ["http" -> 0 ] ( store "http" as 0, "https" as 1, etc. )
subdomain ["foo" ] ( 255 - 63 = 192 characters : I could subtract 2 more because min tld is 2 characters )
domain ["example"], ( 63 characters )
tld ["com"] ( 4 characters to handle "info" tld )
path [ "a/really/long/path" ] ( as long as you want -store in a separate table)
queryparameters ["with=lots&of=query&parameters=that&goes=on&forever&and=ever" ] ( store in a separate key/value table )
portnumber / authentication stuff that is rarely used can be in a separate keyed table if actually needed.
This gives you some nice advantages:
The index is only on the parts of the url that you need to search on (smaller index! )
queries can be limited to the various url parts ( find every url in the facebook domain for example )
anything url that has too long a subdomain/domain is bogus
easy to discard query parameters.
easy to do case insensitive domain name/tld searching
discard the syntax sugar ( "://" after protocol, "." between subdomain/domain, domain/tld, "/" between tld and path, "?" before query, "&" "=" in the query)
Avoids the major sparse table problem. Most urls will not have query parameters, nor long paths. If these fields are in a separate table then your main table will not take the size hit. When doing queries more records will fit into memory, therefore faster query performance.
(more advantages here).
To index a field up to 767 chars wide, it charset must be ascii or similar, it can´t be utf8 because it uses 3 bytes per char, so the maximun wide for indexed utf-8 fields is 255
Of course, an 767 ascii url field, excedes your initial 400 chars spec. Of course, some urls excedes the 767 limit. Perhaps you can store and index on the first 735 chars plus the md5 hash. You can also have a text full_url field to preserve original value.
Notice that ascii charset is good enough for urls
A well formed URL can only contain characters within the ASCII range - other characters need to be encoded. So assuming the URLs you intend to store are well formed (and if they are not, you may want to fix them prior to inserting them to the database), you could define your url_text column character set to ASCII (latin1 in MySQL). With ASCII, one char is one byte, and you will be able to index the whole 400 characters like you want.
The odds of a spurious collision with MD5 (128 bits) can be phrased this way:
"If you have 9 Trillion different items, there is only one chance in 9 Trillion that two of them have the same MD5."
To phrase it another way, it is more likely to be hit by a meteor while winning the mega-lottery.
You can change the url_text from VarChar(400) to Text, then you can add a full text index against it allowing you to search for the existence of the URL before you insert it.

Common MySQL fields and their appropriate data types

I am setting up a very small MySQL database that stores, first name, last name, email and phone number and am struggling to find the 'perfect' datatype for each field. I know there is no such thing as a perfect answer, but there must be some sort of common convention for commonly used fields such as these. For instance, I have determined that an unformatted US phone number is too big to be stored as an unsigned int, it must be at least a bigint.
Because I am sure other people would probably find this useful, I dont want to restrict my question to just the fields I mentioned above.
What datatypes are appropriate for common database fields? Fields like phone number, email and address?
Someone's going to post a much better answer than this, but just wanted to make the point that personally I would never store a phone number in any kind of integer field, mainly because:
You don't need to do any kind of arithmetic with it, and
Sooner or later someone's going to try to (do something like) put brackets around their area code.
In general though, I seem to almost exclusively use:
INT(11) for anything that is either an ID or references another ID
DATETIME for time stamps
VARCHAR(255) for anything guaranteed to be under 255 characters (page titles, names, etc)
TEXT for pretty much everything else.
Of course there are exceptions, but I find that covers most eventualities.
Here are some common datatypes I use (I am not much of a pro though):
| Column | Data type | Note
| ---------------- | ------------- | -------------------------------------
| id | INTEGER | AUTO_INCREMENT, UNSIGNED |
| uuid | CHAR(36) | or CHAR(16) binary |
| title | VARCHAR(255) | |
| full name | VARCHAR(70) | |
| gender | TINYINT | UNSIGNED |
| description | TINYTEXT | often may not be enough, use TEXT
instead
| post body | TEXT | |
| email | VARCHAR(255) | |
| url | VARCHAR(2083) | MySQL version < 5.0.3 - use TEXT |
| salt | CHAR(x) | randomly generated string, usually of
fixed length (x)
| digest (md5) | CHAR(32) | |
| phone number | VARCHAR(20) | |
| US zip code | CHAR(5) | Use CHAR(10) if you store extended
codes
| US/Canada p.code | CHAR(6) | |
| file path | VARCHAR(255) | |
| 5-star rating | DECIMAL(3,2) | UNSIGNED |
| price | DECIMAL(10,2) | UNSIGNED |
| date (creation) | DATE/DATETIME | usually displayed as initial date of
a post |
| date (tracking) | TIMESTAMP | can be used for tracking changes in a
post |
| tags, categories | TINYTEXT | comma separated values * |
| status | TINYINT(1) | 1 – published, 0 – unpublished, … You
can also use ENUM for human-readable
values
| json data | JSON | or LONGTEXT
In my experience, first name/last name fields should be at least 48 characters -- there are names from some countries such as Malaysia or India that are very long in their full form.
Phone numbers and postcodes you should always treat as text, not numbers. The normal reason given is that there are postcodes that begin with 0, and in some countries, phone numbers can also begin with 0. But the real reason is that they aren't numbers -- they're identifiers that happen to be made up of numerical digits (and that's ignoring countries like Canada that have letters in their postcodes). So store them in a text field.
In MySQL you can use VARCHAR fields for this type of information. Whilst it sounds lazy, it means you don't have to be too concerned about the right minimum size.
Since you're going to be dealing with data of a variable length (names, email addresses), then you'd be wanting to use VARCHAR. The amount of space taken up by a VARCHAR field is [field length] + 1 bytes, up to max length 255, so I wouldn't worry too much about trying to find a perfect size. Take a look at what you'd imagine might be the longest length might be, then double it and set that as your VARCHAR limit. That said...:
I generally set email fields to be VARCHAR(100) - i haven't come up with a problem from that yet. Names I set to VARCHAR(50).
As the others have said, phone numbers and zip/postal codes are not actually numeric values, they're strings containing the digits 0-9 (and sometimes more!), and therefore you should treat them as a string. VARCHAR(20) should be well sufficient.
Note that if you were to store phone numbers as integers, many systems will assume that a number starting with 0 is an octal (base 8) number! Therefore, the perfectly valid phone number "0731602412" would get put into your database as the decimal number "124192010"!!
Any Table ID
Use: INT(11).
MySQL indexes will be able to parse through an int list fastest.
Anything Security
Use: BINARY(x), or BLOB(x).
You can store security tokens, etc., as hex directly in BINARY(x) or BLOB(x). To retrieve from binary-type, use SELECT HEX(field)... or SELECT ... WHERE field = UNHEX("ABCD....").
Anything Date
Use: DATETIME, DATE, or TIME.
Always use DATETIME if you need to store both date and time (instead of a pair of fields), as a DATETIME indexing is more amenable to date-comparisons in MySQL.
Anything True-False
Use: BIT(1) (MySQL 8-only.) Otherwise, use BOOLEAN(1).
BOOLEAN is actually just an alias of TINYINT(1), which actually stores 0 to 255 (not exactly a true/false, is it?).
Anything You Want to call `SUM()`, `MAX()`, or similar functions on
Use: INT(11).
VARCHAR or other types of fields won't work with the SUM(), etc., functions.
Anything Over 1,000 Characters
Use: TEXT.
Max limit is 65,535.
Anything Over 65,535 Characters
Use: MEDIUMTEXT.
Max limit is 16,777,215.
Anything Over 16,777,215 Characters
Use: LONGTEXT.
Max limit is 4,294,967,295.
FirstName, LastName
Use : VARCHAR(255).
UTF-8 characters can take up three characters per visible character, and some cultures do not distinguish firstname and lastname. Additionally, cultures may have disagreements about which name is first and which name is last. You should name these fields Person.GivenName and Person.FamilyName.
Email Address
Use : VARCHAR(256).
The definition of an e-mail path is set in RFC821 in 1982. The maximum limit of an e-mail was set by RFC2821 in 2001, and these limits were kept unchanged by RFC5321 in 2008. (See the section: 4.5.3.1. Size Limits and Minimums.) RFC3696, published 2004, mistakenly cites the email address limit as 320 characters, but this was an "info-only" RFC that explicitly "defines no standards" according to its intro, so disregard it.
Phone
Use: VARCHAR(255).
You never know when the phone number will be in the form of "1800...", or "1-800", or "1-(800)", or if it will end with "ext. 42", or "ask for susan".
ZipCode
Use: VARCHAR(10).
You'll get data like 12345 or 12345-6789. Use validation to cleanse this input.
URL
Use: VARCHAR(2000).
Official standards support URL's much longer than this, but few modern browsers support URL's over 2,000 characters. See this SO answer: What is the maximum length of a URL in different browsers?
Price
Use: DECIMAL(11,2).
It goes up to 11.
I am doing about the same thing, and here's what I did.
I used separate tables for name, address, email, and numbers, each with a NameID column that is a foreign key on everything except the Name table, on which it is the primary clustered key. I used MainName and FirstName instead of LastName and FirstName to allow for business entries as well as personal entries, but you may not have a need for that.
The NameID column gets to be a smallint in all the tables because I'm fairly certain I won't make more than 32000 entries. Almost everything else is varchar(n) ranging from 20 to 200, depending on what you wanna store (Birthdays, comments, emails, really long names). That is really dependent on what kind of stuff you're storing.
The Numbers table is where I deviate from that. I set it up to have five columns labeled NameID, Phone#, CountryCode, Extension, and PhoneType. I already discussed NameID. Phone# is varchar(12) with a check constraint looking something like this: CHECK (Phone# like '[0-9][0-9][0-9]-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'). This ensures that only what I want makes it into the database and the data stays very consistent. The extension and country codes I called nullable smallints, but those could be varchar if you wanted to. PhoneType is varchar(20) and is not nullable.
Hope this helps!