I'm currently having a discussion in my class about the datatypes char and int in MySQL.
We have a phone number of 8 individual numbers. However, we can't find any pros and cons for each of them.
Should we use the char type or the int type, and why?
If you have a fixed size then you can use char(8) -> 8B , mysql works much faster with fixed size fields, but if you chose int -> 4B you will save space and also if you will have a index on that field it will work faster then the char.
You can do a simple benchmark to test the speed of writes and reads using char(8) and int
There is no point in using variable length type like varchar or varbinary if you have a fixed size because performance will decrease
Depends on how you want to represent the phone number, do you need area codes, country codes and stuff like that, and do you want to save it as a single column or do you want to split it up?
Personally, I would choose to represent area codes, country codes, and the phone number as 3 columns with the datatype int, as this would make it easier to find all phone numbers in one area, and so on. But if it's only purpose is to be a like a string value, the char would be sufficient, however i would consider using the varchar instead, if you have phone numbers for several countries.`
Don't know about the US exactly, but in Europe the national access code is a leading 0 , and international access code is a leading 00 or +. So that's a con to using INT, as the leading 0's would be lost. Further more you also have phone numbers that contain names and even though these names can be converted to numbers, it would probably be nice to keep them as a name. That's a second con to using INT. Last con is you can also have extension numbers, etc. All goes into favor of VARCHAR.
I started using VARCHAR for storing phone numbers because it gives you more range in formats of different countries. Storage is cheap.
Related
I'm taking the Meta Data Engineer Professional Certificate and I was just given this prompt in a lab:
Mr. Carl needs to have a new table to store the contact details of each customer including customer account number, customer phone number and customer email address.
You are required to choose a relevant data type for each of the columns.
Solution:
Account number: INTEGER
Phone number: INTEGER
Email: VARCHAR
Prior to reading the solution I selected VARCHAR(10) as the datatype for storing phone numbers as I thought they should be treated as string data. My reasoning is that there's no reason to perform any sort of mathematical operation on a phone number, and they're often typed with other characters like "(" or "-".
Is there any compelling reason for storing a phone number as an INT? Do you agree with the solution to this prompt? What is the best practice for storing phone numbers?
Is "Meta Data Engineer Professional Certificate" aimed at MySQL?
General Professional: If not MySQL-specific, then you need to understand that "INTEGER" is implemented in different ways by different database engines.
MySQL Professional: INTEGER, in MySQL, maps to INT SIGNED, which is limited to about 2 billion--That is only 9 digits. I don't know what the max phone number is worldwide, but I know that 10 is needed.
BIGINT gives you about 18 digits (in 8 bytes), but that seems silly. For the reasons already mentioned VARCHAR(...) is reasonable. (Perhaps a limit of 20 would be quite sufficient.) In that case, a 10-digit number would take 11 bytes (1 for length, plus 10 for the number.)
Arguably, you could say, for example DECIMAL(15) to allow up to 15 digits in a 7-byte column.
(I prefer VARCHAR, in spite of it taking the most space.)
Either way: It is a bad test question if it does not understand the two cases I present here.
Non digits: 'typed with other characters like "(" or "-"' -- That brings up a different issue. It comes under the general heading of GIGO. Cleanse the data before storing it into the database.
If you ever needed to compare two phone numbers for equality, you would wish you had removed all non-digits. (Or added them in some canonical way, such as US: "(800)543-1212"
User input: If you ever create a UI for entering phone numbers, dates, SSNs, (or other numbers with some structure), DO NOT require the user to follow some punctuation rules. DO allow a variety of typical formats. (OK, Dates are tricky because there are incompatible orderings. But what if I type "1-1-2021", will you spit at me not having the leading zeroes?
Indexing: VARCHAR, DECIMAL, INT, etc are all indexable. Any speed difference is not significant.
Extensions: Without VARCHAR, how would you represent the "extension" in "(800)543-1212x543"? Might this point be the deciding factor in favor of VARCHAR? And you should write a bug report against that 'Certification' test?
Duplicate?: Which is best data type for phone number in MySQL and what should Java type mapping for it be? covers most of what I have said, and hints that [perhaps] VARCHAR(20) is sufficient. (The quoted 15, excludes the international prefix.)
In my opinion, there is no absolute best choice in this. Both have pros and cons. Personally, I'm in favor of using varchar. Though special characters like hyphen can cause dupes when mishandled (it's a rare case and it's the user to blame as it's required to verify the input before submitting),it does have the merit of formatting the phone which improves the readability. e.g area_code-tel: xxx-xxxxxxxx (without it it's near impossible to separate the area code and the phone number as both can have a varied length). About indexing,though numerics does have advantages over strings, I'm not sure if a phone number would be used as an index. There are more worthy candidates such as ID or date, but what would a phone number do? Usually we look for the phone based on indexed column such as ID, but how often do we get something based on phone number? Unless we want to list all phones from a particular area, we don't really need it to be indexed. Then it actually would be more fitting to use special characters like hyphen to help determine the area part.
P.S Like Ken White kindly suggested, there are cases when phone numbers should be indexed, especially when they are more suitable to be an identifier.
Storing phone numbers as strings can be a disaster, the first things coming up to my mind are:
You can get dupes easily, maybe someone types the number with (
and/or - and another user does type the same number without those
characters, long story short you end up with a duplicate.
Thinking about a way to normalize the phone number using an integer
makes too more sense in terms of normalization and non duplication.
Also think about a search with the scenario above, what would you use ? a like a numeric operator ? spread casts ? Messy...
Now comes the important thing and it is related to the indexing, the
int will be faster. The longer is the varchar the slower it gets
however you are limiting its length.
The validation can be on the UI with a field mask, or using a regex on the logic whatever makes more sense for you.
Hope i helped a little bit :)
I made a table for storing contact record of user of my website. It also contains a 10 digit mobile no.
Table structure is like this:
CREATE TABLE contact_user
(
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
contact INT(10)
)
Now the problem is that if I insert a 10 mobile no.(e.g.9595891256) using phpmyadmin into contact field it will insert some random value and shows a warning saying "data out of column range"
But if I insert a simple 10 digit no (e.g.4561327894) then it works well and no warning is shown.
SO please tell me what is the issue in inserting a mobile no in this column?
I am using mysql 5.1 on ubuntu 11.04 and also using phpmyadmin.
INT(10) does not mean a 10-digit number, it means an integer with a display width of 10 digits. Whether you put INT(2) or INT(10), MySQL still only stores an (unsigned, in this case) INT which has a maximum value of 4294967295.
You can use a BIGINT instead of INT to store it as a numeric. I wouldn't recommend this, however, because it won't allow for international numbers. If you are certain your application will only use US numbers, using BIGINT will save you 3 bytes per row over VARCHAR(10) -- if that sort of thing concerns you.
Since it is a phone number (and therefore you won't be doing numeric calculations against it), try using a VARCHAR(20). This allows you the ability to store international phone numbers properly, should that need arise.
The maximum value for an INT in MySQL is 2147483647 (or 4294967295 if unsigned), according to the MySQL documentation. If your value exceeds this limit, you integer will overflow, hence the not-that-random value.
Also, INT is not the best solution to store phone numbers. You might lose leading zeros if they are one or more. Also, international phone numbers start with a + sign. Consider using a VARCHAR. This will end up using more space in the database, but will provide more consistency.
It is because of the max size of type INT you need to use a different type to hold a number that large. Try using BIGINT.
I'm working tons of phone numbers, and many are international.
I've changed my phone numbers table structure to have 5 columns:
`phonenumbers`.`phoneID`
`phonenumbers`.`countrycode`
`phonenumbers`.`areacode`
`phonenumbers`.`phonenumber`
`phonenumbers`.`ext`
At the moment the phoneID is the only column that's an INT, since it's the primary key.
Should I change the other columns to integers? I've heard indexes work best with numeric values, and I'm only storing numbers in each of the columns (no dashes, parenthesis, spaces, etc)
I'm still learning how MySQL works with indexes, so I'm curious how others work with searching for numbers. In this case, I'm sure I'll be searching for numbers that start with a certain known areacode and part of a known phonenumber, or an entire phonenumber.
The part that gets me with indexing and table columns like phone numbers is that I don't always know how long a phonenumber will be. Since countries have different lengths for areacodes and phonenumbers.
In summary, INT vs VARCHAR indexing with numbers.
Phone numbers are not integers, so don't store them as one, it'll just cause you trouble. The obvious cases are when you have to handle phone numbers too big to fit in an int, or phone numbers starting with a 0.
Moreover, as you want to do prefix matches (phonenumber like '800%'), mysql will be able to use indexes if you're using varchar columns.
You have to figure out how you're querying this data, if you're frequently doing queries like where countrycode='1' and areacode='123' and phonenumber like '2%' , you'd want a compound index on (countrycode,areacode,phonenumber) , and if you're also often doing queries on only the phonenumber, you'd want an additional index only on the phonenumber column, but this is something you have to work out depeding on the amount of data you have and queries you do - work with EXPLAIN to learn how your indexes are used and where they are needed.
Use varchar for representing phone numbers NOT integers. Otherwise you will find your design decision will come back to bite you.
Also: "I've heard indexes work best with numeric values" - well, that's not strictly accurate: yes the index will take up less space, and more rows will fit per page etc, but an index on a varchar column works perfectly well.
Worry about index size and performance when (1) you have a huge amount of data and (2) when you have measured a performance problem.
In my opinion you have a lot of attributes, that you don´t need, and for phone numbers i usualy use an auto-increment key for id and the phone number is a varchar. This makes it easier the validation with the use of a programming language. It´s my opinion...
Use a BIGINT UNSIGNED simple because this forces you to normalize your data. Force your user to store the phonenumber in root level. That means at country level. You could store the country prefix in a separate column to ease the usage.
Everybody types phone-numbers in different ways and this makes it almost impossible to search the data.
E.g. %020123456% will not match 02 0123456. Are you going to search all combinations or just parse it?
This i know from experience, we had to fix manually about 1,000 phonenumbers which we could not script out when installing an auto-dialer.
I made a table for storing contact record of user of my website. It also contains a 10 digit mobile no.
Table structure is like this:
CREATE TABLE contact_user
(
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
contact INT(10)
)
Now the problem is that if I insert a 10 mobile no.(e.g.9595891256) using phpmyadmin into contact field it will insert some random value and shows a warning saying "data out of column range"
But if I insert a simple 10 digit no (e.g.4561327894) then it works well and no warning is shown.
SO please tell me what is the issue in inserting a mobile no in this column?
I am using mysql 5.1 on ubuntu 11.04 and also using phpmyadmin.
INT(10) does not mean a 10-digit number, it means an integer with a display width of 10 digits. Whether you put INT(2) or INT(10), MySQL still only stores an (unsigned, in this case) INT which has a maximum value of 4294967295.
You can use a BIGINT instead of INT to store it as a numeric. I wouldn't recommend this, however, because it won't allow for international numbers. If you are certain your application will only use US numbers, using BIGINT will save you 3 bytes per row over VARCHAR(10) -- if that sort of thing concerns you.
Since it is a phone number (and therefore you won't be doing numeric calculations against it), try using a VARCHAR(20). This allows you the ability to store international phone numbers properly, should that need arise.
The maximum value for an INT in MySQL is 2147483647 (or 4294967295 if unsigned), according to the MySQL documentation. If your value exceeds this limit, you integer will overflow, hence the not-that-random value.
Also, INT is not the best solution to store phone numbers. You might lose leading zeros if they are one or more. Also, international phone numbers start with a + sign. Consider using a VARCHAR. This will end up using more space in the database, but will provide more consistency.
It is because of the max size of type INT you need to use a different type to hold a number that large. Try using BIGINT.
I'm working with some database abstraction layers and most of them are using attributes like "String" which is VARCHAR 250 or INTEGER which has length of 11 digits. But for example I have something that will be less than 250 characters long. Should I go and make it less? Does it really makes any valuable difference?
Thanks in advance!
INT length does nothing. All INTs are 4 bytes. The number you can set, is only used for zerofill (and who uses that!?).
VARCHAR length does more. It's the maxlength of the field. VARCHAR is saved so that only the actual data is stored, so the length doesn't mattter. These days, you can have bigger VARCHARs than 255 bytes (being 256^2-1). The difference is the bytes that are used for the field length. VARCHAR(100) and VARCHAR(8) and VARCHAR(255) use 1 byte to save the field length. VARCHAR(1000) uses 2.
Hope that helps =)
edit
I almost always make my VARCHARs 250 long. Actual length should be checked in the app anyway. For bigger fields I use TEXT (and those are stored differently, so can be much much longer).
edit
I don't know how current this is, but it used to help me (understand): http://help.scibit.com/Mascon/masconMySQL_Field_Types.html
First, remember that the database is meant to store facts and is designed to protect itself against bad data. Thus, the reason you do not want to allow a user to enter 250 characters for a first name is that a user will put all kinds of data in there that is not a first name. They'll put their whole name, their underwear size, a novel about what they did last summer and so on. Thus, you want to strive to enforce that the data is as correct as possible. It is a mistake to assume that the application is the sole protector against bad data. You want users to tell you that they had a problem stuffing War in Peace into a given column.
Thus, the most important question is, "What is the most appropriate value for the data being stored?" Ideally, you would use an int and a check constraint to ensure that the values have an appropriate range (e.g. greater than zero, less than a billion etc.). Unfortunately, this is one of MySQL's greatest weakness: it does not honor check constraints. That simply means you must implement those integrity checks in triggers which admittedly is more cumbersome.
Will the difference between an int (4 bytes) make an appreciable difference to a tinyint (1 byte)? Obviously, it depends on the amount of data. If you will have no more than 10 rows, the answer is obviously no. If you will have 10 billion rows, the answer is obviously "Yes". However, IMO, this is premature optimization. It is far better to focus on ensuring correctness first.
For text, you should ask whether your data should support Chinese, Japanese or non-ANSI values (i.e., should you use nvarchar or varchar)? Does this value represent a real world code like a currency code, or bank code which has a specific specification?
Not so sure in MySQL, but in MS SQL it only makes a difference for sufficiently large databases. Typically, I like to use smaller fields for a) the space saving (it never hurts to practice good habits) and b) for the implied validation (if you know a certain field should never be more than 10 characters, why allow eleven, let alone 250?).
I thinks Rudie is wrong, not all INTs are 4 bytes... in MySQL you have:
tinyint = 1 byte,
smallint = 2 bytes,
mediumint = 3 bytes,
int = 4 bytes,
bigint = 8 bytes.
I think Rudie refers to the "display with" that is the number you put between parenthesis when you are creating a column, e.g.:
age INT(3)
You're telling to the RDBMS just to SHOW no more than 3 numbers.
And VARCHARs are (variable length charcter string) so if you declare let's say name varchar(5000) and you store a name like "Mario" you only are using 7 bytes (5 for the data and 2 for the length of the value).
The correct field size serves to limit the bad data that can be put in. For instance suppose you have a phone number field. If you allow 250 characters, you will often end up with things like the following in the phone field (an example not taken at random):
Call the good-looking blonde secretary instead.
So first limiting the length is part of how we enforce data integrity rules. As such it is critical.
Second, there is only so much space on a datapage and while some databases will allow you to create tables where the potential record is longer than the width of the data page, they often will not allow you to actually exceed it when storing the data. This can lead to some very hard to find bugs when suddenly one record can't be saved. I don't know about MySql and whether it does this but I know SQL Server does and it is very hard to figure out what is wrong. So making data the correct size can be critical to preventing bugs.