Database schema for room of a building - mysql

I have to create a booking software and I've started to design the database.
The room may be in the any place of the world so I would like to have advice for address.
Reading a lot of similary question I have designed this schema, what do you think about? Is it enough to store all type of addresses?
I followed this structure.

Don't use a 4-byte INT for "country_id", use the standard 2-char "country codes"; it is smaller and avoids a JOIN.
There is not really much need to have "city" split out from address -- it does not save enough space to matter, nor is it is useful for "normalization".
So, combine Country and City into Address.
It is rarely useful to have 1:1 relationships; it is almost always better to merge the two tables into one. I am thinking about Building and Address.
Might you have a billion "floors"? Don't use 4-byte INT when 1-byte TINYINT UNSIGNED is more appropriate. See also SMALLINT UNSIGNED (2 bytes, range of 0..65535).
DECIMAL(6,4)/(7,4) is all that is needed for lat/long. 8 decimal places is getting into microscopic distances.
Pick the appropriate CHARACTER SET. country_code and (I suspect) postal_code can be ascii anywhere in the world. (I could be wrong -- Bangladesh uses non-Arabic numerals on license plates.) Other VARCHARs should probably be utf8mb4 if you are truly international.
It is good that you read a lot of similar questions to come up with the schema. It is unfortunate that they were novices.

Related

mySQL data type for storing phone numbers

I'm taking the Meta Data Engineer Professional Certificate and I was just given this prompt in a lab:
Mr. Carl needs to have a new table to store the contact details of each customer including customer account number, customer phone number and customer email address.
You are required to choose a relevant data type for each of the columns.
Solution:
Account number: INTEGER
Phone number: INTEGER
Email: VARCHAR
Prior to reading the solution I selected VARCHAR(10) as the datatype for storing phone numbers as I thought they should be treated as string data. My reasoning is that there's no reason to perform any sort of mathematical operation on a phone number, and they're often typed with other characters like "(" or "-".
Is there any compelling reason for storing a phone number as an INT? Do you agree with the solution to this prompt? What is the best practice for storing phone numbers?
Is "Meta Data Engineer Professional Certificate" aimed at MySQL?
General Professional: If not MySQL-specific, then you need to understand that "INTEGER" is implemented in different ways by different database engines.
MySQL Professional: INTEGER, in MySQL, maps to INT SIGNED, which is limited to about 2 billion--That is only 9 digits. I don't know what the max phone number is worldwide, but I know that 10 is needed.
BIGINT gives you about 18 digits (in 8 bytes), but that seems silly. For the reasons already mentioned VARCHAR(...) is reasonable. (Perhaps a limit of 20 would be quite sufficient.) In that case, a 10-digit number would take 11 bytes (1 for length, plus 10 for the number.)
Arguably, you could say, for example DECIMAL(15) to allow up to 15 digits in a 7-byte column.
(I prefer VARCHAR, in spite of it taking the most space.)
Either way: It is a bad test question if it does not understand the two cases I present here.
Non digits: 'typed with other characters like "(" or "-"' -- That brings up a different issue. It comes under the general heading of GIGO. Cleanse the data before storing it into the database.
If you ever needed to compare two phone numbers for equality, you would wish you had removed all non-digits. (Or added them in some canonical way, such as US: "(800)543-1212"
User input: If you ever create a UI for entering phone numbers, dates, SSNs, (or other numbers with some structure), DO NOT require the user to follow some punctuation rules. DO allow a variety of typical formats. (OK, Dates are tricky because there are incompatible orderings. But what if I type "1-1-2021", will you spit at me not having the leading zeroes?
Indexing: VARCHAR, DECIMAL, INT, etc are all indexable. Any speed difference is not significant.
Extensions: Without VARCHAR, how would you represent the "extension" in "(800)543-1212x543"? Might this point be the deciding factor in favor of VARCHAR? And you should write a bug report against that 'Certification' test?
Duplicate?: Which is best data type for phone number in MySQL and what should Java type mapping for it be? covers most of what I have said, and hints that [perhaps] VARCHAR(20) is sufficient. (The quoted 15, excludes the international prefix.)
In my opinion, there is no absolute best choice in this. Both have pros and cons. Personally, I'm in favor of using varchar. Though special characters like hyphen can cause dupes when mishandled (it's a rare case and it's the user to blame as it's required to verify the input before submitting),it does have the merit of formatting the phone which improves the readability. e.g area_code-tel: xxx-xxxxxxxx (without it it's near impossible to separate the area code and the phone number as both can have a varied length). About indexing,though numerics does have advantages over strings, I'm not sure if a phone number would be used as an index. There are more worthy candidates such as ID or date, but what would a phone number do? Usually we look for the phone based on indexed column such as ID, but how often do we get something based on phone number? Unless we want to list all phones from a particular area, we don't really need it to be indexed. Then it actually would be more fitting to use special characters like hyphen to help determine the area part.
P.S Like Ken White kindly suggested, there are cases when phone numbers should be indexed, especially when they are more suitable to be an identifier.
Storing phone numbers as strings can be a disaster, the first things coming up to my mind are:
You can get dupes easily, maybe someone types the number with (
and/or - and another user does type the same number without those
characters, long story short you end up with a duplicate.
Thinking about a way to normalize the phone number using an integer
makes too more sense in terms of normalization and non duplication.
Also think about a search with the scenario above, what would you use ? a like a numeric operator ? spread casts ? Messy...
Now comes the important thing and it is related to the indexing, the
int will be faster. The longer is the varchar the slower it gets
however you are limiting its length.
The validation can be on the UI with a field mask, or using a regex on the logic whatever makes more sense for you.
Hope i helped a little bit :)

MySQL: using char(n) vs decimal(n) with zero fill

I was asked to use a database in which most of the primary keys, and other fields as well, uses char(n) to store numeric values with padding, for example:
product_id: char(8) [00005677]
user_id: char(6) [000043]
category_id: char(2) [05]
The reason they want to use it like that, is to be able to use characters (in the far future) if they want. However they have many rules based in numbers, for example, category_id from 01 to 79 correspond to a general category and from 80 to 89 is a special category and 90 to 99 is user defined category.
I personally think that using char(n) to store numbers is a bad practice. My reasons are:
using char, " " != 0, 0 != 00, 05 != 5, 00043 != 000043, and so on. For that reason,
the values have to be constantly checked (to prevent data corruption).
If I pad a number: 0 -> 00, then I have to pay attention not to pad
a character (A -> 0A)
If characters are used, then ranges become strange, something like:
from 01 to 79 and AB and RX and TZ and S, etc...
Indexing numbers instead of chars result in a performance gain
I'm proposing to change it to decimal(n) with zerofill to make it more "error-proof", as this information is modified by different sources (web, windows client, upload csv). If they want to add more categories, for example, then updating from decimal(2) to decimal(3) will be easier.
My question then is: Am I wrong? can char(n) be trusted for this task? If "chars" are evil with numbers, then which other disadvantages am I missing in the above list (I may need better reasons if I want to win my case)?
TIA (any comment/answer will be appreciated).
If this was SQL Server or Oracle or any other RDBMS, I would recommend enforcing a check constraint on those columns so that the data always matched the full capacity of the column - this would ensure your identifiers are uniform.
Unfortunately MySQL doesn't support this.
While it wouldn't stop the annoyance of having to pad things coming into the database or in search routines, on the client or in procs in the database, it would guarantee you that the fields were clean at the lowest level.
I find using constraints like this help avoid things getting badly out of hand.
As far as the optimization by using numbers, if they have to accommodate non-numeric characters in the future, that's not going to be an option.
It is very common to have natural keys (which could be candidates for a primary key) with varchar/char data, but yet instead enforce referential integrity on surrogate keys (usually some kind of autonumbering integer which is simply an internal reference, and often the clustered index and primary key).
Quoting your question:
...store numeric values with padding...
You did not show any examples of numeric data, only character data that happens to consist of numbers. If you had said that their OrderTotal column was a char(10), then I'd start to worry.
Just treat this as character data and you will be fine. I can see no business or technical case to change the database (unless you are beginning a near-total rewrite).
Regarding performance... If this is actually a concern, then you most likely have far bigger issues to deal with. MySQL is fast and accurate.
--
Write a function somewhere that will zerofill user inputted ID's for the purpose of querying. Use this function everywhere you need to accept user input. NEVER EVER use a numeric data type to store your data (if PHP, never use +, always use . to concat, etc...)
Remember, this is no different than Item_Number = "SHIRT123" or any other string ID you may encounter.
Take care

mysql indexing question

I have a "best practices" question, concerning indexing.
I have to index phone numbers, which I normally format the column for an integer. I could separate the number into multiple columns: areacode, suffix, prefix, country code. But since I have to account for international numbers, and the numbers get kinda funny in certain countries, I prefer to keep one column.
So my question is, should I keep the column data saved for integers, characters, or varchars?
I do strip out anything non-int related, so varchar is probably not needed.
I have to provide the searching ability for my clients, thus I need to index the number.
If all the phone numbers were from the US, then I'd separate the columns, but I'm catering to the international too.
So I'm curious about the indexing part, and other people's practices in this arena. Is it best to index with integers (for something like this), or does it even matter.
As a side note, the phone numbers are not going to be all the same length. Which is why I ask about formatting the column structure in char or varchar.
thanks guys!!
How large is the table expected to be? The reason I ask is that indexes on ints are going to be smaller, obviously, but on a small table this isn't a major consideration. Using varchar gives you more flexibility to do things like "...WHERE phonenumber like '415%' etc., at the cost of a larger index. If the table is quite large, and the box it runs on is at all memory-constrained, you can run into the situation where your index doesn't fit into memory, sending queries against that index into swap hell. This can be exacerbated by your choice of storage engine: InnoDB prefixes every index with the primary key, for example, which can bloat your indexes if your PK is on a wide field or fields.
Phone numbers can include # and *, so I would recommend against using integers.
Also the international prefix is + this is to support international prefixes regardless of country you are in.
e.g. in South Africa you need to prefix the country code with 09; in Europe the prefix is 00.
To make the numbers work everywhere you replace the international prefix with + and your cellphone will replace this with the local prefix for dialing abroad.
I'd use a varchar for phone numbers.
Furthermore, I'd use an integer auto_increment as primary key and use the phone number as secondary key, in order to keep performance on InnoDB snappy.
Also remember that people can 'share' a phone number, so it's not guaranteed be by unique.

MySQL: Fields length. Does it really matter?

I'm working with some database abstraction layers and most of them are using attributes like "String" which is VARCHAR 250 or INTEGER which has length of 11 digits. But for example I have something that will be less than 250 characters long. Should I go and make it less? Does it really makes any valuable difference?
Thanks in advance!
INT length does nothing. All INTs are 4 bytes. The number you can set, is only used for zerofill (and who uses that!?).
VARCHAR length does more. It's the maxlength of the field. VARCHAR is saved so that only the actual data is stored, so the length doesn't mattter. These days, you can have bigger VARCHARs than 255 bytes (being 256^2-1). The difference is the bytes that are used for the field length. VARCHAR(100) and VARCHAR(8) and VARCHAR(255) use 1 byte to save the field length. VARCHAR(1000) uses 2.
Hope that helps =)
edit
I almost always make my VARCHARs 250 long. Actual length should be checked in the app anyway. For bigger fields I use TEXT (and those are stored differently, so can be much much longer).
edit
I don't know how current this is, but it used to help me (understand): http://help.scibit.com/Mascon/masconMySQL_Field_Types.html
First, remember that the database is meant to store facts and is designed to protect itself against bad data. Thus, the reason you do not want to allow a user to enter 250 characters for a first name is that a user will put all kinds of data in there that is not a first name. They'll put their whole name, their underwear size, a novel about what they did last summer and so on. Thus, you want to strive to enforce that the data is as correct as possible. It is a mistake to assume that the application is the sole protector against bad data. You want users to tell you that they had a problem stuffing War in Peace into a given column.
Thus, the most important question is, "What is the most appropriate value for the data being stored?" Ideally, you would use an int and a check constraint to ensure that the values have an appropriate range (e.g. greater than zero, less than a billion etc.). Unfortunately, this is one of MySQL's greatest weakness: it does not honor check constraints. That simply means you must implement those integrity checks in triggers which admittedly is more cumbersome.
Will the difference between an int (4 bytes) make an appreciable difference to a tinyint (1 byte)? Obviously, it depends on the amount of data. If you will have no more than 10 rows, the answer is obviously no. If you will have 10 billion rows, the answer is obviously "Yes". However, IMO, this is premature optimization. It is far better to focus on ensuring correctness first.
For text, you should ask whether your data should support Chinese, Japanese or non-ANSI values (i.e., should you use nvarchar or varchar)? Does this value represent a real world code like a currency code, or bank code which has a specific specification?
Not so sure in MySQL, but in MS SQL it only makes a difference for sufficiently large databases. Typically, I like to use smaller fields for a) the space saving (it never hurts to practice good habits) and b) for the implied validation (if you know a certain field should never be more than 10 characters, why allow eleven, let alone 250?).
I thinks Rudie is wrong, not all INTs are 4 bytes... in MySQL you have:
tinyint = 1 byte,
smallint = 2 bytes,
mediumint = 3 bytes,
int = 4 bytes,
bigint = 8 bytes.
I think Rudie refers to the "display with" that is the number you put between parenthesis when you are creating a column, e.g.:
age INT(3)
You're telling to the RDBMS just to SHOW no more than 3 numbers.
And VARCHARs are (variable length charcter string) so if you declare let's say name varchar(5000) and you store a name like "Mario" you only are using 7 bytes (5 for the data and 2 for the length of the value).
The correct field size serves to limit the bad data that can be put in. For instance suppose you have a phone number field. If you allow 250 characters, you will often end up with things like the following in the phone field (an example not taken at random):
Call the good-looking blonde secretary instead.
So first limiting the length is part of how we enforce data integrity rules. As such it is critical.
Second, there is only so much space on a datapage and while some databases will allow you to create tables where the potential record is longer than the width of the data page, they often will not allow you to actually exceed it when storing the data. This can lead to some very hard to find bugs when suddenly one record can't be saved. I don't know about MySql and whether it does this but I know SQL Server does and it is very hard to figure out what is wrong. So making data the correct size can be critical to preventing bugs.

When setting MySQL schema, why use certain types?

When I'm setting up a MySQL table, it asks me to define the name of the column, type of input, and length. My assumption, without having read anything about it, is that it's for minimization. Specify the smallest possible int/smallint/tinyint for your needs, and it will reduce overhead of some sort. If it's all positives, make it unsigned to double your space, etc.
What happens if I just make every field a varchar-200 characters? When/why is this bad, what will I miss out on, and when will any inefficiencies manifest themselves? 100k records?
I think about this every time I set up a DB, but I haven't built anything to scale enough where I've ever had my scheme setup inappropriately, either too "strict/small" or "loose/big". Can someone confirm that I'm making good assumptions about speed and efficiency?
Thanks!
Data types not only optimize storage, but how data is indexed. As your databases get bigger, it will become apparent that it's quicker to search for all the records that have a 1 in an integer field than those that have a "1" in a varchar field. This becomes especially important when you're joining data from more than one table and your database engine is having to do this sort of thing repeatedly. (Daren also rightly points out below that it's important that the types of the fields you're matching on are identical as well.)
The level at which these inefficiencies become an issue depends greatly on your hardware and your application design. We have big enough iron these days that if you're building moderate-scale apps, you may not see an appreciable difference. (Aside from feeling a little bit guilty about your database design!) But establishing good habits on small projects makes the bigger ones easier when they come along.
If you have two columns as varchar and put in the values 10 and 20 and add them, you'll get 1020 instead of 30 which you'd likely expect.
Sure, you could save everything as VARCHAR strings. But you'd be giving up a lot of functionality provided by the database engine.
You should choose the database type that most closely matches the intended use of the column. For example, using DATE or DATETIME to store dates provides you with all sorts of date/time functions that you don't get with basic VARCHAR types.
Likewise, fields used to count things or provide simple unique IDs should be INT or one of its related types. Also bear in mind that an INT occupies only 4 bytes, whereas a 9-digit string uses at least 9 bytes.
For character data, it's wise to use NVARCHAR for internationalized values that users in any locale are going to enter (esp. names and locations). If you know the text is limited to US or internal use only, VARCHAR is safe.