MySQL : Table optimization word 'guest' or memberid in the column - mysql

This question is for MySQL (it allows many NULLs in the column which is UNIQUE, so the solution for my question could be slightly different).
There are two tables: members and Table2.
Table members has:
memberid char(20), it's a primary key. (Please do not recommend to use int(11) instead of char(20) for memberid, I can't change it, it contains exactly 20 symbols).
Table2 has:
CREATE TABLE IF NOT EXISTS `Table2`
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
memberid varchar(20) NOT NULL,
`Time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
status tinyint(4) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
Table2.memberid is a word 'guest' (could be repeated many times) or a value from members.memberid (it also could be repeated many times). Any value from Table2.memberid column (if not 'guest') exists in members.memberid column. Again, members.memberid is unique. Table2.memberid, even excluding words 'guest' is not unique.
So, Table2.memberid column looks like:
'guest'
'lkjhasd3lkjhlkjg8sd9'
'kjhgbkhgboi7sauyg674'
'guest'
'guest'
'guest'
'lkjhasd3lkjhlkjg8sd9'
Table2 has INSERTS and UPDATES only. It updates only status. Criteria for updating status: set status=0 WHERE memberid='' and status=1. So, it could be updated once or not updated at all. As result, the number of UPDATES is less or equal (by statistics it is twice less) than number of INSERTS.
The question is only about optimization.
The question could be splitted as:
1) Do you HIGHLY recommend to replace the word 'guest' to NULL or to a special 'xxxxxyyyyyzzzzz00000' (20 symbols like a 'very special and reserved' string) so you can use chars(20) for Table2.memberid, because all values are char(20)?
2) What about using a foreign key? I can't use it because of the value 'guest'. That value can NOT be in members.memberid column.
Using another words, I need some help to decide:
wether I can use 'guest' (I like that word) -vs- choosing 20-char-reserved-string so I can use char(20) instead of varchar(20) -vs- keeping NULLs instead of 'guest',
all values, except 'guest' are actually foreign keys. Is there any possible way to use this information for increasing the performance?
That table is used pretty often so I have to build Table2 as good as I can. Any idea is highly appreciated.
Thank you.
Added:
Well... I think I have found a good solution, that allows me to treat memberid as a foreign key.

1) Do you HIGHLY recommend to replace the word 'guest' to NULL or to a
special 'xxxxxyyyyyzzzzz00000' (20 symbols like a 'very special and
reserved' string) so you can use chars(20) for Table2.memberid,
because all values are char(20)?
Mixing values from different domains always causes trouble. The best thing to do is fix the underlying stuctural problem. Bad design can be really expensive to work around, and it can be really expensive to fix.
Here's the issue in a nutshell. The simplest data integrity constraint for this kind of issue is a foreign key constraint. You can't use one, because "guest" isn't a memberid. (Member ids are from one domain; "guest" isn't part of that domain; you're mixing values from two domains.) Using NULL to identify a guest doesn't help much; you can't distinguish guests from members whose memberid is missing. (Using NULL to identify anything is usually a bad idea.)
If you can use a special 20-character member id to identify all guests, it might be wise to do so. You might be lucky, in that "guest" is five letters. If you can use "guestguestguestguest" for the guests without totally screwing your application logic, I'd really consider that first. (But, you said that seems to treat guests as logged in users, which I think makes things break.)
Retrofitting a "users" supertype is possible, I think, and this might prove to the the best overall solution. The supertype would let you treat members and guests as the same sometimes (because they're not utterly different), and different at other times (because they're not entirely the same). A supertype also allows both individuals (members) and aggregate users (guests all lumped together) without undue strain. And it would unify the two domains, so you could use foreign key constraints for members. But it would require changing the program logic.
In Table2 (and do find a better name than that, please), an index on memberid or a composite index on memberid and status will perform just about as well as you can expect. I'm not sure whether a composite index will help; "status" only has two values, so it's not very selective.
all values, except 'guest' are actually foreign keys. Is there any
possible way to use this information for increasing the performance?
No, they're not foreign keys. (See above.) True foreign keys would help with data integrity, but not with SELECT performance.
"Increasing the performance" is pretty much meaningless. Performance is a balancing act. If you want to increase performance, you need to specify which part you want to improve. If you want faster inserts, drop indexes and integrity constraints. (Don't do that.) If you want faster SELECT statements, build more indexes. (But more indexes slows the INSERTS.)
You can speed up all database performance by moving to hardware that speeds up all database performance. (ahem) Faster processor, faster disks, faster disk subsystem, more memory (usually). Moving critical tables or indexes to a solid-state disk might blow your socks off.
Tuning your server can help. But keep an eye on overall performance. Don't get so caught up in speeding up one query than you degrade performance in all the others. Ideally, write a test suite and decide what speed is good enough before you start testing. For example, say you have one query that takes 30 seconds. What's acceptable improvement? 20 seconds? 15 seconds? 2 milliseconds sounds good, but is an unlikely target for a query that takes 30 seconds. (Although I've seen that kind of performance increase by moving to better table and index structure.)

Related

How to design table with primary key, index, unique in SQL [duplicate]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Here we go again, the old argument still arises...
Would we better have a business key as a primary key, or would we rather have a surrogate id (i.e. an SQL Server identity) with a unique constraint on the business key field?
Please, provide examples or proof to support your theory.
Just a few reasons for using surrogate keys:
Stability: Changing a key because of a business or natural need will negatively affect related tables. Surrogate keys rarely, if ever, need to be changed because there is no meaning tied to the value.
Convention: Allows you to have a standardized Primary Key column naming convention rather than having to think about how to join tables with various names for their PKs.
Speed: Depending on the PK value and type, a surrogate key of an integer may be smaller, faster to index and search.
Both. Have your cake and eat it.
Remember there is nothing special about a primary key, except that it is labelled as such. It is nothing more than a NOT NULL UNIQUE constraint, and a table can have more than one.
If you use a surrogate key, you still want a business key to ensure uniqueness according to the business rules.
It appears that no one has yet said anything in support of non-surrogate (I hesitate to say "natural") keys. So here goes...
A disadvantage of surrogate keys is that they are meaningless (cited as an advantage by some, but...). This sometimes forces you to join a lot more tables into your query than should really be necessary. Compare:
select sum(t.hours)
from timesheets t
where t.dept_code = 'HR'
and t.status = 'VALID'
and t.project_code = 'MYPROJECT'
and t.task = 'BUILD';
against:
select sum(t.hours)
from timesheets t
join departents d on d.dept_id = t.dept_id
join timesheet_statuses s on s.status_id = t.status_id
join projects p on p.project_id = t.project_id
join tasks k on k.task_id = t.task_id
where d.dept_code = 'HR'
and s.status = 'VALID'
and p.project_code = 'MYPROJECT'
and k.task_code = 'BUILD';
Unless anyone seriously thinks the following is a good idea?:
select sum(t.hours)
from timesheets t
where t.dept_id = 34394
and t.status_id = 89
and t.project_id = 1253
and t.task_id = 77;
"But" someone will say, "what happens when the code for MYPROJECT or VALID or HR changes?" To which my answer would be: "why would you need to change it?" These aren't "natural" keys in the sense that some outside body is going to legislate that henceforth 'VALID' should be re-coded as 'GOOD'. Only a small percentage of "natural" keys really fall into that category - SSN and Zip code being the usual examples. I would definitely use a meaningless numeric key for tables like Person, Address - but not for everything, which for some reason most people here seem to advocate.
See also: my answer to another question
Surrogate key will NEVER have a reason to change. I cannot say the same about the natural keys. Last names, emails, ISBN nubmers - they all can change one day.
Surrogate keys (typically integers) have the added-value of making your table relations faster, and more economic in storage and update speed (even better, foreign keys do not need to be updated when using surrogate keys, in contrast with business key fields, that do change now and then).
A table's primary key should be used for identifying uniquely the row, mainly for join purposes. Think a Persons table: names can change, and they're not guaranteed unique.
Think Companies: you're a happy Merkin company doing business with other companies in Merkia. You are clever enough not to use the company name as the primary key, so you use Merkia's government's unique company ID in its entirety of 10 alphanumeric characters.
Then Merkia changes the company IDs because they thought it would be a good idea. It's ok, you use your db engine's cascaded updates feature, for a change that shouldn't involve you in the first place. Later on, your business expands, and now you work with a company in Freedonia. Freedonian company id are up to 16 characters. You need to enlarge the company id primary key (also the foreign key fields in Orders, Issues, MoneyTransfers etc), adding a Country field in the primary key (also in the foreign keys). Ouch! Civil war in Freedonia, it's split in three countries. The country name of your associate should be changed to the new one; cascaded updates to the rescue. BTW, what's your primary key? (Country, CompanyID) or (CompanyID, Country)? The latter helps joins, the former avoids another index (or perhaps many, should you want your Orders grouped by country too).
All these are not proof, but an indication that a surrogate key to uniquely identify a row for all uses, including join operations, is preferable to a business key.
I hate surrogate keys in general. They should only be used when there is no quality natural key available. It is rather absurd when you think about it, to think that adding meaningless data to your table could make things better.
Here are my reasons:
When using natural keys, tables are clustered in the way that they are most often searched thus making queries faster.
When using surrogate keys you must add unique indexes on logical key columns. You still need to prevent logical duplicate data. For example, you can’t allow two Organizations with the same name in your Organization table even though the pk is a surrogate id column.
When surrogate keys are used as the primary key it is much less clear what the natural primary keys are. When developing you want to know what set of columns make the table unique.
In one to many relationship chains, the logical key chains. So for example, Organizations have many Accounts and Accounts have many Invoices. So the logical-key of Organization is OrgName. The logical-key of Accounts is OrgName, AccountID. The logical-key of Invoice is OrgName, AccountID, InvoiceNumber.
When surrogate keys are used, the key chains are truncated by only having a foreign key to the immediate parent. For example, the Invoice table does not have an OrgName column. It only has a column for the AccountID. If you want to search for invoices for a given organization, then you will need to join the Organization, Account, and Invoice tables. If you use logical keys, then you could Query the Organization table directly.
Storing surrogate key values of lookup tables causes tables to be filled with meaningless integers. To view the data, complex views must be created that join to all of the lookup tables. A lookup table is meant to hold a set of acceptable values for a column. It should not be codified by storing an integer surrogate key instead. There is nothing in the normalization rules that suggest that you should store a surrogate integer instead of the value itself.
I have three different database books. Not one of them shows using surrogate keys.
I want to share my experience with you on this endless war :D on natural vs surrogate key dilemma. I think that both surrogate keys (artificial auto-generated ones) and natural keys (composed of column(s) with domain meaning) have pros and cons. So depending on your situation, it might be more relevant to choose one method or the other.
As it seems that many people present surrogate keys as the almost perfect solution and natural keys as the plague, I will focus on the other point of view's arguments:
Disadvantages of surrogate keys
Surrogate keys are:
Source of performance problems:
They are usually implemented using auto-incremented columns which mean:
A round-trip to the database each time you want to get a new Id (I know that this can be improved using caching or [seq]hilo alike algorithms but still those methods have their own drawbacks).
If one-day you need to move your data from one schema to another (It happens quite regularly in my company at least) then you might encounter Id collision problems. And Yes I know that you can use UUIDs but those lasts requires 32 hexadecimal digits! (If you care about database size then it can be an issue).
If you are using one sequence for all your surrogate keys then - for sure - you will end up with contention on your database.
Error prone. A sequence has a max_value limit so - as a developer - you have to put attention to the following points:
You must cycle your sequence ( when the max-value is reached it goes back to 1,2,...).
If you are using the sequence as an ordering (over time) of your data then you must handle the case of cycling (column with Id 1 might be newer than row with Id max-value - 1).
Make sure that your code (and even your client interfaces which should not happen as it supposed to be an internal Id) supports 32b/64b integers that you used to store your sequence values.
They don't guarantee non duplicated data. You can always have 2 rows with all the same column values but with a different generated value. For me this is THE problem of surrogate keys from a database design point of view.
More in Wikipedia...
Myths on natural keys
Composite keys are less inefficient than surrogate keys. No! It depends on the used database engine:
Oracle
MySQL
Natural keys don't exist in real-life. Sorry but they do exist! In aviation industry, for example, the following tuple will be always unique regarding a given scheduled flight (airline, departureDate, flightNumber, operationalSuffix). More generally, when a set of business data is guaranteed to be unique by a given standard then this set of data is a [good] natural key candidate.
Natural keys "pollute the schema" of child tables. For me this is more a feeling than a real problem. Having a 4 columns primary-key of 2 bytes each might be more efficient than a single column of 11 bytes. Besides, the 4 columns can be used to query the child table directly (by using the 4 columns in a where clause) without joining to the parent table.
Conclusion
Use natural keys when it is relevant to do so and use surrogate keys when it is better to use them.
Hope that this helped someone!
Alway use a key that has no business meaning. It's just good practice.
EDIT: I was trying to find a link to it online, but I couldn't. However in 'Patterns of Enterprise Archtecture' [Fowler] it has a good explanation of why you shouldn't use anything other than a key with no meaning other than being a key. It boils down to the fact that it should have one job and one job only.
Surrogate keys are quite handy if you plan to use an ORM tool to handle/generate your data classes. While you can use composite keys with some of the more advanced mappers (read: hibernate), it adds some complexity to your code.
(Of course, database purists will argue that even the notion of a surrogate key is an abomination.)
I'm a fan of using uids for surrogate keys when suitable. The major win with them is that you know the key in advance e.g. you can create an instance of a class with the ID already set and guaranteed to be unique whereas with, say, an integer key you'll need to default to 0 or -1 and update to an appropriate value when you save/update.
UIDs have penalties in terms of lookup and join speed though so it depends on the application in question as to whether they're desirable.
Using a surrogate key is better in my opinion as there is zero chance of it changing. Almost anything I can think of which you might use as a natural key could change (disclaimer: not always true, but commonly).
An example might be a DB of cars - on first glance, you might think that the licence plate could be used as the key. But these could be changed so that'd be a bad idea. You wouldnt really want to find that out after releasing the app, when someone comes to you wanting to know why they can't change their number plate to their shiny new personalised one.
Always use a single column, surrogate key if at all possible. This makes joins as well as inserts/updates/deletes much cleaner because you're only responsible for tracking a single piece of information to maintain the record.
Then, as needed, stack your business keys as unique contraints or indexes. This will keep you data integrity intact.
Business logic/natural keys can change, but the phisical key of a table should NEVER change.
Case 1: Your table is a lookup table with less than 50 records (50 types)
In this case, use manually named keys, according to the meaning of each record.
For Example:
Table: JOB with 50 records
CODE (primary key) NAME DESCRIPTION
PRG PROGRAMMER A programmer is writing code
MNG MANAGER A manager is doing whatever
CLN CLEANER A cleaner cleans
...............
joined with
Table: PEOPLE with 100000 inserts
foreign key JOBCODE in table PEOPLE
looks at
primary key CODE in table JOB
Case 2: Your table is a table with thousands of records
Use surrogate/autoincrement keys.
For Example:
Table: ASSIGNMENT with 1000000 records
joined with
Table: PEOPLE with 100000 records
foreign key PEOPLEID in table ASSIGNMENT
looks at
primary key ID in table PEOPLE (autoincrement)
In the first case:
You can select all programmers in table PEOPLE without use of join with table JOB, but just with: SELECT * FROM PEOPLE WHERE JOBCODE = 'PRG'
In the second case:
Your database queries are faster because your primary key is an integer
You don't need to bother yourself with finding the next unique key because the database itself gives you the next autoincrement.
Surrogate keys can be useful when business information can change or be identical. Business names don't have to be unique across the country, after all. Suppose you deal with two businesses named Smith Electronics, one in Kansas and one in Michigan. You can distinguish them by address, but that'll change. Even the state can change; what if Smith Electronics of Kansas City, Kansas moves across the river to Kansas City, Missouri? There's no obvious way of keeping these businesses distinct with natural key information, so a surrogate key is very useful.
Think of the surrogate key like an ISBN number. Usually, you identify a book by title and author. However, I've got two books titled "Pearl Harbor" by H. P. Willmott, and they're definitely different books, not just different editions. In a case like that, I could refer to the looks of the books, or the earlier versus the later, but it's just as well I have the ISBN to fall back on.
On a datawarehouse scenario I believe is better to follow the surrogate key path. Two reasons:
You are independent of the source system, and changes there --such as a data type change-- won't affect you.
Your DW will need less physical space since you will use only integer data types for your surrogate keys. Also your indexes will work better.
As a reminder it is not good practice to place clustered indices on random surrogate keys i.e. GUIDs that read XY8D7-DFD8S, as they SQL Server has no ability to physically sort these data. You should instead place unique indices on these data, though it may be also beneficial to simply run SQL profiler for the main table operations and then place those data into the Database Engine Tuning Advisor.
See thread # http://social.msdn.microsoft.com/Forums/en-us/sqlgetstarted/thread/27bd9c77-ec31-44f1-ab7f-bd2cb13129be
This is one of those cases where a surrogate key pretty much always makes sense. There are cases where you either choose what's best for the database or what's best for your object model, but in both cases, using a meaningless key or GUID is a better idea. It makes indexing easier and faster, and it is an identity for your object that doesn't change.
In the case of point in time database it is best to have combination of surrogate and natural keys. e.g. you need to track a member information for a club. Some attributes of a member never change. e.g Date of Birth but name can change.
So create a Member table with a member_id surrogate key and have a column for DOB.
Create another table called person name and have columns for member_id, member_fname, member_lname, date_updated. In this table the natural key would be member_id + date_updated.
Horse for courses. To state my bias; I'm a developer first, so I'm mainly concerned with giving the users a working application.
I've worked on systems with natural keys, and had to spend a lot of time making sure that value changes would ripple through.
I've worked on systems with only surrogate keys, and the only drawback has been a lack of denormalised data for partitioning.
Most traditional PL/SQL developers I have worked with didn't like surrogate keys because of the number of tables per join, but our test and production databases never raised a sweat; the extra joins didn't affect the application performance. With database dialects that don't support clauses like "X inner join Y on X.a = Y.b", or developers who don't use that syntax, the extra joins for surrogate keys do make the queries harder to read, and longer to type and check: see #Tony Andrews post. But if you use an ORM or any other SQL-generation framework you won't notice it. Touch-typing also mitigates.
Maybe not completely relevant to this topic, but a headache I have dealing with surrogate keys. Oracle pre-delivered analytics creates auto-generated SKs on all of its dimension tables in the warehouse, and it also stores those on the facts. So, anytime they (dimensions) need to be reloaded as new columns are added or need to be populated for all items in the dimension, the SKs assigned during the update makes the SKs out of sync with the original values stored to the fact, forcing a complete reload of all fact tables that join to it. I would prefer that even if the SK was a meaningless number, there would be some way that it could not change for original/old records. As many know, out-of-the box rarely serves an organization's needs, and we have to customize constantly. We now have 3yrs worth of data in our warehouse, and complete reloads from the Oracle Financial systems are very large. So in my case, they are not generated from data entry, but added in a warehouse to help reporting performance. I get it, but ours do change, and it's a nightmare.

MySQL does it matter which column I COUNT() by

I've been wondering about this for quite some time. Is it better to do this where the primary key ticket_id is counted:
SELECT COUNT(ticket_id)
FROM tickets
WHERE ticket_country_id = 238
Or to do this:
SELECT COUNT(ticket_country_id)
FROM tickets
WHERE ticket_country_id = 238
In this case ticket_country_id is an indexed foreign key, but we could also assume it's just a non-indexed column (perhaps the answer would be different for non-indexed columns)
In other words, does it matter that I am calling on another column for the COUNT()?
Obviously the performance saving would probably be small, but I like to do things the best way.
Yes,it can matter. Select count(*) allows the DB to use whatever resources make sense and are most efficient. It can do as table scan, use a primary key or other index to answer your question.
Count(something-else) means count the non null values. Again, the DB can use several methods such as indexes if such thing are available but you are then asking a different question.
As is often the case with SQL it's better to ask the question you want answers to than play silly games trying to game the system for a few milliseconds here and there.
That helps your future colleagues too by clearly stating the thing you are trying to do.

Email address as select index in mysql for huge table query speed

I'm wondering about using emails for indexing. I realise that this is sub-optimal and that it's better to use an auto-incremented primary key. But in this case I'm trying to develop a lite application that does not require account registration to use.
SELECT account_id, account_balance, account_payments_received
FROM accounts
WHERE account_email = ?
LIMIT 1
This works ok at the moment with few users. But I'm concerned about when it reaches a million or more. Is there any way to index emails quickly?
I was thinking maybe I could use the first and second characters as keys? Maybe develop an index number for a=1, b=2, c=3 and so on.....
What do you guys suggest?
1) You should keep a primary key with auto_increment, because it will provide you efficiency at the time of join with other tables.
2) Keep account_email field varchar(255) instead of char(255), so that can get free bytes back. Even varchar(100) will be enough.
3) create partially index on this field as per below command.
alter table accounts add index idx_account_email(account_email(50));
Note: varchar(50) will cover almost 99% emails.
I think you will find that any modern database will be able to perform this query (particuarily if it does NOT use LIKE) even on a table with a million rows in a fraction of a second. Just make sure you have an index on the column. i would add an autoincrement field also though as will always be simpler and quicker to use an integer to get a row.
Waht you are engaged in is premature optimisation.

web application user table primary key: surrogate key vs username vs email vs customer Id

I am trying to design an ecommerce web application in MySQL and I am having problems choosing the correct primary keys for the user table. the example given is just a sample example for illustration.
user table have following definition
CREATE TABLE IF NOT EXISTS `mydb`.`user` (
`id` INT NOT NULL ,
`username` VARCHAR(25) NOT NULL ,
`email` VARCHAR(25) NOT NULL ,
`external_customer_id` INT NOT NULL ,
`subscription_end_date` DATETIME NULL ,
`column_1` VARCHAR(45) NULL ,
`column_2` VARCHAR(45) NULL ,
`colum_3` VARCHAR(45) NULL ,
PRIMARY KEY (`id`) ,
UNIQUE INDEX `username_UNIQUE` (`username` ASC) ,
UNIQUE INDEX `email_UNIQUE` (`email` ASC) ,
UNIQUE INDEX `customer_id_UNIQUE` (`external_customer_id` ASC) )
ENGINE = InnoDB
I am facing following issues with the primary key candidate columns:
Id column
Pros
No business meaning (stable primary key)
faster table joins
compacter index
cons
not a "natural" key
All attribute table(s) must be joined with the "master" user table, thus non-joining direct queries are not possible
causes less "natural" SQL queries
Leaks information: a user can figure out the number of registered user if start value is 0 (changing the start value sort this out)
ii) A user register a profile as user_A at time_X and some time later as user_B at time_Y will be easily able to calculate the number of registered users over the time period ((Id for user_B) - (Id for user_A)/(time_Y - time_X))
email column
Pros
None
Cons
a user should be able to change the email address. Not suitable for primary key
username column
Pros
a "natural" primary key
Less table joins
simpler and more "natural" queries
Cons
varchar column is slower when joining tables
an index on a varchar column is less compact than int column index
very difficult to change username since foreign keys are dependent on the value. Solution: "Syncing" all foreign keys on application or not allowing a user to change the username, .e.g. a user should delete the profile a register new
external_customer column
pros
can be used as an external reference for a customer and holds no information (maybe non-editable username can be used instead? )
cons
might leaks information if it is auto incremental (if possible)
problematic to generate unqiue value if an auto incremental surrogate id is already in use since MySQL innodb engine does not multiple auto_increment columns in same table
what are the common practice when choosing user table primary keys for a
scalable ecommerce web application? all feedback appreciated
I don't have anything to say about some of your analysis. If I've cut some of your pros or cons, that only means I don't think I have anything useful to add.
Id column
Pros
No business meaning (stable primary key)
faster table joins
compacter index
First, any column or set of columns declared NOT NULL UNIQUE has all the properties of a primary key. You can use any of them as the target for a foreign key reference, which is what all this is really about.
In your case, your structure allows 4 columns to be targets of a foreign key reference: id, username, email, and external_customer_id. You don't have to use the same one all the time. It might make sense to use id for 90% of your FK references, and email for 10% of them.
Stability doesn't have anything to do with whether a column has business meaning. Stability has to do with how often, and under what circumstances, a value might change. "Stable" doesn't mean "immutable" unless you're running Oracle. (Oracle can't do ON UPDATE CASCADE.)
Depending on your table structure and indexing, a natural key might perform faster. Natural keys make some joins unnecessary. I did tests before I built our production database. It's probably going to be decades before we reach the point that joins on ID numbers will outperform fewer joins and natural keys. I've written about those tests either on SO or on DBA.
You have three other unique indexes. (Good for you. I think at least 90% of the people who build a database don't get that right.) So it's not just that an index on an ID number is more compact than either of those three; it's also an additional index. (In this table.)
email column
Pros
None
An email address can be considered stable and unique. You can't stop people from sharing email addresses, regardless of whether it's the target for a foreign key reference.
But email addresses can be "lost". In the USA, most university students lose their *.edu email addresses with a year or so of graduation. If your email address comes through a domain that you're paying for, and you stop paying, the email address goes away. I imagine it's possible for email address like those to be given to new users. Whether that creates an unbearable burden is application-dependent.
Cons
a user should be able to change the email address. Not suitable for primary key
All values in a SQL database can be changed. It's only unsuitable if your environment doesn't let your dbms honor an ON UPDATE CASCADE declaration in a timely manner. My environment does. (But I run PostgreSQL on decent, unshared hardware.) YMMV.
username column
Pros
a "natural" primary key
Less table joins
simpler and more "natural" queries
Fewer joins is an important point. I have been on consulting gigs where I've seen the mindless use of ID numbers made people write queries with 40+ joins. Judicious use of natural keys eliminated up to 75% of them.
It's not important to always use surrogate keys as the target for your foreign keys (unless Oracle) or to always use natural keys as the target. It's important to think.
Cons
varchar column is slower when joining tables
an index on a varchar column is less compact than int column index
You can't really say that joining on a varchar() is slower without qualifying that claim. The fact is that, although most joins on varchar() are slower than joins on id numbers, they're not necessarily so slow that you can't use them. If a query takes 4ms with id numbers, and 6ms with varchar(), I don't think that's a good reason to disqualify the varchar(). Also, using a natural key will eliminate a lot of joins, so overall system response might be faster. (Other things being equal, 40 4ms joins will underperform 10 6ms joins.)
I can't recall any case in my database career (25+ years) where the width of an index was the deciding factor in choosing the target for a foreign key.
external_customer column
pros
can be used as an external reference for a customer and holds no information (maybe non-editable username can be used instead? )
There are actually few systems that let me change my username. Most will let me change my real name (I think), but not my username. I think an uneditable username is completely reasonable.
In general, web applications try to keep their database schema away from the customer - including primary keys. I think you're conflating your schema design with authentication methods - there's nothing stopping you from allowing users to log in with their email address, even if your database design uses an integer to uniquely identify them.
Whenever I've designed systems like this, I've used an ID column - either integer or GUID for the primary key. It's fast, doesn't change due to pesky real life situations, and is a familiar idiom to developers.
I've then worked out the best authentication scheme for the app in hand - most people expect to login with their email address these days, so I'd stick with that. Of course, you could also let them login with their Facebook, Twitter, or Google accounts. Has nothing to do with my primary key, though...
I think that with username column you also have this cons:
A user should be able to change the username. Not suitable for primary key.
So for the same reason that you won't use the email I won't use the username. For me the internal user integer id is the best approach.

Can you have multiple Keys in SQL and why would you want that?

Is there a reason why you would want to have multiple KEYs in a TABLE? What is the point of having multiple KEYs in one table?
Here is an example that I found:
CREATE TABLE orders(
id INT UNSIGNED NOT NULL AUTO INCREMENT,
user_id INT UNSIGNED NOT NULL,
transaction_id VARCHAR(19) NOT NULL,
payment_status VARCHAR(15) NOT NULL,
payment_amount DECIMAL(15) NOT NULL,
payment_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY(id),
KEY(user_id),
)
Also, you'll notice the DBase programmer doesn't make transaction_id a KEY. Is there a reason for this?
KEY in MySQL is an alternate syntax for "index".
Indexes are common across databases, but they aren't covered by ANSI as of yet -- it's pure miracle things are as similar as they are. It can be common to have more than one index associated to a table -- because indexes improve data retrieval at the cost of update/delete/insert speed.
Be aware that MySQL (5.x?) automatically creates an index if one doesn't already exist for the primary key of a table.
There are several possible reasons.
Enhance search performance (when the WHERE clause uses KEY fields, it performs faster)
Restrain table contents (when using a UNIQUE key for a column, it can't have the same value twice or more on it)
These are the most common reasons.
In SQL you may only have one PRIMARY KEY per table.
KEY(foo) is bogus syntax in standard SQL. In MySQL KEY foo is a poorly named synonym for INDEX foo and does not impose a UNIQUE constraint. (It is poorly named because it does not actually relate to the functioning of a key.)
There may be multiple UNIQUE INDICES which can play the role of "candidate keys". (A unique constraint can be specified without an associated INDEX, but the point of a "key" is generally a quick look-up.) The point of a PRIMARY KEY is to uniquely identify a single record and is almost exclusively INDEX-backed and may even be directly related to the clustering of the data.
Only the minimal amount of [unique] INDICES required to ensure data validity and meet performance requirements should be used -- they impose performance penalties on the query engine as well as have additional update and maintenance costs.
Non-unique INDEX's (INDEX foo, or in the case of MySQL, KEY foo) are purely to allow the database to optimize queries. They do not really map to "keys" and may be referred to as "covering indices"; if selected by the query planner these indices can aid in performance even though they add nothing to the logical model itself. (For performance reasons, a database engine may require that FOREIGN KEYS are covered by indices.)
In this case, creating an INDEX (don't think "KEY"!) on user_id will generally (greatly) speed up queries with clauses like:
... WHERE user_id = {somenumber}
Without the INDEX the above query would require a FULL TABLE SCAN (e.g. read through all records).
While I do not know why transaction_id is not made an index, it might not be required (or even detrimental for the given access patterns) -- consider the case where every query that needs to be fast either:
Does not use transaction_id or;
Also has a user_id = ... or other restriction that can utilize an INDEX. That is, in a case like WHERE user_id = ... AND transaction_id = ... the query planner will likely first find the records for the matched user and then look for the matching transaction_id -- it still has to do a SCAN, but only over a much smaller data-set than the original table. Only a plain WHERE transaction_id = ... would necessarily require a FULL TABLE SCAN.
If in doubt, use EXPLAIN -- or other query analyzer -- and see what MySQL thinks it should do. As a last note, sometimes estimated query execution plans may differ from actual execution plans and outdated statistics can make the query planner choose a non-ideal plan.
Happy coding.
"Keys" might refer to one of:
Index (search optimization)
Constraint (e.g. foreign key, primary key)
You may want multiple because you need to implement more than one of these features in a single table. It's actually quite common.
In database theory, a key is a constraint that enforces uniqueness. A relvar (analogous to a SQL table) may indeed have more than one candidate key. A classic example is the set of chemical elements, for which name, symbol, atomic number and atomic weight are all keys (and folk will still want to add their own surrogate key to it ;)
MySQL continues a long tradition of abuse of the word KEY by making it a synonym for INDEX. Clearly, a MySQL table may have as many indexes as deemed necessary for performance for a given set of circumstances.
From your SQL DDL, it seems clear the ID is a surrogate key, so we must look for a natural key with not much to go on. While transaction_id may be a candidate, it is not inconceivable that an order can involve more than one transaction. In practical terms, I think an auditor would be suspicious of multiple orders made by the same user simultaneously, therefore suggest the compound of user_id and payment_time should be a key. However, if transaction_id is not a key then the table would not be fully normalized. Therefore, I'd give the designer the benefit of the doubt and assume transaction_id is also a key.