Strings as Primary Keys in MYSQL Database [closed] - mysql

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
This post was edited and submitted for review 10 days ago.
Improve this question
I am not very familiar with databases and the theories behind how they work. Is it any slower from a performance standpoint (inserting/updating/querying) to use Strings for Primary Keys than integers?
For Example I have a database that would have about 100 million row like mobile number, name and email. mobile number and email would be unique. so can I have the mobile number or email as a primary key,
well it effect my query performance when I search based on email or mobile number. similarly the primary key well be used as foreign key in 5 to 6 tables or even more.
I am using MySQL database

Technically yes, but if a string makes sense to be the primary key then you should probably use it. This all depends on the size of the table you're making it for and the length of the string that is going to be the primary key (longer strings == harder to compare). I wouldn't necessarily use a string for a table that has millions of rows, but the amount of performance slowdown you'll get by using a string on smaller tables will be minuscule to the headaches that you can have by having an integer that doesn't mean anything in relation to the data.

Another issue with using Strings as a primary key is that because the index is constantly put into sequential order, when a new key is created that would be in the middle of the order the index has to be resequenced... if you use an auto number integer, the new key is just added to the end of the index.

Inserts to a table having a clustered index where the insertion occurs in the middle of the sequence DOES NOT cause the index to be rewritten. It does not cause the pages comprising the data to be rewritten. If there is room on the page where the row will go, then it is placed in that page. The single page will be reformatted to place the row in the right place in the page. When the page is full, a page split will happen, with half of the rows on the page going to one page, and half going on the other. The pages are then relinked into the linked list of pages that comprise a tables data that has the clustered index. At most, you will end up writing 2 pages of database.

Strings are slower in joins and in real life they are very rarely really unique (even when they are supposed to be). The only advantage is that they can reduce the number of joins if you are joining to the primary table only to get the name. However, strings are also often subject to change thus creating the problem of having to fix all related records when the company name changes or the person gets married. This can be a huge performance hit and if all tables that should be related somehow are not related (this happens more often than you think), then you might have data mismatches as well. An integer that will never change through the life of the record is a far safer choice from a data integrity standpoint as well as from a performance standpoint. Natural keys are usually not so good for maintenance of the data.
I also want to point out that the best of both worlds is often to use an autoincrementing key (or in some specialized cases, a GUID) as the PK and then put a unique index on the natural key. You get the faster joins, you don;t get duplicate records, and you don't have to update a million child records because a company name changed.

Too many variables. It depends on the size of the table, the indexes, nature of the string key domain...
Generally, integers will be faster. But will the difference be large enough to care? It's hard to say.
Also, what is your motivation for choosing strings? Numeric auto-increment keys are often so much easier as well. Is it semantics? Convenience? Replication/disconnected concerns? Your answer here could limit your options. This also brings to mind a third "hybrid" option you're forgetting: Guids.

It doesn't matter what you use as a primary key so long as it is UNIQUE. If you care about speed or good database design use the int unless you plan on replicating data, then use a GUID.
If this is an access database or some tiny app then who really cares. I think the reason why most of us developers slap the old int or guid at the front is because projects have a way of growing on us, and you want to leave yourself the option to grow.

Don't worry about performance until you have got a simple and sound design that agrees with the subject matter that the data describes and fits well with the intended use of the data. Then, if performance problems emerge, you can deal with them by tweaking the system.
In this case, it's almost always better to go with a string as a natural primary key, provide you can trust it. Don't worry if it's a string, as long as the string is reasonably short, say about 25 characters max. You won't pay a big price in terms of performance.
Do the data entry people or automatic data sources always provide a value for the supposed natural key, or is sometimes omitted? Is it occasionally wrong in the input data? If so, how are errors detected and corrected?
Are the programmers and interactive users who specify queries able to use the natural key to get what they want?
If you can't trust the natural key, invent a surrogate. If you invent a surrogate, you might as well invent an integer. Then you have to worry about whther to conceal the surrogate from the user community. Some developers who didn't conceal the surrogate key came to regret it.

Indices imply lots of comparisons.
Typically, strings are longer than integers and collation rules may be applied for comparison, so comparing strings is usually more computationally intensive task than comparing integers.
Sometimes, though, it's faster to use a string as a primary key than to make an extra join with a string to numerical id table.

Two reasons to use integers for PK columns:
We can set identity for integer field which incremented automatically.
When we create PKs, the db creates an index (Cluster or Non Cluster) which sorts the data before it's stored in the table. By using an identity on a PK, the optimizer need not check the sort order before saving a record. This improves performance on big tables.

Yes, but unless you expect to have millions of rows, not using a string-based key because it's slower is usually "premature optimization." After all, strings are stored as big numbers while numeric keys are usually stored as smaller numbers.
One thing to watch out for, though, is if you have clustered indices on a any key and are doing large numbers of inserts that are non-sequential in the index. Every line written will cause the index to re-write. if you're doing batch inserts, this can really slow the process down.

What is your reason for having a string as a primary key?
I would just set the primary key to an auto incrementing integer field, and put an index on the string field.
That way if you do searches on the table they should be relatively fast, and all of your joins and normal look ups will be unaffected in their speed.
You can also control the amount of the string field that gets indexed. In other words, you can say "only index the first 5 characters" if you think that will be enough. Or if your data can be relatively similar, you can index the whole field.

From performance standpoint - Yes string(PK) will slow down the performance when compared to performance achieved using an integer(PK), where PK ---> Primary Key.
From requirement standpoint - Although this is not a part of your question still I would like to mention. When we are handling huge data across different tables we generally look for the probable set of keys that can be set for a particular table. This is primarily because there are many tables and mostly each or some table would be related to the other through some relation ( a concept of Foreign Key ). Therefore we really cannot always choose an integer as a Primary Key, rather we go for a combination of 3, 4 or 5 attributes as the primary key for that tables. And those keys can be used as a foreign key when we would relate the records with some other table. This makes it useful to relate the records across different tables when required.
Therefore for Optimal Usage - We always make a combination of 1 or 2 integers with 1 or 2 string attributes, but again only if it is required.

I would probably use an integer as your primary key, and then just have your string (I assume it's some sort of ID) as a separate column.
create table sample (
sample_pk INT NOT NULL AUTO_INCREMENT,
sample_id VARCHAR(100) NOT NULL,
...
PRIMARY KEY(sample_pk)
);
You can always do queries and joins conditionally on the string (ID) column (where sample_id = ...).

There could be a very big misunderstanding related to string in the database are. Almost everyone has thought that database representation of numbers are more compact than for strings. They think that in db-s numbers are represented as in the memory. BUT it is not true. In most cases number representation is more close to A string like representation as to other.
The speed of using number or string is more dependent on the indexing then the type itself.

By default ASPNetUserIds are 128 char strings and performance is just fine.
If the key HAS to be unique in the table it should be the Key. Here's why;
primary string key = Correct DB relationships, 1 string key(The primary), and 1 string Index(The Primary).
The other option is a typical int Key, but if the string HAS to be unique you'll still probably need to add an index because of non-stop queries to validate or check that its unique.
So using an int identity key = Incorrect DB Relationships, 1 int key(Primary), 1 int index(Primary), Probably a unique string Index, and manually having to validate the same string doesn't exist(something like a sql check maybe).
To get better performance using an int over a string for the primary key, when the string HAS to be unique, it would have to be a very odd situation. I've always preferred to use string keys. And as a good rule of thumb, don't denormalize a database until you NEED to.

Related

MySQL query and insertion optimisation with varchar(255) UUIDs

I think this question has been asked in some way shape or form but I couldn't find a question that had asked exactly what I wish to understand so I thought I'd put the question here
Problem statement
I have built a web application with a MySQL database of say customer records with an INT(11) id PK AI field and a VARCHAR(255) uuid field. The uuid field is not indexed nor set as unique. The uuid field is used as a public identifier so its part of URLs etc. - e.g. https://web.com/get_customer/[uuid]. This was done because the UUID is 'harder' to guess for a regular John Doe - but understand that it is certainly not 'unguessable' in theory. But the issue now is that as the database is growing larger I have observed that the query to retrieve a particular customer record is taking longer to complete.
My thoughts on how to solve the issue
The solution that is coming to mind is to make the uuid field unique and also index the same. But I've been doing some reading in relation to this and various blog posts, StackOverflow answers on this have described putting indices on UUIDs as being really bad for performance. I also read that it will also increase the time it takes to insert a new customer record into the database as the MySQL database will take time to find the correct location in which to place the record as a part of the index.
The above mentioned https://web.com/get_customer/[uuid] can be accessed without having to authenticate which is why I'm not using the id field for the same. It is possible for me to consider moving to integer based UUIDs (I don't need the UUIDs to be universally unique - they just need to be unique for that particular table) - will that improve the the indicing performance and in turn the insertion and querying performance?
Is there a good blog post or information page on how to best set up a database for such a requirement - Need the ability to store a customer record which is 'hard' to guess, easy to insert and easy to query in a large data set.
Any assistance is most appreciated. Thank you!
The received wisdom you mention about putting indexes on UUIDs only comes up when you use them in place of autoincrementing primary keys. Why? The entire table (InnoDB) is built behind the primary key as a clustered index, and bulk loading works best when the index values are sequential.
You certainly can put an ordinary index on your UUID column. If you want your INSERT operations to fail in the astronomically unlikely event you get a random duplicate UUID value you can use an index like this.
ALTER TABLE customer ADD UNIQUE INDEX uuid_constraint (uuid);
But duplicate UUIDv4s are very rare indeed. They have 122 random bits, and most software generating them these days uses cryptographic-quality random number generators. Omitting the UNIQUE index is, I believe, an acceptable risk. (Don't use UUIDv1, 2, 3, or 5: they're not hard enough to guess to keep your data secure.)
If your UUID index isn't unique, you save time on INSERTs and UPDATEs: they don't need to look at the index to detect uniqueness constraint violations.
Edit. When UUID data is in a UNIQUE index, INSERTs are more costly than they are in a similar non-unique index. Should you use a UNIQUE index? Not if you have a high volume of INSERTs. If you have a low volume of INSERTs it's fine to use UNIQUE.
This is the index to use if you omit UNIQUE:
ALTER TABLE customer ADD UNIQUE INDEX uuid (uuid);
To make lookups very fast you can use covering indexes. If your most common lookup query is, for example,
SELECT uuid, givenname, surname, email
FROM customer
WHERE uuid = :uuid
you can create this so-called covering index.
ALTER TABLE customer
ADD INDEX uuid_covering (uuid, givenname, surname, email);
Then your query will be satisfied directly from the index and therefore be faster.
There's always an extra cost to INSERT and UPDATE operations when you have more indexes. But the cost of a full table scan for a query is, in a large table, far far greater than the extra INSERT or UPDATE cost. That's doubly true if you do a lot of queries.
In computer science there's often a space / time tradeoff. SQL indexes use space to save time. It's generally considered a good tradeoff.
(There's all sorts of trickery available to you by using composite primary keys to speed things up. But that's a topic for when you have gigarows.)
(You can also save index and table space by storing UUIDs in BINARY(16) columns and use UUID_TO_BIN() and BIN_TO_UUID() functions to convert them. )

MYSQL: What is the impact of varchar length on performance when used a primary key? [duplicate]

What would be the performance penalty of using strings as primary keys instead of bigints etc.? String comparison is much more expensive than integer comparison, but on the other hand I can imagine that internally a DBMS will compute hash keys to reduce the penalty.
An application that I work on uses strings as primary keys in several tables (MySQL). It is not trivial to change this, and I'd like to know what can be gained performance wise to justify the work.
on the other hand I can imagine that
internally a DBMS will compute hash
keys to reduce the penalty.
The DB needs to maintain a B-Tree (or a similar structure) with the key in a way to have them ordered.
If the key is hashed and stored it in the B-Tree that would be fine to check rapidly the uniqueness of the key -- the key can still be looked up efficiently. But you would not be able to search efficient for range of data (e.g. with LIKE) because the B-Tree is no more ordered according to the String value.
So I think most DB really store the String in the B-Tree, which can (1) take more space than numeric values and (2) require the B-Tree to be re-balanced if keys are inserted in arbitrary order (no notion of increasing value as with numeric pk).
The penalty in practice can range from insignificant to huge. It all depends on the usage, the number of rows, the average size of the string key, the queries which join table, etc.
In our product we use varchar(32) for primary keys (GUIDs) and we haven't met performance issues of this. Our product is a web site with extreme overload and is critical to be stable.
We use SQL Server 2005.
Edit: In our biggest tables we have more than 3 000 000 records with lots of inserts and selects from them. I think in general, the benefit of migrating to int key will be very low, but the problems while migrating very high.
One thing to watch out for is page splits (I know this can happen in SQL Server - probably the same in MySQL).
Primary keys are physically ordered. By using an auto-increment integer you guarantee that each time you insert you are inserting the next number up, so there is no need for the db to reorder the keys. If you use strings however, the pk you insert may need to be placed in the middle of the other keys to maintain the pk order. That process of reordering the pks on the insert can get expensive.
It depends on several factors: RDBMS, number of indexes involving those columns but in general it will be more efficient using ints, folowed by bigints.
Any performance gains depend on usage, so without concrete examples of table schema and query workload it is hard to say.
Unless it makes sense in the domain (I'm thinking unique something like social security number), a surrogate integer key is a good choice; referring objects do not need to have their FK reference updated when the referenced object changes.

What are the merits of using numeric row IDs in MySQL?

I'm new to SQL, and thinking about my datasets relationally instead of hierarchically is a big shift for me. I'm hoping to get some insight on the performance (both in terms of storage space and processing speed) versus design complexity of using numeric row IDs as a primary key instead of string values which are more meaningful.
Specifically, this is my situation. I have one table ("parent") with a few hundred rows, for which one column is a string identifier (10-20 characters) which would seem to be a natural choice for the table's primary key. I have a second table ("child") with hundreds of thousands (or possibly millions or more) of rows, where each row refers to a row in the parent table (so I could create a foreign key constraint on the child table). (Actually, I have several tables of both types with a complex set of references among them, but I think this gets the point across.)
So I need a column in the child table that gives an identifier to rows in the parent table. Naively, it seems like creating the column as something like VARCHAR(20) to refer to the "natural" identifier in the first table would lead to a huge performance hit, both in terms of storage space and query time, and therefore I should include a numeric (probably auto_increment) id column in the parent table and use this as the reference in the child. But, as the data that I'm loading into MySQL don't already have such numeric ids, it means increasing the complexity of my code and more opportunities for bugs. To make matters worse, since I'm doing exploratory data analysis, I may want to muck around with the values in the parent table without doing anything to the child table, so I'd have to be careful not to accidentally break the relationship by deleting rows and losing my numeric id (I'd probably solve this by storing the ids in a third table or something silly like that.)
So my question is, are there optimizations I might not be aware of that mean a column with hundreds of thousands or millions of rows that repeats just a few hundred string values over and over is less wasteful than it first appears? I don't mind a modest compromise of efficiency in favor of simplicity, as this is for data analysis rather than production, but I'm worried I'll code myself into a corner where everything I want to do takes a huge amount of time to run.
Thanks in advance.
I wouldn't be concerned about space considerations primarily. An integer key would typically occupy four bytes. The varchar will occupy between 1 and 21 bytes, depending on the length of the string. So, if most are just a few characters, a varchar(20) key will occupy more space than an integer key. But not an extraordinary amount more.
Both, by the way, can take advantage of indexes. So speed of access is not particularly different (of course, longer/variable length keys will have marginal effects on index performance).
There are better reasons to use an auto-incremented primary key.
You know which values were most recently inserted.
If duplicates appear (which shouldn't happen for a primary key of course), it is easy to determine which to remove.
If you decide to change the "name" of one of the entries, you don't have to update all the tables that refer to it.
You don't have to worry about leading spaces, trailing spaces, and other character oddities.
You do pay for the additional functionality with four more bytes in a record devoted to something that may not seem useful. However, such efficiencies are premature and probably not worth the effort.
Gordon is right (which is no surprise).
Here are the considerations for you not to worry about, in my view.
When you're dealing with dozens of megarows or less, storage space is basically free. Don't worry about the difference between INT and VARCHAR(20), and don't worry about the disk space cost of adding an extra column or two. It just doesn't matter when you can buy decent terabyte drives for about US$100.
INTs and VARCHARS can both be indexed quite efficiently. You won't see much difference in time performance.
Here's what you should worry about.
There is one significant pitfall in index performance, that you might hit with character indexes. You want the columns upon which you create indexes to be declared NOT NULL, and you never want to do a query that says
WHERE colm IS NULL /* slow! */
or
WHERE colm IS NOT NULL /* slow! */
This kind of thing defeats indexing. In a similar vein, your performance will suffer bigtime if you apply functions to columns in search. For example, don't do this, because it too defeats indexing.
WHERE SUBSTR(colm,1,3) = 'abc' /* slow! */
One more question to ask yourself. Will you uniquely identify the rows in your subsidiary tables, and if so, how? Do they have some sort of natural compound primary key? For example, you could have these columns in a "child" table.
parent varchar(20) pk fk to parent table
birthorder int pk
name varchar(20)
Then, you could have rows like...
parent birthorder name
homer 1 bart
homer 2 lisa
homer 3 maggie
But, if you tried to insert a fourth row here like this
homer 1 badbart
you'd get a primary key collision because (homer,1) is occupied. It's probably a good idea to work how you'll manage primary keys for your subsidiary tables.
Character strings containing numbers sort funny. For example, '2' comes after '101'. You need to be on the lookout for this.
The main benefit you get from numeric values that that they are easier to 'index'. Indexing is a process that MySQL uses to make it easier to find a value.
Typically, if you want to find a value in a group, you have to loop through the group looking for your value. That is slow and has a worst case of O(n). If instead your data was in a nice, searchable format -- like a binary search tree, if could be found in O(lon n), much faster.
Indexing is the process MySQL uses to prepare data to be searched, it generates search trees and other clever do-bobs that will make finding data quick. It makes many searches much faster. However, to do it, it has to compare the value you are searching for to various 'key' values to determine if your value is greater than or less than the key.
This comparison can be done on non-numeric values. However, comparing non-numeric values is much slower. If you want to be able to quickly look up data, your best bet is you have a integer 'key' that you use.
Numeric row id's have many advantages over a string based id.
Most of them are mentioned in other answers.
1. One of them is indexing. Primary keys are by default indexed in a relational database. So, having a numeric key is always more efficient.
2. Numeric fields are stored much more efficiently
2. Joins are much faster with numeric keys.
3. A row id could be a foreign key. Numeric id's are compact to store, making them efficient
4. I think using a auto-increment on primary key has its advantages too
-Thanks
_san

Should a primary key necessarily be auto-incrementing when I'm sure it is and will always be unique?

I've looked for a satisfying answer a tad more specific to my particular problem for a while now, but to avail. Whether I'm just not looking at the right places or not, I don't know, but here goes:
I'm pulling data from an application that afterwards is manipulated and sent to my own server. Amongst the data pulled is an, originally in the application's database, auto-incremented identifier. An example of this identifier I just now retrieved is 955534861. Isn't it better and more effective design to not auto-increment my primary key and just use the value I know is and will always stay unique, or should I look into concepts such as surrogate keys?
Thanks in advance.
The situation you describe resembles my primary job which is maintaining a data warehouse. We get data from other systems and store it.
Something that happens to us is that these "other systems" change. That leads to possibilities that the new version of the "other system" will duplicate the unique identifier from the previous system. We deal with this by adding something to that record in our data warehouse to guarantee it's uniqueness. It might be a field to identify the source system or it might be a date. It is never an autogenerated number.
If there is any chance of this this happening to you, you might want to expand your options.
If there is a natural key in your model, you cannot replace it by creating a surrogate key.
You can only add a surrogate key and keep the existing natural key, which has its pros and cons, as described here.
This'll get a little nerdy, but bear with me:
As long as a key value is unique, it'll serve its function. But for performance, you ideally want that key value to be as short as possible.
GUIDs are commonly used, because they are statistically highly unlikely to ever be repeated. But that comes at the expense of size: they are 128 bits long, which makes them longer than a machine word. To compare two GUIDs (as must be repeatedly done when sorting, or migrating down a b-tree for indexes) will take multiple processor intructions to load and compare the values. And they will consume more memory when cached into memory.
The advantage of auto-incrementing key values is that
They are guaranteed to be unique. Proxy index values are only predicted to be unique.
Because they will have full value coverage over the range of their underlying datatype, the most compact possible type may be used. This makes for smaller indexes and more efficient compare operations
Because the smallest possible type can be used, more index values can be stored on a single database page, which means you're more likely to get a cache hit when searching or joining on that value. That means that peformance will be--all other things being equal--somewhat better.
On most databases, auto-incrementing keys are worked into the database engine, so there is very small overhead in generating them.
If you employ a clustered index on your key value, new record inserts are less likely to require a random disk seek, and more likely to be read during read-ahead, so if you do any kind of sequential processing or lookup based on that key, it'll probably run faster.
The primary key, typically an auto-incrementing ID, is what MySQL uses as a row identifier as well, so it should be left alone. If you need a secondary key that's generated by your application for some other purpose, you may want to add that as another column with a UNIQUE index on it.
In other databases where there's a proper row identifier mechanism, this is less of an issue.

SQL GUID Vs Integer

I have recently started a new job and noticed that all the SQL tables use the GUID data type for the primary key.
In my previous job we used integers (Auto-Increment) for the primary key and it was a lot more easier to work with in my opinion.
For example, say you had two related tables; Product and ProductType - I could easily cross check the 'ProductTypeID' column of both tables for a particular row to quickly map the data in my head because its easy to store the number (2,4,45 etc) as opposed to (E75B92A3-3299-4407-A913-C5CA196B3CAB).
The extra frustration comes from me wanting to understand how the tables are related, sadly there is no Database diagram :(
A lot of people say that GUID's are better because you can define the unique identifer in your C# code for example using NewID() without requiring SQL SERVER to do it - this also allows you to know provisionally what the ID will be.... but I've seen that it is possible to still retrieve the 'next auto-incremented integer' too.
A DBA contractor reported that our queries could be up to 30% faster if we used the Integer type instead of GUIDS...
Why does the GUID data type exist, what advantages does it really provide?... Even if its a choice by some professional there must be some good reasons as to why its implemented?
GUIDs are good as identity fields in certain cases:
When you have multiple instances of SQL (different servers) and you need to combine the different updates later on without affecting referential integrity
Disconnected clients that create data - this way they can create data without worrying that the ID field is already taken
GUIDs are generated to be globally unique, which is why they are suited for such scenarios.
Contrary to what most folks here seem to preach, I see GUID's as more of a plague than a blessing. Here's why:
GUIDs may seem to be a natural choice for your primary key - and if you really must, you could probably argue to use it for the PRIMARY KEY of the table. What I'd strongly recommend not to do is use the GUID column as the clustering key, which SQL Server does by default, unless you specifically tell it not to.
You really need to keep two issues apart:
the primary key is a logical construct - one of the candidate keys that uniquely and reliably identifies every row in your table. This can be anything, really - an INT, a GUID, a string - pick what makes most sense for your scenario.
the clustering key (the column or columns that define the "clustered index" on the table) - this is a physical storage-related thing, and here, a small, stable, ever-increasing data type is your best pick - INT or BIGINT as your default option.
By default, the primary key on a SQL Server table is also used as the clustering key - but that doesn't need to be that way! I've personally seen massive performance gains when breaking up the previous GUID-based Primary / Clustered Key into two separate key - the primary (logical) key on the GUID, and the clustering (ordering) key on a separate INT IDENTITY(1,1) column.
As Kimberly Tripp - the Queen of Indexing - and others have stated a great many times - a GUID as the clustering key isn't optimal, since due to its randomness, it will lead to massive page and index fragmentation and to generally bad performance.
Yes, I know - there's newsequentialid() in SQL Server 2005 and up - but even that is not truly and fully sequential and thus also suffers from the same problems as the GUID - just a bit less prominently so. Plus, you can only use it as a default for a column in your table - you cannot get a new sequential GUID in T-SQL code (like a trigger or something) - another major drawback.
Then there's another issue to consider: the clustering key on a table will be added to each and every entry on each and every non-clustered index on your table as well - thus you really want to make sure it's as small as possible. Typically, an INT with 2+ billion rows should be sufficient for the vast majority of tables - and compared to a GUID as the clustering key, you can save yourself hundreds of megabytes of storage on disk and in server memory.
Quick calculation - using INT vs. GUID as Primary and Clustering Key:
Base Table with 1'000'000 rows (3.8 MB vs. 15.26 MB)
6 nonclustered indexes (22.89 MB vs. 91.55 MB)
TOTAL: 25 MB vs. 106 MB - and that's just on a single table!
Some more food for thought - excellent stuff by Kimberly Tripp - read it, read it again, digest it! It's the SQL Server indexing gospel, really.
GUIDs as PRIMARY KEY and/or clustered key
The clustered index debate continues
Ever-increasing clustering key - the Clustered Index Debate..........again!
Marc
INT
Advantage:
Numeric values (and specifically integers) are better for performance when used in joins, indexes and conditions.
Numeric values are easier to understand for application users if they are displayed.
Disadvantage:
If your table is large, it is quite possible it will run out of it and after some numeric value there will be no additional identity to use.
GUID
Advantage:
Unique across the server.
Disadvantage:
String values are not as optimal as integer values for performance when used in joins, indexes and conditions.
More storage space is required than INT.
credit goes to : http://blog.sqlauthority.com/2010/04/28/sql-server-guid-vs-int-your-opinion/
There are a ton of Google-able articles on using GUIDs as PKs and almost all of them say the same thing your DBA contractor says -- queries are faster without GUIDs as keys.
The primary use I've seen in practice (we've never used them as PKs) is with replication. The MSDN page for uniqueidentifier says about the same.
It is globally unique, so that each record in your table has a GUID that is shared by no other item of any kind in the world. Handy if you need this kind of exclusive identification (if you are replicating the database, or combining data from multiple source). Otherwise, your dba is correct - GUIDs are much larger and less efficient that integers, and you could speed up your db (30%? maybe...)
They basically save you from more sometimes complicated logic of using
set #InsertID = scope_identity()