Understanding keys in databases - mysql

This question is geared towards MySQL, since that is what I'm using -- but I think that it's probably the same or similar for almost every major database implementation.
How do keys work in a database? By that I mean, when you set a field to 'primary key', 'unique key' or an 'index' -- what do each of these do, and when should I use each one?
Right now I have a table containing a few fields, one of them being a GUID (minus the { and } around it). I set the GUID field to the primary key and I see that it created a binary tree. So it improves search performance -- but what differentiates that from other types of keys?
I realize this may not really be programming related (although it is development related) -- I wasn't sure where exactly to ask this but SO is what I use the most so I'll ask here. Migrate as necessary

There are probably hundreds of references for this elsewhere on the web, so a bit of Googling will help you get deep into understanding DB design. That said, the basic gist is:
primary key: a field or combination of fields which must be unique for each row, and which is/are indexed to provide rapid lookup of a row given a key value; cannot contain NULL, and a table can only have one primary key. Generally indexed in a clustered index, which means that the data in the table is reordered to match the order of the index, a process that greatly improves serial data retrieval. (This is the main reason a table can only have one primary key -- the order of the data can't match the order of more than one index!)
unique key: same as a primary key, but on some DB platforms, can contain NULL values so long as they don't violate the uniqueness constraint. (In other words, if the unique key contains a single column, there can only be one row in the table with NULL in that column; if the key contains more than one column, then the table can only contain rows with NULLs in the columns such that there's no non-unique duplication of NULL values across the columns in the key.) On other platforms (including MySQL), unique constraints can contain multiple NULLs; the uniqueness constraint only applies to non-NULL values of the referenced columns. There can be more than one of these per table. Indexed in a non-clustered index.
index: a field or combination of fields which are pre-indexed for more rapid retrieval given a value for the field(s) in the index. A table can have more than one index.

When you define a primary key, the database creates an index based on that key. It needs to be unique. In general you can create an index that to speed up access to data based on non-unique query data. The indexed retrieval time for a uniquely keyed data should be better than for non-uniquely keyed indexes, so I try to use unique indexes where possible.

At the most basic, primary keys represent how the records will be physically stored in memory / on disk, you would want the unique field you're going to search on the most to be this as it will greatly reduce searching.
Unique key's are fields that can only contain unique values.
An index is a specialized "map" to the database file that queries can reference.
These are extremely simplified answers, but I think that's the gist of it.

One more thing, any key is essentially a separate table that is sorted by the index that points directly to the row(s) that match the key.
A BTree style index is stored in a balanced tree, a balanced tree is a tree structure where traveling left is smaller and traveling right is larger.
5
3 7
2 4 6 8
Would be an example of a balanced tree. The other major type is a Hash, where a mathematical expression turns the key into the relative memory location of the key.

In order to really understand keys, you have to understand them at three levels: conceptual, logical, and physical. I'm going to reverse my habitual order, and discuss physical first.
Most programmers tend to think at the physical level. At the physical level, a key is a surrogate (stand-in) for the address of a row. When a row is to be referenced, a copy of the key can be used to specify the row. When a reference to a row is made in another row, the copy is known as a foreign key.
Most experienced programmers have a thorough understanding of pointers and addresses, and would understand exactly how the data structure worked if only it used pointers and addresses. Before the relational databases became dominant, there were in fact databases that used pointers to records embedded in other records to tie the data together.
A disadvantage to using keys instead of pointers is that the DBMS has to use an index to translate a key reference back to a pointer in order to retrieve the row in question. An advantage is that the level of indirection allows the DBMS to shuffle all the rows in a table for whatever purpose, as long as the DBMS updates all the relevant indexes accordingly.
Viewed at this level, keys might as well be simple, integer, and autoincremented. These work faster than other kinds of keys, and they sidestep certain data management issues that arise when user supplied data is missing or inconsistent. However, sidestepping data management issues at this level can create a minefield at the two higher levels.
At the logical level, a key is a minimal subset of the data in a tuple (row) that allows a single matching tuple to be specified, and when the DBMS retrieves the container for that tuple, all the attributes in the tuple are now available. Every relation has at least one candidate key. In the worst case, the entire tuple is the only candidate key. When multiple candidate keys exist for a single relation (table), common practice is to choose one candidate key as the primary key, and to make all references via this primary key.
(Actually, relation and table are not synonymous, but I'm simplifying here. Likewise, tuple and row are not synonymous, although they look identical at first glance.)
The primary reason to declare a primary key is to rule out duplicate keys or missing keys.
Sometimes database people choose to leave duplicate and missing key avoidance up to the programmers whose applications write to the database. More commonly, a primary key constraint serves to reflect an error back to a program that violates a primary key constraint.
When a DBMS sets up a primary key constraint, it also builds an index on the primary key. This allows the DBMS to find duplicates quickly, and it also speeds up certain queries that use the key column(s).
At the conceptual level, keys are the means by which the user community identifies instances of entities, whether those entities are persons (employees, travellers, etc.), things (bank accounts, hotel rooms, etc.) or whatever. The key is data and the entity identified by the key is not data. The key can thus be seen a surrogate for the entity in the database.
At the conceptual level, keys are always natural, and never automatically supplied by the system. However, in the real world, keys are often mismanaged, and the consequences of mismanagement are overcome by what is called "common sense". Instilling common sense into an automated system is generally not feasible.
I never really described an index in the above, but it's implicit in what I said. An index is a data structure that serves to map from a key to a pointer. In all the databases you are likely to use, indexes are declared by the database builder (or perhaps a DBA) and managed by the DBMS.

Related

Should one combine foreign keys that point to the same table if all columns are required?

I encounter this situation frequently. An example,
A user is uniquely identified by appId, externalUserId.
Table xxxContract has a foreign key (fileUploadId, appId, externalUserId) to table fileUpload that ensures the file upload belongs to the specified user.
Table xxxContract has a foreign key (businessId, appId, externalUserId) to table business that ensures the business belongs to the specified user.
With the above two, we guarantee user A's file upload won't be used as a contract for user B's business.
xxxContract also has a fileTypeId column that is STORED GENERATED to a certain value that says "This contract is of file type XXX_CONTRACT"
Table xxxContract also has a foreign key (fileUploadId, fileTypeId) to table fileUpload.
This guarantees we only use XXX_CONTRACT file uploads for xxxContract, and not accidentally use other file types.
Given the above, we have this situation where we have two foreign keys that point to the same table fileUpload, and even have overlapping columns,
(fileUploadId, appId, externalUserId)
(fileUploadId, fileTypeId)
And all the columns are NOT NULL.
So, it seems to me like it's safe to combine the foreign keys into one larger foreign key,
(fileUploadId, appId, externalUserId, fileTypeId)
And we'll still have the same guarantees as before.
My gut feeling is that I should not combine the foreign keys because separating them by meaning and giving the FKs meaningful names helps with maintainability.
But I've never had a formal education with these things so I'd like to know what the industry standard is.
Related, is there a performance benefit to combining them vs. separating them?
But I've never had a formal education with these things so I'd like to know what the industry standard is.
The standard is, that there is no standard.
As you already noted, you can use multiple columns to define a primary key. This is called a natural primary key, for instance: A user can be uniquely identified by firstname, lastname - and birthdate. (at least almost ever)
This kind of keys is often called composite keys, because every column alone doesn't work out, only combined they form a primary key.
Surrogate (or artifical) primary keys are also well known: id column, using auto-increment.
So, as to your question: Yes, if you have 3 columns that already form a natural primary key, it is completly safe to add more columns. Since the 3 columns already present will uniquely identify the row, there is no harm in adding a 4th, 5th or even 6th column to the key.
Whether you are going to use natural or surrogate primary keys depens on personal preference i'd say. I never use natural keys, even on tables where this is possible.
Keep in mind, that whenever you need to delete / update something, you always need to know the primary key. hence, with natural keys, you need to move multiple values through many method-calls, while surrogate keys offer the advantage of just having "one" id to uniquely identify a row. No more information required.
Performance-wise, i assume that (Integer-based) surrogate primary keys tend to be faster than (String-based) natural primary keys. It's even less columns to consider when writing queries and/or designing indexes.

MySQL is there a point to having a primary key on a lookup table which referrers to a primary key on another table which is indexed?

I'm just doing some basic normalisation but I don't have the answer for this, wondering if you guys can give me some info on right/wrong, do's/dont's etc.
So if I have:
I've always set a primary key (unique auto incrementer on lookup tables), in the image the lookup tables would be "page_downloads" and "page_includes" but I can guarantee those columns will never get used as they will only be queried via the page_id, same for so many definition tables.
So my question is: "Is there any point? What is the best practice thing to do? Always create the primary key even though it will never be used or don't bother creating it as it is fine to use the indexed int column which refers to a primary key in another table. Eg the relationship in the picture (page_id to page_id). Thoughts?"
Thanks
D
No. While every table should have a PRIMARY KEY, it need not be a surrogate. In this instance, (page_id,file_id) is a valid compound PRIMARY KEY (as is (file_id,page_id)).
To add some info to Strawberry's valid observations.
There's no absolute answer or best practice regarding the surrogate keys and usually this boils down to individual preference. There are both advantages and disadvantages to using surrogate keys. Among the advantages, one could consider:
Immutability Surrogate keys do not change while the row exists.
This has the following advantages:
Applications cannot lose their reference to a row in the database
(since the identifier never changes). The primary or natural key data
can always be modified, even with databases that do not support
cascading updates across related foreign keys. Requirement
changes[edit] Attributes that uniquely identify an entity might
change, which might invalidate the suitability of natural keys.
Consider the following example:
An employee's network user name is chosen as a natural key. Upon
merging with another company, new employees must be inserted. Some of
the new network user names create conflicts because their user names
were generated independently (when the companies were separate). In
these cases, generally a new attribute must be added to the natural
key (for example, an original_company column). With a surrogate key,
only the table that defines the surrogate key must be changed. With
natural keys, all tables (and possibly other, related software) that
use the natural key will have to change.
Some problem domains do not clearly identify a suitable natural key.
Surrogate keys avoid choosing a natural key that might be incorrect.
Performance[edit] Surrogate keys tend to be a compact data type, such
as a four-byte integer. This allows the database to query the single
key column faster than it could multiple columns. Furthermore a
non-redundant distribution of keys causes the resulting b-tree index
to be completely balanced. Surrogate keys are also less expensive to
join (fewer columns to compare) than compound keys.
Compatibility While using several database application
development systems, drivers, and object-relational mapping systems,
such as Ruby on Rails or Hibernate, it is much easier to use an
integer or GUID surrogate keys for every table instead of natural keys
in order to support database-system-agnostic operations and
object-to-row mapping.
Uniformity When every table has a uniform surrogate key, some
tasks can be easily automated by writing the code in a
table-independent way.
Validation It is possible to design key-values that follow a
well-known pattern or structure which can be automatically verified.
For instance, the keys that are intended to be used in some column of
some table might be designed to "look differently from" those that are
intended to be used in another column or table, thereby simplifying
the detection of application errors in which the keys have been
misplaced. However, this characteristic of the surrogate keys should
never be used to drive any of the logic of the applications
themselves, as this would violate the principles of Database
normalization.

Why we should have an ID column in the table of users?

It's obvious that we already have another unique information about each user, and that is username. Then, why we need another unique thing for each user? Why should we also have an id for each user? What would happen if we omit the id column?
Even if your username is unique, there are few advantages to having an extra id column instead of using the varchar as your primary key.
Some people prefer to use an integer column as the primary key, to serve as a surrogate key that never needs to change, even if other columns are subject to change. Although there's nothing preventing a natural primary key from being changeable too, you'd have to use cascading foreign key constraints to ensure that the foreign keys in related tables are updated in sync with any such change.
The primary key being a 32-bit integer instead of a varchar can save space. The choice between a int or a varchar foreign key column in every other table that references your user table can be a good reason.
Inserting to the primary key index is a little bit more efficient if you add new rows to the end of the index, compared to of wedging them into the middle of the index. Indexes in MySQL tables are usually B+Tree data structures, and you can study these to understand how they perform.
Some application frameworks prefer the convention that every table in your database has a primary key column called id, instead of using natural keys or compound keys. Following such conventions can make certain programming tasks simpler.
None of these issues are deal-breakers. And there are also advantages to using natural keys:
If you look up rows by username more often than you search by id, it can be better to choose the username as the primary key, and take advantage of the index-organized storage of InnoDB. Make your primary lookup column be the primary key, if possible, because primary key lookups are more efficient in InnoDB (you should be using InnoDB in MySQL).
As you noticed, if you already have a unique constraint on username, it seems a waste of storage to keep an extra id column you don't need.
Using a natural key means that foreign keys contain a human-readable value, instead of an arbitrary integer id. This allows queries to use the foreign key value without having to join back to the parent table for the "real" value.
The point is that there's no rule that covers 100% of cases. I often recommend that you should keep your options open, and use natural keys, compound keys, and surrogate keys even in a single database.
I cover some issues of surrogate keys in the chapter "ID Required" in my book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
This identifier is known as a Surrogate Key. The page I linked lists both the advantages and disadvantages.
In practice, I have found them to be advantageous because even superkey data can change over time (i.e. a user's email address may change and thus any corresponding relations must change), but a surrogate key never needs to change for the data it identifies because its value is meaningless to the relation.
It's also nice from a JOIN standpoint because it can be an integer with a smaller key length than a varchar.
I can say that in practice I prefer to use them. I have been bitten too many times by having multiple-column primary keys or a data-representative superkey used across tables having to become non-unique later due to changing requirements during development, and that is not a situation you want to deal with.
In my opinion, every table should have a unique, auto-incremented id.
Here are some practical reasons. If you have duplicate rows, you can readily determine which row to delete. If you want to know the order that rows were inserted, you have that information in the id. As for users, there's more than on "John Smith" in the world. An id provides a key for foreign references.
Finally, just about anything that might describe a user -- a name, an address, a telephone number, an email address -- could change over time.
im mysql we have.
1:Index fields 2:Unique fields and 3:PK fields.
index means pointable
unique means in a table must be one in all rows.
PK = index + unique
in a table you may have lots of unique fields like
username or passport code or email.
but you need a field like ID. that is both unique and index (=PK).which is first is always one thing and never changes and second is unique and third is simple (because is often number).
One reason to have a numeric id is that creating an index on it is leaner than on a text-field, reducing index size and processing time required to look up a specific user. Also it's less bytes to save when cross-referencing to a user (relational database) in a different table.

Why are composite primary keys still around?

I'm assigned to migrate a database to a mid-class ERP.
The new system uses composite primary keys here and there, and from a pragmatic point of view, why?
Compared to autogenerated IDs, I can only see negative aspects;
Foreign keys becomes blurry
Harder migration or db-redesigns
Inflexible as business change. (My car has no reg.plate..)
Same integrity better achieved with constraints.
It's falling back to the design concept of candiate keys, which I neither see the point of.
Is it a habit/artifact from the floppy-days (minimizing space/indexes), or am I missing something?
//edit//
Just found good SO-post: Composite primary keys versus unique object ID field
//
Composite keys are required when your primary keys are non-surrogate and inherently, um, composite, that is, breakable into several non-related parts.
Some real-world examples:
Many-to-many link tables, in which the primary keys are composed of the keys of the entities related.
Multi-tenant applications when tenant_id is a part of primary key of each entity and the entities are only linkable within the same tenant (constrained by a foreign key).
Applications processing third-party data (with already provided primary keys)
Note that logically, all this can be achieved using a UNIQUE constraint (additional to a surrogate PRIMARY KEY).
However, there are some implementation specific things:
Some systems won't let a FOREIGN KEY refer to anything that is not a PRIMARY KEY.
Some systems would only cluster a table on a PRIMARY KEY, hence making the composite the PRIMARY KEY would improve performance of the queries joining on the composite.
Personally I prefer the use of surrogate keys. However, in joining tables that consist only of the ids from two other tables (to create a many-to-many relationships) composite keys are the way to go and thus taking them out would make things more difficult.
There is a school of thought that surrogate keys are always bad and that if you don't have uniqueness to record through the use of natural keys you have a bad design. I strongly disagree with this (if you aren't storing SSN or some other unique value I defy you to come up with a natural key for a person table for instance.) But many people feel that it is necessary for proper normalization.
Sometimes having a composite key reduces the need to join to another table. Sometimes it doesn't. So there are times when a composite key can boost performance as well as times when it can harm performance. If the key is relatively stable, you may be fine with faster performance on select queries. However, if it is something that is subject to change like a company name, you could be in a world of hurt when company A changes it's name and you have to update a million associated records.
There is no one size fits all in database design. There are time when composite keys are helpful and times when they are horrible. There are times when surrogate keys are helpful and times when they are not.
Composite primary key provides better performance when it comes to them being used as Foreign keys in other tables and reduces table reads - sometimes they can be life savers. If you use surrogate keys, you have to go to that table to get natural key information.
For example (pure example - so we are not talking DB design here), lets say you have an ORDER table and ORDER_ITEM. If you use ProductId and LineNumber (UPDATE: and as Pedro mentioned OrderId or even better OrderNumber) as composite primary key in ORDER_ITEM, then in your cross table for SHIPPING, you would be able to have ProductId in the SHIPPING_ORDERITEM. This can massively boost your performance if for example you have run out of that product and need to find out all products of that ProductId that need to be shipped without a need to join.
On the other hand, if you use a surrogate key, you have to join and you end up with a very inefficient SQL execution plan where it has to do bookmark lookup on several indexes.
See more on bookmark lookup which using surrogate keys becomes a major issue.
Natural primary keys are brittle.
Suppose we have built a system around a natural PK on (CountryCode, PhoneNumber), and several years down the road we need to add Extension, or change the PK to one column: Email. If these PK columns are propagated to all child tables, this becomes very expensive.
A few years ago there were some systems that were built assuming that Social Security Number is a natural PK, and had to be redesigned to use identities, when the SSN became non-unique and nullable.
Because we cannot predict the future, we don't know if later on some change will render obsolete what used to be a perfectly correct and complete model.
The very simple answer is data integrity. If the data is to be useful and accurate then the keys are presumably required. Having an "autogenerated id" doesn't remove the requirement for other keys as well. The alternative is not to enforce uniqueness and accept that data will be duplicated and almost inevatibly contain anomalies and lead to errors as a result. Why would you want that?
In short, the purpose of composite keys is to use the database to enforce one or more business rules. In other words: protect the integrity of your data.
Ex. You have a list of parts that you buy from suppliers. You could could create your supplier and parts table like such:
SUPPLIER
SupplierId
SupplierName
PART
PartId
PartName
SupplierId
Uh oh. The parts table allows for duplicate data. Since you used a surrogate key that was autogenerated, you're not enforcing the fact that a part from a supplier should only be entered once. Instead, you should create the PART table like such:
PART
SupplierId
SupplierPartId
PartName
In this example, your parts come from specific suppliers and you want to enforce the rule: "A single supplier can only supply a single part once" in the PARTS table. Hence, the composite key. Your composite key prevents accidental duplicate entry of a part.
You can always leave business rules out of your database and leave them to your application, but by keeping the rule in the database (via a composite key), you ensure that the business rule is enforced everywhere, especially if you should ever decide to allow multiple applications to access the data.
Just as functions encapsulate a set of instructions, or database views abstract base table connections, so to do surrogate keys abstract the meaning of the entity they are placed on.
If, for example, you have a table that holds vehicle data, applying a surrogate VehicleId abstracts what it means to be a vehicle from a data point of view. When you reference VehicleId = 1, you are most surely talking about a vehicle of some sort, but do we know if it is a 2008 Chevy Impala, or a 1991 Ford F-150? No. Can the underlying data of whatever Vehicle #1 is change at any time? Yes.
Short answer: Multi-column foreign keys naturally refer to multi column primary keys. There can still be an autogenerated id column that is part of the primary key.
Philosophical answer: Primary key is the identity of the row. If there there is a bit of information that is an intrinsic part of the identity of the row (such as which customer the article belongs to.. in a multi customer wiki) - The information should be part of the primary key.
An example: System for organizing LAN parties
The system supports several LAN parties with the same people and organizers attending thus:
CREATE TABLE users ( users_id serial PRIMARY KEY, ... );
And there are several parties:
CREATE TABLE parties ( parties_id serial PRIMARY KEY, ... );
But most of the other stuff needs to carry the information about which party it is linked to:
CREATE TABLE ticket_types (
ticket_types_id serial,
parties_id integer REFERENCES parties,
name text,
....
PRIMARY KEY(ticket_types_id, parties_id)
);
...this is because we want to refer to primary keys. Foreign key on table attendances points to table ticket_types.
CREATE TABLE attendances (
attendances_id serial,
parties_id integer REFERENCES parties,
ticket_types_id integer,
PRIMARY KEY (attendances_id, parties_id),
FOREIGN KEY (ticket_types_id, parties_id) REFERENCES parties
);
While I prefer surrogate keys, I use composite cases in a few cases. The composite key may consist entirely or partially of surrogate key fields.
Many to many join tables. These usually require a unique key on the key pair anyway. In some cases additional columns may be included in the key.
Weak child tables. Things like order lines do not stand on their own. In this case I use the parent (orders) tables primary key in the composite table.
When there are multiple weak tables related to an entity, it may be possible to eliminate a table from the join set when querying child data. In the case of grandchild tables, it is possible to join the grandparent to grandchild without involving the table in the middle.

Mysql auto increment primary key id's

I have some mysql tables that have auto incrementing id's that are primary keys, but I notice that I never actually use them... I used to think that every table must have a primary key so I guess that is why I created them before. Should I remove them all if I don't use them at all?
Unless you are running into space problems I wouldn't remove them.
They are a life saver in case you by mistake (or oversight) populate the database with repeated/wrong data.
They also help to have related tables, where you reference the content on one table through the autogenerated id.
This is assuming you have indexes for the other columns you use to actually query the data (if you don't, then more reason to keep the autoincrement ids and use them!).
No.
You should keep them; a database always needs something that differentiates a row from another row (a "Key" of some sort).
If you have something that is guaranteed to be unique for each row, then you can use that as a key; otherwise keep the Primary Key and the Auto generated ID.
I'd personally keep them. They will be especially useful at a later date if you expand the database design and need to reference this table.
Interesting!...
I seem to hold a minority opinion here, getting both upvoted and downvoted to currently an even 0, yet no one in the majority opinion (see responses above) seems to make much of a case for keeping the id field, and the downvoters didn't even bother leaving comments hinting at why doing away with the id is such a bad idea.
In their defense, my own original response did not include any strong argument as to why it is ok to do away with the id attribute in some cases (which seem to apply to the OP). Maybe such a gratuitous response makes it, in of itself, a downvotable response.
Please do educate me, and the OP, by leaving comments pro or against the _systematic_ (and I stress "systematic") need to include auto-incremented non-semantic primary keys in all tables. A promised I returned and added to my response to provide a list of reasons why it may be detrimental to [again, systematically] impose a auto-incremented PK.
My original response:
You bet! you can remove these!
Before you do anything to the database make sure you have a backup, in particular is the DB size is significant.
Use the ALTER TABLE statement to remove the id in the tables where you want to remove it. Specifically
ALTER TABLE myTable DROP COLUMN id
(you also need to remove the PK constraint before removing the id, if the table has such a constraint)
EDIT (Added later)
There are many cases where it just doesn't make sense to carry along an autoincremented ID key, regardless of the relative little extra storage requirement these keys add.
In all these cases, the underlying implication is that
either the data itself supplies a primary key,
or, the application manages the key generation
The key supplied "natively" in the data doesn't necessarily neeeds to be a single column key, it can be a composite key, although in these cases one may wish to study the situation more closely, particularly is the overal key is a bit long.
Here are some of the drawbacks of using an auto-incremeted primary key in lieu of a native or application-supplied key:
The effective data integrity may go unchecked
i.e. the server may allow record insertions of updates which create a duplicated [native] key (eventhough the artificial, autoincremented primary key hides this reality)
When relying on the auto-incremented PK for the support of joins between tables, when part of the [native] key values have to be updated...
...we either create the need of deleting the record in full and and re-insert it with the news values,
...or the risk of keeping outdated/incorrect links.
A common "follow-up" with auto-incremented keys is to create a clustered index on the table for this key.
This does make sense for tables without an native or application-supplied primary key, so so much for data sets that have such keys.
Effectively this prevents choosing a key for the clustered index which may be more beneficial for the most common query patterns.
Migrating tables with an auto-incremented key can made more difficult depending on the DBMS (need to declare the underlying column as plain integer, prior to copy, then need start again the autoincrement...)
For narrow tables, i.e. tables with a few columns only, the relative cost of the auto-incremented PK can be significant, and impact performance in a non negligible fashion.
When inserting new records along with associated records in related tables, the auto-incremented key needs to be obtained after the insertion of the main record, before the related records can be inserted; the logic is simpler when the column values supporting the link are known ahead of time.
To summarize, the idea that so long as the storage can carry the [relatively minimal] extra "weight" of the artificial primary key, we should include and use such a key, is not without drawbacks of its own.
A final consideration is that just like it is rather easy to remove such keys when we don't need them, they too can be easily added, post-facto, when/if it becomes apparent that they are useful in a particular situation. Neither form of refactoring (adding vs. removing the auto-incremented columns) is risk free, but neither is a major production either.
Yes, if you can figure out another primary key.
There is obviously a flaw of your table design. For example, you had a table like
relation_id(PK), parent_id, child_id .
It is known that the combination of parent_id and child_id is unique, then you can assign the primary key to be parent_id + child_id, and then drop the column relation_id.
There should may endlessly other possible cases, but just bear in mind that primary key is helping you to locate data quickly, as well as helping you have your design making sense.