database storing multiple types of data, but need unique ids globally - mysql

a while ago, i asked about how to implement a REST api. i have since made headway with that, but am trying to fit my brain around another idea.
in my api, i will have multiple types of data, such as people, events, news, etc.
now, with REST, everything should have a unique id. this id, i take it, should be unique to the whole system, and not just each type of data.
for instance, there should not be a person with id #1 and a news item with id of #1. ultimately, these two things would be given different ids altogether: person #1 with unique id of #1 and news item #1 with unique id #2, since #1 was taken by a person.
in a database, i know that you can create primary keys that automatically increment. the problem is, usually you have a table for each data "type", and if you set the auto increment for each table individually, you will get "duplicate" ids (yes, the ids are still unique in their own table, but not the whole DB).
is there an easy way to do this? for instance, can all of these tables be set to work off of one incrementer (the only way i could think of how to put it), or would it require creating a table that holds these global ids, and ties them to a table and the unique id in that table?

You could use a GUID, they will be unique everywhere (for all intents and purposes anyway).
http://en.wikipedia.org/wiki/Globally_unique_identifier

+1 for UUIDs (note that GUID is a particular Microsoft implementation of a UUID standard)
There is a built-in function uuid() for generating UUID as text. You may probably prefix it with table name so that you may easily recognize it later.
Each call to uuid() will generate you a fresh new value (as text). So with the above method of prefixing, the INSERT query may look like this:
INSERT INTO my_table VALUES (CONCAT('my_table-', UUID()), ...)
And don't forget to make this column varchar of large enough size and of course create an index for it.

now, with REST, everything should have a unique id. this id, i take
it, should be unique to the whole system, and not just each type of
data.
That's simply not true. Every resource needs to have a unique identifier, yes, but in an HTTP system, for example, that means a unique URI. /people/1 and /news/1 are unique URI's. There is no benefit (and in fact quite a lot of pain, as you are discovering) from constraining the system such that /news/1 has to instead be /news/0983240-2309843-234802/ in order to avoid conflict.

Related

what is the best practice - a new column or a new table?

I have a users table, that contains many attributes like email, username, password, phone, etc.
I would like to save a new type of data (integer), let's call it "superpower", but only very few users will have it. the users table contains 10K+ records, while fewer than 10 users will have a superpower (for all others it will be null).
So my question is which of the following options is more correct and better in terms of performance:
add another column in the users table called "superpower", which will be null for almost all users
have a new table calles users_superpower, which will at most contains 10 records and will map users to superpowers.
some things i have thought about:
a. the first option seems wasteful of space, but it really just an ingeger...
b. the second option will require a left join every time i query the users...
c. will the answer change if "superpower" data was 5 columns, for example?
note: i'm using hibenate and mysql, if it changes the answer
This might be a matter of opinion. My viewpoint on this follows:
If superpower is an attribute of users and you are not in the habit of adding attributes, then you should add it as a column. 10,000*4 additional bytes is not very much overhead.
If superpower is just one attribute and you might add others, then I would suggest using JSON or another EAV table to store the value.
If superpower is really a new type of user with other attributes and dates and so on, then create another table. In this table, the primary key can be the user_id, making the joins between the tables even more efficient.
I would go with just adding a new boolean field in your user entity which keeps track of whether or not that user has superpowers.
Appreciate that adding a new table and linking it requires the creation of a foreign key in your current users table, and this key will be another column taking up space. So it doesn't really get around avoiding storage. If you just want a really small column to store whether a user has superpowers, you can use a boolean variable, which would map to a MySQL BIT(1) column. Because this is a fixed width column, NULL values would still take up a single bit of space, but this not a big storage concern most likely as compared to the rest of your table.

Do I need a primary key for every table in MS Access?

I am new to MSAccess so I'm not sure about this; do I have to have a primary key for every single table in my database? I have one table which looks something like this:
(http://i108.photobucket.com/albums/n32/lurker3345/ACCESSHELP.png?t=1382688844)
In this case, every field/column has a repeating term. I have tried assigning the primary key to every field but it returns with an error saying that there is a repeated field.
How do I go about this?
Strictly speaking, Yes, every row in a relational database should have a Primary Key (a unique identifier). If doing quick-and-dirty work, you may be able to get away without one.
Internal Tracking ID
Some database generate a primary key under-the-covers if you do not assign one explicitly. Every database needs some way to internally track each row.
Natural Key
A natural key is an existing field with meaningful data that happens to identify each row uniquely. For example, if you were tracking people assigned to teams, you might have an "employee_id" column on the "person" table.
Surrogate Key
A surrogate key is an extra column you add to a table, just to assign an arbitrary value as the unique identifier. You might assign a serial number (1, 2, 3, …), or a UUID if your database (such as Postgres) supports that data type. Assigning a serial number or UUID is so common that nearly every database engine provides a built-in facility to help you automatically create such a value and assign to new rows.
My Advice
In my experience, any serious long-term project should always use a surrogate key because every natural key I've ever been tempted to use eventually changes. People change their names (get married, etc.). Employee IDs change when company gets acquired by another.
If, on the other hand, you are doing a quick-and-dirty job, such as analyzing a single batch of data to produce a chart once and never again, and your data happens to have a natural key then use it. Beware: One-time jobs often have a way of becoming recurring jobs.
Further advice… When importing data from a source outside your control, assign your own identifier even if the import contains a candidate key.
Composite Key
Some database engines offer a composite key feature, also called compound key, where two or more columns in the table are combined to create a single value which once combined should prove unique. For example, in a "person" table, "first_name" and "last_name", and "phone_number" fields might be unique when considered together. Unless two people married and sharing the same home phone number while also happening to each be named "Alex" with a shared last name! Because of such collisions as well as the tendency for meaningful data to change and also the overhead of calculating such combined values, it is advisable to stick with simple (single-column) keys unless you have a special situation.
If the data doesn't naturally have a unique field to use as the primary key, add an auto-generated integer column called "Id" or similar.
Read the "how to organize my data" section of this page:
http://www.htmlgoodies.com/primers/database/article.php/3478051
This page shows you how to create one (under "add an autonumber primary key"):
http://office.microsoft.com/en-gb/access-help/create-or-remove-a-primary-key-HA010014099.aspx
In you use a DataAdapter and a Currency Manager, your tables must have a primary key in order to push updates, additions and deletions back to the database. Otherwise, they will not register and you will receive an error.
I lost one week figuring that one out until I added this to the Try-Catch-End Try block: MsgBox(er.ToString) which mentioned "key". From there, I figured it out.
(NB : Having a primary key was not a requisite in VB6)
Not having a primary key usually means your data is poorly structured. However, it looks like you're dealing with summary/aggregate data there, so it's probably doesn't matter.

I want to reuse the gaps of the deleted rows

I have a auto-increment primary key on one of my tables. If I have 3 rows and, for example, delete the third row I'm left with two. However, if I insert a new row its ID is automatically 4 and the IDs are 1, 2 and 4.
How can I re-use the deleted ID and have the ID of the newly inserted record to be 3 automatically?
Really, you shouldn't. Primary keys should be purely technical, meaningless values. Their value, and the monotony of the generation, shouldn't matter at all.
Moreover, since it's the PK of the row, you'll have potentially dozens (or thousands) of other rows in other tables referencing this ID (foreign keys), so changing it in the table would not be enough: you would have to change it everywhere.
And there's a good chance that this ID is also referenced in other applications (for example, it could be part of a bookmarked URL in a browser), and changing its value would make all these references invalid.
You should never change a primary key. It should be immutable, forever.
EDIT: I misread the question. You actually want to reuse an old ID. This is also a bad idea. Existing references would reference something other than they initially referenced. This is what happens when you change your phone number and it's being reused by someone else, who starts receiving lots of calls from people who still think this phone number is yours. Very annoying. You want to avoid this situation.

MySQL: Table entry representing all 'ids'?

I have a single value in a table that I want selected every time a query is made on the table.
Let me break is down.
I have the following entry:
Instead of making a new entry for every different user_id, can I use some kind is primitive to represent ALL user_ids instead of specific ids? Example below:
For reasons that I would rather not take the time to explain, this is what I need. Is there any way to do this?
Thanks in advance!
If I'm correct in assuming that that means you want tag_id linked to every user_id (as some sort of a catch-all clause), you have a few ways of going about it. Depending on your application, you can simply request it to add a row for tag_id = 1 whenever you add a user. If you would, however, want to do it in a single row, well ... it kind of misrepresents the relational model.
You could, presumably use the NULL special "value" (essentially, declare it without a value) and then check in your application logic with
WHERE user_id = [uid] OR user_id IS NULL
or some such. I'd prefer keeping the relations intact with the former approach, however; you lose foreign keys (although using NULL won't violate the constraint) and similar constraints if you don't.

MySQL updating 'categories' linking table

I have a table that holds bibliography entries, with a bibID primary key. I also have a table that holds a list of categories that can be assigned to the bibliography entries with a categoryID primary key. A table links these two tables as bibID:categoryID, so that each bibID can be associated with multiple categoryIDs.
Categories associated with bibliography entries can be edited via a form with checkboxes that represent all possible categories.
What is the most efficient way to update this relationship? I could just delete all relationships from the linking table associated with an entry and then reinsert whatever the form says, but this seems inefficient.
Efficiency is a slippery term. It can mean different things to different people.
However in most cases it means "performance", so I will assume that is what you mean for now.
I suspect the reality is that this is the most efficient (performant) way.
Other methods may appear more elegant, as they will preserve existing data, and only add missing data, but they will (potentially) require more database accesses and (definitely) more complicated SQL. One database call to delete and one to add should fix you up.
The only exception may be where there are large numbers of entries and the changes are small (or negligible). In this case you may need to reconsider.