In MySql, one can create an index on a (non-unique) column along with a table, e.g.
create table orders(
orderid varchar(20) not null unique,
customerid varchar(20),
index(customerid)
);
Having not found a corresponding option in Oracle, i.e. creating the index on table creation rather than as a separate command afterwards, I suspect it is not possible. Is this correct? If so, what is the reason behind this - efficiency, as for example discussed here
Insertion of data after creating index on empty table or creating unique index after inserting data on oracle? ?
Thanks in advance!
Other than indexes defined as part of a primary or unique constraint there does not appear to be a way to define an index as part of a CREATE TABLE statement in Oracle. Although the USING INDEX clause is part of the constraint-state element of the CREATE TABLE statement, a missing right parenthesis error is issued if you try to include a USING INDEX clause in any constraint definition except a PRIMARY or UNIQUE constraint - see this db<>fiddle for examples.
As to "why" - that's a question only someone on the architecture team at Oracle could answer. From my personal user-oriented point of view, I see no particular value to being able to create an index as part of the CREATE TABLE statement, but then I'm accustomed to how Oracle works and have my thought patterns oriented in that particular direction. YMMV.
The root reason for the difference is that MySQL and Oracle are two distinctly different products, developed at different times by different software engineering teams. The fact thay MySQL is owned by Oracle means nothing in this case. MySQL was a separate and separetly developed product which was subsequently purchased by Oracle. As for why the two separate and distinct design teams made the decisions they did ... you'd have to ask them. But I'm pretty certain it has nothing to do with operational efficiency as you suggest. Once a table and index are created, there is no difference between having created an index as part of the CREATE TABLE vs. creating the index separately. And so there would be no difference in efficiency of any DML on said table.
Related
I have to say that this is my first question here. Currently I have a table ´journals´ in a MySQL server with an ID autogenerate with autoincrement:
´id´ int(10) NOT NULL AUTO_INCREMENT,
-- more columns
PRIMARY KEY (´id´)
I'm thinking about modify this ID for a custom one because the client would like to generate the id in the following way:
ABC0000001, where ´ABC´ is the acronym of a department and ´0000001´ would be autoincrement. We already have different acronyms for different deparments.
I can get this ID with a trigger (a similar example: How to make MySQL table primary key auto increment with some prefix) but I have doubts about perfomance and efficiency because I could get the same "result" with the initial solution and another column to store the department´s acronym. In the backend I would create a method to unified both columns before returning the result to the frontend and another method to split the search into the table.
Does anybody have faced a similar situation that could guide me on this issue in terms of perfomance or efficiency? Thanks in advance.
Creating custom-generated ids is fun, but masochistic. What is your subconscious goal?
AUTO_INCREMENT is well optimized; anything else is somewhere between slightly and seriously less efficient. For example, a Trigger, might double the cost of an INSERT, but if INSERTs are not a performance problem, then "so what".
Semi-related: UUIDs are terribly inefficient for huge tables -- due to their randomness.
I'm helping with a Rails application, the intent is for that application to be multi-tenanted. What this means is that there will be data from multiple users/organisations in the database tables, and often the access path will be along the lines of "get me all the data for my organisation".
We're using MYSQL as the database.
Rails by default creates a primary key on the table using the id column. The id column is auto-incremented. This is nice in some ways - rows are always added at the end of the table. However, consider the following situation:
An object called foo. A foo has an id, and always has an
organisation_id
Over time each organisation creates foos in the database, these foos
are interleaved throughout the table (they are stored in id sequence)
A use case that involves listing all foos for this organisation
The problem I have is that the foos for an organisation are not located closely together in the database, in fact they're spread around very sub-optimally. Ideally I'd create a primary key of (organisation_id, id) on the table, which would result in all foos for a given organisation being side by side in the table.
Unfortunately, when I do this Rails gives me an 'Unknown primary key for table foos in model Foo' error. I think I could deal with this by using the composite keys gem to rails, but it seems like there should be some way to make this transparent at the database level.
Is there an alternate approach?
For reference, the command on the database to change my index was:
ALTER TABLE foos ADD KEY (id); # needed because the id column is auto-increment
ALTER TABLE foos DROP PRIMARY KEY, ADD PRIMARY KEY(organisation_id, id);
EDIT 1: A blog post that indicates success doing exactly this with composite_primary_keys gem. Which gives me a bit more confidence with that approach, problem is that it's from 2008, so things may have moved on. http://www.joehruska.com/?p=6
EDIT 2: Another option I was considering was partitioning instead - the number of organisations probably wouldn't exceed the maximum partitions, and I could probably group them a bit without losing too much benefit. Unfortunately, the key quote is every unique key on the table must use every column in the table's partitioning expression. (This also includes the table's primary key - from the MYSQL manual http://dev.mysql.com/doc/refman/5.6/en/partitioning-limitations-partitioning-keys-unique-keys.html.
So I'm still back needing a composite primary key again. I'm a little surprised that Rails cares so much about the primary key, rather than simply that a key is present.
If you don't want to use composite_primary_keys then you may be stuck just relying on a standard index on :organisation_id or [:organisation_id, :id]
My understanding is that Rails cares about PrimaryKeys so much because of the assumptions is makes with relationships between models. Perhaps it should be improved, you could always suggest it as a future feature.
I think I should be counted as database newbie, so read the question as a newbie question. I currently create a table, which holds environment variables for a number of hosts, like this:
create table envs (
host varchar(255),
envname varchar(255),
envvalue varchar(8192),
PRIMARY KEY(host, envname)
);
Very simple, one table holding all the data I need. Common operation is to get all the environment variables for a given host, another is to get a given environment variable for a given host, third example operation would be to get a given environment variable for all hosts and list duplicates.
Performance is not expected to be an issue, it's going to be maybe tens of hosts, dozens of variables per host, average max 1 query per second.
Now I've read that having composite primary key is not necessarily a good idea. Is this true for above use case? If it is true, how should I change the database design? If not, is the above one-table database fine for the purposes I listed above?
I don't see a problem here with the primary key. The semantics of a primary key is to uniquely identify the non-key attribute values for the key values. As I assume that for one host and one envname there is at most one envvalue the primary key makes perfect sense.
It could be that some people argue against composite primary keys because they are afraid of performance issues. However performance considerations should never influence the choice of the primary key. Many database systems automatically create an index structure for the primary key; the choice of this index structure can influence performance. However this choice can mostly be changed manually and should be done at a later point if you really have performance issues.
Your one-table design and choice of primary key is fine.
Now I've read that having composite primary key is not necessarily a good idea. Is this true for above use case?
No. Use a composite primary key on (host, envname).
If it is true, how should I change the database design?
N/A.
If not, is the above one-table database fine for the purposes I listed above?
Yes: it's known as the Entity–Attribute–Value model.
It's a bad idea, because you store unique values (host, envname) several times.
What if you were to change the hostname from srv01 to *srv01_new*? You'd have to change every ocurrence of srv01 in your table. And what if, some day, you decide you need to create a new table that holds additional information about every single host.
Now, if you change the hostname, you have to change those information as well.
To get to your question: It's not an issue of performance, but of normalization.
Databases should generally be normalized as far as possible. If you are intrigued enough, read on.
You should create one table for your hosts, having a unique id (int) as primary key and a unique (index) name as the hostname.
Your table should then only reference the id of the host, not the name. This way, your hostname is only stored once in your whole database and can be altered to whatever you desire, without breaking other tables.
If your environment names are unique, too, you should create another table for those, having the same layout as the hosts table (id, name).
Your combination table then stores the id of the host and the id of the environment, along with the value. You must of course keep the combined primary key, so every combination of host/environment is unique and easily indexable.
Then, you have a many-to-many-relationship with additional attributes and perfect normalization.
I am concerned with the performance of a database table i have to store data related
with a customer survey application.
I have a database table storing customer responses from a survey. Since the survey questions change according to customer i though instead of defining
the table schema using each questionid as column to define it as as follows
customerdata(customerid varchar,
partkey varchar,
questionkey varchar,
value, varchar,
version, int,
lastupdate, timestamp)
Where:
partkey: is the shortcode of the part (part1,part2...)
questionkey: is the shortcode of the question
e.g age, gender etc
since some customers fill the survey twice, thrice etc i have added the version column.
With this design customerid,partkey,questionkey and version are primary keys.
i am concerned about the performance with such design. Should i define the other primary keys as indexes ? Would that help ? So far for 30 customers i have 7000 records. I expect to have maximum 300-500. What do you think ?
Sounds like a pretty small database. I doubt you'll have performance issues but if you detect any when querying on partkey, questionkey, or version later on you can always add one or more indexes to solve the problem at that time. There's no need to solve a performance problem you don't have and probably never will have.
Performance issues will arise only if you have to perform time-sensitive queries that don't use the customerid field as the primary filter. I suspect you'll have some queries like that (when you want to aggregate data across customers) but I doubt they'll be time-sensitive enough to be impacted by the one second or less response time I would expect to see from such a small collection of data. If they are, add the index(es) then.
Also, note that a table only has a single PRIMARY KEY. That key can use more than one column, so you can say that columns customerid, partkey, questionkey, and version are part of the PRIMARY KEY, but you can't say their all "primary keys".
rownumber-wise, i have experienced mysql database with over 100.000 rows and it runs just fine so you should be okay.
although it's a different case if you run complicated queries, which depends more on database design rather than row numbers.
I have a question that involves database design. For an application that I am building, a certain set of unique identifiers needs to be related to a variable amount of data. The solution that I built involves two tables.
The first table has an auto incrementing primary key ID, along with two columns that are both in a unique index (which work to identify the specific set of data). The second table then references the primary key, and stores data along with this key.
Using this technique, I am able to link the two identifiers that are contained in the unique index in the first table with a variable amount of rows in the second table.
I know that this will work, but my question involves the viability of this structure. Is it poor database design to have the entire first table contained in indexes? Can anyone think of any better solution that does not involve duplicating the identifiers used in the first table?
I am using MySQL along with innodb, if it is pertinent to the question.
Is it poor database design to have the
entire first table contained in
indexes?
Not in your case. The real question should probably be, "What are the candidate keys in the second table?"
In your case, you can think of your "first" table as an enumeration of the valid values implied in a hypothetical CHECK() constraint.
Have you ever heard of domain-key normal form (DKNF)? The more familiar 3NF, BCNF, 4NF, and 5NF are special cases of DKNF.