There are four regions with more than one million records total. Should I create a table with a region column or a table for each region and combine them to get the top ranks?
If I combine all four regions, none of my columns will be unique so I will need to also add an id column for my primary key. Otherwise, name, accountId & characterId would be candidate keys or should I just add an id column anyways.
Table:
----------------------------------------------------------------
| name | accountId | iconId | level | characterId | updateDate |
----------------------------------------------------------------
Edit:
Should I look into partitioning the table by region_id?
Because all records are related to a particular region, a single database table in 3NF(e.g All-Regions) containing a regionId along with other attributes should work.
The correct answer, as usually with database design, is "It depends".
First of all, (IMHO) a good primary key should belong to the database, not to the users :)
So, if accountId and characterId are user-editable or prominently displayed to the user, they should not be used for the primary key of the table(s) anyway. And using name (or any other user-generated string) for a key is just asking for trouble.
As for the regions, try to divine how the records will be used.
Whether most of the queries will use only a single region, or most of them will use data across regions?
Is there a possibility that the schemas for different regions might diverge?
Will there be different usage scenarios for similar data? (e.g. different phone number patterns for different regions)
Bottom line, both approaches will work, let your data tell you which approach will be more manageable.
Related
We are really having a technical trouble of designing the primary keys for our new data intensive project.
Please explain us which PK design is better for our data intensive database.
The database is data intensive and persistence.
Atleast 3000 users access it per second.
Please tell us technically which type of PK is better for our database and the tables are less likely to change in the future.
1.INT/BIGINT auto increment column as PK
2.Composite keys.
3.Unique varchar PK.
I would go for option 1, using a BIGINT autoincrement column as the PK. The reason is simple, each write will write to the end of the current page, meaning inserting new rows is very fast. If you use a composite key, then you need an order, and unless you are inserting in the order of the composite key, then you need to split pages to insert, e.g. Imagine this table:
A | B | C
---+---+---
1 | 1 | 4
1 | 4 | 5
5 | 1 | 2
Where the primary key is a composite key on (A, B, C), suppose I want to insert (2, 2, 2), it would need to be inserted as follows:
A | B | C
---+---+---
1 | 1 | 4
1 | 4 | 5
2 | 2 | 2 <----
5 | 1 | 2
So that the clustered key maintains its order. If the page you are already inserting too is already full, then MySQL will need to split the page, moving some of the data to a new page to make room for the new data. These page splits are quite costly, so unless you know you are inserting sequential data then using an autoincrement column as the clustering key means that unless you mess around with the increments you should never have to split a page.
You could still add a unique index to the columns that would be the primary key to maintain integrity, you would still have the same problem with splits on the index, but since the index would be narrower than a clustered index the splits would be less frequent as more data will fit on a page.
More or less the same argument applies against a unique varchar column, unless you have some kind of process that ensures the varchar is sequential, but generating a sequential varchar is more costly than an autoincrement column, and I can see no immediate advantage.
This is not easy to answer.
To start with, using composite keys as primary keys is the straight-forward way. IDs come in handy when the database structure changes.
Say you have products in different sizes sold in different countries. Primary keys are bold.
product (product_no, name, supplier_no, ...)
product_size (product_no, size, ean, measures, ...)
product_country (product_no, country_isocode, translated_name, ...)
product_size_country (product_no, size, country_isocode, vat, ...)
It is very easy to wite data, because you are dealing with natural keys, which is what users work with. The dbms garantees data consistency.
Now the same with technical IDs:
product (product_id, product_no, name, supplier_no, ...)
product_size (product_size_id, size, product_id, ean, measures, ...)
product_country (product_country_id, product_id, country_id, translated_name, ...)
product_size_country (product_size_country_id, product_size_id, country_id, vat, ...)
To get the IDs is an additional step needed now, when inserting data. And still you must ensure that product_no is unique. So the unique constraint on product_id doesn't replace that constraint on product_no, but adds to it. Same for product_size, product_country and product_size_country. Moreover product_size_country may now link to product_country and product_size_country of different products. The dbms cannot guarantee data consistency any longer.
However, natural keys have their weakness when changes to the database structure must be made. Let's say that a new company is introduced in the database and product numbers are only unique per company. With the ID based database you would simply add a company ID to the products table and be done. In the natural key based database you would have to add the company to all primary keys. Much more work. (However, how often must such changes be made to a database. In many databases never.)
What more is there to consider? When the database gets big, you might want to partitionate tables. With natural keys, you could partition your tables by said company, assuming that you will usually want to select data from one company or the other. With IDs, what would you partition the tables by to enhance access?
Well, both concepts certainly have pros and cons. As to your third option to create a unique varchar, I see no benefit in this over using integer IDs.
I'm making a site that will be a subscription based service that will provide users several courses based on whatever they signed up for. A single user can register in multiple courses.
Currently the db structure is as follows:
User
------
user_id | pwd | start | end
Courses
-------
course_id | description
User_course_subscription
------------------------
user_id | course_id | start | end
course_chapters
---------------
course_id | title | description | chapter_id | url |
The concern is that with the user_course_subscription table I cannot (at least at the moment I don't know how) I can have one user with multiple course subscriptions (unless I enter the same user_id in multiple times with a different course_id each time). Alternatively I would add many columns in the format calculus_1 chem_1 etc., but that would give me a ton of columns as the list of courses grow.
I was wondering if having the user_id put in multiple times is the most optimal way to do this? Or is there another way to structure the table (or maybe I'd have to restructure all the tables)?
Your database schema looks fine. Don't worry, you're on the right track. As for the User_course_subscription table, both user_id and course_id form the primary key together. This is called a joint primary key and is basically fine.
Values are still unique because no user subscribes to the same course twice. Your business logic code should ensure this anyway. For the database part: You might want to look up in your database system's manual how to set joint primary keys up properly when creating the table (syntax might differ).
If you don't like this idea, you can also create a pseudo primary key, that is having:
user_course_subscription
------------------------
user_course_subscription_id | user_id | course_id | start | end
...where user_course_subscription_id is just an auto-incremented integer. This way, you can use user_course_subscription_id to identify records. This might make things easier in some places of your code, because you don't always have to use two values.
As for heaving calculus_1, chem_1 etc. - don't do this. You might want to read up on database normalization, as mike pointed out. Especially 1NF through 3NF are very common in database design.
The only reason not to follow normal forms is performance, and then again, in most cases optimization is premature. If you're concerned, stress-test the prototype of your appliation under realistic (expected) conditions and measure response times to get some hard evidence.
I don't know what's the meaning of the start and end columns in the user table. But you seem to have no redundancy.
You should check out the boyce-codd normal form wikipedia article. There is a useful example.
I'm having a hard time representing the following situation in the database:
A user can declare multiple addresses (such as Home, Office, Mailing etc. as requested by client).
I have an auto-incremented primary key called UserID that represents one user account. I've been thinking of making a BelongsToUserID column to represent each user's form field to look like:
I can't do this because each row can only be occupied by UserID row.
Any thoughts on how to achieve this?
You want a separate table holding the addresses. Perhaps something like:
| id(primary key) | type(enum home/work/etc.) | userID | address |
you can make this in two ways
first one is simple but not adviced is that you don't make any primary key and use composite key pair as the candidate key and choose primary from that. as the table is missing the primary key its not adviced
second approach is good and i also use that is to make a master table and use that as the relation-table there and use another table to actually store the data.
in master table you can have id, userid, address_bit, and in second table you can have id, address_bit, address.
please tell me any other solution if you found one. It might help me to learn new :)
I need to sell items on my fictitious website and as such have come up with a couple of tables and was wondering if anyone could let me know if this is plausible and if not where i might be able to change things?
I am thinking along the lines of;
Products table : ID, Name, Cost, mediaType(FK)
Media: Id, Name(book, cd, dvd etc)
What is confusing me is that a user might have / own many products, but how would you store an array of product id's in a single column?
Thanks
You could something like store a JSON array in a text or varchar field and let the application handle parsing it.
MySQL doesn't have a native array type, unlike say PostgreSQL, but in general I find if you're trying to store an array you're probably doing something wrong. Of course every rule has its exceptions.
What your probably want is a user table and then a table that correlates products to users. If a product is only going to relate to one user then you can add a user ID column to your Products table. If not, then you'll want another lookup table which handles the many to many relationship. It would look something like this:
------------------------
| user_id | product_id |
------------------------
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 2 |
| 3 | 1 |
| 3 | 5 |
------------------------
I think one way of storing all the products which user has in one column is to store it as a string where product ids are separated by some delimiters like comma. Though this is not the way you want to solve. The best way to solve this problem would be to have a seperate user table and than have a user product table where you associate userid with product id. You could than simple use a simple query to get list of all the products owned by a particular userid
As a starting point, try to think of the system in terms of the major parts - you would have a 'warehouse', so you need a table to list the products you have, and you are going to possibly have users who register their details with you for regular visits - so an account per user. You would generally hold all details of a single product in the same row of the same table (unless you have a really complex product to detail, but not likely). If you're going to keep track of products bought per user account, there's always the option of keeping the order history as a delimited list in a large text field. For example: date,id,id,id,id;date,id,id. Or you could simply refer to order numbers and have a separate table for orders placed [by any customer].
What is confusing me is that a user might have / own many products, but how would you store an array of product id's in a single column?
This is called a "many-to-many" relationship. In essence you would have a table for users, a table for products, and a table to map them like this:
[table] Users
- id
- name
[table] Products
- id
- name
- price
[table] Users_Products
- user_id
- product_id
Then when you want to know what products a user has, you could perform a query like:
SELECT product_id FROM Users_Products WHERE user_id=23;
Of course, user id 23 is fictituous for examples sake. The resulting recordset would contain the id's of all the products the user owns.
You wouldn't store an array of things into a single column. In fact you usually wouldn't store them in separate columns either.
You need to step away from design for a bit and go investigate third normal form. That should be you starting point and, in the vast majority of cases, your ending point for designing database schemas.
The correct way of handling variable size "arrays" is with two tables with a many to one relationship, something like:
Users
User ID (primary key)
Name
Other user info
Objects:
Object Id (primary key)
User id (foreign key, references Users(User id)
Other object info
That's the simplest form where one object is tied to a specific user, but a specific user may have any number of objects.
Where an object can be "owned" by multiple users (I say an object meaning (for example) the book "Death of a Salesman", but obviously each user has their own copy of an object), you use a joining table:
Users
User ID (primary key)
Name
Other user info
Objects:
Object Id (primary key)
User id (foreign key, references Users(User id))
Other object info
UserObjects:
User id (foreign key, references Users(User id))
Object id (foreign key, references Objects(Object id))
Count
primary key (User id, Object id)
Similarly, you can handle one or more by adding an object id to the Users table.
But, until you've nutted out the simplest form and understand 3NF, they won't generally matter to you.
Say I have the following table:
TABLE: product
============================================================
| product_id | name | invoice_price | msrp |
------------------------------------------------------------
| 1 | Widget 1 | 10.00 | 15.00 |
------------------------------------------------------------
| 2 | Widget 2 | 8.00 | 12.00 |
------------------------------------------------------------
In this model, product_id is the PK and is referenced by a number of other tables.
I have a requirement that each row be unique. In the example about, a row is defined to be the name, invoice_price, and msrp columns. (Different tables may have varying definitions of which columns define a "row".)
QUESTIONS:
In the example above, should I make name, invoice_price, and msrp a composite key to guarantee uniqueness of each row?
If the answer to #1 is "yes", this would mean that the current PK, product_id, would not be defined as a key; rather, it would be just an auto-incrementing column. Would that be enough for other tables to use to create relationships to specific rows in the product table?
Note that in some cases, the table may have 10 or more columns that need to be unique. That'll be a lot of columns defining a composite key! Is that a bad thing?
I'm trying to decide if I should try to enforce such uniqueness in the database tier or the application tier. I feel I should do this in the database level, but I am concerned that there may be unintended side effects of using a non-key as a FK or having so many columns define a composite key.
When you have a lot of columns that you need to create a unique key across, create your own "key" using the data from the columns as the source. This would mean creating the key in the application layer, but the database would "enforce" the uniqueness. A simple method would be to use the md5 hash of all the sets of data for the record as your unique key. Then you just have a single piece of data you need to use in relations.
md5 is not guaranteed to be unique, but it may be good enough for your needs.
First off, your intuition to do it in the DB layer is correct if you can do it easily. This means even if your application logic changes, your DB constraints are still valid, lowering the chance of bugs.
But, are you sure you want uniqueness on that? I could easily see the same widget having different prices, say for sale items or what not.
I would recommend against enforcing uniqueness unless there's a real reason to.
You might have something like this (obvoiusly, don't use * in production code)
# get the lowest price for an item that's currently active
select *
from product p
where p.name = "widget 1" # a non-primary index on product.name would be advised
and p.active
order-by sale_price ascending
limit 1
You can define composite primary keys and also unique indexes. As long as your requirement is met, defining composite unique keys is not a bad design. Clearly, the more columns you add, the slower the process of updating the keys and searching the keys, but if the business requirement needs this, I don't think it is a negative as they have very optimized routines to do these.