How to properly index - mysql

I'm creating a table on a database that has different poll options. There is another table with polls.
The idea is that given a poll_id I want to get as fast as possible all its options.
This are the table columns: opt_id, poll_id, opt_text, opt_votes.
I would like the opt_id not to be an auto_increment but just the id (1 to N options) within the poll, so to me the primary key is given by both the poll_id and the option_id, right?
What I want is to have a proper index so that a query such SELECT * FROM options WHERE poll_id=X takes as less as possible, but I don't know if just by setting the primary key to these two fields is enough or I have to set an index somewhere.

For SELECT * FROM options WHERE poll_id=X, INDEX(poll_id) is optimal. If you already have PRIMARY KEY(poll_id), then that is sufficient. (A PRIMARY KEY is a UNIQUE KEY which is an INDEX.)
Index Cookbook .
Please provide SHOW CREATE TABLE; there is too much hand-waving in your description of the tables.
And show us any other SELECTs; they may need other indexes.

Related

Composite primary keys vs auto increment primary key in sql workbench

I need the advice of someone who has a greeter experience.
I have an associative entity in my database, like that:
Table2-> CustomerID, ServiceID, DateSub
Since the same customer (with PK, for example 1111) can require the same service (with PK, for example 3) more than once but never in the same date , the composite PK of Table 2 can't be just (CustomerID, ServiceID).
Now I have 2 options:
1- Also "DateSub" will be a primary key, so the PK of table 2 will be (CustomerID, ServiceID, DateSub)
2- Create a specific PK for the associative entity (for example, Table2ID, and so CustomerID and Service ID will be FK)
Which of the 2 approach would you follow and why? Thank you
First of all you need to decide whether is it your requirement to make combination of CustomerID, ServiceID amd DateI column as unique? If so then you should go for firt option.
Otherwise I would go for second option.
With first option if DateI is of date data type you will not be able to insert same service for a customer twice. If it's datetime then it's doable though.
If you want to use this primary key (composite primary key) in any other table as foreign key then you need to use all three columns there too.
I tend to prefer the PK be "natural". You have 3 columns that, together, can uniquely define each row. I would consider using it.
The next question is what order to put the 3 columns in. This depends on the common queries. Please provide them.
An index (including the PK) is used only leftmost first. It may be desirable to have some secondary key(s), for efficient access to other columns. Again, let's see the queries.
If you have a lot of secondary indexes, it may be better to have a surrogate, AUTO_INCREMENT "id" as the PK. Again, let's see the queries.
If you ever use a date range, then it is probably best to have DateSub last in any index. (There are rare exceptions.)
How many rows in the table?
The table is ENGINE=InnoDB, correct?
Reminder: The PRIMARY KEY is a Unique key, which is an INDEX.
DateSub is of datatype DATE, correct?

Storing key value where key repeats and using primary keys

I am in a situation where i have to store key -> value pairs in a table which signifies users who have voted certain products.
UserId ProductID
1 2345
1 1786
6 657
2 1254
1 2187
As you can see that userId keeps on repeating and so can productId. I wanted to know what can be the best way to represent this data. Also is there a necessity of using primary key in here. I've searched a lot but am not able to find the exact specification about my problem. Any help would be appreciated. Thank you.
If you want to enforce that a given user can vote for a given product at most once, create a unique constraint over both columns:
ALTER TABLE mytable ADD UNIQUE INDEX (UserId, ProductID);
Although you can use these two columns together as a key, your app code is often simpler if you define a separate, typically auto increment, key column, but the decision to do this depends on which app code language/library you use.
If you have any tables that hold a foreign key reference to this table, and you intend to use referential integrity, those tables and the SQL used to define the relationship will also be simpler if you create a separate key column - you just end up carting multiple columns around instead of just one.

MySQL 3-way 1..n tables relation

1 database with 3 tables: user - photo - vote
- A user can have many photos.
- A photo can have many votes.
- A user can vote on many photos.
- A vote records:
. the result as an int (-1/disliked, 0/neutral, 1/liked)
. the id of the user who voted.
Here is what I have (all FKs are cascade on delete and update):
http://grab.by/iZYE (sid = surrogate id)
My question is: this doesn't seem right, and I look at this for 2 days already and can't confidently move on. How can I optimize this or am I completely wrong?
MySQL/InnoDB tables are always clustered (more on clustering here and here).
Since primary key also acts as a clustering key1, using the surrogate primary key means you are physically sorting the table in order that doesn't have a useful meaning for the client applications and cannot be utilized for querying.
Furthermore, secondary indexes in clustered tables can be "fatter" than in heap-based tables and may require double lookup.
For these reasons, you'd want to avoid surrogates and use more "natural" keys, similar to this:
({USER_ID, PICTURE_NO} in table VOTE references the same-named fields in PICTURE. The VOTE.VOTER_ID references USER.USER_ID. Use integers for *_ID and *_NO fields if you can.)
This physical model will enable extremely efficient querying for:
Pictures of the given user (a simple range scan on PICTURE primary/clustering index).
Votes on the given picture (a simple range scan on VOTE primary/clustering index). Depending on circumstances, this may actually be fast enough so you don't have to cache the sum in PICTURE.
If you need votes of the given user, change the VOTE PK to: {VOTER_ID, USER_ID, PICTURE_NO}. If you need both (votes of picture and votes of user), keep the existing PK, but create a covering index on {VOTER_ID, USER_ID, PICTURE_NO, VOTE_VALUE}.
1 In InnoDB. There are DBMSes (such as MS SQL Server) where clustering key can differ from primary.
The first thing I see is that you have duplicate unique IDs on the tables. You don't need the sid columns; just use user_id, photo_id, and photo_user_id (maybe rename this one to vote_id). Those ID columns should also be INT type, definitely not VARCHARs. You probably don't need the vote total columns on photo; you can just run a query to get the total when you need it and not worry about keeping both tables in sync.
Assuming that you will only allow one vote per user on each photo, the structure of the can be modified so the only columns are user_id, photo_id, and vote_result. You would then make the primary key a composite index on (user_id, photo_id). However, since you're using foreign keys, that makes this table a bit more complicated.

MySQL table - designing efficient table

I'm designing a db table that will save a list of user's favorited food items.
I created favorite table with the following schema
id, user_id, food_id
user_id and food_id will be foreign key linking to another table.
Im just wondering if this is efficient and scalable cause if user has multiple favorite things then it would need multiple rows of data.
i.e. user has 5 favorited food items, then it will consist of five rows to save the list for that user.
Is this efficient? and scalable? Whats the best way to optimize this schema?
thnx in advance!!!
tldr; This is called a "join table" and is the correct and scalable approach to model M-M relationships in a relational database. (Depending upon the constraints used it can also model 1-M/1-1 relationships in a "no NULL FK" schema.)
However, I contend that the id column should be omitted here so that the table is only user_id, food_id. The PK will be (user_id, food_id) in this case.
Unlike other tables, where surrogate (aka auto-increment) PKs are sometimes argued for, a surrogate PK generally only adds clutter in a join table as it has a very natural compound PK.
While the PK itself is compound in this case, each "joined" table only relates back by part of the PK. Depending upon queries performed it might also be beneficial to add covering indices on food_id or (food_id, user_id).
Eliminate Surrogate Key: Unless you have a specific reason for the surrogate key id, exclude it from the table.
Fine-tune Indexing: A this point, you just have a composite primary key that is the combination of the two foreign keys. In which order should the PK fields be?
If your application(s) predominantly execute queries such as: "for given user, give me foods", then PK should be {user_id, food_id}.
If the predominant query is "for given food, give me users", then the PK should be {food_id, user_id}.
If both query "directions" are common, add a UNIQUE INDEX that has the same fields as PK, but in opposite directions. So you'll have PK on {user_id, food_id} and index on {food_id, user_id}.
Note that InnoDB tables are clustered, which eliminates (in this case "unnecessary") table heap. Yet, the secondary index discussed above will not cause a double-lookup (since it fully covers the query), nor have a hidden overhead of PK fields (since it indexes the same fields as PK, just in opposite order).
For more on designing a junction table, take a look at this post.
To my opinion, you can optimize your table in the following ways:
As a relation table with 2 foreighkeys you don't have to use "id" field.
use "innodb" engine to your table
name your relation table "user_2_food", which will make it more clear.
try to use datatype as small as possible, i.e. "smallint" is better than "int", and don't forget "UNSIGNED" attribute.
Creating the below three Tables will result in an efficient design.
users : userId, username, userdesc
foods : foodId, foodname, fooddesc
userfoodmapping : ufid, userid, foodid, rowstate
The significance of rowstate is, if the user in future doesn't like that food, its state will become -1
You have 2 options in my opnion:
Get rid of the ID field, but in that case, make both your other keys (combined) your primary key
Keep your ID key as the primary key for your table.
In either case, I think this is a proper approach. Once you get into a problem of inefficiency, then you will look at probably how to load part of the table or any other technique. This would do for now.

Which has better performance?

I am thinking of database schema for post and its comments, in context of a social networking application and im wandering which of these two would give better performance:
I am storing comments of a post in "Comments" Table and posts in the "Posts" Table.
Now my schema for the comments table looks like this:
postId commentId postedBy Date CommentBody
Since in order to retrieve the comments of a post I would be required to search all posts whose postId matches postId of this specific post and even my postId could not become primary key since the postId would be non unique within the column(since several comments for a single post), therefore I was thinking if I could merge postId and commentId into one single commentId (this becomes primary key) using which postId could also be retrieved. This is how I am thinking:
CommentId would be generated as postId*100+i (where i is the ith comment on the post)
thus in order to retrieve comments for a post(say with postId=8452 ) I would search all posts with commentId(that would be primary key), lying between 845200 & 845299.. instead of searching all comments with postId=8452.. (of course this limits the maximum no of comments to 100). But will this lead to any performance gains?
Here's what you do. Load up a database with representative data at (for example) twice the size you ever expect it to get.
Then run your queries and test them against both versions of the schema.
Then, and this is the good bit, retest this every X weeks with new up-to-date data to ensure the situation hasn't changed.
That's what being a DBA is all about. Unless your data will never change, database optimisation is not a set-and-forget operation. And the only way to be sure is to test under representative conditions.
Everything else is guesswork. Educated guesswork, don't get me wrong, but I'd rather have a deterministic answer in preference to anyone's guess, especially since the former will adapt to changes.
My favorite optimisation mantra is "Measure, don't guess!"
I'd recommend:
Use two-table structure with composite key in comments for best uniquness in index.
100 comments per article is a bad limition that may hit you in the back.
Dont use different tables for comments regarding video/pictures etc.
If huge amounts of comments, add an comment-archive table and move old comments
there. Most requested comments (newest) will have a smaller and more efficient table.
Do save blobs (pictures and videos) on different partition and not in db. Db will be smaller and less fragmented at file level.
regards,
/t
If you gonna get big volume you should make a table Post and a table Comments in order to have smaller table :). And don't forget to use index and partitions on them.
Use a composite key. Or, if you're using some framework that only allows single-column keys, a secondary index on postId
If CommendId is not unique, you can create a composite PRIMARY KEY on (postId, CommentID):
CREATE TABLE Comment
(
postId INT NOT NULL,
commentId INT NOT NULL,
…,
PRIMARY KEY (postId, commentId)
)
If your table is MyISAM, you can mark commentId as AUTO_INCREMENT, which will assign it with a per-post UNIQUE incrementing value.
If it is unique, you can create a PRIMARY KEY on CommentId and create a secondary index on (PostId, CommentId):
CREATE TABLE Comment
(
commentId INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
postId INT NOT NULL,
…,
KEY (postId, commentId)
)
CommentId would be generated as postId*100+i (where i is the ith comment on the post)
thus inorder to retrieve comments for a post(say with postId=8452 ) I would search all posts with commentId(that would be primary key), lying between 845200 & 845299.. instead of searching all comments with postId=8452.. (ofcourse this limits the maximum no of comments to 100). But will this lead to any performance gains ??
This will likely give much worse performance than a query based on a postId foreign key column, but the only way to be sure is to try both techniques (as suggested by paxdiablo) and measure the performance.