I have this table:
// votes
+----+---------+---------+
| id | user_id | post_id |
+----+---------+---------+
| 1 | 12345 | 12 |
| 2 | 12345 | 13 |
| 3 | 52344 | 12 |
+----+---------+---------+
Also this is a part of my query:
EXISTS (select 1 from votes v where u.id = v.user_id and p.id = v.post_id)
To make my query more efficient, I have added a index group on user_id and post_id:
ALTER TABLE `votes` ADD INDEX `user_id,post_id` (`user_id,post_id`)
What's my question? I also want to prevent of duplicate vote from one user to one post. So I have to create a unique index on user_id and post_id too. Now I want to know, should I create another index? or just a unique index is enough and I should remove previous index?
You do not need two indexes serving similar purpose. Only one of them would be used during a select operation, and both will have to be modified on insert, update and delete. These are unnecessary overheads. Go with the unique index, since it serves both the purposes. A range scan is almost guaranteed when using a unique indexed columns in a where clause.
EDIT :
The term for index does not matter. When you are creating an index, a B- tree structure is created, selecting a convenient root node, and rearranging column values. If all entries in the given column are going to be unique, normal index would also be of the same size as unique index, and would give same performance as unique index.
Primary index is also a unique index, with the exception that it would not allow null values.Null values are permitted in a unique index.
if you're trying to prevent multiple votes from the same user_id to the same post_id, then why don't you use a UNIQUE constraint?
ALTER TABLE votes
ADD CONSTRAINT uc_votes UNIQUE (user_id,post_id)
with regards to whether you should remove your index, you should review EXPLAIN concepts for query plan execution paths and performance. I suspect it will be better to keep them, but it will require testing.
In MySQL:
A PRIMARY KEY is a UNIQUE key.
A UNIQUE key is an INDEX.
"index" and "key" are synonyms.
Related
I am using Server version: 5.5.28-log MySQL Community Server (GPL).
I have a big table consist of 279703655 records called table A. I have to perform join on this table with one of my changelog table B and then insert matching records in new tmp table C.
B table has index on column type.
A table consist of prod_id,his_id and other columns.A table has index on both column prod_id,history_id.
When i am going to perform the following query
INSERT INTO C(prod,his_id,comm)
SELECT DISTINCT a.product_id,a.history_id,comm
FROM B as b INNER JOIN A as a ON a.his_id = b.his_id AND b.type="applications"
GROUP BY prod_id
ON DUPLICATE KEY UPDATE
`his_id` = VALUES(`his_id`);
it takes 7 to 8 min to insert records.
Even if i perform simple count from table A it took 15 min to give me count.
I have also tried a procedure to insert records in Limit but due to count query takes 15 min it is more slower then before.
BEGIN
DECLARE n INT DEFAULT 0;
DECLARE i INT DEFAULT 0;
SELECT COUNT(*) FROM A INTO n;
SET i=5000000;
WHILE i<n DO
INSERT INTO C(product_id,history_id,comments)
SELECT a.product_id,a.history_id,a.comments FROM B as b
INNER JOIN (SELECT * FROM A LIMIT i,1) as a ON a.history_id=b.history_id;
SET i = i + 5000000;
END WHILE;
End
But the above code is also take 15 to 20 min o execute.
Please suggest me how i make it faster.
Below is EXPLAIN result:
+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------------+-------------+
| 1 | SIMPLE | a | ALL | (NULL) | (NULL) | (NULL) | (NULL) | 279703655 | |
| 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 8 | DB.a.history_id | 1 | Using index |
+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------------+-------------+
(from Comment)
CREATE TABLE B (
history_id bigint(20) unsigned NOT NULL AUTO_INCREMENT,
history_hash char(32) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
type enum('products','brands','partnames','mc_partnames','applications') NOT NULL,
stamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (history_id),
UNIQUE KEY history_hash (history_hash),
KEY type (type),
KEY stamp (stamp)
);
Let's first look at the tables.
What you call table B is really a history table. Its primary key is the history_id.
What you call table A is really a product table with one product per row and product_id its primary key. Each product also has a history_id. Thus you have created a 1:n relation. A product has one history row; one history row relates to multiple products.
You are selecting the product table rows that have an 'application' type history entry. This should be written as:
select product_id, history_id, comm
from product
where history_id in
(
select history_id
from history
where type = 'applications'
);
(A join would work just as well, but isn't as clear. As there is only one history row per product, you can't get duplicates. Both GROUP BY and DISTINCT are completely superfluous in your query and should be removed in order not to give the DBMS unecessary work to do. But as mentioned: better don't join at all. If you want rows from table A, select from table A. If you want to look up rows in table B, look them up in the WHERE clause, where all criteria belongs.)
Now, we would have to know how many rows may be affected. If only 1% of all history rows are 'applications', then an index should be used. Preferably
create index idx1 on history (type, history_id);
… which finds rows by type and gets their history_id right away.
If, say 20%, of all all history rows are 'applications', then reading the table sequentially might be more efficient.
Then, how many product rows may we get? Even with a single history row, we might get millions of related product rows. Or vice versa, with millions of history rows we might get no product row at all. Again, we can provide an index, which may or may not be used by the DBMS:
create index idx2 on product (history_id, product_id, comm);
This is about as fast as it gets. Two indexes offered and a proper written query without an unnecessary join. There were times when MySQL had performance problems with IN. People rewrote the clause with EXISTS then. I don't think this is still necessary.
As of MySQL 8.0.3, you can create histogram statistics for tables.
analyze history update histogram on type;
analyze product update histogram on history_id;
This is an important step to help the optimizer to find the optimal way to select the data.
Indexes needed (assuming it is history_id, not his_id):
B: INDEX(type, history_id) -- in this order. Note: "covering"
A: INDEX(history_id, product_id, comm)
What column or combination of columns provides the uniqueness constraint that IODKU needs?
Really-- Provide SHOW CREATE TABLE.
I am doing mysql list partitioning. my table data is as below
----------------------------------------
id | unique_token | city | student_name |
----------------------------------------
1 | xyz |mumbai| sanjay |
-----------------------------------------
2 | abc |mumbai| vijay |
----------------------------------------
3 | def | pune | ajay |
----------------------------------------
In the above table unique_token column has a unique key and i want to do list partitioning with city column. As per mysql documentation every partition column must be part of every unique key of a table and hence in order to do list partitioning with city column i have to create new unique key as unique_key(unique_token,city).
Now the issue is that unique_token column should be unique and if i insert two rows in the table as ('xyz','banglore') and ('xyz','pune') then these rows will be inserted into the table but then unique_token column won't be unique at all.
I want to know how to do list partitioning on this table without having duplicate data in unique_token column??
There are limitations in MySQL's PARTITION implementation. In particular, no FOREIGN KEYs and no UNIQUE keys unless they happen to include the "partition key". These limitation exist because of the unacceptable cost of implementing them. This, in turn, is caused by each partition being essentially a separate 'table', with its own indexes. There is no "index" that spans the entire set of partitions. Such a 'global index' would make FKs and UNIQUE keys viable and efficient. This may come in version 5.8.
Meanwhile, let me change your question from "How to do LIST partitioning..." to "Why do LIST partitioning at all?". I know of no utility -- not performance, not convenience, not anything else, for PARTITION BY LIST. If you have a reason for wanting to do it, please explain. I would be happy to change my rather negative attitude toward partitioning. (I know of only 4 use cases for PARTITION BY RANGE, but that is another topic.)
Better to give composite primary key for (unique_token and city) columns
alter table table_name add constraint constraint_name primary
key(unique_token and city).
I have a set of records that I want to add a unique index to, however some existing records conflict with that index, so I want to identify them and remove them in order that the constraint can be placed on the data.
Is there a way I can write a SELECT query based around any record that contradicts the unique index?
Example:
Table has columns
id | user | question_id | response | is_current
I want a unique index such that
user | question_id | response |is_current
is not duplicated.
Is it possible to SELECT all records where that set of values is not unique?
Show non-unique:
select user,question_id,response,is_current,count(*) as theCount
from tablename
group by user,question_id,response,is_current
having theCount>1
Please note that I have asked this question on dba.stackexchange.com, but I thought I'd post it here too:
In MySQL, I have two basic tables - Posts and Followers:
CREATE TABLE Posts (
id int(11) NOT NULL AUTO_INCREMENT,
posted int(11) NOT NULL,
body varchar(512) NOT NULL,
authorId int(11) NOT NULL,
PRIMARY KEY (id),
KEY posted (posted),
KEY authorId (authorId,posted)
) ENGINE=InnoDB;
CREATE TABLE Followers (
userId int(11) NOT NULL,
followerId int(11) NOT NULL,
PRIMARY KEY (userId,followerId),
KEY followerId (followerId)
) ENGINE=InnoDB;
I have the following query, which seems to be optimized enough:
SELECT p.*
FROM Posts p
WHERE p.authorId IN (SELECT f.userId
FROM Followers f
WHERE f.followerId = 9
ORDER BY authorId)
ORDER BY posted
LIMIT 0, 20
EXPLAIN output:
+------+--------------------+-------+-----------------+--------------------+---------+---------+------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------------+-------+-----------------+--------------------+---------+---------+------------+------+--------------------------+
| 1 | PRIMARY | p | index | NULL | posted | 4 | NULL | 20 | Using where |
| 2 | DEPENDENT SUBQUERY | f | unique_subquery | PRIMARY,followerId | PRIMARY | 8 | func,const | 1 | Using index; Using where |
+------+--------------------+-------+-----------------+--------------------+---------+---------+------------+------+--------------------------+
When followerId is a valid id (meaning, it actually exists in both tables), the query execution is almost immediate. However, when the id is not present in the tables, the query only returns results (empty set) after a 7 second delay.
Why is this happening? Is there some way to speed up this query for cases where there are no matches (without having to do a check ahead of time)?
Is there some way to speed up this query ...???
Yes. You should do two things.
First, you should use EXISTS instead of IN (cross reference SQL Server IN vs. EXISTS Performance). It'll speed up the instances where there is a match, which will come in handy as your data set grows (it's may be fast enough now, but that doesn't mean you shouldn't follow best practices, and in this case EXISTS is a better practice than IN)
Second, you should modify the keys on your second table just a little bit. You were off to a good start using the compound key on (userId,followerId), but in terms of optimizing this particular query, you need to keep in mind the "leftmost prefix" rule of MySQL indices, eg
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. http://dev.mysql.com/doc/refman/5.6/en/multiple-column-indexes.html
What your Query Execution Plan from EXPLAIN is telling you is that SQL thinks it makes more sense to join Followers to Posts (using the Primary Key on Posts) and filter the results for a given followerId off of that index. Think of it like saying "Show me all the possible matches, then reduce that down to just the ones that match followerId = {}"
If you replace your followerId key with a compound key (followerId,userId), you should be able to quickly zoom in to just the user ids associated with a given followerID and do the existence check against those.
I wish I knew how to explain this better... it's kind of a tough concept to grasp until you have a "Aha!" moment and it clicks. But if you look into the leftmost prefix rules on indices, and also change the key on followerId to be a key on (followerId,userId), I think it'll speed it up quite a bit. And if you use EXISTS instead of IN, that'll help you maintain that speed even as your data set grows.
try this one :
SELECT p.*
FROM Posts p
inner join Followers f On f.userId = p.authorId
WHERE f.followerId = 9
ORDER BY posted
LIMIT 0, 20
Choice 1:
comments {commentid,replyto,comment}
//replyto will be null on many posts
Choice 2:
comments {commentid,comment}
replies {replyid, replyto, reply}
It looks like a matter of choice rather than linear benefit analysis at the moment.
The first option looks like a simple one, but the problem is that you're building a tree-structure in SQL.
and SQL does not support hierarchical data.
Not recommended - ever
TABLE comment
-------------
id unsigned integer auto_increment primary key,
reply_to unsigned integer,
comment text,
foreign key FK_comment_reply_to(reply_to) references comment.id
ON UPDATE CASCADE ON DELETE CASCADE
Recommended - if you want a tree 2 levels deep
If you build it using 2 tables
TABLE main_post
----------------
id unsigned integer auto_increment primary key,
body text,
TABLE reply
-------------
id unsigned integer auto_increment primary key,
reply_to unsigned integer,
body text,
foreign key FK_reply_reply_to(reply_to) references main_post.id
ON UPDATE CASCADE ON DELETE CASCADE
Then you are building a much simpler structure that can be easily queried in SQL because the tree is only 1 level deep.
For this reason I'd recommend choice number 2.
Alternatives for deeper trees
If you want a hierarchical structure I'd look at nested sets insteads, see:
http://www.pure-performance.com/2009/03/managing-hierarchical-data-in-sql/
In fact this is not 'only' matter of choice, but aware decision. Relational databases are not good at solving problems of hierarchical nature. There were tons of discussions, articles, and even books about that, so lets narrow the problem to your case.
The second choice would work fine ONLY if you were to allow replies to comments, and not to replies itself, thus this would be a tree with maximum 2 levels. That might be ok, but if you were to do that better solution would be to place everything with COMMENTS table, and add two columns: THREAD_ID (all the comments with the same THREAD_ID would belong to same thread), SEQ_NUM (or simply DATE would tell us which comment was first). Similar way of organising comments is implemented here on SO.
The first choice is quite simple and generic - but implements recurention with all its cons. Lets stop a bit and think... note that we are actually NOT building a tree, but a 'forest'. We will have many commen threads and every single thread will be a separate tree - relatively small amount of data to organise. In that case I would add a THREAD_ID column to COMMENTS table and use only that table (it would be also good to set an composite index on COMMENTS table containing THREAD_ID and COMMENTID columns - in exactly that order).
So upon above I would choose "choice 1".
Next decision should be about where to do the processing and comment tree construction? I would just get all the comments from the table an organise them on a controller (MVC) side, i.e. JAVA or C++. Traversing the list of comments and building the tree in Main Memory (using objects and pointers or hash tables) would be an easy thing. It is a good option also because small amount of nodes (comments and replies within one thread).
I would say it depends very much on what you're trying to achieve with this, from what I can understand if you want a max 2-level tree you should go with choice 2, if you want a deeper tree go with choice 1 with the following modification
Choice 1: comments {commentid,toplevelcommentid | thread | (whatever parent this comment and possibly other comments is/are linked to so you can easily recreate the structure afterwards),replyto,comment}
and when displaying results select everything that has commentid or toplevelcommentid equal to a value and order by commentid so you can easily recreate the structural data with a single select query
1) Queries against the TEXT table were always 3 times slower than those against the VARCHAR table (averages: 0.10 seconds for the VARCHAR table, 0.29 seconds for the TEXT table). The difference is 100% repeatable.
CREATE TABLE varcharTable (a varchar(255) NOT NULL, PRIMARY KEY (a)) ENGINE=MyISAM;
CREATE TABLE textTable (a text NOT NULL, PRIMARY KEY (a(255))) ENGINE=MyISAM;
mysql> explain SELECT SQL_NO_CACHE count(*) from varcharTable where a LIKE "n%";
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
| 1 | SIMPLE | varcharTable | range | PRIMARY | PRIMARY | 257 | NULL | 5882 | Using where; Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
1 row in set (0.00 sec)
mysql> explain SELECT SQL_NO_CACHE count(*) from T where a LIKE "n%";
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | T | range | PRIMARY | PRIMARY | 257 | NULL | 5882 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)
Index is being used for the VARCHAR table, but not for the TEXT table (in the Extra column)
2) search is not required on comments table.So, querying is not required and since its long too. Its type is preferred to be text
And then since its text you cannot search on it .So, put the comments(non-searchable and affecting performance) and replies on separate table. So, that the replies table will function good and the comments table will be kept just for storage purpose, no search performed on them.
Conclusion: So, put them the Comments table in a separate table.