using index with mysql table

using index with mysql table - mysql

my mysql database have a table with 3 columns ,
its strucure :
CREATE TABLE `Table` (
`value1` VARCHAR(50) NOT NULL DEFAULT '',
`value2` VARCHAR(50) NOT NULL DEFAULT '',
`value3` TEXT NULL,
`value4` VARCHAR(50) NULL DEFAULT NULL,
`value5` VARCHAR(50) NULL DEFAULT NULL,
PRIMARY KEY (`value1`, `value2`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
ROW_FORMAT=DEFAULT
the first and the second columns are :
varchar(50)
and they both combine the primary key
the third column is
text ,
the table contain about 1000000 records i doing my search using the first column it take minutes
to search a specific item.
how can i index this table to fast my search and what index type to use ?

A primary key of 50+50 characters? What does it contain? Are you should that the table is in 3rd normal form? It sounds that the key itself might contain some information, sounds like an alarm bell to me.
If you can change your primary key with something else much shorter and manageable, there are a few things you can try:
externalise text3 to a different table, matched by the new primary key
analyse your table to determine a more optimised length, rather than 50 chars with SELECT FROM xcve_info PROCEDURE ANALYSE()
change the size of the fields accordingly and if you can afford the extra space change VARCHAR to CHAR
add an index to value1, which probably shouldn't be part of the primary key
Always check the performance of the changes, to see if they were worth it or not.

What is the actual query you're executing? The index will only help if you're searching for a prefix (or exact) match. For example:
SELECT * FROM Table WHERE value1='Foo%'
will find anything that starts with Foo, and should use the index and be relatively quick. On the other hand:
SELECT * FROM Table WHERE value1='%Foo%'
will not use the index and you'll be forced to do a full table scan. If you need to do that, you should use a full-text index and query: http://dev.mysql.com/doc/refman/5.5/en/fulltext-search.html

The only thing I can see that might possibly improve things would be to add a unique index to the first column. This obviously does not work if the first column is not actually unique, and it is questionable if it would be at all more efficient than the already existing primary key. The way I thought this might possibly help is if the unique index on the first column was smaller than the primary key (index scans would be quicker).
Also, you might be able to create an index on parts of your first column, maybe only the 5 or 10 first characters, that could be more efficient.
Also, after deleting and/or inserting lots of values, remember to run ANALYZE TABLE on the affected table, or even OPTIMIZE TABLE. That way, the stats for the MySQL query optimizer are updated.

Always is a bad idea to use such a long strings as indexes, but in case you really need to search it that way consider how are you filtering the query because MySQL can't perform like operations on indexes, so conditions like WHERE value1 LIKE "%mytext%" will never use indexes, instead try searching a shorter string so MySQL can convert that operation into a equality one. For example, use: value1 = "XXXXX" where "XXXXX" is a part of the string. To determine the best length of the comparision string analize the selectivity of your value1 field.
Consider too that multiple field indexes like (value1, value2) won't use the second field unless the first matches exactly. That it's not a bad index, is just so you know and understand how it works.
If that doesn't works, another solution could be store value1 and value2 in a new table (table2 for example) with an auto incremental id field, then add a foreign key from Table to table2 using ids (f.e. my_long_id) and finally create an index on table2 like: my_idx (value1, value2). The search will be something like:
SELECT t1.*
FROM
table2 as t2
INNER JOIN Table as t1 ON (t1.my_long_id = t2.id)
WHERE
t2.value1 = "your_string"
Ensure that table2 has an index like (value1, value2) and that Table has a primary index on (my_long_id).
As final recommendation, add an 'id' field with AUTO_INCREMENT as PRIMARY KEY and (value1, values2) as a unique/regular key. This helps a lot because B-Tree stores sorted indexes, so using a string of 100 chars makes you waste I/O in this sorting. InnoDB determines the best position for that index at insert, probably it will need to move some indexes to another pages in order to get some space for the new one. With an auto incremental value this is easier and cheaper because it will never need to do such movements.

But why are you searching for a unique item on a non-unique column? Why can't you make queries based on your primary key? If for some reason you cannot then I would index value1, the column you are searching on.
CREATE INDEX 'index_name'
ON 'table' (column_name)

Related

MySQL : Using EXPLAIN keyword to know about indexing. (Specific Use Case)

This is my table structure:
CREATE TABLE `channel_play_times_bar_chart` (
`playing_date` datetime NOT NULL,
`channel_report_tag` varchar(50) NOT NULL,
`country_code` varchar(50) NOT NULL,
`device_report_tag` int(11) NOT NULL,
`greater_than_30_minutes` decimal(10,0) NOT NULL,
`15_to_30_minutes` decimal(10,0) NOT NULL,
`0-15_minutes` decimal(10,0) NOT NULL,
PRIMARY KEY (`country_code`,`device_report_tag`,`channel_report_tag`,`playing_date`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
When I run the following query:
EXPLAIN EXTENDED
SELECT
channel_report_tag,
SUM(`greater_than_30_minutes`) AS '>30 minutes',
SUM(`15_to_30_minutes`) AS '15-30 Minutes',
SUM(`0-15_minutes`) AS '0-15 Minutes'
FROM
channel_play_times_bar_chart USE INDEX (ABCDE)
WHERE country_code = 'US'
AND device_report_tag = 14
AND channel_report_tag = 'DUNYA NEWS'
AND playing_date BETWEEN '2016-09-01'
AND '2016-09-13'
GROUP BY channel_report_tag
ORDER BY SUM(`greater_than_30_minutes`) DESC
LIMIT 10
This is the output I get ( open it in another tab):
The index was defined as :
CREATE INDEX ABCDE
ON channel_play_times_bar_chart (
`country_code`,
`device_report_tag`,
`channel_report_tag`,
`playing_date`,
`greater_than_30_minutes`
)
I am a bit confused here ; The key column shows ABCDE being used the as the index , yet ref column shows NULL. What does this mean ? Is the index actually being used ? If not what did I do wrong ?

It is using the key you are showing in the create index, that is, ABCDE.
It would be nice if you did a
show create table channel_play_times_bar_chart
and just showed it all at once. That key might not be of much use to you as it replicates most of what your rather wide Primary Key already gives you.
Once the query uses the key up thru the 3rd segment of the composite key, it resumes with a WHERE range on playing_date in that composite and finds 8 rows.
Note EXPLAIN is an estimate.
Further, I would reconsider the strategy for your PRIMARY KEY (PK) ideas especially considering that you decided to dupe it up more or less with the creation of ABCDE. That means you are maintaining two indexes with little if anything gained on the second one (you added one column to the secondary index ABCDE).
The PK It is rather WIDE (118 bytes I believe). It dictates the physical ordering. And that idea could easily be a bad one if used throughout the way you architect things. Changes made to data via UPDATE that impact the columns in the PK force a reshuffle of physical ordering of the table. That fact would be a good indication why id INT AUTO_INCREMENT PRIMARY KEY is often used as a best practice use case as it never endures a reshuffle and is THIN (4 bytes).
The width of keys and their strategy with referencing (other) tables (in Foreign Key Constraints) impact key sizes and performance for lookups. Wide keys can measurably slow down that process.
This is not to suggest that you shouldn't have a key on those columns like your secondary index ABCDE. But in general that is not a good idea for the PK.
Note that it could be argued that ABCDE never gives you any benefit over your PK due to range queries ceasing the use of it near the end that just WHERE out with ranges once it hits the date. Just a thought.
A nice read and rather brief is the article Using EXPLAIN to Write Better MySQL Queries.

Your query does use the ABCDE index. As MySQL documentation on EXPLAIN Output Format explains :) (bolding is mine):
key (JSON name: key)
The key column indicates the key (index) that MySQL actually decided
to use. If MySQL decides to use one of the possible_keys indexes to
look up rows, that index is listed as the key value.
The ref field of the explain output is primarily used when joins are present to show the fields / constants / expressions the index was compared with.

is there any difference with joint primary key order?

I am curious about that , is there any difference with joint primary key order?
For example, is there any difference between the two tables' primary key? the key order would make no difference on the table?
CREATE TABLE `Q3` (
`user_id` VARCHAR(20) NOT NULL,
`retweet_id` VARCHAR(20) NOT NULL,
PRIMARY KEY (`user_id`,`retweet_id`)
)
vs
CREATE TABLE `Q3` (
`user_id` VARCHAR(20) NOT NULL,
`retweet_id` VARCHAR(20) NOT NULL,
PRIMARY KEY (`retweet_id`,`user_id`)
)

It would make difference in an index structure.
In composite index the index value consists of several values that go one after another. And the order determines what queries can be optimized using this particular index.
IE:
For the index created as
PRIMARY KEY (`user_id`,`retweet_id`)
The query like WHERE user_id = 42 will be optimized (not guaranteed, but technically possible), whereas for the query WHERE retweet_id = 4242 it won't be.
PS: it's a good idea to always have an artificial primary key, like a sequence (or an autoincrement column in case of mysql), instead of using natural primary keys. It would be better because the primary key is a clustered key, which means it defines how rows are physically stored in pages on disk. Which means it's a good idea for a PK to be monotonously growing (or decreasing, doesn't matter)

The order does affect how the index is used in queries. When you use multiple columns, each column is a sub-tree of the preceding column.
In your first case (user_id, retweet_id) - if you searched the index for user_id 1, you then have all the retweet_ids under that.
Subsequently if you wish to search for only retweet_id=7 (for all users) - the index cannot be used because you need to first step through each users item in the index.
So if you wish to query for user_id, or retweet_id individually (without the other), put that column first. If you need both you could consider adding a secondary index.
There are also limitations for range scans, you can only effectively use the last column queried for the range scan. You can read more about all of this here:
http://dev.mysql.com/doc/refman/5.6/en/multiple-column-indexes.html
Additionally if using InnoDB, the tables are stored in order of the PRIMARY KEY. This might matter for performance depending on how you query your data.

Comma separated list on MySQL database

I am implementing a friends list for users in my database, where the list will store the friends accountID.
I already have a similar structure in my database for achievements where I have a separate table that has a pair of accountID to achievementID, but my concern with this approach is that it is inefficient because if there are 1 million users with 100 achievements each there are 100million entries in this table. Then trying to get every achievement for a user with a certain accountID would be a linear scan of the table (I think).
I am considering having a comma separated string of accountIDs for my friends list table, I realize how annoying it will be to deal with the data as a string, but at least it would be guaranteed to be log(n) search time for a user with accountID as the primary key and the second column being the list string.
Am I wrong about the search time for these two different structures?

MySQL can make effective use of appropriate indexes, for queries designed to use those indexes, avoiding a "scan" operation on the table.
If you are ALWAYS dealing with the complete set of achievements for a user, retrieving the entire set, and storing the entire set, then a comma separated list in a single column can be a workable approach.
HOWEVER... that design breaks down when you want to deal with individual achievements. For example, if you want to retrieve a list of users that have a particular achievement. Now, you're doing expensive full scans of all achievements for all users, doing "string searches", dependent on properly formatted strings, and MySQL is unable to use an index scan to efficiently retrieve that set.
So, the rule of thumb, if you NEVER need to individually access an achievement, and NEVER need to remove an achievement from user in the database, and NEVER need to add an individual achievement for a user, and you will ONLY EVER pull the achievements as an entire set, and only store them as an entire set, in and out of the database, the comma separated list is workable.
I hesitate to recommend that approach, because it never turns out that way. Inevitably, you'll want a query to get a list of users that have a particular achievement.
With the comma separated list column, you're into some ugly SQL:
SELECT a.user_id
FROM user_achievement_list a
WHERE CONCAT(',',a.list,',') LIKE '%,123,%'
ugly in the sense that MySQL can't use an index range scan to satisfy the predicate; MySQL has to look at EVERY SINGLE list of achievements, and then do a string scan on each and every one of them, from the beginning to the end, to find out if a row matches or not.
And it's downright excruciating if you want to use the individual values in that list to do a join operation, to "lookup" a row in another table. That SQL just gets horrendously ugly.
And declarative enforcement of data integrity is impossible; you can't define any foreign key constraints that restrict the values that are added to the list, or remove all occurrences of a particular achievement_id from every list it occurs in.
Basically, you're "giving up" the advantages of a relational data store; so don't expect the database to be able to do any work with that type of column. As far as the database is concerned, it's just a blob of data, might as well be .jpg image stored in that column, MySQL isn't going to help with retrieving or maintaining the contents of that list.
On the other hand, if you go with a design that stores the individual rows, each achievement for each user as a separate row, and you have an appropriate index available, the database can be MUCH more efficient at returning the list, and the SQL is more straightforward:
SELECT a.user_id
FROM user_achievements a
WHERE a.achievement_id = 123
A covering index would be appropriate for that query:
... ON user_achievements (achievement_id, user_id)
An index with user_id as the leading column would be suitable for other queries:
... ON user_achievements (user_id, achievement_id)
FOLLOWUP
Use EXPLAIN SELECT ... to see the access plan that MySQL generates.
For your example, retrieving all achievements for a given user, MySQL can do a range scan on the index to quickly locate the set of rows for the one user. MySQL doesn't need to look at every page in the index, the index is structured as a tree (at least, in the case of B-Tree indexes) so it can basically eliminate a whole boatload of pages it "knows" that the rows you are looking for can't be. And with the achievement_id also in the index, MySQL can return the resultset right from the index, without a need to visit the pages in the underlying table. (For the InnoDB engine, the PRIMARY KEY is the cluster key for the table, so the table itself is effectively an index.)
With a two column InnoDB table (user_id, achievement_id), with those two columns as the composite PRIMARY KEY, you would only need to add one secondary index, on (achievement_id, user_id).
FOLLOWUP
Q: By secondary index, do you mean a 3rd column that contains the key for the composite (userID, achievementID) table. My create table query looks like this
CREATE TABLE `UserFriends`
(`AccountID` BIGINT(20) UNSIGNED NOT NULL
,`FriendAccountID` BIGINT(20) UNSIGNED NOT NULL
,`Key` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT
, PRIMARY KEY (`Key`)
, UNIQUE KEY `AccountID` (`AccountID`, `FriendAccountID`)
);
A: No, I don't mean the addition of a third column. If the only two columns in the table are the foreign keys to another table (looks like they refer to the same table, and the columns are both NOT NULL and there is a UNIQUE constraint on the combination of the columns... and there are no other attributes on the table, I would consider not using a surrogate as the primary key at all. I would make the UNIQUE KEY the PRIMARY KEY.
Personally, I would be using InnoDB, with the innodb_file_per_table option enabled. And my table definition would look something like this:
CREATE TABLE user_friend
( account_id BIGINT(20) UNSIGNED NOT NULL COMMENT 'PK, FK ref account.id'
, friend_account_id BIGINT(20) UNSIGNED NOT NULL COMMENT 'PK, FK ref account.id'
, PRIMARY KEY (account_id, friend_account_id)
, UNIQUE KEY user_friend_UX1 (friend_account_id, account_id)
, CONSTRAINT FK_user_friend_user FOREIGN KEY (account_id)
REFERENCES account (id) ON UPDATE CASCADE ON DELETE CASCADE
, CONSTRAINT FK_user_friend_friend FOREIGN KEY (friend_account_id)
REFERENCES account (id) ON UPDATE CASCADE ON DELETE CASCADE
) Engine=InnoDB;

MySQL REPLACE in an auto incremented row

Let say I have a MySQL table which contains three columns: id, a and b and the column named id is an AUTO INCREMENT field. If I pass a query like the following to MySQL, it will works fine:
REPLACE INTO `table` (`id`, `a`, `b`) VALUES (1, 'A', 'B')
But if I skip the field id it will no longer works, which is expected.
I want to know if there is a way to ignore some fields in the REPLACE query. So the above query could be something like this:
REPLACE INTO `table` (`a`, `b`) VALUES ('A', 'B')
Why do I need such a thing?
Sometimes I need to check a database with a SELECT query to see if a row exists or not. If it is exists then I need to UPDATE the existing row, otherwise I need to INSERT a new row. I'm wondering if I could achieve a similar result (but not same) with a single REPLACE query.
Why it couldn't be the same result? Simply because REPLACE will DELETE the existing row and will INSERT a new row, which will lose the current primary key and will increase the auto incremented values. In contrast, in an UPDATE query, primary key and the AI fields will be untouched.
MySQL REPLACE.

That's not how you're supposed to use replace.
use replace only when you know primary key values.
Manual:
Note that unless the table has a PRIMARY KEY or UNIQUE index, using a
REPLACE statement makes no sense. It becomes equivalent to INSERT,
because there is no index to be used to determine whether a new row
duplicates another.

What if you have multiple rows that match the fields?
Consider adding a key that you can match on and use INSERT IGNORE.. ON DUPLICATE KEY UPDATE. The way INSERT IGNORE works is slightly different from REPLACE.
INSERT IGNORE is very fast but can have some invisible side effects.
INSERT... ON DUPLICATE KEY UPDATE
Which has fewer side effects but is probably much slower, especially for MyISAM, heavy write loads, or heavily indexed tables.
For more details on the side effects, see:
https://stackoverflow.com/a/548570/1301627
Using INSERT IGNORE seems to work well for very fast lookup MyISAM tables with few columns (maybe just a VARCHAR field).
For example,
create table cities (
city_id int not null auto_increment,
city varchar(200) not null,
primary key (city_id),
unique key city (city))
engine=myisam default charset=utf8;
insert ignore into cities (city) values ("Los Angeles");
In this case, repeatedly re-inserting "Los Angeles" will not result in any actual changes to the table at all and will prevent a new auto_increment ID from being generated, which can help prevent ID field exhaustion (using up all the available auto_increment range on heavily churned tables).
For even more speed, use a small hash like spooky hash before inserting and use that for a separate unique key column and then the varchar won't get indexed at all.

What kind of indexing to use for a varchar field where I only need to check if something exist?

Here is my current table;
CREATE TABLE `linkler` (
`link` varchar(256) NOT NULL,
UNIQUE KEY `link` (`link`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
I will only use these 2 queries on the table, SELECT EXISTS (SELECT 1 FROM linkler WHERE link = ?) and INSERT INTO linkler (link) VALUES (?)
I don't know much about indexing databases. Since I won't be adding same thing twice, I thought marking it unique would be a good idea. Is there anything I can do to increase performance? For example, can I do something that rows are always sorted so that mysql can do binary search or something similiar?

Adding a unique index is perfect. Also, since you have a unique index, you don't need to check for existence before you do an insert. You can simply use INSERT IGNORE to insert the row if it doesn't exist (or ignore the error if it does):
INSERT IGNORE INTO linkler (link) VALUES (?)
Whether that will be faster than doing a SELECT/INSERT combination depends on how often you expect to have duplicates.
ETA: If that is the only column in this table, you might want to make it a PRIMARY KEY instead of just a UNIQUE KEY although I don't think it really matters much other than for clarity.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008