MySQL - How to optimize table for speed - mysql

First of all, I am still very new to PHP / mySQL so excuse my question if it is too simple for you :)
Here is my issue: I am currently working on storing a lot of data into a mysql database. It's basically a directory like archive for my son's school.
The structure is basically:
id, keyword, title, description, url, rank, hash
id is int 11
keyword is varchar 255
title is varchar 255
description is text
url is varchar 255
rank is int 2
hash is varchar 50
We plan to insert about 10 million rows containing the fields above and my mission is being able to query the database as fast as possible.
My query is always for an exact keyword.
For example:
select * from table where keyword = "keyword" limit 10
I really just need to query the keyword and not the title or description or anything else. There are a maximum of 10 results for each keyword - never more.
I have no real clue about mysql indexes and stuff but I read that it can improve speed if you have some indexes.
Now here is where I need help from a Pro. My mission is being able to run the fastest possible query, so it doesn't take too long to query the database. Since I am only looking up the keyword field, I am sure there is a way to make sure that even if you have millions of rows, that the results can be returned quickly.
What would you suggest that I should do. Should I set the keyword field to INDEX or do I have to watch anything else? Since I have no real clue about INDEXES, your help is appreciated, meaning I don't know if I should use indexes at all, or if I have to use them for everything like keyword, title, description and so on...
The database is updated frequently - in case it matters.
Do you think it's even possible to store millions of rows and doing a query in less than a second?
Any other suggestions such as custom my.cnf settings etc would be also helpful.
Your help is greatly appreciated.

Your intuition is correct - if you are only filtering on keyword in your WHERE clause, it should be indexed and you likely will see some execution speed improvement if you do.
CREATE INDEX `idx_keyword` ON `yourtable` (`keyword`)
You may be using a client like PHPMyAdmin which makes index creation easier than execution commands, but review the MySQL documentation on CREATE INDEX. Yours is a very run-of-the-mill case, so you won't need any special options.
Although this isn't the case for you (as you said there would be up to 10 rows per keyword), if you already had a unique constraint or PRIMARY KEY or FOREIGN KEY defined on keyword, it would function as an index as well.

Add an index on the keyword column. It will increase the speed significantly. Then it should be no problem to query the data in milliseconds.
In generel you should put an index on fields you are using in your where clause. That way the DB can limit the data really fast and return the results.

Related

Is this strategy for fast substring search in MySQL fast enough?

I have a USER table with millions of rows. I am implementing a search function that allows someone to look for a user by typing in a username. This autocomplete feature needs to be blazingly fast. Given that, in MySQL, column indexes speed up queries using LIKE {string}%, is the following approach performant enough to return within 200ms? (Note: Memory overhead is not an issue here, username are maximum 30 characters).
Create a USERSEARCH table that has a foreign key to the user table and an indexed ngram username column:
USERSEARCH
user_id username_ngram
-------------------------
1 crazyguy23
1 razyguy23
1 azyguy23
1 zyguy23
...
The query would then be:
SELECT user_id FROM myapp.usersearch WHERE username_ngram LIKE {string}%
LIMIT 10
I am aware that third party solutions exist, but I would like to stay away from them at the moment for other reasons. Is this approach viable in terms of speed? Am I overestimating the power of indexes if the db would need to check all O(30n) rows where n is the number of users?
Probably not. The union distinct is going to process each subquery to completion.
If you just want arbitrary rows, phrase this as:
(SELECT user_id
FROM myapp.usersearch
WHERE username_1 LIKE {string}%
LIMIT 10
) UNION DISTINCT
(SELECT user_id
FROM myapp.usersearch
WHERE username_2 LIKE {string}%
LIMIT 10
)
LIMIT 10;
This will at least save you lots of time for common prefixes -- say 'S'.
That said, this just returns an arbitrary list of 10 user_ids when there might be many more.
I don't know if the speed will be fast enough for your application. You have to make that judgement by testing on an appropriate set of data.
Assuming SSDs, that should be blazing fast, yes.
Here are some further optimizations:
I would add a DISTINCT to your query, since there is no point in returning the same user_id multiple times. Especially when searching for a very common prefix, such as a single letter.
Also consider searching only for at least 3 letters of input. Less tends to be meaningless (since hopefully your usernames are at least 3 characters long) and is a needless hit on your database.
If you're not adding any more columns (I hope you're not, since this table is meant for blazing fast search!), we can do better. Swap the columns. Make the primary key (username_ngram, user_id). This way, you're searching directly on the primary key. (Note the added benefit of the alphabet ordering of the results! Well... alphabetic on the matching suffixes, that is, not the full usernames.)
Make sure you have an index on user_id, to be able to replace everything for a user if you ever need to change a username. (To do so, just delete all rows for that user_id and insert brand new ones.)
Perhaps we can do even better. Since this is just for fast searching, you could use an isolation level of READ_UNCOMMITTED. That avoids placing any read locks, if I'm not mistaken, and should be even faster. It can read uncommitted data, but so what... Afterwards you'll just query any resulting user_ids in another table and perhaps not find them, if that user was still being created. You haven't lost anything. :)
I think you nedd to use mysql full text index to improve performance.
You need to change your syntax to use your full text index.
Create full text index:
CREATE FULLTEXT INDEX ix_usersearch_username_ngram ON usersearch(username_ngram);
The official mysql documentation how to use full text index: https://dev.mysql.com/doc/refman/8.0/en/fulltext-search.html

How can I improve the response time on my query when using ibatis/spring/mysql?

I have a database with 2 tables, I must run a simple query `
select *
from tableA,tableB
where tableA.user = tableB.user
and tablea.email LIKE "%USER_INPUT%"
Where user_input is a part of the string of tablea.email that has to match.
The problem:
The table will carry about 10 million registers and its taking a while, the cache of ibatis (as far as I know) will be used only if the previous query looks the same. example: for USER_INPUT = john_doe if the second query is john_doe again the cache willt work, but if is john_do will not work(that is, as I said, as far as I know).
current, the tableA structure is like this:
id int(11) not_null auto_increment
email varchar(255)not_null
many more fields...
I dont know if email , a varchar of 255 might be too long and could take longer time because of that, if I decrease it to 150 characters for example, would the response time will be shorter?
Right now the query is taking too long... I know I could upgrade to more memory to the servers but I would like to know if there is other way to improve this code.
tableA and tableB have about 30 fields each and they are related by the ID on a relational schema.
Im going to create an index for tableA.email.
Ideas?
I'd recommend running an execution plan for that query in your DB. That'll tell how the DB plans to execute your query, and what you're looking for is something like a "full table scan". I'd guess you'll see just that, due to the like clause, and an index the email field won't help that part.
If you need to search by substrings of email addresses you might want to consider the granularity of how you store your data. For example, instead of storing email addresses in a single field as usual you could split them into two fields (or maybe more), where everything before the '#' is in one field and the domain name is in another. Then you could search by either component without needing a like and then indexes would significantly speed things up significantly. For example, you could do this to search:
WHERE tableA.email_username = 'USER_INPUT' OR tableA.email_domain = 'USER_INPUT'
Of course you then have to concatenate the two fields to recreate the email address, but I think iBatis will let you add a method to your data object to do that in a single place instead of all over your app (been a while since I used iBatis, though, so I could be wrong).
MySQL cannot utilize indexes on LIKE queries where the wildcard precedes the search string (%query).
You can try a Full-Text search instead. You'll have to add a FULLTEXT index to your email column:
ALTER TABLE tablea
ADD FULLTEXT(email);
From there you can revise your query
SELECT *
FROM tableA,tableB
WHERE tableA.user = tableB.user
AND MATCH (tablea.email) AGAINST ('+USER_INPUT' IN BOOLEAN MODE)
You'll have to make sure you can use full text indexes.
Full-text indexes can be used only with MyISAM tables. (In MySQL 5.6 and up, they can also be used with InnoDB tables.)
http://dev.mysql.com/doc/refman/5.5/en/fulltext-search.html

what is mysql indexing and how do you create an index?

Okay, mysql indexing. Is indexing nothing more than having a unique ID for each row that will be used in the WHERE clause?
When indexing a table does the process add any information to the table? For instance, another column or value somewhere.
Does indexing happen on the fly when retrieving values or are values placed into the table much like an insert or update function?
Any more information to clearly explain mysql indexing would be appreciated. And please dont just place a link to the mysql documentation, it is confusing and it is always better to get a personal response from a professional.
Lastly, why is indexing different from telling mysql to look for values between two values. For Example: WHERE create_time >= 'AweekAgo'
I'm asking because one of my tables is 220,000+ rows and it takes more than a minute to return values with a very simple mysql select statement and I'm hoping indexing will speed this up.
Thanks in advanced.
You were down voted because you didn't make effort to read or search for what you are asking for. A simple search in google could have shown you the benefits and drawbacks of Database Index. Here is a related question on StackOverflow. I am sure there are numerous questions like that.
To simplify the jargons, it would be easier to locate books in a library if you arrange the in shelves numbered according to their area of specialization. You can easily tell somebody to go to a specific location and pick the book - that is what index does
Another example: imagine an alphabetically ordered admission list. If your name start with Z, you will just skip A to Y and get to Z - faster? If otherwise, you will have to search and search and may not even find it if you didn't look carefully
A database index is a data structure that improves the speed of operations in a table. Indexes can be created using one or more columns, providing the basis for both rapid random lookups and efficient ordering of access to records.
You can create an index like this way :
CREATE INDEX index_name
ON table_name ( column1, column2,...);
You might be working on a more complex database, so it's good to remember a few simple rules.
Indexes slow down inserts and updates, so you want to use them carefully on columns that are FREQUENTLY updated.
Indexes speed up where clauses and order by.
For further detail, you can read :
http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
http://www.tutorialspoint.com/mysql/mysql-indexes.htm
There are a lot of indexing, for example a hash, a trie, a spatial index. It depends on the value. Most likely it's a hash and a binary search tree. Nothing really fancy because most likely the fancy thing is expensive.

Should I create an Index on my table?

I've a table with 7 columns, I've on primary on first column, another index (foreign key).
My app does:
SELECT `comment_vote`.`ip`, `comment_vote`.`comment_id`, COUNT(*) AS `nb` FROM `comment_vote`
SELECT `comment_vote`.`type` FROM `comment_vote` WHERE (comment_id = 123) AND (ip = "127.0.0.1")
Is it worth to add an index on ip column? it is often used in my select query.
By the way is there anything I can do to quick up those queries? Sometimes it tooks a long time and lock the table preventing other queries to run.
If you are searching by ip quite often then yes you can create an index. However your insert/updates might take a bit longer due to this. Not sure how your data is structured but if the data collection is by ip then may be you can consider partitioning it by ip.
A good rule of thumb: If a column appears in the WHERE clause, there should be an index for it. If a query is slow, there's a good chance an index could help, particularly one that contains all fields in the WHERE clause.
In MySQL, you can use the EXPLAIN keyword to see an approximate query plan for your query, including indexes used. This should help you find out where your queries spend their time.
Yes, do create an index on ip if you're using it in other queries.
This one uses column id and ip, so I'd create an index on the combination. An index on ip alone won't help that query.
YES! Almost always add an INDEX or two or three! (multi-column indexes?) to every column.
If it is in not a WHERE clause today, you can bet it will be tomorrow.
Most data is WORM (written once read many times) so making the read most effective is where you will get the most value. And, as many have pointed out, the argument about having to maintain the index during a write is just plain silly.

MySQL: a huge table. can't query, even a simple select!

i have a table with about 200,000 records.
it takes a long time to do a simple select query. i am confiused because i am running under a 4 core cpu and 4GB of ram.
how should i write my query?
or is there anything to do with INDEXING?
important note: my table is static (it's data wont change).
what's your solutions?
PS
1 - my table has a primary key id
2 - my table has a unique key serial
3 - i want to query over the other fields like where param_12 not like '%I.S%'
or where param_13 = '1'
4 - 200,000 is not big and this is exactly why i am surprised.
5 - i even have problem when adding a simple field: my question
6 - can i create an INDEX for BOOL fields? (or is it usefull)
PS and thanks for answers
7 - my select shoudl return the fields that has specified 'I.S' or has not.
select * from `table` where `param_12` like '%I.S%'
this is all i want. it seems no Index helps here. ham?
Indexing will help. Please post table definition and select query.
Add index for all "=" columns in where clause.
Yes, you'll want/need to index this table and partitioning would be helpful as well. Doing this properly is something you will need to provide more information for. You'll want to use EXPLAIN PLAN and look over your queries to determine which columns and how you should index them.
Another aspect to consider is whether or not your table normalized. Normalized tables tend to give better performance due to lowered I/O.
I realize this is vague, but without more information that's about as specific as we can be.
BTW: a table of 200,000 rows is relatively small.
Here is another SO question you may find useful
1 - my table has a primary key id: Not really usefull unless you use some scheme which requires a numeric primary key
2 - my table has a unique key serial: The id is also unique by definition; why not use serial as the primary? This one is automatically indexed because you defined it as unique.
3 - i want to query over the other fields like where param_12 not like '%I.S%' or where param_13 = '1': A like '%something%' query can not really use an index; is there some way you can change param12 to param 12a which is the first %, and param12b which is 'I.S%'? An index can be used on a like statement if the starting string is known.
4 - 200,000 is not big and this is exactly why i am surprised: yep, 200.000 is not that much. But without good indexes, queries and/or cache size MySQL will need to read all data from disk for comparison, which is slow.
5 - i even have problem when adding a simple field: my question
6 - can i create an INDEX for BOOL fields? Yes you can, but an index which matches half of the time is fairly useless, an index is used to limit the amount of records MySQL has to load fully as much as possible; if an index does not dramatically limit that number, as is often the case with boolean (in a 50-50 distribution), using an index only requires more disk IO and can slow searching down. So unless you expect something like an 80-20 distribution or better creating an index will cost time, and not win time.
Index on param_13 might be used, but not the one on param_12 in this example, since the use of LIKE '% negate the index use.
If you're querying data with LIKE '%asdasdasd%' then no index can help you. It will have to do a full scan every time. The problem here is the leading % because that means that the substring you are looking for can be anywhere in the field - so it has to check it all.
Possibly you might look into full-text indexing, but depending on your needs that might not be appropriate.
Firstly, ensure your table have a primary key.
To answer in any more detail than that you'll need to provide more information about the structure of the table and the types of queries you are running.
I don't believe that the keys you have will help. You have to index on the columns used in WHERE clauses.
I'd also wonder if the LIKE requires table scans regardless of indexes. The minute you use a function like that you lose the value of the index, because you have to check each and every row.
You're right: 200K isn't a huge table. EXPLAIN PLAN will help here. If you see TABLE SCAN, redesign.