Implementing a composite index - mysql

I've been reading about how a composite index can improve performance but am still a unclear on a few things. I have an INNODB database that has over 20 million entries with 8 data points each. Its performance has dropped substantially in the past few months. The server has 6 cores with 4gb mem which will be increased soon, but there's no indication on the server that I'm running low on mem. INNODB settings have been changed in my.cnf to;
innodb_buffer_pool_size = 1000M
innodb_log_file_size = 147M
These settings have helped in the past. So, my understanding is that many factors can contribute to the performance decrease, including the fact that I originally I had no indexing at all. Indexing methods are predicated on the type of queries that are run. So, this is my table;
cdr_records | CREATE TABLE `cdr_records` (
`dateTimeOrigination` int(11) DEFAULT NULL,
`callingPartyNumber` varchar(50) DEFAULT NULL,
`originalCalledPartyNumber` varchar(50) DEFAULT NULL,
`finalCalledPartyNumber` varchar(50) DEFAULT NULL,
`pkid` varchar(50) NOT NULL DEFAULT '',
`duration` int(11) DEFAULT NULL,
`destDeviceName` varchar(50) DEFAULT NULL,
PRIMARY KEY (`pkid`),
KEY `dateTimeOrigination` (`dateTimeOrigination`),
KEY `callingPartyNumber` (`callingPartyNumber`),
KEY `originalCalledPartyNumber` (`originalCalledPartyNumber`),
KEY `finalCalledPartyNumber` (`finalCalledPartyNumber`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
So, typically, a query will take a value and search callingPartyNumber, originalCalledPartyNumber, and finalCalledPartyNumber to find any entries related to it in the table. So, it wouldn't make any sense to use individual indexes like I have illustrated above because I typically don't run queries like this. However, I have another job in the evenings that is basically;
select * from cdr_records;
In this case, it sounds like it would be a good idea to have another composite index with all columns in it. Am I understanding this correctly?

The benefit of the composite index comes when you need to select/sort/group based on multiple columns, in the same fashion.
I remember there was a very good example with a phone book analogy I read somewhere. As the names in a phone book are ordered alphabetically it is very easy for you to sort through them and find the one you need based on the letters of the name from left to right. You can imagine that is a composite index of the letters in the names.
If the names were ordered only by the first letter and subsequent letters were chaotic (single column index) you would have to go through all records after you find the first letter, which will take a lot of time.
With a composite index, you can start from left to right and very easily find the record you are looking for, this is also the reason why you can't use for example the second or third column of the composite index, because you need the previous ones in order for it to work. Imagine trying to find all names whose third letter is "a" in the phone book, it would be a nightmare, you would need a separate index just for that, which is exactly what you need to do if you need to use a column from a composite index without using other columns from the index before it.
Bear in mind that the phone book example assumes that each letter of the names is a separate column, that could be a little confusing.
The other great strength of the composite indexes are unique composite indexes, which allow you to apply higher logical restrictions on your data that is very handy when you need it. Has nothing to do with performance but I thought it was worth to mention.
In your question your sql has no criteria, so there will be no index used. It is always recommended to use EXPLAIN to see what is going on, you can never be sure!

No, its not a good idea to set a composite index over all fields.
Wich field you are put i one or more index depends on your Querys.
Note:
MySQL can only use one Index per Query and can use composite Index only if all fields from left site on are used.
You not may use all field.
Example:
if you have an index x on the field name, street, number so this index will used when you query (in WHERE)
name or
name and street or
name, street and numer
but not if you search only
street or
street an number.
To find out if your index working well with your query put EXPLAIN before your query and you can see wich indexe are used from your query.

Related

Email address as select index in mysql for huge table query speed

I'm wondering about using emails for indexing. I realise that this is sub-optimal and that it's better to use an auto-incremented primary key. But in this case I'm trying to develop a lite application that does not require account registration to use.
SELECT account_id, account_balance, account_payments_received
FROM accounts
WHERE account_email = ?
LIMIT 1
This works ok at the moment with few users. But I'm concerned about when it reaches a million or more. Is there any way to index emails quickly?
I was thinking maybe I could use the first and second characters as keys? Maybe develop an index number for a=1, b=2, c=3 and so on.....
What do you guys suggest?
1) You should keep a primary key with auto_increment, because it will provide you efficiency at the time of join with other tables.
2) Keep account_email field varchar(255) instead of char(255), so that can get free bytes back. Even varchar(100) will be enough.
3) create partially index on this field as per below command.
alter table accounts add index idx_account_email(account_email(50));
Note: varchar(50) will cover almost 99% emails.
I think you will find that any modern database will be able to perform this query (particuarily if it does NOT use LIKE) even on a table with a million rows in a fraction of a second. Just make sure you have an index on the column. i would add an autoincrement field also though as will always be simpler and quicker to use an integer to get a row.
Waht you are engaged in is premature optimisation.

MySQL fulltext inadequate. Creating an index would exceed 1000 bytes limit. What to do then?

Here are the columns involved:
order_id int(8)
tracking_number char(13)
carrier_status varchar(128)
recipient_name varchar(128)
recipient_address varchar(256)
recipient_state varchar(32)
recipient_country_descr varchar(128)
recipient_zipcode varchar(32)
Getting this error when I try to create an index of all these columns:
MySQL #1071 - Specified key was too long; max key length is 1000 bytes
What can I do given that this index is very important for the database?
I need to allow users to search all fields from a form with a single search field. So I'm using this query:
WHERE CONCAT(tracking_number, recipient_name, recipient_address, recipient_state, recipient_country_descr, recipient_zipcode, order_id, carrier_status) LIKE '$keyword1'
I've also considered using fulltext match() against(). Problem was that it doesn't allow searches like *keyword so I scratched it and am doing it with simple LIKE %keyword1% AND LIKE %keyword2% for each keyword.
But now I've ran into another problem, the query may be slow as I cannot create a single index containing all columns that will be searched.
What to do in this situation?
A conventional B-tree index cannot help when you're searching for a keyword that may be in multiple columns, or may be in the middle of a string.
Think of a telephone book. It's like an index on (last name, first name). If I ask you to search for a certain last name, it helps a lot that the book is already sorted that way. If I ask you to search for a specific last name and first name combination, the sort order of the book is also helpful.
But if I ask you to search for someone only by a specific first name "Bill", the fact that the book is sorted is not helpful. Occurrences of "Bill" could be found anywhere in the book, so you basically have to read it cover-to-cover.
Likewise if I ask you to search for anyone's name that contains a certain substring in the middle of the name or at the end. For example, anyone's last name that ends in "-son".
Your example of using CONCAT() over a bunch of columns and comparing that to a keyword in a LIKE pattern has the same problem.
The solution is to use a fulltext search engine, which does offer the ability to search for words anywhere in the middle of strings. It indexes in a completely different way than the one-dimensional B-tree sorting.
If you don't find that MySQL's FULLTEXT index is flexible enough (and I wouldn't blame you because MySQL's implementation is pretty rudimentary), then I suggest you look at a more specialized search technology. Here are a few free software options:
Apache Solr
Sphinx Search
Xapian
This may mean that you have to copy the searchable text from MySQL to the search engine, and keep copying it incrementally as changes are made to your MySQL data. A lot of developers do this.

MySQL : Table optimization word 'guest' or memberid in the column

This question is for MySQL (it allows many NULLs in the column which is UNIQUE, so the solution for my question could be slightly different).
There are two tables: members and Table2.
Table members has:
memberid char(20), it's a primary key. (Please do not recommend to use int(11) instead of char(20) for memberid, I can't change it, it contains exactly 20 symbols).
Table2 has:
CREATE TABLE IF NOT EXISTS `Table2`
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
memberid varchar(20) NOT NULL,
`Time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
status tinyint(4) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
Table2.memberid is a word 'guest' (could be repeated many times) or a value from members.memberid (it also could be repeated many times). Any value from Table2.memberid column (if not 'guest') exists in members.memberid column. Again, members.memberid is unique. Table2.memberid, even excluding words 'guest' is not unique.
So, Table2.memberid column looks like:
'guest'
'lkjhasd3lkjhlkjg8sd9'
'kjhgbkhgboi7sauyg674'
'guest'
'guest'
'guest'
'lkjhasd3lkjhlkjg8sd9'
Table2 has INSERTS and UPDATES only. It updates only status. Criteria for updating status: set status=0 WHERE memberid='' and status=1. So, it could be updated once or not updated at all. As result, the number of UPDATES is less or equal (by statistics it is twice less) than number of INSERTS.
The question is only about optimization.
The question could be splitted as:
1) Do you HIGHLY recommend to replace the word 'guest' to NULL or to a special 'xxxxxyyyyyzzzzz00000' (20 symbols like a 'very special and reserved' string) so you can use chars(20) for Table2.memberid, because all values are char(20)?
2) What about using a foreign key? I can't use it because of the value 'guest'. That value can NOT be in members.memberid column.
Using another words, I need some help to decide:
wether I can use 'guest' (I like that word) -vs- choosing 20-char-reserved-string so I can use char(20) instead of varchar(20) -vs- keeping NULLs instead of 'guest',
all values, except 'guest' are actually foreign keys. Is there any possible way to use this information for increasing the performance?
That table is used pretty often so I have to build Table2 as good as I can. Any idea is highly appreciated.
Thank you.
Added:
Well... I think I have found a good solution, that allows me to treat memberid as a foreign key.
1) Do you HIGHLY recommend to replace the word 'guest' to NULL or to a
special 'xxxxxyyyyyzzzzz00000' (20 symbols like a 'very special and
reserved' string) so you can use chars(20) for Table2.memberid,
because all values are char(20)?
Mixing values from different domains always causes trouble. The best thing to do is fix the underlying stuctural problem. Bad design can be really expensive to work around, and it can be really expensive to fix.
Here's the issue in a nutshell. The simplest data integrity constraint for this kind of issue is a foreign key constraint. You can't use one, because "guest" isn't a memberid. (Member ids are from one domain; "guest" isn't part of that domain; you're mixing values from two domains.) Using NULL to identify a guest doesn't help much; you can't distinguish guests from members whose memberid is missing. (Using NULL to identify anything is usually a bad idea.)
If you can use a special 20-character member id to identify all guests, it might be wise to do so. You might be lucky, in that "guest" is five letters. If you can use "guestguestguestguest" for the guests without totally screwing your application logic, I'd really consider that first. (But, you said that seems to treat guests as logged in users, which I think makes things break.)
Retrofitting a "users" supertype is possible, I think, and this might prove to the the best overall solution. The supertype would let you treat members and guests as the same sometimes (because they're not utterly different), and different at other times (because they're not entirely the same). A supertype also allows both individuals (members) and aggregate users (guests all lumped together) without undue strain. And it would unify the two domains, so you could use foreign key constraints for members. But it would require changing the program logic.
In Table2 (and do find a better name than that, please), an index on memberid or a composite index on memberid and status will perform just about as well as you can expect. I'm not sure whether a composite index will help; "status" only has two values, so it's not very selective.
all values, except 'guest' are actually foreign keys. Is there any
possible way to use this information for increasing the performance?
No, they're not foreign keys. (See above.) True foreign keys would help with data integrity, but not with SELECT performance.
"Increasing the performance" is pretty much meaningless. Performance is a balancing act. If you want to increase performance, you need to specify which part you want to improve. If you want faster inserts, drop indexes and integrity constraints. (Don't do that.) If you want faster SELECT statements, build more indexes. (But more indexes slows the INSERTS.)
You can speed up all database performance by moving to hardware that speeds up all database performance. (ahem) Faster processor, faster disks, faster disk subsystem, more memory (usually). Moving critical tables or indexes to a solid-state disk might blow your socks off.
Tuning your server can help. But keep an eye on overall performance. Don't get so caught up in speeding up one query than you degrade performance in all the others. Ideally, write a test suite and decide what speed is good enough before you start testing. For example, say you have one query that takes 30 seconds. What's acceptable improvement? 20 seconds? 15 seconds? 2 milliseconds sounds good, but is an unlikely target for a query that takes 30 seconds. (Although I've seen that kind of performance increase by moving to better table and index structure.)

MySQL large index integers for few rows performance

A developer of mine was making an application and came up with the following schema
purchase_order int(25)
sales_number int(12)
fulfillment_number int(12)
purchase_order is the index in this table. (There are other fields but not relevant to this issue). purchase_order is a concatenation of sales_number + fulfillment.
Instead i proposed an auto_incrementing field of id.
Current format could be essentially 12-15 characters long and randomly generated (Though always unique as sales_number + fulfillment_number would always be unique).
My question here is:
if I have 3 rows each with a random btu unique ID i.e. 983903004, 238839309, 288430274 vs three rows with the ID 1,2,3 is there a performance hit?
As an aside my other argument (for those interested) to this was the schema makes little sense on the grounds of data redundancy (can easily do a SELECT CONCATENAE(sales_number,fulfillment_number)... rather than storing two columns together in a third)
The problem as I see is not with bigint vs int ( autoicrement column can be bigint as well, there is nothing wrong with it) but random value for primary key. If you use INNODB engine, primary key is at the same time a clustered key which defines physical order of data. Inserting random value can potentially cause more page splits, and, as a result a greater fragmentation, which in turn causes not only insert/update query to slow down, but also selects.
Your argument about concatenating makes sense, but executing CONCATE also has its cost(unfortunately, mysql doesn't support calculated persistent columns, so in some cases it's ok to store result of concatenation in a separate column; )
AFAIK integers are stored and compared as integers so the comparisons should take the same length of time.
Concatenating two ints (32bit) into one bigint (64bit) may have a performance hit that is hardware dependent.
having incremental id's will put records that were created around the same time near each other on the hdd. this might make some queries faster. if this is the primary key on innodb or for the index that these id's are used.
incremental records can sometimes be inserted a little bit quicker. test to see.
you'll need to make sure that the random id is unique. so you'll need an extra lookup.
i don't know if these points are material for you application.

Can you have multiple Keys in SQL and why would you want that?

Is there a reason why you would want to have multiple KEYs in a TABLE? What is the point of having multiple KEYs in one table?
Here is an example that I found:
CREATE TABLE orders(
id INT UNSIGNED NOT NULL AUTO INCREMENT,
user_id INT UNSIGNED NOT NULL,
transaction_id VARCHAR(19) NOT NULL,
payment_status VARCHAR(15) NOT NULL,
payment_amount DECIMAL(15) NOT NULL,
payment_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY(id),
KEY(user_id),
)
Also, you'll notice the DBase programmer doesn't make transaction_id a KEY. Is there a reason for this?
KEY in MySQL is an alternate syntax for "index".
Indexes are common across databases, but they aren't covered by ANSI as of yet -- it's pure miracle things are as similar as they are. It can be common to have more than one index associated to a table -- because indexes improve data retrieval at the cost of update/delete/insert speed.
Be aware that MySQL (5.x?) automatically creates an index if one doesn't already exist for the primary key of a table.
There are several possible reasons.
Enhance search performance (when the WHERE clause uses KEY fields, it performs faster)
Restrain table contents (when using a UNIQUE key for a column, it can't have the same value twice or more on it)
These are the most common reasons.
In SQL you may only have one PRIMARY KEY per table.
KEY(foo) is bogus syntax in standard SQL. In MySQL KEY foo is a poorly named synonym for INDEX foo and does not impose a UNIQUE constraint. (It is poorly named because it does not actually relate to the functioning of a key.)
There may be multiple UNIQUE INDICES which can play the role of "candidate keys". (A unique constraint can be specified without an associated INDEX, but the point of a "key" is generally a quick look-up.) The point of a PRIMARY KEY is to uniquely identify a single record and is almost exclusively INDEX-backed and may even be directly related to the clustering of the data.
Only the minimal amount of [unique] INDICES required to ensure data validity and meet performance requirements should be used -- they impose performance penalties on the query engine as well as have additional update and maintenance costs.
Non-unique INDEX's (INDEX foo, or in the case of MySQL, KEY foo) are purely to allow the database to optimize queries. They do not really map to "keys" and may be referred to as "covering indices"; if selected by the query planner these indices can aid in performance even though they add nothing to the logical model itself. (For performance reasons, a database engine may require that FOREIGN KEYS are covered by indices.)
In this case, creating an INDEX (don't think "KEY"!) on user_id will generally (greatly) speed up queries with clauses like:
... WHERE user_id = {somenumber}
Without the INDEX the above query would require a FULL TABLE SCAN (e.g. read through all records).
While I do not know why transaction_id is not made an index, it might not be required (or even detrimental for the given access patterns) -- consider the case where every query that needs to be fast either:
Does not use transaction_id or;
Also has a user_id = ... or other restriction that can utilize an INDEX. That is, in a case like WHERE user_id = ... AND transaction_id = ... the query planner will likely first find the records for the matched user and then look for the matching transaction_id -- it still has to do a SCAN, but only over a much smaller data-set than the original table. Only a plain WHERE transaction_id = ... would necessarily require a FULL TABLE SCAN.
If in doubt, use EXPLAIN -- or other query analyzer -- and see what MySQL thinks it should do. As a last note, sometimes estimated query execution plans may differ from actual execution plans and outdated statistics can make the query planner choose a non-ideal plan.
Happy coding.
"Keys" might refer to one of:
Index (search optimization)
Constraint (e.g. foreign key, primary key)
You may want multiple because you need to implement more than one of these features in a single table. It's actually quite common.
In database theory, a key is a constraint that enforces uniqueness. A relvar (analogous to a SQL table) may indeed have more than one candidate key. A classic example is the set of chemical elements, for which name, symbol, atomic number and atomic weight are all keys (and folk will still want to add their own surrogate key to it ;)
MySQL continues a long tradition of abuse of the word KEY by making it a synonym for INDEX. Clearly, a MySQL table may have as many indexes as deemed necessary for performance for a given set of circumstances.
From your SQL DDL, it seems clear the ID is a surrogate key, so we must look for a natural key with not much to go on. While transaction_id may be a candidate, it is not inconceivable that an order can involve more than one transaction. In practical terms, I think an auditor would be suspicious of multiple orders made by the same user simultaneously, therefore suggest the compound of user_id and payment_time should be a key. However, if transaction_id is not a key then the table would not be fully normalized. Therefore, I'd give the designer the benefit of the doubt and assume transaction_id is also a key.