mysql innodb inner join with longtext very slow - mysql

I migrated all MySQL tables of one project from MyISAM to InnoDB last week, in order to support transaction. I used the command of alter table for this.
Most works fine, however one particular query runs very very slow, and it always gives the error Incorrect key file for table '/tmp/#sql_xxxx_x.MYI
Later I narrowed down the problem into the inner join of 2 tables, the user table and agreement table. And the inner join took place between the foreign key field of user (i.e. agreement_id) and primary key field of agreement (i.e. id).
The user table has only 50,000 rows of data, and the agreement table has, well, one single row. And we have set up the index for the agreement_id of user.
In any case, this seems to be a very light-weighted query, but it turns out to be the whole bottle neck.
Here is the full schema of agreement:
CREATE TABLE IF NOT EXISTS `agreement` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`remark` varchar(200) NOT NULL,
`content` longtext NOT NULL,
`is_active` tinyint(1) NOT NULL,
`date_stamp` datetime NOT NULL,
PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8 AUTO_INCREMENT=2 ;
One thing I doubt about is the longtext field of remark inside the agreement table, but we did NOT use the field for the inner join, in fact the query is slow even if we did NOT select remark in the query result.
finally, we converted the table of agreement from innoDB back to MyISAM, than everything becomes normal. And the query is finished in less than 1 second.
Now, my question is what actually is going on here? Does that mean once an innoDB table contains any text field, then the table could not be used for inner join?
I wish I could know the real reason so that I could avoid the same problems in the future.
thanks a lot.

This is a famous and tricky one. The most likely cause is that you're out of space in /tmp.
Here is a link I keep in my bookmarks that may help you: http://www.mysqlperformancetuning.com/a-fix-for-incorrect-key-file-for-table-mysql
In my experience, limited though it is, the primary reason for seeing
this error message is because your tmpdir has run out of space. Like
me you'll check how much free space you have: 1Gb, 2Gb, 4Gb. It may
not be enough. And here's why: MySQL can create temporary tables
bigger than that in a matter of seconds, quickly filling up any free
space. Depending on the nature of the query and the size of the
database naturally.
You may also try a REPAIR on your table but to me it is as usefull as breakdancing :/

InnoDB has it's own settings of buffer sizes etc. Check that out and if you can adjust it, go ahead. Just for test try to double them, if it helps, you may want to optimize it. It can make a big difference.
Some links that may help:
http://www.mysqlperformanceblog.com/2007/11/03/choosing-innodb_buffer_pool_size/
http://dev.mysql.com/doc/refman/5.5/en/innodb-buffer-pool.html

Maybe problem here is remark field defined as varchar(200)? Remember that temporary and memory tables stores varchar with fixed length. So 50k rows with varchar(200) can consume a lot of memory even if they are all empty.
If this is the problem then you can try one of several things:
If only few rows have value in column then create varchar(200) column with NULL allowed and always use NULL value instead of empty string
Change varchar(200) to text (there is of course drawback - it will always use temporary table on disk)
Maybe you don't need 200 characters? Try to use smaller VARCHAR size
Try to adjust tmp_table_size and max_heap_table_size, so you could handle larger temporary tables in memory http://dev.mysql.com/doc/refman/5.1/en/internal-temporary-tables.html
Use percona server as they support dynamic row format for memory tables http://www.mysqlperformanceblog.com/2011/09/06/dynamic-row-format-for-memory-tables/

What is the purpose of your query? I see from your comment that you only list the user information and nothing from agreement leading me to believe you are looking for users that has an agreement?
Since you are converting between engines it leads me to think you are doing cleanup before adding constraints. If so, consider a left join from the user table instead like:
select user.* from user left join agreement on user.agreement_id = agreement.id where user.agreement_id != 0;
If it's not cleanup, but simply looking for users with an agreement we make it simple
select user.* from user where user.agreement_id != 0;
If the purpose is something else, consider adding an index on user.agreement_id since an inner join may need it for speed. Let us know the real purpose and you may get better help.

Related

Email address as select index in mysql for huge table query speed

I'm wondering about using emails for indexing. I realise that this is sub-optimal and that it's better to use an auto-incremented primary key. But in this case I'm trying to develop a lite application that does not require account registration to use.
SELECT account_id, account_balance, account_payments_received
FROM accounts
WHERE account_email = ?
LIMIT 1
This works ok at the moment with few users. But I'm concerned about when it reaches a million or more. Is there any way to index emails quickly?
I was thinking maybe I could use the first and second characters as keys? Maybe develop an index number for a=1, b=2, c=3 and so on.....
What do you guys suggest?
1) You should keep a primary key with auto_increment, because it will provide you efficiency at the time of join with other tables.
2) Keep account_email field varchar(255) instead of char(255), so that can get free bytes back. Even varchar(100) will be enough.
3) create partially index on this field as per below command.
alter table accounts add index idx_account_email(account_email(50));
Note: varchar(50) will cover almost 99% emails.
I think you will find that any modern database will be able to perform this query (particuarily if it does NOT use LIKE) even on a table with a million rows in a fraction of a second. Just make sure you have an index on the column. i would add an autoincrement field also though as will always be simpler and quicker to use an integer to get a row.
Waht you are engaged in is premature optimisation.

Implementing a composite index

I've been reading about how a composite index can improve performance but am still a unclear on a few things. I have an INNODB database that has over 20 million entries with 8 data points each. Its performance has dropped substantially in the past few months. The server has 6 cores with 4gb mem which will be increased soon, but there's no indication on the server that I'm running low on mem. INNODB settings have been changed in my.cnf to;
innodb_buffer_pool_size = 1000M
innodb_log_file_size = 147M
These settings have helped in the past. So, my understanding is that many factors can contribute to the performance decrease, including the fact that I originally I had no indexing at all. Indexing methods are predicated on the type of queries that are run. So, this is my table;
cdr_records | CREATE TABLE `cdr_records` (
`dateTimeOrigination` int(11) DEFAULT NULL,
`callingPartyNumber` varchar(50) DEFAULT NULL,
`originalCalledPartyNumber` varchar(50) DEFAULT NULL,
`finalCalledPartyNumber` varchar(50) DEFAULT NULL,
`pkid` varchar(50) NOT NULL DEFAULT '',
`duration` int(11) DEFAULT NULL,
`destDeviceName` varchar(50) DEFAULT NULL,
PRIMARY KEY (`pkid`),
KEY `dateTimeOrigination` (`dateTimeOrigination`),
KEY `callingPartyNumber` (`callingPartyNumber`),
KEY `originalCalledPartyNumber` (`originalCalledPartyNumber`),
KEY `finalCalledPartyNumber` (`finalCalledPartyNumber`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
So, typically, a query will take a value and search callingPartyNumber, originalCalledPartyNumber, and finalCalledPartyNumber to find any entries related to it in the table. So, it wouldn't make any sense to use individual indexes like I have illustrated above because I typically don't run queries like this. However, I have another job in the evenings that is basically;
select * from cdr_records;
In this case, it sounds like it would be a good idea to have another composite index with all columns in it. Am I understanding this correctly?
The benefit of the composite index comes when you need to select/sort/group based on multiple columns, in the same fashion.
I remember there was a very good example with a phone book analogy I read somewhere. As the names in a phone book are ordered alphabetically it is very easy for you to sort through them and find the one you need based on the letters of the name from left to right. You can imagine that is a composite index of the letters in the names.
If the names were ordered only by the first letter and subsequent letters were chaotic (single column index) you would have to go through all records after you find the first letter, which will take a lot of time.
With a composite index, you can start from left to right and very easily find the record you are looking for, this is also the reason why you can't use for example the second or third column of the composite index, because you need the previous ones in order for it to work. Imagine trying to find all names whose third letter is "a" in the phone book, it would be a nightmare, you would need a separate index just for that, which is exactly what you need to do if you need to use a column from a composite index without using other columns from the index before it.
Bear in mind that the phone book example assumes that each letter of the names is a separate column, that could be a little confusing.
The other great strength of the composite indexes are unique composite indexes, which allow you to apply higher logical restrictions on your data that is very handy when you need it. Has nothing to do with performance but I thought it was worth to mention.
In your question your sql has no criteria, so there will be no index used. It is always recommended to use EXPLAIN to see what is going on, you can never be sure!
No, its not a good idea to set a composite index over all fields.
Wich field you are put i one or more index depends on your Querys.
Note:
MySQL can only use one Index per Query and can use composite Index only if all fields from left site on are used.
You not may use all field.
Example:
if you have an index x on the field name, street, number so this index will used when you query (in WHERE)
name or
name and street or
name, street and numer
but not if you search only
street or
street an number.
To find out if your index working well with your query put EXPLAIN before your query and you can see wich indexe are used from your query.

Is that a good practice to put several longtext into a same table mysql?

I am creating a mysql table which contain several longtext rows. I am expecting a lot of users enter a lot of texts. Should I split them into different table individually or just put them together in one table? I concern about the speed, will that affect the speed when I query the result, how about if I want to transfer the data on the future? I am using InnoDB, or should I use Myisam?
CREATE TABLE MyGuests (
id INT(6) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
diet longtext NOT NULL,
run longtext NOT NULL,
faith longtext,
apple longtext
);
The main concern over speed you'd have with this database layout is if your query is a SELECT *, while the page only uses one of the fields. (Which is a very common performance degrader.) Also, if you intend to display multiple texts per page in a listing of available texts etc., you'd probably want to have a separate description column (that has a truncated version of the complete text if nothing else), and only fetch those instead of fetching the full text only to then truncate it in PHP.
If you intend to provide search functionality, you should definitely use fulltext indexes to keep your performance in the clear. If your MySQL version is 5.6.4 or later, you can use both InnoDB and MyISAM for full text search. Otherwise, only MyISAM provides that in earlier versions.
You also have a third choice between an all-in-one table and separate-tables-for-each, which might be the way of choice, presuming you may end up adding more text types in the future. That is:
Have a second table with a reference to the ID of the first table, a column (ENUM would be most efficient, but really a marginal concern as long as you index it) indicating the type of text (diet, run, etc.), and a single longtext column that contains the text.
Then you can effortlessly add more text types in the future without the hassle of more dramatic edits to your table layouts (or code), and it will also be simple to fetch only texts of a particular type. An indexed join that combines the main entry-table (which might also hold some relevant metadata like author id, entry date, etc.) and the texts shouldn't be a performance concern.

Partitioning a mySQL table

I am considering partitioning a mySQL table that has the potential to grow very big. The table as it stands goes like this
DROP TABLE IF EXISTS `uidlist`;
CREATE TABLE IF NOT EXISTS `uidlist` (
`uid` varchar(9) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
`chcs` varchar(16) NOT NULL DEFAULT '',
UNIQUE KEY `uid` (`uid`)
) ENGINE=InnoDB DEFAULT CHARSET=ascii;
where
uid is a sequence of 9 character id strings starting with a lowercase letter
chcs is a checksum that is used internally.
I suspect that the best way to partition this table would be based on the first letter of the uid field. This would give
Partition 1
abcd1234,acbd1234,adbc1234...
Partition 2
bacd1234,bcad1234,bdac1234...
However, I have never done partitioning before I have no idea how to go about it. Is the partitioning scheme I have outlined possible? If so, how do I go about implementing it?
I would much appreciate any help with this.
Check out the manual for start :)
http://dev.mysql.com/tech-resources/articles/partitioning.html
MySQL is pretty feature-rich when it comes to partitioning and choosing the correct strategy depends on your use case (can partitioning help your sequential scans?) and the way your data grows since you don't want any single partition to become too large to handle.
If your data will tend to grow over time somewhat steadily you might want to do a create-date based partitioning scheme so that (for example) all records generated in a single year end up in last partition and previous partitions are never written to - for this to happen you may have to introduce another column to regulate this, see http://dev.mysql.com/doc/refman/5.1/en/partitioning-hash.html.
Added optimization benefit of this approach would be that you can have the most recent partition on a disk with fast writes (a solid state for example) and you can keep the older partitions on a cheaper disk with decent read speed.
Anyway, knowing more about your use case would help people give you more concrete answers (possibly including sql code)
EDIT, also, check out http://www.tokutek.com/products/tokudb-for-mysql/
The main question you need to ask yourself before partitioning is "why". What is the goal you are trying to achieve by partitioning the table?
Since all the table's data will still existing on a single MySQL server and, I assume, new rows will be arriving in "random" order (meaning the partition they'll be inserted into), you won't gain much by partitioning. Your point select queries might be slightly faster, but not likely by much.
The main benefit I've seen using MySQL partitioning is for data that needs to be purged according to a set retention policy. Partitioning data by week or month makes it very easy to delete old data quickly.
It sounds more likely to me that you want to be sharding your data (spreading it across many servers), and since your data design as shown is really just key-value then I'd recommend looking at database solutions that include sharding as a feature.
I have upvoted both of the answers here since they both make useful points. #bbozo - a move to TokuDB is planned but there are constraints that stop it from being made right now.
I am going off the idea of partitioning the uidlist table as I had originally wanted to do. However, for the benefit of anyone who finds this thread whilst trying to do something similiar here is the "how to"
DROP TABLE IF EXISTS `uidlist`;
CREATE TABLE IF NOT EXISTS `uidlist` (
`uid` varchar(9) CHARACTER SET ascii COLLATE ascii_bin NOT NULL ,
`chcs` varchar(16) NOT NULL DEFAULT '',
UNIQUE KEY `uid` (`uid`)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
PARTITION BY RANGE COLUMNS(uid)
(
PARTITION p0 VALUES LESS THAN('f%'),
PARTITION p1 VALUES LESS THAN('k%'),
PARTITION p2 VALUES LESS THAN('p%'),
PARTITION p3 VALUES LESS THAN('u%')
);
which creates four partitions.
I suspect that the long term solution here is to use a key-value store as suggested by #tmcallaghan rather than just stuffing everything into a MySQL table. I will probably post back in due course once I have established what would be the right way to accomplish that.

MySQL query very slow because of BLOB field (that can't be moved in another table)

I am developping a PyQT software based on a MySql Database. The database contains some recorded electrical signals, and all the information describing these signals (sampling rate, date of recoding etc...).
To have an idea, one database contains between 10 000 and 100 000 rows, and total size is >10Gb. All these data are stored on a dedicated server. In fact, most of the data is the signal itself, which is in a BLOB field called analogsignal.signal (see below)
here is the architecture of the database : http://packages.python.org/OpenElectrophy/_images/simple_diagram1.png
I can't change it (I can add columns and indexes, but I can not move or delete existing columns).
In the software, I need to list all the analogsignal columns (id, name, channel, t_start,sampling_rate), except analogsignal.signal, which is called later via the analogsignal.id. So I'm doing the following query
SELECT block.id, block.datetime, segment.id, analogsignal.id, analogsignal.name, analogsignal.channel, analogsignal.sampling_rate, block.fileOrigin, block.info
FROM segment, block, analogsignal
WHERE block.id=segment.id_block
AND segment.id=analogsignal.id_segment
ORDER BY analogsignal.id
The problem is, my queries are vey slow (> 10 min if the request is not in cache) because of the presence of analogsignal.signal column. If i understand correctly what's happening, the table is read line by line, including analogsignal.signal, even if the analogsignal.signal is not in the SELECT field.
Does anyone have an idea how to optimize the database or the query without moving the BLOB in an other table (which I agree would be more logical, but I do not control this point).
Thank you!
Here's the CREATE TABLE command for the AnalogSignal table (pulled/formatted from comment)
CREATE TABLE analogsignal
( id int(11) NOT NULL AUTO_INCREMENT,
id_segment int(11) DEFAULT NULL,
id_recordingpoint int(11) DEFAULT NULL,
name text,
channel int(11) DEFAULT NULL,
t_start float DEFAULT NULL,
sampling_rate float DEFAULT NULL,
signal_shape varchar(128) DEFAULT NULL,
signal_dtype varchar(128) DEFAULT NULL,
signal_blob longblob, Tag text,
PRIMARY KEY (id),
KEY ix_analogsignal_id_recordingpoint (id_recordingpoint),
KEY ix_analogsignal_id_segment (id_segment)
) ENGINE=MyISAM AUTO_INCREMENT=34798 DEFAULT CHARSET=latin1 ;
EDIT: Problem solved, here are the key points:
-I had to add a multiple column index, type INDEX on all he SELECT fields in the analogsignal table
-The columns of 'TEXT' type blocked the use of the index. I converted these TEXT fields in VARCHAR(xx). for this I used this simple command:
SELECT MAX(LENGTH(field_to_query)) FROM table_to_query
to check the minimal text length before conversion, to be sure that I will not loose any data
ALTER TABLE table_to_query CHANGE field_to_query field_to_query VARCHAR(24)
I first used VARCHAR(8000), but with this setting, VARCHAR was like a TEXT field, and indexing didn't worked. No such problem with VARCHAR(24). If I'm right, the total TEXT length (all fields included) in a query must no pass 1000 bytes
Then I indexed all the columns as said above, with no size parameter in the index
Finally, using a better query structure (thank you DRapp), improved also the query.
I passed from 215s to 0.016s for the query, with no cache...
In addition to trying to shrink your "blob" column requirements by putting the data to an external physical file and just storing the path\file name in the corresponding record, I would try the following as an alternative...
I would reverse the query and put your AnalogSignal table first as it is basis of the order by clause and reverse the query backwards to the blocks. Also, to prevent having to read every literal row of data, if you build a compound index on all columns you want in your output, it would make a larger index, but then the query will pull the values directly from the key expression instead of from reading back to the actual rows of data.
create index KeyDataOnly on AnalogSignal ( id, id_segment, name, channel, sampling_rate )
SELECT STRAIGHT_JOIN
block.id,
block.datetime,
segment.id,
analogsignal.id,
analogsignal.name,
analogsignal.channel,
analogsignal.sampling_rate,
block.fileOrigin,
block.info
FROM
analogsignal
JOIN Segment
on analogsignal.id_segment = segment.id
JOIN block
on segment.id_block = block.id
ORDER BY
analogsignal.id
If you cannot delete the BLOB column, do you have to fill it? You could add a column for storing the path/to/filename of your signal and then put all your signal files in the appropriate directory(s). Once that's done, set your BLOB field values to null.
It's probably breaking the spirit of the restrictions you're under. But arbitrary restrictions often need to be circumvented.
So according the comments I'm sure your problem is caused by the MyISAM storage engine and its behavior on storing the data. toxicate20 is right. The MySQL has to skip through those big blobs anyway which is not effective. You can change the storage engine for InnoDB which will help a lot in this problem. Will only read the blob data if you explicitly ask for it in the SELECT ... part.
ALTER TABLE analogsignal ENGINE=InnoDB;
This will take a while but helps a lot in performance. You can read more about InnoDB file formats here:
http://dev.mysql.com/doc/innodb/1.1/en/innodb-row-format-antelope.html
http://dev.mysql.com/doc/innodb/1.1/en/innodb-row-format-dynamic.html
Disclaimer: If you use fulltext search (MATCH ... AGAINST http://dev.mysql.com/doc/refman/5.5/en//fulltext-search.html) on any of the columns in the table you cannot change it to InnoDB.
As the analog signal column is pretty large the query will take a long time because it has to skip (or jump over them if you see it metaphorically) them when doing a select query. What I would do is the following: Instead of having a blob in the database, generate binary files via
$fh = fopen("analogfile.spec", 'w') or die("can't open file");
$data = $yourAnalogDataFromSomewhere;
fwrite($fh, $data);
fclose($fh);
The filename would be given by the ID of the column for example. Instead of the blob you would just add the filepath within your server directory structure.
This way your query will run very fast as it does not have to skip the big chunks of data in the blob column.