MySql - WAMP - Huge Table is very slow (20 million rows) - mysql

So I posted this! yesterday and got a perfect answer, which required running this code first: ALTER TABLE mytable AUTO_INCREMENT=10000001;
I ran it several times, but restarted WAMP after a couple of hours of it not working. After running overnight (12 hours), the code still hadn't run.
I am wondering if my database table size is past the limits of mysql or my computer or both.
However, I have a sneaky suspicion that proper indexing or some other factor could greatly impact my performance. I know 20 million is a lot of rows, but is it too much?
I don't know much about indexes, except that they are important. I attempted to add them to the name and state fields, which I believe I did successfully.
Incidentally, I am trying to add a unique ID field, which is what my post yesterday was all about.
So, the question is: Is 20 million rows outside the scope of MySql? If not, am I missing an index or some other setting that would help better work with this 20 million rows? Can I put indexes on all the columns and make it super fast?
As always, thanks in advance...
Here are the specs:
My PC is XP, running WAMPSERVER, Win32 NTFS, Intel Duo Core, T9300 # 2.50GHz, 1.17 GHz, 1.98 GB or RAM
DB: 1 table, 20 million rows
The size of the tables is:
Data 4.4 Gigs, Indexes 1.3 Gigs, Total 5.8 Gigs
The indexes are set up on the 'BUSINESS NAME' and 'STATE' fields
The table fields are like this:
`BUSINESS NAME` TEXT NOT NULL,
`ADDRESS` TEXT NOT NULL,
`CITY` TEXT NOT NULL,
`STATE` TEXT NOT NULL,
`ZIP CODE` TEXT NOT NULL,
`COUNTY` TEXT NOT NULL,
`WEB ADDRESS` TEXT NOT NULL,
`PHONE NUMBER` TEXT NOT NULL,
`FAX NUMBER` TEXT NOT NULL,
`CONTACT NAME` TEXT NOT NULL,
`TITLE` TEXT NOT NULL,
`GENDER` TEXT NOT NULL,
`EMPLOYEE` TEXT NOT NULL,
`SALES` TEXT NOT NULL,
`MAJOR DIVISION DESCRIPTION` TEXT NOT NULL,
`SIC 2 CODE DESCRIPTION` TEXT NOT NULL,
`SIC 4 CODE` TEXT NOT NULL,
`SIC 4 CODE DESCRIPTION` TEXT NOT NULL

Some answers:
20 million rows is well within the capability of MySQL. I work on a database that has over 500 million rows in one of its tables. It can take hours to restructure a table, but ordinary queries aren't a problem as long as they're assisted by an index.
Your laptop is pretty out of date and underpowered to use as a high-scale database server. It's going to take a long time to do a table restructure. The low amount of memory and typically slow laptop disk is probably constraining you. You're probably using default settings for MySQL too, which are designed to work on very old computers.
I wouldn't recommend using TEXT data type for every column. There's no reason you need TEXT for most of those columns.
Don't create an index on every column, especially if you insist on using TEXT data types. You can't even index a TEXT column unless you define a prefix index. In general, choose indexes to support specific queries.
You probably have many other questions based on the above, but there's too much to cover in a single StackOverflow post. You might want to take training or read a book if you're going to work with databases.
I recommend High Performance MySQL, 2nd Edition.
Re your followup questions:
For MySQL tuning, here's a good place to start: http://www.mysqlperformanceblog.com/2006/09/29/what-to-tune-in-mysql-server-after-installation/
Many ALTER TABLE operations cause a table restructure, which means basically lock the table, make a copy of the whole table with the changes applied, and then rename the new and old tables and drop the old table. If the table is very large, this can take a long time.
A TEXT data type can store up to 64KB, which is overkill for a phone number or a state. I would use CHAR(10) for a typical US phone number. I would use CHAR(2) for a US state. In general, use the most compact and thrifty data type that supports the range of data you need in a given column.

It's going to take a long time because you've only got 2GB RAM and 6GB of data/indexes and it's going to force a ton of swapping in/out between RAM and disk. There's not much you can do about that, though.
You could try running this in batches.
Create a separate empty table with the auto_increment column included in it. Then insert your records a certain amount at a time (say, 1 state at a time). That might help it go faster since you should be able to handle those smaller datasets completely in memory instead of paging to disk.
You'll probably get a lot better responses for this if it's on dba.stackexchange.com also.

I believe the hardware is fine but you need to spare your resources a lot better.
Db structure optimization!
Do not use TEXT!
For phonenumbers use bigint unsigned. Any signs or alpha must be parsed and converted.
For any other alpha-numeric column use eg varchar([32-256]).
Zip-code is of course mediumint unsigned.
Gender should be enum('Male','Female')
Sales could be an int unsigned
State should be enum('Alaska',...)
Country should be enum('Albania',...)
When building a large index the fastest way is to create a new table and do INSERT INTO ... SELECT FROM ... rather then ALTER TABLE ....
Changing the State and Country fields to enum will drastically reduce you indexes size.

Related

Is that a good practice to put several longtext into a same table mysql?

I am creating a mysql table which contain several longtext rows. I am expecting a lot of users enter a lot of texts. Should I split them into different table individually or just put them together in one table? I concern about the speed, will that affect the speed when I query the result, how about if I want to transfer the data on the future? I am using InnoDB, or should I use Myisam?
CREATE TABLE MyGuests (
id INT(6) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
diet longtext NOT NULL,
run longtext NOT NULL,
faith longtext,
apple longtext
);
The main concern over speed you'd have with this database layout is if your query is a SELECT *, while the page only uses one of the fields. (Which is a very common performance degrader.) Also, if you intend to display multiple texts per page in a listing of available texts etc., you'd probably want to have a separate description column (that has a truncated version of the complete text if nothing else), and only fetch those instead of fetching the full text only to then truncate it in PHP.
If you intend to provide search functionality, you should definitely use fulltext indexes to keep your performance in the clear. If your MySQL version is 5.6.4 or later, you can use both InnoDB and MyISAM for full text search. Otherwise, only MyISAM provides that in earlier versions.
You also have a third choice between an all-in-one table and separate-tables-for-each, which might be the way of choice, presuming you may end up adding more text types in the future. That is:
Have a second table with a reference to the ID of the first table, a column (ENUM would be most efficient, but really a marginal concern as long as you index it) indicating the type of text (diet, run, etc.), and a single longtext column that contains the text.
Then you can effortlessly add more text types in the future without the hassle of more dramatic edits to your table layouts (or code), and it will also be simple to fetch only texts of a particular type. An indexed join that combines the main entry-table (which might also hold some relevant metadata like author id, entry date, etc.) and the texts shouldn't be a performance concern.

Optimal way to store BLOBs larger than max_allowed_packet in MySQL InnoDB

Maybe this question should be asked on https://dba.stackexchange.com/ instead, I'm not sure. Please advise in comments or move it there.
For this project I'm using MySQL 5.6.19 hosted at Amazon RDS.
Summary
I'm going to store photos in the database in a BLOB column in InnoDB table and I would like to know the optimal way to do it. I'm looking for official documentation or some method(s) that would allow to compare different variants.
When searching for this topic there are a lot of discussions and questions about whether it is better to store binary files in the database BLOB or in the file system with the database having only file paths and names. Such discussion is beyond the scope of this question. For this project I need consistency and referential integrity, so files are going to be stored in BLOB, the question is in details of how exactly to do it.
Database schema
Here is the relevant part of the schema (so far). There is a table Contracts with some general information about each contract and primary ID key.
For each Contract there can be several (~10) photos taken, so I have a table ContractPhotos:
CREATE TABLE `ContractPhotos` (
`ID` int(11) NOT NULL,
`ContractID` int(11) NOT NULL,
`PhotoDateTime` datetime NOT NULL,
PRIMARY KEY (`ID`),
KEY `IX_ContractID` (`ContractID`),
CONSTRAINT `FK_ContractPhotos_Contracts` FOREIGN KEY (`ContractID`) REFERENCES `Contracts` (`ID`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8
For each photo I will store original full resolution image plus few scaled down versions, so I have a table ContractPhotoVersions:
CREATE TABLE `ContractPhotoVersions` (
`ID` int(11) NOT NULL,
`ContractPhotoID` int(11) NOT NULL,
`PhotoVersionTypeID` int(11) NOT NULL,
`PhotoWidth` int(11) NOT NULL,
`PhotoHeight` int(11) NOT NULL,
`FileSize` int(11) NOT NULL,
`FileMD5` char(32) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
PRIMARY KEY (`ID`),
KEY `IX_ContractPhotoID` (`ContractPhotoID`),
CONSTRAINT `FK_ContractPhotoVersions_ContractPhotos` FOREIGN KEY (`ContractPhotoID`) REFERENCES `ContractPhotos` (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Finally, there is a table that holds actual binary data of all images. I know that MySQL allows to store up to 4GB in LONGBLOB columns, but during my search I came across another MySQL limitation: max_allowed_packet. On my instance of MySQL this variable is 4MB. My understanding of this variable after reading the docs is that effectively, a single row can't exceed 4MB. It is pretty normal to have a photo that is more than 4MB, so in order to be able to INSERT and SELECT such files I intend to split the file into small chunks:
CREATE TABLE `PhotoChunks` (
`ID` int(11) NOT NULL,
`ContractPhotoVersionID` int(11) NOT NULL,
`ChunkNumber` int(11) NOT NULL,
`ChunkSize` int(11) NOT NULL,
`ChunkData` blob NOT NULL,
PRIMARY KEY (`ID`),
UNIQUE KEY `IX_ContractPhotoVersionID_ChunkNumber` (`ContractPhotoVersionID`,`ChunkNumber`),
CONSTRAINT `FK_PhotoChunks_ContractPhotoVersions` FOREIGN KEY (`ContractPhotoVersionID`) REFERENCES `ContractPhotoVersions` (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Besides, I will be able to upload large photos into the database few chunks at a time and resume upload when connection drops.
Data volume
The estimated volume of data is 40,000 full-resolution photos at ~5MB each => 200GB. The scaled down versions will most likely be 800x600 at ~120KB each => + extra 5GB. Images will not be UPDATEd. They will be deleted eventually after several years.
Question
There are many ways to split a file into smaller chunks: you can split it into 4KB, 8KB, 64KB, etc. What would be the optimal way when using InnoDB storage engine to minimize wasted space first and overall performance second?
I found these docs: http://dev.mysql.com/doc/refman/5.6/en/innodb-file-space.html, but there is not much detail about BLOB. It says that page size is 16KB.
The maximum row length, except for variable-length columns (VARBINARY,
VARCHAR, BLOB and TEXT), is slightly less than half of a database
page. That is, the maximum row length is about 8000 bytes.
I really expected official documentation to be more precise than about 8000 bytes. The following paragraph is most interesting:
If a row is less than half a page long, all of it is stored locally
within the page. If it exceeds half a page, variable-length columns
are chosen for external off-page storage until the row fits within
half a page. For a column chosen for off-page storage, InnoDB stores
the first 768 bytes locally in the row, and the rest externally into
overflow pages. Each such column has its own list of overflow pages.
The 768-byte prefix is accompanied by a 20-byte value that stores the
true length of the column and points into the overflow list where the
rest of the value is stored.
Considering the above there can be at least these strategies:
choose such chunk size that it will be stored locally within the page without involving off-page storage.
choose such chunk size that the whole BLOB is stored off-page.
I don't like the idea of storing a BLOB partially within the page and partially off-page. But, hey, maybe I'm wrong.
I also came across this doc https://dev.mysql.com/doc/refman/5.6/en/innodb-row-format-dynamic.html and at this point I realised that I want to ask this question. It is too overwhelming for me now and I hope that there is somebody who has had a practical experience with this topic.
I don't want to end up wasting half of the disk space by inadvertently choosing a poor chunk size and row format. My concern is that if I choose to store 8000 bytes for each chunk plus 16 bytes for 4 ints in the same row of PhotoChunks table it would exceed that magic half of the page size and I end up spending 16KB for each row for only 8000 byte of data.
Is there a way to check how much space is actually wasted in this way? In the Amazon RDS environment I'm afraid there is no way to have a look at the actual files that the InnoDB table consists of. Otherwise, I would simply try different variants and see the final file size.
So far I can see that there are two parameters: the row format and chunk size. Maybe there are other things to consider.
Edit
Why I don't consider changing the max_allowed_packet variable. From the doc:
Both the client and the server have their own max_allowed_packet
variable, so if you want to handle big packets, you must increase this
variable both in the client and in the server.
I use MySQL C API to work with this database and the same C++ application is talking to 200 other MySQL servers (completely unrelated to this project) using same libmysql.dll. Some of these servers are still MySQL 3.23. So my app has to work with all of them. Frankly speaking, I didn't look into docs on how to change max_allowed_packet variable in the client side of MySQL C API.
Edit 2
#akostadinov pointed out that there is mysql_stmt_send_long_data() to send BLOB data to server in chunks and people said that they have managed to INSERT BLOBs that are larger than max_allowed_packet. Still, even if I manage to INSERT, say, 20MB BLOB with max_allowed_packet=4MB how do I SELECT it back? I don't see how I can do it.
I would appreciate it if you pointed me to the right direction.
I stand by my answer in forums.mysql.com of 2 years ago. Some further notes:
16M is likely to work for max_allowed_packet, however I have no evidence that it works beyond that.
In an application I worked on a several years ago, it seemed that a chunk size of about 50KB was 'optimal'.
max_allowed_packet can be set in /etc/my.cnf. But if you don't access to that, you are stuck with its value. You can get it in any(?) version by doing SHOW VARIABLES LIKE 'max_allowed_packet'. (I'm reasonably sure back to 4.0, but not sure about 3.23.) So that could be an upper limit on your chunk size.
InnoDB will split big BLOB/TEXT fields into 16KB blocks. Probably each block has some overhead, so you don't get exactly 16KB.
Antelope versus Barracuda, and other settings control whether 767 bytes of the BLOB is stored in the record. If none is stored there, there is a 20-byte pointer to the off-block storage.
Today, 16MB may seem like a reasonable limit for picture size; tomorrow it won't.
If you are running a new enough version of MySQL, innodb_page_size can be raised from 16K to 32K or 64K. (And the ~8000 goes up to ~16000, but not ~32000.)
If replication is involved, chunking becomes more important. But there can be some extra tricky business with the 'sequence number' for the chunks. (Ask me if you need to go this direction.)
Adding the above comments together, I suggest a chunk size of MIN(64700, max_allowed_packet) bytes as a reasonable compromise, even if you can't control innodb_file_format. Only 1-2% of disk space will be wasted inside this "photos" table (assuming pictures of, say, about 1MB).
Compression is useless; JPGs are already compressed.
Most of the time is in I/O; second most is in network chatter between client and server. The point here is... C vs PHP will not make much difference when it comes to performance.
The ~8000 bytes per record is irrelevant in this discussion. That applies to a table with lots of columns -- they can't add up to more than ~8K. Most of the BLOB will go off-page, leaving only 60-800 bytes per row, hence 15-200 rows per 16KB block (average, after other types of overhead).
PARTITION is unlikely to be of any use.
Is "chunking a premature optimization"? It is not an "optimization" if you are hitting a brick wall because of max_allowed_packet.
One approach to try is using long send as described here:
Is there any way to insert a large value in a mysql DB without changing max_allowed_packet?
Another approach, as you suggest, is to split data into chunks. See one possible approach in this thread:
http://forums.mysql.com/read.php?20,601656,601656
Another is, given you set some image max size limit on your user interface, to increase packet size accordingly. Do you allow images larger than 16MB?
If you ask me, I'd avoid implementing chunking as it looks more like a premature optimization instead of letting DB do its own optimizations.

Optimizing MySQL Table Structure and impact of row size

One of my database tables has grown quite large, to the point where I think it is impacting the performance on my site (it is definitely making backups a lot slower).
It has ~13,000,000 rows and is 4.2 GiB in size, of which 1.2 GiB is data.
The structure looks like this:
CREATE TABLE IF NOT EXISTS `t1` (
`id` int(10) unsigned NOT NULL,
`int2` int(10) unsigned NOT NULL,
`int3` int(10) unsigned NOT NULL,
`int4` int(10) unsigned NOT NULL,
`char1` varchar(255) NOT NULL,
`int5` int(10) NOT NULL,
`char2` varchar(1024) DEFAULT NULL,
`char3` varchar(1024) NOT NULL,
PRIMARY KEY (`id`,`int2`,`int3`,`int4`),
KEY `key1` (`id`,`int2`,`char1`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Common operations in this table are insert and selects, rows are never updated and rarely deleted. int2 is a running version number, which means usually only the rows with the highest value of int2 for that id are selected.
I have been thinking of several ways of optimizing this and I was wondering which one would be the which one to pursue:
char1 (which is in the index) actually only contains about 40,000 different strings. I could move the strings into a second table (idchar -> char) and then just save the id in my main table, at the cost of an additional id lookup step during inserts and selects.
char2 and char3 are often empty. I could move them to a separate table that I would then do a LEFT JOIN on in selects.
Even if char2 and char3 contain data they are usually shorter than 1024 chars. I could probably shorten these to ~200.
Which one of these do you think is the most promising? Does decreasing the row size (either by making char1 into an integer or by removing/resizing columns) in MySQL InnoDB tables actually have a big impact on performance?
Thanks
There are several options. From what you say, moving char1 to another table seems quite reasonable. The additional lookup could, under some circumstances, even be faster than storing the raw data in the tables. (This occurs when the repeated values cause the table to be larger than necessary, especially when the larger table might be larger than available memory.) And, this would save space both in the data table and the corresponding index.
The exact impact on performance is hard to say, without understanding much more about your system and the query load.
Moving char3 and char4 to another table will have minimal impact. The overhead of the link to the other table would eat up any gains in space. You could save a couple bytes per record by storing them as varchar(255) rather than varchar(1024).
If you have a natural partitioning key, then partitioning is definitely an option, particularly for reducing the time for backups. This is very handy for a transaction-style table, where records are inserted and never or very rarely modified. If, on the other hand, the records contain customer records and any could be modified at any time, then you would still need to back up all the partitions.
There are several factors that could affect performance of your DB. Partitioning is definitive the best option but not allways can be done. If you are searching char1 before insert, then partitioning can be a problem because you have to search all the parts for the key. You must analize how the data is generated and most important how you make your querys for this table. This is the key so you should post your querys over this table. In the case on char2 and char3, moving to another table won't make any difference. You also should mention the physical distribution of you data. Are you using a single data file? Are data files on same physical disk as SO? Give more details so we can give you more help.

MySQL: Which is smaller, storing 2 sets of similar data in 1 table vs 2 tables (+indexes)?

I was asked to optimize (size-wise) statistics system for a certain site and I noticed that they store 2 sets of stat data in a single table. Those sets are product displays on search lists and visits on product pages. Each row has a product id, stat date, stat count and stat flag columns. The flag column indicates if it's a search list display or page visit stat. Stats are stored per day and product id, stat date (actually combined with product ids and stat types) and stat count have indexes.
I was wondering if it's better (size-wise) to store those two sets as separate tables or keep them as a single one. I presume that the part which would make a difference would be the flag column (lets say its a 1 byte TINYINT) and indexes. I'm especially interested about how the space taken by indexes would change in 2 table scenario. The table in question already has a few millions of records.
I'll probably do some tests when I have more time, but I was wondering if someone had already challenged a similar problem.
Ordinarily, if two kinds of observations are conformable, it's best to keep them in a single table. By "conformable," I mean that their basic data is the same.
It seems that your observations are indeed conformable.
Why is this?
First, you can add more conformable observations trivially easily. For example, you could add sales to search-list and product-page views, by adding a new value to the flag column.
Second, you can report quite easily on combinations of the kinds of observations. If you separate these things into different tables, you'll be doing UNIONs or JOINs when you want to get them back together.
Third, when indexing is done correctly the access times are basically the same.
Fourth, the difference in disk space usage is small. You need indexes in either case.
Fifth, the difference in disk space cost is trivial. You have several million rows, or in other words, a dozen or so gigabytes. The highest-quality Amazon Web Services storage costs about US$ 1.00 per year per gigabyte. It's less than the heat for your office will cost for the day you will spend refactoring this stuff. Let it be.
Finally I got a moment to conduct a test. It was just a small scale test with 12k and 48k records.
The table that stored both types of data had following structure:
CREATE TABLE IF NOT EXISTS `stat_test` (
`id_off` int(11) NOT NULL,
`stat_date` date NOT NULL,
`stat_count` int(11) NOT NULL,
`stat_type` tinyint(11) NOT NULL,
PRIMARY KEY (`id_off`,`stat_date`,`stat_type`),
KEY `id_off` (`id_off`),
KEY `stat_count` (`stat_count`)
) ENGINE=InnoDB DEFAULT CHARSET=latin2;
The other two tables had this structure:
CREATE TABLE IF NOT EXISTS `stat_test_other` (
`id_off` int(11) NOT NULL,
`stat_date` date NOT NULL,
`stat_count` int(11) NOT NULL,
PRIMARY KEY (`id_off`,`stat_date`),
KEY `id_off` (`id_off`),
KEY `stat_count` (`stat_count`)
) ENGINE=InnoDB DEFAULT CHARSET=latin2;
In case of 12k records 2 separate tables were actually slightly bigger than the one storing everything, but in case of 48k records, two tables were smaller and by a noticeable value.
In the end I didn't split the data into two tables to solve my initial space problem. I managed to considerably reduce the size of the database, by removing the redundant id_off index and adjusting the data types (in most cases unsigned smallint was more than enough to store all the values I needed). Note that originally stat_type was also of type int and for this column unsigned tinyint was enough. All in all this reduced the size of the database from 1.5GB to 600MB (and my limit was just 2GB for the database). Another advantage of this solution was the fact that I didn't have to modify a single line of code to make everything work (since the site was written by someone else, I didn't had to spend hours trying to understand the source code).

mysql innodb inner join with longtext very slow

I migrated all MySQL tables of one project from MyISAM to InnoDB last week, in order to support transaction. I used the command of alter table for this.
Most works fine, however one particular query runs very very slow, and it always gives the error Incorrect key file for table '/tmp/#sql_xxxx_x.MYI
Later I narrowed down the problem into the inner join of 2 tables, the user table and agreement table. And the inner join took place between the foreign key field of user (i.e. agreement_id) and primary key field of agreement (i.e. id).
The user table has only 50,000 rows of data, and the agreement table has, well, one single row. And we have set up the index for the agreement_id of user.
In any case, this seems to be a very light-weighted query, but it turns out to be the whole bottle neck.
Here is the full schema of agreement:
CREATE TABLE IF NOT EXISTS `agreement` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`remark` varchar(200) NOT NULL,
`content` longtext NOT NULL,
`is_active` tinyint(1) NOT NULL,
`date_stamp` datetime NOT NULL,
PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8 AUTO_INCREMENT=2 ;
One thing I doubt about is the longtext field of remark inside the agreement table, but we did NOT use the field for the inner join, in fact the query is slow even if we did NOT select remark in the query result.
finally, we converted the table of agreement from innoDB back to MyISAM, than everything becomes normal. And the query is finished in less than 1 second.
Now, my question is what actually is going on here? Does that mean once an innoDB table contains any text field, then the table could not be used for inner join?
I wish I could know the real reason so that I could avoid the same problems in the future.
thanks a lot.
This is a famous and tricky one. The most likely cause is that you're out of space in /tmp.
Here is a link I keep in my bookmarks that may help you: http://www.mysqlperformancetuning.com/a-fix-for-incorrect-key-file-for-table-mysql
In my experience, limited though it is, the primary reason for seeing
this error message is because your tmpdir has run out of space. Like
me you'll check how much free space you have: 1Gb, 2Gb, 4Gb. It may
not be enough. And here's why: MySQL can create temporary tables
bigger than that in a matter of seconds, quickly filling up any free
space. Depending on the nature of the query and the size of the
database naturally.
You may also try a REPAIR on your table but to me it is as usefull as breakdancing :/
InnoDB has it's own settings of buffer sizes etc. Check that out and if you can adjust it, go ahead. Just for test try to double them, if it helps, you may want to optimize it. It can make a big difference.
Some links that may help:
http://www.mysqlperformanceblog.com/2007/11/03/choosing-innodb_buffer_pool_size/
http://dev.mysql.com/doc/refman/5.5/en/innodb-buffer-pool.html
Maybe problem here is remark field defined as varchar(200)? Remember that temporary and memory tables stores varchar with fixed length. So 50k rows with varchar(200) can consume a lot of memory even if they are all empty.
If this is the problem then you can try one of several things:
If only few rows have value in column then create varchar(200) column with NULL allowed and always use NULL value instead of empty string
Change varchar(200) to text (there is of course drawback - it will always use temporary table on disk)
Maybe you don't need 200 characters? Try to use smaller VARCHAR size
Try to adjust tmp_table_size and max_heap_table_size, so you could handle larger temporary tables in memory http://dev.mysql.com/doc/refman/5.1/en/internal-temporary-tables.html
Use percona server as they support dynamic row format for memory tables http://www.mysqlperformanceblog.com/2011/09/06/dynamic-row-format-for-memory-tables/
What is the purpose of your query? I see from your comment that you only list the user information and nothing from agreement leading me to believe you are looking for users that has an agreement?
Since you are converting between engines it leads me to think you are doing cleanup before adding constraints. If so, consider a left join from the user table instead like:
select user.* from user left join agreement on user.agreement_id = agreement.id where user.agreement_id != 0;
If it's not cleanup, but simply looking for users with an agreement we make it simple
select user.* from user where user.agreement_id != 0;
If the purpose is something else, consider adding an index on user.agreement_id since an inner join may need it for speed. Let us know the real purpose and you may get better help.