Exporting Large MySql Table - mysql

I have a table in MySql that I manage using PhpMyAdmin. Currently it's sitting at around 960,000 rows.
I have a boss who likes to look at the data in Excel, which means weekly, I have to export the data into Excel.
I am looking for a more efficient way to do this. Since I can't actually do the entire table at once, because it times out. So I have been stuck with 'chunking' the table into smaller queries and exporting it like that.
I have tried connecting Excel (and Access) directly to my database, but same problem; it times out. Is there any way to extend the connection limit?

Is your boss Rain Man? Does he just spot 'information' in raw 'data'?
Or does he build functions in excel rather than ask for what he really needs?
Sit with him for an hour and see what he's actually doing with the data. What questions are being asked? What patterns are being detected (manually)? Write a real tool to find that information and then plug it into your monitoring/alerting system.
Or, get a new boss. Seriously. You can tell him I said so.

Honestly for this size of data, I would suggest doing a mysqldump then importing the table back into another copy of MySQL installed somewhere else, maybe on a virtual machine dedicated to this task. From there, you can set timeouts and such as high as you want and not worry about resource limitations blowing up your production database. Using nice on Unix-based OSes or process priorities on Windows-based system, you should be able to do this without too much impact to the production system.
Alternatively, you can set up a replica of your production database and pull the data from there. Having a so-called "reporting database" that replicates various tables or even entire databases from your production system is actually a fairly common practice in large environments to ensure that you don't accidentally kill your production database pulling numbers for someone. As an added advantage, you don't have to wait for a mysqldump backup to complete before you start pulling data for your boss; you can do it right away.

The rows are not too many. Run a query to export your data into a csv file:
SELECT column,column2 INTO OUTFILE '/path/to/file/result.txt'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
FROM yourtable WHERE columnx = 'çondition';

One option would be to use SELECT ... INTO OUTFILE or mysqldump to export your table to CSV, which Excel can then directly open.

Related

400MB of csv data into mysql in under 2 hours

I need to import 400MB of data into a mysql table from 10 different .txt files loosely formatted in a csv manner. I have 4 hours tops to do it.
Is this AT ALL possible?
Is it better to chop the files into smaller files?
Is it better to upload them all simultaneously or to upload them sequentialy?
Thanks
EDIT:
I currently uploaded some of the files to the server, and am trying to use the "load infile" from phpmyadmin to achieve the import, using the following syntax & parameters:
load data infile 'http://example.com/datafile.csv' replace into table database.table fields terminated by ',' lines terminated by ';' (`id`,`status`);
It throws an Access denied error, was wondering if I could achieve this through a php file instead, as mentioned here.
Thanks again
FINAL EDIT
(or My Rookie Mistakes)
In the end, it could be done. Easily.
Paranoia was winning the war in my mind as I wrote this, due to a time limit I thought impossible. I wasn't reading the right things, paying attention to nothing else but the pressure.
The first mistake, an easy one to make and solve, was keeping the indexes while trying to import that first batch of data. It was pointed out in the comments, and I read it in many other places, but I dismissed it as unimportant; it wouldn't change a thing. Well, it definitely does.
Still, the BIG mistake was using LOAD FILE on a table running on the InnoDB engine. Suddenly I run into a post somewhere which's link I've lost that proclaimed that the command wouldn't perform well in InnoDB tables, giving errors such as Lock wait and the likes.
I'm guessing this should probably be declared as a duplicate for some InnoDB vs MyISAM question (if somewhat related), or someone could perhaps provide a more elaborate answer explaining what I only mention and know superficially, and I'd gladly select it as the correct answer (B-B-BONUS points for adding relative size of index compared to table data or something of the like).
Thanks to all those who where involved.
PS: I've replaced HeidiSQL with SQLyog since I read the recommendation on the comments and later answer, it's pretty decent and a bit faster/lighter. If I can overcome the SQLServer-y interface I might keep it as my default db manager.

Fastest way to copy a large MySQL table?

What's the best way to copy a large MySQL table in terms of speed and memory use?
Option 1. Using PHP, select X rows from old table and insert them into the new table. Proceed to next iteration of select/insert until all entries are copied over.
Option 2. Use MySQL INSERT INTO ... SELECT without row limits.
Option 3. Use MySQL INSERT INTO ... SELECT with a limited number of rows copied over per run.
EDIT: I am not going to use mysqldump. The purpose of my question is to find the best way to write a database conversion program. Some tables have changed, some have not. I need to automate the entire copy over / conversion procedure without worrying about manually dumping any tables. So it would be helpful if you could answer which of the above options is best.
There is a program that was written specifically for this task called mysqldump.
mysqldump is a great tool in terms of simplicity and careful handling of all types of data, but it is not as fast as load data infile
If you're copying on the same database, I like this version of Option 2:
a) CREATE TABLE foo_new LIKE foo;
b) INSERT INTO foo_new SELECT * FROM foo;
I've got lots of tables with hundreds of millions of rows (like 1/2B) AND InnoDB AND several keys AND constraints. They take many many hours to read from a MySQL dump, but only an hour or so by load data infile. It is correct that copying the raw files with the DB offline is even faster. It is also correct that non-ASCII characters, binary data, and NULLs need to be handled carefully in CSV (or tab-delimited files), but fortunately, I've pretty much got numbers and text :-). I might take the time to see how long the above steps a) and b) take, but I think they are slower than the load data infile... which is probably because of transactions.
Off the three options listed above.
I would select the second option if you have a Unique constraint on at least one column, therefore not creating duplicate rows if the script has to be run multiple times to achieve its task in the event of server timeouts.
Otherwise your third option would be the way to go, while manually taking into account any server timeouts to determine your insert select limits.
Use a stored procedure
Option two must be fastest, but it's gonna be a mighty long transaction. You should look into making a stored procedure doing the copy. That way you could offload some of the data parsing/handling from the MySQL engine.
MySQL's load data query is faster than almost anything else, however it requires exporting each table to a CSV file.
Pay particular attention to escape characters and representing NULL values/binary data/etc in the CSV to avoid data loss.
If possible, the fastest way will be to take the database offline and simply copy data files on disk.
Of course, this have some requirements:
you can stop the database while copying.
you are using a storage engine that stores each table in individual files, MyISAM does this.
you have privileged access to the database server (root login or similar)
Ah, I see you have edited your post, then I think this DBA-from-hell approach is not an option... but still, it's fast!
The best way i find so far is creating the files as dump files(.txt), by using the outfile to a text then using infile in mysql to get the same data to the database

Transfer mySQL from development to production

I need to synch development mysql db with the production one.
Production db gets updated by user clicks and other data generated via web.
Development db gets updated with processing data.
What's the best practice to accomplish this?
I found some diff tools (eg. mySQL diff), but they don't manage updated records.
I also found some application solution: http://www.isocra.com/2004/10/dumptosql/
but I'm not sure it's a good practice as in this case I need to retest my code each time I add new innodb related tables.
Any ideas?
Take a look at mysqldump. It may serve you well enough for this.
Assuming your tables are all indexed with some sort of unique key you could do a dump and have it leave out the 'drop/create table' bits. Have it run as 'insert ignore' and you'll get the new data without effecting the existing data.
Another option would be to use the query part of mysqldump to dump only the new records from the production side. Again - have mysqldump leave off the 'drop/create' bits.

Can I use multiple servers to increase mysql's data upload performance?

I am in the process of setting up a mysql server to store some data but realized(after reading a bit this weekend) I might have a problem uploading the data in time.
I basically have multiple servers generating daily data and then sending it to a shared queue to process/analyze. The data is about 5 billion rows(although its very small data, an ID number in a column and a dictionary of ints in another). Most of the performance reports I have seen have shown insert speeds of 60 to 100k/second which would take over 10 hours. We need the data in very quickly so we can work on it that day and then we may discard it(or achieve the table to S3 or something).
What can I do? I have 8 servers at my disposal(in addition to the database server), can I somehow use them to make the uploads faster? At first I was thinking of using them to push data to the server at the same time but I'm also thinking maybe I can load the data onto each of them and then somehow try to merge all the separated data into one server?
I was going to use mysql with innodb(I can use any other settings it helps) but its not finalized so if mysql doesn't work is there something else that will(I have used hbase before but was looking for a mysql solution first in case I have problems seems more widely used and easier to get help)?
Wow. That is a lot of data you're loading. It's probably worth quite a bit of design thought to get this right.
Multiple mySQL server instances won't help with loading speed. What will make a difference is fast processor chips and very fast disk IO subsystems on your mySQL server. If you can use a 64-bit processor and provision it with a LOT of RAM, you may be able to use a MEMORY access method for your big table, which will be very fast indeed. (But if that will work for you, a gigantic Java HashMap may work even better.)
Ask yourself: Why do you need to stash this info in a SQL-queryable table? How will you use your data once you've loaded it? Will you run lots of queries that retrieve single rows or just a few rows of your billions? Or will you run aggregate queries (e.g. SUM(something) ... GROUP BY something_else) that grind through large fractions of the table?
Will you have to access the data while it is incompletely loaded? Or can you load up a whole batch of data before the first access?
If all your queries need to grind the whole table, then don't use any indexes. Otherwise do. But don't throw in any indexes you don't need. They are going to cost you load performance, big time.
Consider using myISAM rather than InnoDB for this table; myISAM's lack of transaction semantics makes it faster to load. myISAM will do fine at handling either aggregate queries or few-row queries.
You probably want to have a separate table for each day's data, so you can "get rid" of yesterday's data by either renaming the table or simply accessing a new table.
You should consider using the LOAD DATA INFILE command.
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
This command causes the mySQL server to read a file from the mySQL server's file system and bulk-load it directly into a table. It's way faster than doing INSERT commands from a client program on another machine. But it's also tricker to set up in production: your shared queue needs access to the mySQL server's file system to write the data files for loading.
You should consider disabling indexing, then loading the whole table, then re-enabling indexing, but only if you don't need to query partially loaded tables.

mysql optimization script file

I'm looking at having someone do some optimization on a database. If I gave them a similar version of the db with different data, could they create a script file to run all the optimizations on my database (ie create indexes, etc) without them ever seeing or touching the actual database? I'm looking at MySQL but would be open to other db's if necessary. Thanks for any suggestions.
EDIT:
What if it were an identical copy with transformed data? Along with a couple sample queries that approximated what the db was used for (ie OLAP vs OLTP)? Would a script be able to contain everything or would they need hands on access to the actual db?
EDIT 2:
Could I create a copy of the db, transform the data to make it unrecognizable, create a backup file of the db, give it to vendor and them give me a script file to run on my db?
Why are you concerned that they should not access the database? You will get better optimization if they have the actual data as they can consider table sizes, which queries run the slowest, whether to denormalise if necessary, putting small tables completely in memory, ...?
If it is a issue of confidentiality you can always make the data anomous by replacement of names.
If it's just adding indices, then yes. However, there are a number of things to consider when "optimizing". Which are the slowest queries in your database? How large are certain tables? How can certain things be changed/migrated to make those certain queries run faster? It could be harder to see this with sparse sample data. You might also include a query log so that this person could see how you're using the tables/what you're trying to get out of them, and how long those operations take.