I always use MySQL for storing and operating on data.
But this time I have around 4 GB csv dataset.
I have imported this into MySQL.
It was importing for about 2-3 hours.
It's a one table with about 7.500.000 rows and few columns.
Importing time was long.
Operating with MySQL queries on this dataset happens long too.
Do I really do good thing to use this with MySQL database?
Maybe I should use something like nosql database? Or serverless database?
I don't know if I am doing it proper way.
What should I do with it? How should I operate on this dataset?
Related
We have a need to do the initial data copy on a table that has 4+ billion records to target SQL Server (2014) from source MySQL (5.5). The table in question is pretty wide with 55 columns, however none of them are LOB. I'm looking for options for copying this data in the most efficient way possible.
We've tried loading via Attunity Replicate (which has worked wonderfully for tables not this large) but if the initial data copy with Attunity Replicate fails then it starts over from scratch ... losing whatever time was spent copying the data. With patching and the possibility of this table taking 3+ months to load Attunity wasn't the solution.
We've also tried smaller batch loads with a linked server. This is working but doesn't seem efficient at all.
Once the data is copied we will be using Attunity Replicate to handle CDC.
For something like this I think SSIS would be the most simple. It's designed for large inserts as big as 1TB. In fact, I'd recommend this MSDN article We loaded 1TB in 30 Minutes and so can you.
Doing simple things like dropping indexes and performing other optimizations like partitioning would make your load faster. While 30 minutes isn't a feasible time to shoot for, it would be a very straightforward task to have an SSIS package run outside of business hours.
My business doesn't have a load on the scale you do, but we do refresh our databases of more than 100M nightly which doesn't take more than 45 minutes, even with it being poorly optimized.
One of the most efficient way to load huge data is to read them by chunks.
I have answered many similar question for SQLite, Oracle, Db2 and MySQL. You can refer to one of them for to get more information on how to do that using SSIS:
Reading Huge volume of data from Sqlite to SQL Server fails at pre-execute (SQLite)
SSIS failing to save packages and reboots Visual Studio (Oracle)
Optimizing SSIS package for millions of rows with Order by / sort in SQL command and Merge Join (MySQL)
Getting top n to n rows from db2 (DB2)
On the other hand there are many other suggestions such as drop indexes in destination table and recreate them after insert, Create needed indexes on source table, use fast-load option to insert data ...
I have a MySQL database with fixed data, that never will change
or be edited, or be queried with complex queries.
IT just has 2 columns
Id|Data
And it contains about 50k rows, and has a size of around 70mb
I was thinking maybe I should created 50k static files which
will be named Id.xml and will be read that way. For example:
file_get_contents('2232.xml');
versus querying the mysql database
select from table where id = 2232
Is it better to do it this way, for a quicker performance
less ram usage? Or 50k inodes would not be ideal for the system?
Go for the static files. One benefit is not having another database in the system. The system can definitely handle 50k inodes (if it were in the millions, you may need to reconsider).
I am making a web app using the Django framework and would like some opinion regarding which database to use.
PostgreSQL works very nicely with Django and I think (please do correct me if I'm wrong) MySQL requires a bit more time and effort to work with Django.
My database in a single table will have around 60 million entries and it does read and write per request once but will sometimes require two reads making it I guess in a sense more read heavy.
Total expected DB size: around 10 tables each with around ~50 million entries.
My question is that will PostgreSQL suffice for having such a large number of entries while performing both read and writes or should I switch to MySQL because I heard MySQL is more advantageous to read heavy tasks.
Thank You.
Both MySQL and PostgreSQL are free to download and install. Install then, tune the servers for the expected load, insert 100 million rows of random data, and take some measurements.
PostgreSQL, when configured correctly for your hardware, will perform fine. (PostgreSQL's default settings are very conservative.) Its query optimizer and indexing options are far superior to MySQL.
Can any one guide me about my query?, i m making application for banking sector with fuzzy logic. i have to import table with 100 million rows daily. and i am using MySql for this application which is processing slowly. so is there any another server for handling my database which can access fast?
We roughly load about half that many rows a day in our RDBMS (Oracle) and it would not occur to me to implement such a thing without access to DBA knowledge about my RDBMS. We fine-tune this system several times a month and we still encounter new issues all the time. This is such a non-trivial task that the only valid answer is:
Don't play around, have your managers get a DBA who knows their business!
Note: Our system has been in place for 10 years now. It hasn't been built in a day...
100 million rows daily?
You have to be realistic. I doubt any single instance of any database out there can handle this type of thouroughput efficiently. You should probably look at clustering options and other optimising techniques such as splitting data in two diffent DB's (sharding).
MySQL Enterprise has a bunch of features built-in that could ease and moniter the clustering process, but I think MySQL community edition supports it too.
Good-luck!
How are you doing it?
One hugh transaction?
Perhaps try to make small transactions in chunks of 100 or 1000.
Is there an index on that table? Drop the index before starting the improt (if that is possible due to unique costraints etc.) and rebuild the index after the import.
An other option would perhaps be an in memory database.
Well it seems your business' main logic does not depend on importing those 100mio rows into a database, otherwise you wouldn't be stuck with doing this by yourself, right? (correct me if I'm wrong)
Are you sure you need to import those rows into a database when the main business doesn't need to? What kind of questions are you going to ask of the date? Can't you maybe keep the log files on a bunch of servers and query them with eg Hadoop? Or can you aggregate the information contained in the log files and only store a condensed version?
I'm also asking this because it sounds like you're planning to perform some at least moderately sophisticated analysis on the data and the trouble with this amount of data won't stop once you have it in a DB.
I have a MySQL database that contains millions of rows per table and there are 9 tables in total. The database is fully populated, and all I am doing is reads i.e., there are no INSERTs or UPDATEs. Data is stored in MyISAM tables.
Given this scenario, which linux file system would work best? Currently, I have xfs. But, I read somewhere that xfs has horrible read performance. Is that true? Should I shift the database to an ext3 file system?
Thanks
What about a RAM disk?
it's not about the FS but it can improve your SELECTs. did you evaluated the mysql table partitioning ?