I am currently importing a huge CSV file from my iPhone to a rails server. In this case, the server will parse the data and then start inserting rows of data into the database. The CSV file is fairly large and would take a lot time for the operation to end.
Since I am doing this asynchronously, my iPhone is then able to go to other views and do other stuff.
However, when it requests another query in another table.. this will HANG because the first operation is still trying to insert the CSV's information into the database.
Is there a way to resolve this type of issue?
As long as the phone doesn't care when the database insert is complete, you might want to try storing the CSV file in a tmp directory on your server and then have a script write from that file to the database. Or simply store it in memory. That way, once the phone has posted the CSV file, it can move on to other things while the script handles the database inserts asynchronously. And yes, #Barmar is right about using an InnoDB engine rather than MyISAM (which may be default in some configurations).
Or, you might want to consider enabling "low-priority updates" which will delay write calls until all pending read calls have finished. See this article about MySQL table locking. (I'm not sure what exactly you say is hanging: the update, or reads while performing the update…)
Regardless, if you are posting the data asynchronously from your phone (i.e., not from the UI thread), it shouldn't be an issue as long as you don't try to use more than the maximum number of concurrent HTTP connections.
Related
I'm developing one app that store data locally and send CSV when internet is available.
The goal is to populate an on-line MySql db.
The question is: considering an environment with very low quality internet connection, is better to send data as CSV to check and then put in db or directly to db?
I suppose directly populate db is less expensive in terms of connection but is safe for db integrity and consistency if the data connection get lost?
If you can insert the data with a single SQL statement, it will be treated as an atomic operation and either fail completely or succeed completely. If you need multiple insert statements, wrap the whole thing in a transaction so it gets treated as an atomic operation.
I'm developing an app in which I'll need to collect, from a MySQL server, a 5 years daily data (so, approximately 1825 rows of a table with about 6, 7 columns).
So, for handling this data, I can, after retrieving it, store it in a local SQLite database, or just keep it in memory.
I admit that, so far, the only advantage I could find for storing it in a local database, instead of just using what's already loaded, would be to have the data accessible in a next time the user were to open the app.
But I think I might not be taking into account all important factors.
Which factors should I take into account to decide between storing data in a local database or keep it in memory?
Best regards,
Nicolas Reichert
With respect, you're overthinking this. You're talking about a small amount of data: 2K rows is nothing for a MySQL server.
Therefore, I suggest you keep your app simple. When you need those rows in your app fetch them from MySQL. If you run the app again tomorrow, run the query again and fetch them again.
Are the rows the result of some complex query? To keep things simple you might consider creating a VIEW from the query. On the other hand, you can just as easily keep the query in your app.
Are the rows the result of a time-consuming query? In that case you could create a table in MySQL to hold your historical data. That way you'd only have to do the time-consuming query on your newer data.
At any rate, adding some alternative storage tech to your app (be it RAM or be it a local sqlite instance) isn't worth the trouble IMHO. Keep It Simple™.
If you're going to store the data locally, you have to figure out how to make it persistent. sqlite does that. It's not clear to me how RAM would do that unless you dump it to the file system.
I haven't got a lot of ETL experience but I haven't found the answer to my question either, although I guess it may be a no-brainer if you've worked with it. We're currently looking into creating a simple data warehouse (simple as in "copy most columns from most tables" and not OLAP-style) and it seems we're leaning towards SQL Server (2008) for a few reasons.
SSIS seems to be the tool for this kind of tasks when it comes to SQL Server, but I can't find anything about how it is affecting the source database cache, if at all, when loading data. Some of our installations are very sensitive performance-wise when it comes to having a usage-style-cache.
But if SSIS runs a "select *"-ish query and the cache is altered, then the performance for the users may degrade to unacceptable levels until it is rebuild from those queries again.
So my question is, does SSIS (or is there a way to avoid) affect the database cache when loading data from a SQL Server database?
Part of the problem is also that the source database could be both an Oracle or SQL Server database, so if there is a way to avoid the cache-affecting part for Oracle, that would be good input as well. (I guess the Attunity connector is the way to go?)
(Some additional info: We have considered plain files as well, but then export-import probably takes longer time than SSIS-transfer? I also guess change data capture is something we'll also look into, so if that is relevant to this question, feel free to include possible issues/benefits.)
Any other relevant suggestions are also welcome!
Thanks!
Tackling the SQL Server side:
First off, SSIS doesn't do anything special to avoid the buffer pool, or the plan cache.
Simple test (on a NON-production instance!):
Create a new SSIS package with a single connection manaager, and a single data flow containing one OLE DB Source, pointing to a table, similar to:
Clear the buffer pool, from SSMS: DBCC DROPCLEANBUFFERS
Verify that the cache has been cleared using the glorified dm_os_buffer_descriptors query at the top of this page: I get this:
Run the package
Re-run the query from step (2), and note that the data pages for the table (BOM_PIECE in my example) have been loaded into the cache:
Note that most SSIS components allow you to provide your own query, so if you have a way to avoid the buffer pool (I don't know that this is possible - I'd defer to someone who knows more about it), you could insert that into a query. So in the above example, instead of selecting Table or view in the OLE DB Source, you would select SQL command, or SQL command from variable if your command requires dynamic text.
Finally, I can imagine why you want to eliminate the cache load - but are you sure you want to do this? SQL Server is fairly good at managing memory, and what you're doing is swapping memory load for disk I/O load, which (depending on your use case) may have a negative impact on other users. This question has a discussion on SQL Server caching.
Read this article about Attunity regarding reading data from oracle
What do you mean "affect the database cache when loading data from a SQL Server database". SQL Server does not cache data, it caches execution plans. The fact that you are using SSIS wont affect your Server (other than the overhead of reading the data of course). Just use a propper transaction isolation level.
Also, read about the fast load property on SSIS components
About change data capture, I don't see how it can replace SSIS. You can use CDC to select the rows that will be loaded, but it wont do the loading for you.
I've been searching all over the place for streaming a file into MySQL using C, and I can't find anything. This is pretty easy to do in C++, C#, and many other languages, but I can't find anything for straight C.
Basically, I have a file, and I want to read that file into a TEXT or BLOB column in my MySQL database. This can be achieved pretty easily by looping through the file and using subsequent CONCAT() calls to append the data to the column. However, I don't think this is as elegant as a solution, and is probably very error prone.
I've looked into the prepared statements using mysql_stmt_init() and all the binds, etc, but it doesn't seem to accept a FILE pointer to read the data into the database.
It is important to note I am working with very large files that cannot be stored in RAM, so reading the entire file into a temporary variable is out of the question.
Simply put: how can I read a file from disk into a MySQL database using C? And keep in mind, there needs to be some type of buffer (ie, BUFSIZ due to the size of the files). Has anyone achieved this? Is it possible? And I'm looking for a solution that works both with text and binary files.
Can you use LOAD DATA INFILE in a call to mysql_query()?
char statement[STMT_SIZE];
snprintf(statement, STMT_SIZE, "LOAD DATA INFILE '%s' INTO TABLE '%s'",
filename, tablename);
mysql_query(conn, statement);
See
http://dev.mysql.com/doc/refman/5.6/en/load-data.html and http://dev.mysql.com/doc/refman/5.6/en/mysql-query.html for the corresponding pages in the MySQL docs.
You can use a loop to read through the file, but instead of using a function like fgets() that reads one line at a time, use a lower-level function like read() or fread() that will fill an arbitrary-sized buffer at a time:
allocate large buffer
open file
while NOT end of file
fill buffer
CONCAT to MySQL
close file
release buffer
I don't like answering my own questions, but I feel the need in case someone else is looking for a solution to this down the road.
Unless I'm missing something, my research and testing has shown me that I have three general options:
Decent Solution: use a LOAD DATA INFILE statement to send the file
pros: only one statement will ever be needed. Unlike loading the entire file into memory, you can tune the performance of LOAD DATA on both the client and the server to use a given buffer size, and you can make that buffer much smaller, which will give you "better" buffer control without making numerous calls
cons: First of all, the file absolutely MUST be in a given format, which can be difficult to do with binary blob files. Also, this takes a fair amount of work to set up, and requires a lot of tuning. By default, the client will try to load the entire file into memory, and use swap-space for the amount of the file that does not fit into memory. It's very easy to get terrible performance here, and every time you wish to make a change you have to restart the mysql server.
Decent Solution: Have a buffer (eg, char buf[BUFSIZ]), and make numerous queries with CONCAT() calls to update the content
pros: uses the least amount of memory, and gives the program better control over how much memory is being used
cons: takes up A LOT of processing time because you are making numerous mysql calls, and the server has to find the given row, and then append a string to it (which takes time, even with caching)
Worst Solution: Try to load the entire file into memory (or as much as possible), and make only one INSERT or UPDATE call to mysql
pros: limits the amount of processing performance needed on the client, as only a minimum number of calls (preferably one) will need to be buffered and executed.
cons: takes up a TON of memory. If you have numerous clients making these large calls simultaneously, the server will run out of memory quickly, and any performance gains will turn to losses very quickly.
In a perfect world, MySQL would implement a feature which allowed for buffering queries, something akin to buffering a video: you open a MySQL connection, then within that open a 'query connection' and stream the data in buffered sets, then close the 'query connection'
However, this is NOT a perfect world, and there is no such thing in MySQL. this leaves us with the three options shown above. I decided to stick with the second, where I make numerous CONCAT() calls because my current server has plenty of processing time to spare, and I'm very limited on memory in the clients. For my unique situation, trying to beat my head around tuning LOAD DATA INFILE doesn't make sense. Every application, however, will have to analyze it's own problem.
I'll stress none of these are "perfect" for me, but you can only do the best with what you have.
Points to Adam Liss for giving the LOAD DATA INFILE direction.
I am creating a site which will make lots of searches and I need to log data about every search that is made for later analysis.
I anticipate ultimately having load distributed between a number of servers, then each month I will download and import all logs into a single mysql database at my end for analysis.
At the moment I've been looking at setting every server up as a mysql 'master' which will live update the slave analysis server and essentially also act as a backup.
However I'm aiming for efficiency. Obviously the benefits of mysql replication are that I always have logs centrally available and don't have to import and reset log files on each server every month.
How much more efficient would it be to log in a plaintext file and just dump this logfile every month and import into mysql centrally? Is a plaintext dump much, if any, more efficient/faster than mysql?
Thanks for your thoughts!
Databases are strong for doing more than inserts. They are strong for locking mechanisms, transaction management, fast searches, connections pooling, and the list goes on.
On the other hand, if all you need to do in general is writing a chunk of data to the disk, a database would be a huge overhead.
Given the above, and since you only want to write stuff all month long, I would recommend you use logs, and once a month - take the logs, merge them together and analyze them. You could then decide if you want to merge all of them into a database (if it makes sense and gives you some added value), or you just want to merge the text together.
BTW, you may want to save the INSERT statements into this log, and then use it as a script to load everything into the database. Give it a thought :-)