Large data upload using Scala JDBC

Large data upload using Scala JDBC - mysql

I am trying to implement a function that uploads around 40 million records to a MySQL database that is hosted on AWS. However, my write statement gets stuck at 94% for an infinitely long time.
This is the command I'm using to upload df_intermediate.write.mode("append").jdbc(jdbcUrl, "user", connectionProperties) with rewriteBatchedStatements and useServerPrepStmts enabled in the connection properties.
This statement works for small number of points(50000) but is unable to handle this large amount. I've also increased the maximum number of connections on the MySQL side.
EDIT: I'm running this on GCP n1-standard-16 machines.
Why could be the reasons that write is stuck at 94%?

I don't think this has anything to do with Scala really, you are just saying you want to add many many rows into a DB. The quick answer would be to not have all these in the one transaction, and to commit these lets say 100 at a time. Try this on a non-production sql database first to see if that works.

Related

MySQL performance difference between local and prod with large text column

I'm trying to figure out what could account for the very large performance difference between my dev environment (5 year old laptop) and our stage server (azure cloud). The table in question is a log table of web service requests for a service that processes XML. One of the columns in the table is the XML passed to the web service.
On my local computer it basically doesn't matter how many rows are in the table; performance is great. On the deployed server if there are more than a couple hundred rows then performance starts tanking quickly. A "select count(*)" on this table when it has 2000 rows in it will take 0.0017 seconds locally but close to 20 on the server. Even a simple insert of a new row takes a significant amount of time; on the order of whole seconds.
I found this article while researching the problem explanation of MySQL block performance. That makes sense to me and I'd be happy to implement the 1-to-1 solution but I don't want to do it until I understand why it's working fine locally and tanking on the server.
Are there some MySQL setting variables I can check to find the differences? I'd really like to get my local computer to have the same performance issue as the deployed so I can validate that the fix will work.
EDIT:
The create table queries are identical. MySQL versions are 5.7.23 and 5.7.22. I did notice that the buffer is 16x bigger on my local. Gonna try and get the server updated to the setting my local has and see if that resolves the issue.

The solution was updating the buffer pool size like Rick suggested.

Very large mysql Insert query taking lifetime to execute

I am executing a mysql insert query having very large amount of linestring data. but it's taking infinite time to execute the query. (insert query is so big that it takes more than 5 minutes to paste on the terminal).
My content-length is 10 mb as per request header. so i thought it must be due to mysql query size but that is 34mb. i also increase connection timeout to infinite but still it's not executing.
Insert into table (id,data) value (1,LinestringFromText('LINESTRING(0 9,8 0... so on and it goes on)'))
If i try the same query with the low amount of data it execute smoothly but with large amount of data it stucks.

You shouldn't use an App Engine application to hold and run a mysql server, reasons being that you shouldn't rely on an App Engine instance to hold and serve dynamic data (see how instances are managed). However, if having your mysql server in App engine is a must, you can try to change the instance class of your application in the app.yaml configuration file, to one of the following. A machine with more resources, should translate to a better performance.
However, I would recommend you to use Cloud SQL to hold and serve your data, and connect to it from your application. See this tutorial for an overview on how can you do this, which is based in this source code. This example also uses the mysql package that you linked.

Store in memory or in local database

I'm developing an app in which I'll need to collect, from a MySQL server, a 5 years daily data (so, approximately 1825 rows of a table with about 6, 7 columns).
So, for handling this data, I can, after retrieving it, store it in a local SQLite database, or just keep it in memory.
I admit that, so far, the only advantage I could find for storing it in a local database, instead of just using what's already loaded, would be to have the data accessible in a next time the user were to open the app.
But I think I might not be taking into account all important factors.
Which factors should I take into account to decide between storing data in a local database or keep it in memory?
Best regards,
Nicolas Reichert

With respect, you're overthinking this. You're talking about a small amount of data: 2K rows is nothing for a MySQL server.
Therefore, I suggest you keep your app simple. When you need those rows in your app fetch them from MySQL. If you run the app again tomorrow, run the query again and fetch them again.
Are the rows the result of some complex query? To keep things simple you might consider creating a VIEW from the query. On the other hand, you can just as easily keep the query in your app.
Are the rows the result of a time-consuming query? In that case you could create a table in MySQL to hold your historical data. That way you'd only have to do the time-consuming query on your newer data.
At any rate, adding some alternative storage tech to your app (be it RAM or be it a local sqlite instance) isn't worth the trouble IMHO. Keep It Simple™.
If you're going to store the data locally, you have to figure out how to make it persistent. sqlite does that. It's not clear to me how RAM would do that unless you dump it to the file system.

Which would be more efficient, having each user create a database connection, or caching?

I'm not sure if caching would be the correct term for this but my objective is to build a website that will be displaying data from my database.
My problem: There is a high probability of a lot of traffic and all data is contained in the database.
My hypothesized solution: Would it be faster if I created a separate program (in java for example) to connect to the database every couple of seconds and update the html files (where the data is displayed) with the new data? (this would also increase security as users will never be connecting to the database) or should I just have each user create a connection to MySQL (using php) and get the data?
If you've had any experiences in a similar situation please share, and I'm sorry if I didn't word the title correctly, this is a pretty specific question and I'm not even sure if I explained myself clearly.

Here are some thoughts for you to think about.
First, I do not recommend you create files but trust MySQL. However, work on configuring your environment to support your traffic/application.
You should understand your data a little more (How much is the data in your tables change? What kind of queries are you running against the data. Are your queries optimized?)
Make sure your tables are optimized and indexed correctly. Make sure all your query run fast (nothing causing a long row locks.)
If your tables are not being updated very often, you should consider using MySQL cache as this will reduce your IO and increase the query speed. (BUT wait! If your table is being updated all the time this will kill your server performance big time)
Your query cache is set to "ON". Based on my experience this is always bad idea unless your data does not change on all your tables. When you have it set to "ON" MySQL will cache every query. Then as soon as they data in the table changes, MySQL will have to clear the cached query "it is going to work harder while clearing up cache which will give you bad performance." I like to keep it set to "ON DEMAND"
from there you can control which query should be cache and which should not using SQL_CACHE and SQL_NO_CACHE
Another thing you want to review is your server configuration and specs.
How much physical RAM does your server have?
What types of Hard Drives are you using? SSD is not at what speed do they rotate? perhaps 15k?
What OS are you running MySQL on?
How is the RAID setup on your hard drives? "RAID 10 or RAID 50" will help you out a lot here.
Your processor speed will make a big different.
If you are not using MySQL 5.6.20+ you should consider upgrading as MySQL have been improved to help you even more.
How much RAM does your server have? is your innodb_log_buffer_size set to 75% of your total physical RAM? Are you using innodb table?
You can also use MySQL replication to increase the read sources of the data. So you have multiple servers with the same data and you can point half of your traffic to read from server A and the other half from Server B. so the same work will be handled by multiple server.
Here is one argument for you to think about: Facebook uses MySQL and have millions of hits per seconds but they are up 100% of the time. True they have trillion dollar budget and their network is huge but the idea here is to trust MySQL to get the job done.

RoR / MySQL: How many Ruby instances can work on 1 MySQL database (in parallel)?

I have a script in a Controller that I launch from the Ruby on Rails console (IRB).
This script constantly Creates-Reads-Updates (no deletions) a MySQL database, taking data from the Interwebs.
The problem is that it takes very long until all the required data is put into the database. So I would like to know if it is a good idea to simply open several Rails consoles and launch that script several times in parallel.
-> Several Ruby instances would work 1 database.
Is that a problem? Could this create any write conflicts (Create/Update) in the database? If so, is there anything I would have to do in order to avoid such conflicts?
If it's not a problem: How many Ruby instances could I "unleash" onto the database, in parallel?

You can definitely run multiple consoles simultaneously against a single database. The limit is the number of open connections the database allows. In Mysql 5.1, the default was 100, and in 5.5 it's 151. You're unlikely to run out of connections before something else becomes the bottleneck.
It might just work to have multiple processes running simultaneously, but it might not. The complete analysis of this is fairly complicated. A couple things you can do to ensure it will work properly with multiple simultaneous clients. First, if you wrap each change in a database transaction that will take care of most of what you need:
transaction do
# all your code to create / modify a single item goes here
end
Make sure your tables are using the InnoDB format instead of MyISAM which doesn't support transactions.
Also, as mu too short points out, put all the validation constraints you can directly into the database. So if you have uniqueness constraints or foreign key relations, add them to your schema by hand, since rails doesn't do it by default. Complex validations that compare different model objects (aside from FK relations as in belongs_to) could require database trigger validations -- hopefully you don't need that. But if you get all your validations in the database natively, and then everything should work.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008