I'm working on an hibernate Spring Mysql app, sometimes when i make a gethibernateTemplate()get(class,id) i can see a bunch of HQL in the logs and the application hangs, have to kill tomcat. This method reads trhough a 3,000 lines file, and there should be 18 files of these, i've been thinking i probably been looking at this wrong. I need you to help me check this at database level , but i don´t know hot to approach. Maybe my database can´t take so many hits so fast.
I´v looked in phpMyAdmin in the information bout executions time section, i see a red values in:
Innodb_buffer_pool_reads 165
Handler_read_rnd 40
Handler_read_rnd_next 713 k
Created_tmp_disk_tables 8
Opened_tables 30
Can i set the application some how to threat more gently the database ?
How can i check if this is the issue ?
Update
I put a
Thread.sleep(2000);
at the end of each cycle and it made the same numbers of calls (18), so i guess this wont be te reason ? can i discard this approach ?
This is a different view of this question
Hibernate hangs or throws lazy initialization no session or session was closed
trying some different
Update 2
Think it might be the buffer reader reading the file?? file is 44KB, tried this method:
http://code.hammerpig.com/how-to-read-really-large-files-in-java.html
class but did not work.
Update 1 -- do never use a Sleep or something slow within an transaction. A transaction has to be closed as fast as possible, because it can block other database operations (what exactly will be blocked depends on the isolation level)
I do not really understand how the database is related to the files in your usecase. But if the stuff works for the first file and become slow later on, then the problem can be the Hibernate Session (to many objects), in this case start an new Transaction/Hibernate Session for each file.
I rewrote the program so i load the information directly into the database using mysql query LOAD DATA INFILE. It works very fast. Then i updated rows changing some fields i need also with sql queries. I think there is simply too much information at the same time to manage trhough memory and abstractions.
Related
I'm running MariaDB 10.2.31 on Ubuntu 18.4.4 LTS.
On a regular basis I encounter the following conundrum - especially when starting out in the morning, that is when my DEV environment has been idle for the night - but also during the day from time to time.
I have a table (this applies to other tables as well) with approx. 15.000 rows and (amongst others) an index on a VARCHAR column containing on average 5 to 10 characters.
Notably, most columns including this one are GENERATED ALWAYS AS (JSON_EXTRACT(....)) STORED since 99% of my data comes from a REST API as JSON-encoded strings (and conveniently I simply store those in one column and extract everything else).
When running a query on that column WHERE colname LIKE 'text%' I find query-result durations of i.e. 0.006 seconds. Nice. When I have my query EXPLAINed, I can see that the index is being used.
However, as I have mentioned, when I start out in the morning, this takes way longer (14 seconds this morning). I know about the query cache and I tried this with query cache turned off (both via SET GLOBAL query_cache_type=OFF and RESET QUERY CACHE). In this case I get consistent times of approx. 0.3 seconds - as expected.
So, what would you recommend I should look into? Is my DB sleeping? Is there such a thing?
There are two things that could be going on:
1) Cold caches (overnight backup, mysqld restart, or large processing job results in this particular index and table data being evicted from memory).
2) Statistics on the table go stale and the query planner gets confused until you run some queries against the table and the statistics get refreshed. You can force an update using ANALYZE TABLE table_name.
3) Query planner heisenbug. Very common in MySQL 5.7 and later, never seen it before on MariaDB so this is rather unlikely.
You can get to the bottom of this by enablign the following in the config:
log_output='FILE'
log_slow_queries=1
log_slow_verbosity='query_plan,explain'
long_query_time=1
Then review what is in the slow log just after you see a slow occurrence. If the logged explain plan looks the same for both slow and fast cases, you have a cold caches issue. If they are different, you have a table stats issue and you need to cron ANALYZE TABLE at the end of the over night task that reads/writes a lot to that table. If that doesn't help, as a last resort, hard code an index hint into your query with FORCE INDEX (index_name).
Enable your slow query log with log_slow_verbosity=query_plan,explain and the long_query_time sufficient to catch the results. See if occasionally its using a different (or no) index.
Before you start your next day, look at SHOW GLOBAL STATUS LIKE "innodb_buffer_pool%" and after your query look at the values again. See how many buffer pool reads vs read requests are in this status output to see if all are coming off disk.
As #Solarflare mentioned, backups and nightly activity might be purging the innodb buffer pool of cached data and reverting bad to disk to make it slow again. As part of your nightly activites you could set innodb_buffer_pool_dump_now=1 to save the pages being hot before scripted activity and innodb_buffer_pool_load_now=1 to restore it.
Shout-out and Thank you to everyone giving valuable insight!
From all the tips you guys gave I think I am starting to understand the problem better and beginning to narrow it down:
First thing I found was my default innodb_buffer_pool_size of 134 MB. With the sort and amount of data I'm processing this is ridiculously low - so I was able to increase it.
Very helpful post: https://dba.stackexchange.com/a/27341
And from the docs: https://dev.mysql.com/doc/refman/8.0/en/innodb-buffer-pool-resize.html
Now that I have increased it to close to 2GB and am able to monitor its usage and RAM usage in general (cli: cat /proc/meminfo) I realize that my 4GB RAM is in fact on the low side of things. I am nowhere near seeing any unused overhead (buffer usage still at 99% and free RAM around 100MB).
I will start to optimize RAM usage of my daemon next and see where this leads - but this will not free enough RAM altogether.
#danblack mentioned innodb_buffer_pool_dump_now and innodb_buffer_pool_load_now. This is an interesting approach to maybe use whenever the daemon accesses the DB as I would love to separate my daemon's buffer usage from the front end's (apparently this is not possible!). I will look into this further but as my daemon is running all the time (not only at night) this might not be feasible.
#Gordan Bobic mentioned "refreshing" DBtables by using ANALYZE TABLE tableName. I found this to be quite fast and incorporated it into the daemon after each time it does an extensive read/write. This increases daemon run times by a few seconds but this is no issue at all. And I figure I can't go wrong with it :)
So, in the end I believe my issue to be a combination of things: Too small buffer size, too small RAM, too many read/write operations for that environment (evicting buffered indexes etc.).
Also I will have to learn more about memory allocation etc and optimize this better (large-pages=1 etc).
I'm trying to import a large (~30M row) MySQL database to ElasticSearch. Cool cool, there's a logstash tool that looks like it's built for this sort of thing; its JDBC plugin will let me connect right to the database and slurp up the rows real fast.
However! When I try it, it bombs with java.lang.OutOfMemoryError. Okay. It's probably trying to batch up too many rows or something. So I add jdbc_fetch_size => 1000 to my configuration. No dice, still out of memory. Okay, maybe that option doesn't work, or doesn't do what I think?
So I try adding jdbc_paging_enabled => true and jdbc_page_size => 10000 to my config. Success! It starts adding rows in batches of 10k to my index.
But it slows down. At first I'm running 100k rows/minute; by the time I'm at 2M rows, however, I'm at maybe a tenth of that. And no surprise; I'm pretty sure this is using LIMIT and OFFSET, and using huge OFFSETs in queries is real slow, so I'm dealing with an O(n^2) kind of thing here.
I'd really like to just run the whole big query and let the cursor iterate over the resultset, but it looks like that isn't working for some reason. If I had more control over the query, I could change the LIMIT/OFFSET thing out for an WHERE id BETWEEN val1 AND val2 kind of thing, but I can't see where I could get in to do that.
Any suggestions on how I can not crash, but still run at a reasonable speed?
Okay! After searching the issues for the logstash-input-jdbc github page for "Memory" I found this revelation:
It seems that an additional parameter ?useCursorFetch=true needs to be added to the connection string of mysql 5.x.
It turns out that the MySQL JDBC client does not use a cursor for fetching rows by default because of some reason, and the logstash client doesn't warn you that it's not able to use a cursor to iterate over the resultset even though you've set a jdbc_fetch_size because of some other reason.
The obvious way to know about this, of course, would have been to have carefully read the MySQL Connector/J documentation which does mention that cursors are off by default, though not why.
Anyhow, I added useCursorFetch=true to the connection string, kicked jdbc_query_paging to the curb, and imported 26M rows into my index in 2.5 hours, on an aging Macbook Pro with 8G memory.
Thanks to github user axhiao for the helpful comment!
I'm working on a migration from MySQL to Postgres on a large Rails app, most operations are performing at a normal rate. However, we have a particular operation that will generate job records every 30 minutes or so. There are usually about 200 records generated and inserted after which we have separate workers that pick up the jobs and work on them from another server.
Under MySQL it takes about 15 seconds to generate the records, and then another 3 minutes for the worker to perform and write back the results, one at a time (so 200 more updates to the original job records).
Under Postgres it takes around 30 seconds, and then another 7 minutes for the worker to perform and write back the results.
The table being written to has roughly 2 million rows, and 1 sequence column under ID.
I have tried tweaking checkpoint timeouts and sizes with no luck.
The table is heavily indexed and really shouldn't be any different than it was before.
I can't post code samples as its a huge codebase and without posting pages and pages of code it wouldn't make sense.
My question is, can anyone think of why this would possibly be happening? There is nothing in the Postgres log and the process of creating these objects has not changed really. Is there some sort of blocking synchronous write behavior I'm not aware of with Postgres?
I've added all sorts of logging in my code to spot errors or transaction failures but I'm coming up with nothing, it just takes twice as long to run, which doesn't seem correct to me.
The Postgres instance is hosted on AWS RDS on a M3.Medium instance type.
We also use New Relic, and it's showing nothing of interest here, which is surprising
Why does your job queue contain 2 million rows? Are they all live or are have not moved them to an archive table to keep your reporting more simple?
Have you used EXPLAIN on your SQL from a psql prompt or your preferred SQL IDE/tool?
Postgres is a completely different RDBMS then MySQL. It allocates space differently and manipulates space differently so may need to be indexed differently.
Additionally there's a tool called pgtune that will suggest configuration changes.
edit: 2014-08-13
Also, rails comes with a profiler that might add some insight. Here's a StackOverflow thread about rails profiling.
You also want to watch your DB server at the disk IO level. Does your job fulfillment to a large number of updates? Postgres created new rows when you update a existing rows, and marks the old rows as available, instead of just overwriting the existing row. So you may be seeing a lot more IO as a result of your RDBMS switch.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
We deploy an (AJAX - based) Instant messenger which is serviced by a Comet server. We have a requirement to store the sent messages in a DB for long-term archival purposes in order to meet legal retention requirements.
Which DB engine provides the best performance in this write-once, read never (with rare exceptions) requirement?
We need at least 5000 Insert/Sec. I am assuming neither MySQL nor PostgreSQL
can meet these requirements.
Any proposals for a higher performance solution? HamsterDB, SQLite, MongoDB ...?
Please ignore the above Benchmark we had a bug inside.
We have Insert 1M records with following columns: id (int), status (int), message (140 char, random).
All tests was done with C++ Driver on a Desktop PC i5 with 500 GB Sata Disk.
Benchmark with MongoDB:
1M Records Insert without Index
time: 23s, insert/s: 43478
1M Records Insert with Index on Id
time: 50s, insert/s: 20000
next we add 1M records to the same table with Index and 1M records
time: 78s, insert/s: 12820
that all result in near of 4gb files on fs.
Benchmark with MySQL:
1M Records Insert without Index
time: 49s, insert/s: 20408
1M Records Insert with Index
time: 56s, insert/s: 17857
next we add 1M records to the same table with Index and 1M records
time: 56s, insert/s: 17857
exactly same performance, no loss on mysql on growth
We see Mongo has eat around 384 MB Ram during this test and load 3 cores of the cpu, MySQL was happy with 14 MB and load only 1 core.
Edorian was on the right way with his proposal, I will do some more Benchmark and I'm sure we can reach on a 2x Quad Core Server 50K Inserts/sec.
I think MySQL will be the right way to go.
If you are never going to query the data, then i wouldn't store it to a database at all, you will never beat the performance of just writing them to a flat file.
What you might want to consider is the scaling issues, what happens when it's to slow to write the data to a flat file, will you invest in faster disk's, or something else.
Another thing to consider is how to scale the service so that you can add more servers without having to coordinate the logs of each server and consolidate them manually.
edit: You wrote that you want to have it in a database, and then i would also consider security issues with havening the data on line, what happens when your service gets compromised, do you want your attackers to be able to alter the history of what have been said?
It might be smarter to store it temporary to a file, and then dump it to an off-site place that's not accessible if your Internet fronts gets hacked.
If you don't need to do queries, then database is not what you need. Use a log file.
it's only stored for legal reasons.
And what about the detailed requirements? You mention the NoSQL solutions, but these can't promise the data is realy stored on disk. In PostgreSQL everything is transaction safe, so you're 100% sure the data is on disk and is available. (just don't turn of fsync)
Speed has a lot to do with your hardware, your configuration and your application. PostgreSQL can insert thousands of record per second on good hardware and using a correct configuration, it can be painfully slow using the same hardware but using a plain stupid configuration and/or the wrong approach in your application. A single INSERT is slow, many INSERT's in a single transaction are much faster, prepared statements even faster and COPY does magic when you need speed. It's up to you.
I don't know why you would rule out MySQL. It could handle high inserts per second. If you really want high inserts, use the BLACK HOLE table type with replication. It's essentially writing to a log file that eventually gets replicated to a regular database table. You could even query the slave without affecting insert speeds.
Firebird can easily handle 5000 Insert/sec if table doesn't have indices.
Depending in your system setup MySql can easily handle over 50.000 inserts per sec.
For tests on a current system i am working on we got to over 200k inserts per sec. with 100 concurrent connections on 10 tables (just some values).
Not saying that this is the best choice since other systems like couch could make replication/backups/scaling easier but dismissing mysql solely on the fact that it can't handle so minor amounts of data it a little to harsh.
I guess there are better solutions (read: cheaper, easier to administer) solutions out there.
Use Event Store (https://eventstore.org), you can read (https://eventstore.org/docs/getting-started/which-api-sdk/index.html) that when using TCP client you can achieve 15000-20000 writes per second. If you will ever need to do anything with data, you can use projections or do the transformations based on streams to populate any other datastore you wish.
You can create even cluster.
If money plays no role, you can use TimesTen.
http://www.oracle.com/timesten/index.html
A complete in memory database, with amazing speed.
I would use the log file for this, but if you must use a database, I highly recommend Firebird. I just tested the speed, it inserts about 10k records per second on quite average hardware (3 years old desktop computer). The table has one compound index, so I guess it would work even faster without it:
milanb#kiklop:~$ fbexport -i -d test -f test.fbx -v table1 -p **
Connecting to: 'LOCALHOST'...Connected.
Creating and starting transaction...Done.
Create statement...Done.
Doing verbatim import of table: TABLE1
Importing data...
SQL: INSERT INTO TABLE1 (AKCIJA,DATUM,KORISNIK,PK,TABELA) VALUES (?,?,?,?,?)
Prepare statement...Done.
Checkpoint at: 1000 lines.
Checkpoint at: 2000 lines.
Checkpoint at: 3000 lines.
...etc.
Checkpoint at: 20000 lines.
Checkpoint at: 21000 lines.
Checkpoint at: 22000 lines.
Start : Thu Aug 19 10:43:12 2010
End : Thu Aug 19 10:43:14 2010
Elapsed : 2 seconds.
22264 rows imported from test.fbx.
Firebird is open source, and completely free even for commercial projects.
I believe the answer will as well depend on hard disk type (SSD or not) and also the size of the data you insert. I was inserting a single field data into MongoDB on a dual core Ubuntu machine and was hitting over 100 records per second. I introduced some quite large data to a field and it dropped down to about 9ps and the CPU running at about 175%! The box doesn't have SSD and so I wonder if I'd have gotten better with that.
I also ran MySQL and it was taking 50 seconds just to insert 50 records on a table with 20m records (with about 4 decent indexes too) so as well with MySQL it will depend on how many indexes you have in place.
Having a major hair-pulling issue with extremely slow inserts from Delphi 2010 to a remote MySQL 5.09 server.
So far, I have tried:
ADO using MySQL ODBC Driver
Zeoslib v7 Alpha
MyDAC
I have used batching and direct insert with ADO (using table access), and with Zeos I have used SQL insertion with a Query, then used Table direct mode and also cached updates Table mode using applyupdates and commit. With MyDAC I used table access mode, then direct SQL insert and then batched SQL insert
All technologies I have tried, I set compression on and off with no discernable difference.
So far I have seen a pretty much the same across the board 7.5 records per second!!!
Now, I would from this point assume that the remote server is just slow, but the MySQL Workbench is amazingly fast, and the Migration toolkit managed the initial migration very quickly (to be honest, I don't recall how quickly - which kind of means that it was quick)
Edit 1
It is quicker for me to write the sql to a file, upload the file to the server via ftp and then import it direct on the remote server - I wonder if they perhaps are throttling incoming MySQL traffic, but that doesn't explain why the MySQL Workbench was so quick!
Edit 2
At the most basic level, the code has been:
while not qMSSQL.EOF do
begin
qMySQL.SQL.Clear;
qMySQL.SQL.Add('INSERT INTO tablename (fieldname1) VALUES (:fieldname1)');
qMySQL.ParamByName('fieldname1').asString:=qMSSQL.FieldByName('fieldname1').asString;
qMySQL.ExecSQL;
qMSSQL.Next;
end;
I then tried
qMySQL.CachedUpdates:=true;
i:=0;
while not qMSSQL.EOF do
begin
qMySQL.SQL.Clear;
qMySQL.SQL.Add('INSERT INTO tablename (fieldname1) VALUES (:fieldname1)');
qMySQL.ParamByName('fieldname1').asString:=qMSSQL.FieldByName('fieldname1').asString;
qMySQL.ExecSQL;
inc(i);
if i>100 then
begin
qMySQL.ApplyUpdates;
i:=0;
end;
qMSSQL.Next;
end;
qMySQL.ApplyUpdates;
Now, in this code with CachedUpdates:=False (which obviously never actually wrote back to the database) the speed was blisteringly fast!!
To be perfectly honest, I think it's the connection - I feel it's the connection... Just waiting for them to get back to me!
Thanks for all your help!
You can try AnyDAC and it Array DML feature. It may speedup a standard SQL INSERT for few times.
Sorry that this reply comes long after you asked the question.
I had a similar problem. BDS2006 to MySQL via ODBC across the network - took 25 minutes to run - around 25 inserts per second. I was using a TDatabase connection and attached the TTable Tquery to it. Prepared the SQL statements.
The major improvement was when I started starting transactions within the loop. A simple example, Memebrships have Member Period. Start a transaction before the insert of the Membership and Members, Commit after. The number of memberships was 01585 and before transactions it took 279.90 seconds to process all the Membership records but after it took 6.71 seconds.
Almost too good to believe and am still working through fixing the code for the other slow bits.
Maybe Mark you have solved your problem but it may help someone else.
Are you using query parameters? The fastest way to insert should be using plain queries and parameters (i.e. INSERT INTO table (field) VALUES (:field) ), preparing the query and then assigning parameters and executing as many times as required within a single transaction - committing at the end (don't use any flavour of autocommit)
That in most databases avoids hard parses each time the query is executed, which requires time. Parameters allow the query to be parsed only once, and then re-executed many times as needed.
Use the server facilites to check what's going on - many offer a way to inspect what running statements are doing.
I'm not sure about ZeosLib, but using ADO with ODBC driver, you will not get the fastest way to insert the records, here few step that may make your insertion faster:
Use Mydac for direct access, they work without the slow ODBC > ADO > OLEDB > MySqlLib to connect to Mysql.
Open the connection at first before the insertion.
if you have large insertion such as 1000 or more, try use transaction and commit after 100 record or more depend on number of records.
Point 3 may makes your insertion faster even with ZeosLib or ADO.
You've got two separate things going on here. First, your Delphi program is creating Insert statements and sending them to the DB server, and then the server is handling them. You need to examine both ends to find the bottleneck. I'm not to familiar with MySql tools, but I bet you could find a SQL profiler for it easily enough. Use it to profile your inserts from the Delphi app, and compare it to running inserts from the Workbench tool and see if there's a significant difference.
If not, then the slowdown is in your app. Try hooking it up to Sampling Profiler or some other profiling tool that understands Delphi, and it'l show you where you're spending lots of time on. Once you know that, then you can work on attacking the problem, or maybe come back here to ask a more specific question. But until you know where the problem is coming from, any answers you get here are just gonna be educated guesses at best.