Cassandra giving time out exception after some inserts - exception

I am using Cassandra 1.0.6 version... I have around ~1million JSON Objects of 5KB each to be inserted to the cassandra. As the inserts goes on, the memory consumption of cassandra also goes up until it gets stable to certain point.. After some inserts(around 2-3 lkhs) the ruby client gives me "`recv_batch_mutate': CassandraThrift::TimedOutException" exception.
I have also tried inserting 1KB sized JSON Objects more than a million times. This doesnt give any exception. Also in this experiment I plotted a graph between time taken by 50000 inserts vs number of 50000 inserts. I could find that there is a sharp rise in time taken to inserts after some iterations and suddenly that falls down. This could be due to Garbage collection done by JVM. But the same doesnt happen while inserting 5KB of data for a million times.
What may be the problem?? Some of the configuration options which I am using:-
System:-
8GB with 4-core ..
Cassandra configuration:-
- concurrent_writes: 64
memtable_flush_writers: 4
memtable_flush_queue_size: 8
rpc_server_type: sync
thrift_framed_transport_size_in_mb: 30
in_memory_compaction_limit_in_mb: 64
multithreaded_compaction: true
Do I need to do any changes in configuration. Is it related to JVM heap space or due to Garbage collection ??

You can increase the rpc timeout to a larger value in cassandra config file, look for rpc_timeout_in_ms . But you should really look into your ruby client on the connection part.
# Time to wait for a reply from other nodes before failing the command
rpc_timeout_in_ms: 10000

Related

Neo4j batch insertion with .CSV files taking huge amount of time to sort&index

I'm trying to create a database with data collected from google n-grams. It's actually a lot of data, but after the creation of the CSV files the insertion was pretty fast. The problem is that, immediately after the insertion, the neo4j-import tool indexes the data, and this step its taking too much time. It's been more than an hour and it looks like it achieved 10% of progress.
Nodes
[*>:9.85 MB/s---------------|PROPERTIES(2)====|NODE:198.36 MB--|LABE|v:22.63 MB/s-------------] 25M
Done in 4m 54s 828ms
Prepare node index
[*SORT:295.94 MB-------------------------------------------------------------------------------] 26M
This is the console info atm. Does anyone have a suggestion about what to do to speed up this process?
Thank you. (:
Indexing takes a long time depending on number of nodes. I tried indexing with 10 million nodes and it took around 35 minutes, but you can still try these settings :
Increase your page cache size which is stored in '/var/lib/neo4j/conf/neo4j.properties' file (in my ubuntu system). Edit the following line
dbms.pagecache.memory=4g
according to your RAM, allocate size, here, 4g means 4gb space. Also, you can try changing java memory size which is stored in neo4j-wrapper.conf
wrapper.java.initmemory=1024
wrapper.java.maxmemory=1024
You can also read neo4j documentation on this - http://neo4j.com/docs/stable/configuration-io-examples.html

IOPS or Throughput? - Determining Write Bottleneck in Amazon RDS Instance

We have nightly load jobs that writes several hundred thousand records to an Mysql reporting database running in Amazon RDS.
The load jobs are taking several hours to complete, but I am having a hard time figuring out where the bottleneck is.
The instance is currently running with General Purpose (SSD) storage. By looking at the cloudwatch metrics, it appears I am averaging less than 50 IOPS for the last week. However, Network Receive Throughput is less than 0.2 MB/sec.
Is there anyway to tell from this data if I am being bottlenecked by network latency (we are currently loading the data from a remote server...this will change eventually) or by Write IOPS?
If IOPS is the bottleneck, I can easily upgrade to Provisioned IOPS. But if network latency is the issue, I will need to redesign our load jobs to load raw data from EC2 instances instead of our remote servers, which will take some time to implement.
Any advice is appreciated.
UPDATE:
More info about my instance. I am using an m3.xlarge instance. It is provisioned for 500GB in size. The load jobs are done with the ETL tool from pentaho. They pull from multiple (remote) source databases and insert into the RDS instance using multiple threads.
You aren't using up much CPU. Your memory is very low. An instance with more memory should be a good win.
You're only doing 50-150 iops. That's low, you should get 3000 in a burst on standard SSD-level storage. However, if your database is small, it is probably hurting you (since you get 3 iops per GB- so if you are on a 50gb or smaller database, consider paying for provisioned iops).
You might also try Aurora; it speaks mysql, and supposedly has great performance.
If you can spread out your writes, the spikes will be smaller.
A very quick test is to buy provisioned IOPS, but be careful as you may get fewer than you do currently during a burst.
Another quick means to determine your bottleneck is to profile your job execution application with a profiler that understands your database driver. If you're using Java, JProfiler will show the characteristics of your job and it's use of the database.
A third is to configure your database driver to print statistics about the database workload. This might inform you that you are issuing far more queries than you would expect.
Your most likely culprit accessing the database remotely is actually round-trip latency. The impact is easy to overlook or underestimate.
If the remote database has, for example, a 75 millisecond round-trip time, you can't possibly execute more than 1000 (milliseconds/sec) / 75 (milliseconds/round trip) = 13.3 queries per second if you're using a single connection. There's no getting around the laws of physics.
The spikes suggest inefficiency in the loading process, where it gathers for a while, then loads for a while, then gathers for a while, then loads for a while.
Separate but related, if you don't have the MySQL client/server compression protocol enabled on the client side... find out how to enable it. (The server always supports compression but the client has to request it during the initial connection handshake), This won't fix the core problem but should improve the situation somewhat, since less data to physically transfer could mean less time wasted in transit.
I'm not an RDS expert and I don't know if my own particular case can shed you some light. Anyway, hope this give you any kind of insight.
I have a db.t1.micro with 200GB provisioned (that gives be 600 IOPS baseline performance), on a General Purpose SSD storage.
The heaviest workload is when I aggregate thousands of records from a pool of around 2.5 million rows from a 10 million rows table and another one of 8 million rows. I do this every day. This is what I average (it is steady performance, unlike yours where I see a pattern of spikes):
Write/ReadIOPS: +600 IOPS
NetworkTrafficReceived/Transmit throughput: < 3,000 Bytes/sec (my queries are relatively short)
Database connections: 15 (workers aggregating on parallel)
Queue depth: 7.5 counts
Read/Write Throughput: 10MB per second
The whole aggregation task takes around 3 hours.
Also check 10 tips to improve The Performance of your app in AWS slideshare from AWS Summit 2014.
I don't know what else to say since I'm not an expert! Good luck!
In my case it was the amount of records. I was writing only 30 records per minute and had an Write IOPS of round about the same 20 to 30. But this was eating at the CPU, which reduced the CPU credit quite steeply. So I took all the data in that table and moved it to another "historic" table. And cleared all data in that table.
CPU dropped back down to normal measures, but Write IOPS stayed about the same, this was fine though. The problem: Indexes, I think because so many records needed to indexed when inserting it took way more CPU to do this indexing with that amount of rows. Even though the only index I had was a Primary Key.
Moral of my story, the problem is not always where you think it lies, although I had increased Write IOPS it was not the root cause of the problem, but rather the CPU that was being used to do index stuff when inserting which caused the CPU credit to fall.
Not even X-RAY on the Lambda could catch an increased query time. That is when I started to look at the DB directly.
Your Queue depth graph shows > 2 , which clearly indicate that the IOPS is under provisioned. (if Queue depth < 2 then IOPS is over provisioned)
I think you have used the default AUTOCOMMIT = 1 (autocommit mode). It performs a log flush to disk for every insert and exhausted the IOPS.
So,It is better to use (for performance tuning) AUTOCOMMIT = 0 before bulk inserts of data in MySQL, if the insert query looks like
set AUTOCOMMIT = 0;
START TRANSACTION;
-- first 10000 recs
INSERT INTO SomeTable (column1, column2) VALUES (vala1,valb1),(vala2,valb2) ... (val10000,val10000);
COMMIT;
--- next 10000 recs
START TRANSACTION;
INSERT INTO SomeTable (column1, column2) VALUES (vala10001,valb10001),(vala10001,valb10001) ... (val20000,val20000);
COMMIT;
--- next 10000 recs
.
.
.
set AUTOCOMMIT = 1
Using the above approach in t2.micro and inserted 300000 in 15 minutes using PHP.

OLEDB Source Task Failing

I have a DataFlow task that is supposed to extract around 18 million records and after performing some tasks on them, insert them into another OLEDB Destination.
The problem can be seen in the screenshot below.
The errors that I receive are like:
Information: The buffer manager has allocated 65536 bytes, even though
the memory pressure has been detected and repeated attempts to swap
buffers have failed.
I tried changing DefaultBufferMaxRows from 10000 as it was originally set to 100000 and even 150000, but it didn't work out and increasing the number lead to even less records coming through the Source 3 million and 1 million respectively as opposed to 8 million when the value was at 10000.
I would appreciate if someone could help me out in this.
For all those stumbling upon this question later on,
I solved it by increasing DefaultBufferSize in Data Flow Task's properties from 10485760 to 104857600.
The package took around 10 hours to complete but it completed without any glitches, although there were a few warnings regarding full buffers.

Abnormally high MySQL writes and larger than normal Binary Log files. How can I determine what caused this?

We have a MySQL master database which replicates to a MySQL slave. We were experiencing issues where MySQL was showing a high number of writes (but not an increased number of queries being ran) for a short period of time (a few hours). We are trying to investigate the cause.
Normally our binary logs are 1 GB in file size but during the period that we were experiencing these issues, the log files jumped to 8.5 GB.
When I run mysqlbinlog --short-form BINARYLOG.0000 on one of the 8.5 GB binary log it only returns 196 KB of queries and data. When I run mysqlbinlog --short-form on a normal binary log (1 GB) it returns around 8,500 KB worth of queries and database activity. That doesn't make sense because it has 7 GB more of data yet returns less than a 1 GB binary log file.
I see lots of these statements with very sequential timestamps, but I'm not sure if that's related to the problem because they're in both the normal period as well as when we experienced these issues.
SET TIMESTAMP=1391452372/*!*/;COMMIT/*!*/;
SET TIMESTAMP=1391452372/*!*/;BEGIN/*!*/;COMMIT/*!*/;
SET TIMESTAMP=1391452372/*!*/;BEGIN/*!*/;COMMIT/*!*/;
SET TIMESTAMP=1391452372/*!*/;BEGIN/*!*/;COMMIT/*!*/;
How can I determine what caused those binary logs to balloon in size which also caused high writes, so much so it took the server offline at points, almost like a DDoS attack would?
How could mysqlbinlog return so much less data, even though the binary log file itself had 7 GB more? What can I do to identify the difference between a normal period where the binary logs are 1 GB to the period we had issues with the 8 GB binary log? Thank you for any help you can provide.
Bill
I would guess that your log contains some form of LOAD DATA [LOCAL] INFILE commands and the data files associated with them. These commands do not generate much SQL output as their data is written to a temporary file by mysqlbinlog during processing. Can you check if the output contains any such LOAD DATA commands?

Matlab loading large dataset returns 0

I am trying to load a simple, but fairly large table from a MySQL database into Matlab (the table is about 16'000'000 x 18)
The MySQL database size is 2.6 GB, and my Windows machine has 32 GB of memory, so in principle, memory should not be a problem.
I tried to load the data via a simple fetch:
curs = exec(conn,['SELECT * FROM mydb.large_table']);
curs = fetch(curs);
data = curs.Data;
I also tried to use the select function, but in both instances data is simply returned as 0.
As there are no error messages, and as it does not seem that I am even close to any matlab or memory size restrictions, I am at a loss to understand what is going wrong.
Any help would be greatly appreciated.
[added : ]
Did some further checks:
Tried to pre-allocate memory for the table, pulling in the first row of the table and then replicating it 20 million time. No problem there.
Was able to pull in 2.5m rows. 6m failed again
Watching the memory consumption (Windows Task Manager), I noticed that onces the exec statement pulled all the data from the database onto the local machine, the fetch command started to eat up memory. For 6m rows, the memory in use first increased to the full available 32 GB, then dropped to 2GB (where it stayed), but the Committed memory went up a staggering 125 GB !
I have absolutely no idea what is going on here