OLEDB Source Task Failing - mysql

I have a DataFlow task that is supposed to extract around 18 million records and after performing some tasks on them, insert them into another OLEDB Destination.
The problem can be seen in the screenshot below.
The errors that I receive are like:
Information: The buffer manager has allocated 65536 bytes, even though
the memory pressure has been detected and repeated attempts to swap
buffers have failed.
I tried changing DefaultBufferMaxRows from 10000 as it was originally set to 100000 and even 150000, but it didn't work out and increasing the number lead to even less records coming through the Source 3 million and 1 million respectively as opposed to 8 million when the value was at 10000.
I would appreciate if someone could help me out in this.

For all those stumbling upon this question later on,
I solved it by increasing DefaultBufferSize in Data Flow Task's properties from 10485760 to 104857600.
The package took around 10 hours to complete but it completed without any glitches, although there were a few warnings regarding full buffers.

Related

How to use Flush Command in MySQL Effectively?

We are facing a problem, our DB instance MySQL 8.0 (Production environment) is continuously showing an alert that number of open tables is equal to table_open_cache value. The number of open tables is increased more than 43,200 in 24 hour observation period which makes total count of open tables equals to 2845063.
Please help me how to reduce this, If I go for Flush tables command with read only or with read lock will it cause any data loss or performance issues. I have to implement this to my production Database, Is it a good practice to use Flush tables manually once a day.
Posted a question regarding MySQL DB instance open tables, need to know how to reduce the same by any method. Is it a good practice to use Flush tables manually once a day.
I am attaching an image for reference :-
image1
Misses/Hits is about 2% -- reasonable.
Apparently that screenshot should be talking about "opened" tables, not "open" tables. Only 4K are currently "open", limited by table_open_cache.
The image shows 43.2K vs 2.8M -- it is unclear what each means. 43.2K/24h is exactly 1 per 2 seconds. This is suspect.
2.8M openings of tables in 24 hours is high, but not necessarily "bad. (It's about the 95th percentile.)
Suggest increasing table_open_cache to 8000. What activity is going on? Perhaps you are opening a connection, performing a single operation (which involves opening one or more tables), then disconnecting? Can you cut back on the rapidity of creating connections?
Please provide SHOW GLOBAL STATUS LIKE 'Connection'; 50 per second is "high".
I await seeing Opened_tables and Uptime fetched at the 'same' time.
No, I don't think FLUSH is the answer.

Loading 5 million records from server A to Server B using SSIS

I'm new to SSIS, I need to your suggestion, I have created SSIS package which retrieve data from source server around 5 million records from server A and save data into destination server. in this process it is taking nearly 3 hours to complete the task. can we have any other way to reduce the timeline. I have tried to increase the buffer size, but still same.
Thanks in Advance.
There are many factors influencing the speed of execution. both hardware and software. Based on the structure of the database, a solution can be determined.
In a test project, I have transferred 40 million records in 30 minutes on a system with 4 GB of RAM.

Neo4j batch insertion with .CSV files taking huge amount of time to sort&index

I'm trying to create a database with data collected from google n-grams. It's actually a lot of data, but after the creation of the CSV files the insertion was pretty fast. The problem is that, immediately after the insertion, the neo4j-import tool indexes the data, and this step its taking too much time. It's been more than an hour and it looks like it achieved 10% of progress.
Nodes
[*>:9.85 MB/s---------------|PROPERTIES(2)====|NODE:198.36 MB--|LABE|v:22.63 MB/s-------------] 25M
Done in 4m 54s 828ms
Prepare node index
[*SORT:295.94 MB-------------------------------------------------------------------------------] 26M
This is the console info atm. Does anyone have a suggestion about what to do to speed up this process?
Thank you. (:
Indexing takes a long time depending on number of nodes. I tried indexing with 10 million nodes and it took around 35 minutes, but you can still try these settings :
Increase your page cache size which is stored in '/var/lib/neo4j/conf/neo4j.properties' file (in my ubuntu system). Edit the following line
dbms.pagecache.memory=4g
according to your RAM, allocate size, here, 4g means 4gb space. Also, you can try changing java memory size which is stored in neo4j-wrapper.conf
wrapper.java.initmemory=1024
wrapper.java.maxmemory=1024
You can also read neo4j documentation on this - http://neo4j.com/docs/stable/configuration-io-examples.html

big differences in mysql execution time: minimum 2 secs - maximum 120 secs

The Situation:
I use a (php) cronjob to keep my database up-to-date. the affected table contains about 40,000 records. basically, the cronjob deletes all entries and inserts them afterwards (with different values ofc). I have to do it this way, because they really ALL change, because they are all interrelated.
The Problem:
Actually, everything works fine. The cronjob is doin' his job within 1.5 to 2 seconds (again, for about 40k inserts - i think this is adequate). MOSTLY.. But sometimes, the query takes up to 60, 90 or even 120 seconds!
I indexed my database. And I think query is good working, due to the fact it only needs 2 seconds mots of the time. I close the connection via mysql_close();
Do you have any ideas? If you need more information please tell me.
Thanks in advance.
Edit: Well, it seems like there was no problem with the inserts. it was a complex SELECT query, that made some trouble. Tho, thanks to everyone who answered!
[Sorry, apparently I haven't mastered the formatting yet]
From what I read, I can conclude that your cronjob is using bulk-insert statements. If you know when cronjob works, I suggest you to start a Database Engine Tuning Advisor session and see what other processes are running while the cronjob do its things. A bulk-insert has some restrictions with the number of fields and the number of rows at once. You could read the subtitles of this msdn http://technet.microsoft.com/en-us/library/ms188365.aspx
Performance Considerations
If the number of pages to be flushed in a
single batch exceeds an internal threshold, a full scan of the buffer
pool might occur to identify which pages to flush when the batch
commits. This full scan can hurt bulk-import performance. A likely
case of exceeding the internal threshold occurs when a large buffer
pool is combined with a slow I/O subsystem. To avoid buffer
overflows on large machines, either do not use the TABLOCK hint (which
will remove the bulk optimizations) or use a smaller batch size
(which preserves the bulk optimizations). Because computers vary, we
recommend that you test various batch sizes with your data load to
find out what works best for you.

Cassandra giving time out exception after some inserts

I am using Cassandra 1.0.6 version... I have around ~1million JSON Objects of 5KB each to be inserted to the cassandra. As the inserts goes on, the memory consumption of cassandra also goes up until it gets stable to certain point.. After some inserts(around 2-3 lkhs) the ruby client gives me "`recv_batch_mutate': CassandraThrift::TimedOutException" exception.
I have also tried inserting 1KB sized JSON Objects more than a million times. This doesnt give any exception. Also in this experiment I plotted a graph between time taken by 50000 inserts vs number of 50000 inserts. I could find that there is a sharp rise in time taken to inserts after some iterations and suddenly that falls down. This could be due to Garbage collection done by JVM. But the same doesnt happen while inserting 5KB of data for a million times.
What may be the problem?? Some of the configuration options which I am using:-
System:-
8GB with 4-core ..
Cassandra configuration:-
- concurrent_writes: 64
memtable_flush_writers: 4
memtable_flush_queue_size: 8
rpc_server_type: sync
thrift_framed_transport_size_in_mb: 30
in_memory_compaction_limit_in_mb: 64
multithreaded_compaction: true
Do I need to do any changes in configuration. Is it related to JVM heap space or due to Garbage collection ??
You can increase the rpc timeout to a larger value in cassandra config file, look for rpc_timeout_in_ms . But you should really look into your ruby client on the connection part.
# Time to wait for a reply from other nodes before failing the command
rpc_timeout_in_ms: 10000