I have understanding wih MySQL Cluster.
I have one table:
38 fields total
22 fields are described as 22 indexes (field type: int)
Other fields double and bigint values
The table doesn't have defined Primary Key
My Environment (10 nodes):
data nodes: 8 (AWS EC2 instances, m4.xlarge 16GB RAM, 750GB HDD)
management nodes: 2 (AWS EC2 instances, m4.2xlarge 32GB RAM)
sql nodes: 2 (the same VM as in management nodes)
MySQL Cluster settings (config.ini) are set to:
[NDBD DEFAULT]
NoOfReplicas=2
ServerPort=2200
Datadir=/storage/data/mysqlcluster/
FileSystemPathDD=/storage/data/mysqlcluster/
BackupDataDir=/storage/data/mysqlcluster//backup/
#FileSystemPathUndoFiles=/storage/data/mysqlcluster/
#FileSystemPathDataFiles=/storage/data/mysqlcluster/
DataMemory=9970M
IndexMemory=1247M
LockPagesInMainMemory=1
MaxNoOfConcurrentOperations=100000
MaxNoOfConcurrentTransactions=16384
StringMemory=25
MaxNoOfTables=4096
MaxNoOfOrderedIndexes=2048
MaxNoOfUniqueHashIndexes=512
MaxNoOfAttributes=24576
MaxNoOfTriggers=14336
### Params for REDO LOG
FragmentLogFileSize=256M
InitFragmentLogFiles=SPARSE
NoOfFragmentLogFiles=39
RedoBuffer=64M
TransactionBufferMemory=8M
TimeBetweenGlobalCheckpoints=1000
TimeBetweenEpochs=100
TimeBetweenEpochsTimeout=0
### Params for LCP
MinDiskWriteSpeed=10M
MaxDiskWriteSpeed=20M
MaxDiskWriteSpeedOtherNodeRestart=50M
MaxDiskWriteSpeedOwnRestart=200M
TimeBetweenLocalCheckpoints=20
### Heartbeating
HeartbeatIntervalDbDb=15000
HeartbeatIntervalDbApi=15000
### Params for setting logging
MemReportFrequency=30
BackupReportFrequency=10
LogLevelStartup=15
LogLevelShutdown=15
LogLevelCheckpoint=8
LogLevelNodeRestart=15
### Params for BACKUP
BackupMaxWriteSize=1M
BackupDataBufferSize=24M
BackupLogBufferSize=16M
BackupMemory=40M
### Params for ODIRECT
#Reports indicates that odirect=1 can cause io errors (os err code 5) on some systems. You must test.
#ODirect=1
### Watchdog
TimeBetweenWatchdogCheckInitial=60000
### TransactionInactiveTimeout - should be enabled in Production
TransactionInactiveTimeout=60000
### New 7.1.10 redo logging parameters
RedoOverCommitCounter=3
RedoOverCommitLimit=20
### REALTIME EXTENSIONS
#RealTimeScheduler=1
### REALTIME EXTENSIONS FOR 6.3 ONLY
#SchedulerExecutionTimer=80
#SchedulerSpinTimer=40
### DISK DATA
SharedGlobalMemory=20M
DiskPageBufferMemory=64M
BatchSizePerLocalScan=512
After importing 75M records to my table I get the error (The table 'test_table' is full) and can not import data any more to the table.
I don't undersdtand why it is so.
I look at information_scheme and can see that avg_record_size is 244. The full table size is: ~19G
Also if I look at DataMemory used on each data node I see: ~94%.
IndexMemory used is: ~22%
But I have 8 data nodes with DataMemory total with *8*9970M = 80GB*
My table is 19GB only. So even I have replicas. The memory used muse be: 19*2=38GB.
Could somebody explain me what is the situation. And how can I configure the Cluster and import max possible records.
The full table in production will have: 33 Billion records.
For tests on the given cluster I need to test 100M and 1B data sets.
Thanks.
Related
i'm using mariadb c-connector with prepare, bind and execute. it works usualy. but one case end up in "corrupted unsorted chunks" and core dumping when freeing bind buffer. i suggest the whole malloc organisation is messed up after calling mysql_stmt_execute(). my test's MysqlDynamic.c show:
the problem only is connected to x509cert variable bound by bnd[9]
freeing memory only fails if bnd[9].is_null = 0, if is_null execute end normally
freeing memory (using FreeStmt()) after bind and before execute end normally
print of bnd[9].buffer before execute show (void*) is connected to the correct string buffer
same behavior for setting bnd[9].buffer_length to STMT_INDICATOR_NTS or strlen()
other similar bindings (picture, bnd[10]) do not lead to corrupted memory and core dump.
i defined a c structure test for test data in my test program MysqlDynamic.c which is bound in MYSQL_BIND structure.
bindings for x509cert (string buffer) see bindInsTest():
bnd[9].buffer_type = MYSQL_TYPE_STRING;
bnd[9].buffer_length = STMT_INDICATOR_NTS;
bnd[9].is_null = ¶->x509certI;
bnd[9].buffer = (void*) para->x509cert;
please get the details out of source file MysqlDynamic.c. please adapt defines in the source to your environment, verify content, and run it. you will find compile info in source code. MysqlDynymic -c will create the table. MysqlDynamic -i will insert 3 records each run. And 'MysqlDynamic -d` drop the the table again.
MysqlDynamic -vc show:
session set autocommit to <0>
connection id: 175
mariadb server ver:<100408>, client ver:<100408>
connected on localhost to db test by testA
>> if program get stuck - table is locked
table t_test created
mysql connection closed
pgm ended normaly
MysqlDynamic -i show
ins2: BufPara <92> name<master> stamp<> epoch<1651313806000>
cert is cert<(nil)> buf<(nil)> null<1>
picure is pic<0x5596a0f0c220> buf<0x5596a0f0c220> null<0> length<172>
ins1: BufPara <91> name<> stamp<2020-04-30> epoch<1650707701123>
cert is cert<0x5596a0f181d0> buf<0x5596a0f181d0> null<0>
picure is pic<(nil)> buf<(nil)> null<1> length<0>
ins0: BufPara <90> name<gugus> stamp<1988-10-12T18:43:36> epoch<922337203685477580>
cert is cert<(nil)> buf<(nil)> null<1>
picure is pic<(nil)> buf<(nil)> null<1> length<0>
free(): corrupted unsorted chunks
Aborted (core dumped)
checking t_test table content show all records are inserted as expected.
you can disable loading of x509cert and/or picture by commenting out the defines line 57/58. the program than end normally. you also can comment out line 208. the buffers are then indicated as NULL.
Questions:
is there a generic coding mistake in the program causing this behavior?
can you run the program in your environment without core dumping? i'm currently using version 10.04.08.
any improvment in code will be welcome.
I am running reading binlogs with Debezium, but when I start new reading thread it reads all create statements for table from the beginning, but I dont need them(op=c). I need to handle create/update/delete events that happens after I run code first time. And than work with correct offset (that stored in file "tmp/offsets.dat"), so how I can set initial configuration in this way? So the flow need to be the next one:
start reading(first time) -> take current(latest position from binlog and save it, work from here) and handle newest events
start reading(not the first run) -> take latest position from file and read data as usual
Here is my current configurations
config = Configuration.empty().withSystemProperties(Function.identity()).edit()
.with(MySqlConnectorConfig.SERVER_NAME, SERVER_NAME)
.with(MySqlConnectorConfig.SKIPPED_OPERATIONS, "r")
.with(MySqlConnectorConfig.HOSTNAME, HOSTNAME)
.with(MySqlConnectorConfig.PORT, PORT)
.with(MySqlConnectorConfig.USER, USER)
.with(MySqlConnectorConfig.PASSWORD, PASSWORD)
.with(MySqlConnectorConfig.TABLE_WHITELIST, TABLE_WHITELIST)
.with(MySqlConnectorConfig.SERVER_ID, 100)
//
.with(EmbeddedEngine.OFFSET_STORAGE, "org.apache.kafka.connect.storage.FileOffsetBackingStore")
.with(EmbeddedEngine.OFFSET_STORAGE_FILE_FILENAME, "tmp/offsets.dat")
.with(EmbeddedEngine.CONNECTOR_CLASS, "io.debezium.connector.mysql.MySqlConnector")
.with(EmbeddedEngine.ENGINE_NAME, SERVER_NAME)
//
.with(MySqlConnectorConfig.DATABASE_HISTORY, "io.debezium.relational.history.FileDatabaseHistory")
.with("database.history.file.filename", "tmp/dbhistory.dat")
// Send JSON without schema
.with("schemas.enable", false)
.build();
and my.cnf values for binlogs
[mysqld]
log-bin=mysql-bin.log
server_id=100
binlog_row_image=full
binlog-format=row
expire_logs_days =10
Below is my Test plan to read Data from Multiple CSV file. I wants to test Scenario like
1. 10 users performed operation on 100 documents. Idealy each user should get 10 documents and perfromed the operation on it.
TestPlan
Thread Group
While controller
LoginUserDataConfig
LoginRequestRecordingController
HTTPLoginRequest
DocumentOperationRecordingController
DocIDList
HttpSaveRequest
But with above plan It is taking only 10 document and stop the process. I run the script by changing CSVDataConfigu setting like Shared Mode to All Thread\Current Thread but not getting desired output.
Can any one correct my test plan.
Thread Settings:
Number of Thread: 10
Ramp-Up Period: 2
loop count: 1
LoginUserDataConfig Settings:
Allowed Quoted Data: False
Recycle on EOF? False
Stop Thread on EOF: True
Sharing mode: Current Thread Group
DocIDList Settings:
Allowed Quoted Data: False
Recycle on EOF? False
Stop Thread on EOF: True
Sharing mode: Current Thread Group
You should mark loop count as forever and it will continue until End Of File of CSV (100 IDs)
I have a functional LMDB that, for test purposes, currently contains only 21 key / value records. I've successfully tested inserting and reading records, and I'm comfortable with the database working as intended.
However, when I use the mdb_stat and mdb_dump utilities, I see the following output, respectively:
Status of Main DB
Tree depth: 1
Branch pages: 0
Leaf pages: 1
Overflow pages: 0
Entries: 1
VERSION=3
format=bytevalue
type=btree
mapsize=1073741824
maxreaders=126
db_pagesize=4096
HEADER=END
4d65737361676573
000000000000010000000000000000000100000000000000d81e0000000000001500000000000000ba1d000000000000
DATA=END
In particular, why would mdb_stat indicate only one entry when I have 21? Moreover, each entry comprises 1024 x 300 values of five bytes per value. mdb_dump obviously doesn't show anywhere near the 1,536,000 bytes I'd expect to see, yet the values I mdb_put() and mdb_get() on the fly are correct. Anyone know what's going on?
The relationship between an operating system's directory and an LMDB environment's data.mdb and lock.mdb files is one-to-one.
If the LMDB environment (in the OS directory) has more than one database, then the environment also contains a separate LMDB database containing all of its named databases.
The mdb_stat and mdb_dump utilities appear to contain minimal logic, so when they are fed a given directory via the command line, they appear to produce results only for the database storing database names and not the database(s) storing the actual data of interest.
4d65737361676573 is the Ascii for "Messages", which is the name of table ("sub-db" in lmdb terminology) storing the actual data in your case.
The mdb_dump command only dumps the main db by default. You can use the -s option to dump that sub-db, i.e.
mdb_dump -s Messages
or you can use the -a option to dump all the sub-dbs.
Since you are using a sub-database, the number of entries in the main database corresponds to the number of sub-databases you've created (ie just 1).
Try using mdb_stat -a. This will show you a break-down of all the sub-databases (as well as the main DB). In this breakdown it will list the number of entries for each sub-database. Here you should see your 21 entries.
I am using comma separated value file to create nodes and edges in a Neo4j database. The commands which create nodes run with no issue. The attempt to create edges fails with this error:
Exception in thread "GC-Monitor" java.lang.OutOfMemoryError: GC
overhead limit exceeded
Exception: java.lang.OutOfMemoryError thrown from the
UncaughtExceptionHandler in thread "GC-Monitor"
Further, in the output from the commands there was this:
neo4j-sh (?)$ using periodic commit 400 load csv with headers from 'file://localhost/tmp/vm2set3.csv' as line match (u:VM {id: line.vm_id}),(s:VNIC {id: line.set3_id}) create (u)-[:VNIC]->(s);
SystemException: Kernel has encountered some problem, please perform neccesary action (tx recovery/restart)
neo4j-sh (?)$
SystemException: Kernel has encountered some problem, please perform neccesary action (tx recovery/restart)
neo4j-sh (?)$ using periodic commit 400 load csv with headers from 'file://localhost/tmp/unix2switch.csv' as line match (u:UNIX {id: line.intf_id}),(s:switch {id: line.set2a_id}) create (u)-[:cable]->(s);
SystemException: Kernel has encountered some problem, please perform neccesary action (tx recovery/restart)
neo4j-sh (?)$
My shell script is:
cat /home/ES2Neo/2.1/neo4j_commands.cql | /export/neo4j-community-2.1.4/bin/neo4j-shell -path /export/neo4j-community-2.1.4/data/graph.db > /tmp/na.out
The commands are like this:
load csv WITH HEADERS from 'file://localhost/tmp/intf.csv' AS line CREATE (:UNIX {id: line.id, MAC: line.MAC ,BIA: line.BIA ,host: line.host,name: line.name});
for nodes, and
using periodic commit 400 load csv with headers from 'file://localhost/tmp/unix2switch.csv' as line match (u:UNIX {id: line.intf_id}),(s:switch {id: line.set2a_id}) create (u)-[:cable]->(s);
for edges.
The csv input files look like this:
"intf_id","set2a_id"
"100321","6724919"
"125850","6717849"
"158249","6081895"
"51329","5565380"
"57248","6680663"
"235196","6094139"
"229242","4800249"
"225630","6661742"
"183281","4760022"
Is there something I am doing wrong? Is there something in Neo4j configuration I need to check? Thanks.
The problem is that you're running out of memory for loading the data into the database.
Take a look at this blog post which goes into a number of details about how to load CSV data in successfully.
In particular, here's the key bit from the blog post you should pay attention to.
The more memory you have the faster it will import your data.
So make sure to edit conf/neo4j-wrapper.conf and set:
# Uncomment these lines to set specific initial and maximum
# heap size in MB.
wrapper.java.initmemory=4096
wrapper.java.maxmemory=4096
In conf/neo4j.properties set:
# Default values for the low-level graph engine
neostore.nodestore.db.mapped_memory=50M
neostore.relationshipstore.db.mapped_memory=500M
neostore.propertystore.db.mapped_memory=100M
neostore.propertystore.db.strings.mapped_memory=100M
neostore.propertystore.db.arrays.mapped_memory=0M