file Autogrowth settings switched - sql-server-2008

What could be the impacts to change the default values for Autogrowth for the files of a database?
Actually I have a database with the Autogrowth values switched between the Data and Log files.
I have those values in those database properties:
DB_Data (Rows Data), PRIMARY, 71027 (Initial Size(MB)), "By 10 percent, unrestricted growth"
DB_Log (Log), Not Applicable, 5011, "By 1MB, restricted growth to 2097152 MB".

For the data file it depends whether or not you have instant file initialisation enabled for the SQL Server account. If you don't you should definitely consider using a fixed growth increment as the length of time that the file growth takes will grow exponentially in proportion to the size of the growth. If you grow the file in too small an increment then you can end up with file system fragmentation.
For the log file you should definitely consider a much larger number than 1MB as you will end up with VLF fragmentation. Log file growth cannot take advantage of instant file initialisation so should always use a fixed increment (say between 1GB - 4GB unless you know for a fact that the log will always remain small) .
Of course in an ideal world it wouldn't actually matter what you set these too as you should be pre-sizing files in advance at low traffic times rather than leaving when it happens to chance.

Related

How can I reduce table size in MySQL?

I have a database named "bongoTv" where lots of table but I found one table its size about 20GB with less amount of data.
After removing few row storage did not reduced. Then I ran a command
OPTIMIZE TABLE notifiation to re-indexing. But It increase its size to 25GB.
As per my undersetting with other DBMS it should be reduce its size but why its size increased, I think it cached previous information somewhere.
After searching on web I found need to configure with innodb_file_per_table=ON. But here in my configuration it is also enabled. But it did not worked.
Need expert opinion who dedicatedly working on this MySQL.
In that case what need to do from my end, what is the solution this issue?
#Louis &
#P.Salmon Can you help me on this?
Thanks in Advance who is going to help me on this.
In general, InnoDB tablespace files never shrink. If you delete data, it makes some space "unused" and over time InnoDB will try to reuse unused space before expanding the tablespace file further.
But there is also tablespace fragmentation. As you delete rows and leave small gaps of unused space, those small gaps may not be usable for new data. So over time, the gaps grow in number, and the tablespace uses more space than it should, if you were to store the same data as compactly as possible.
The free space that comprise full extents, or contiguous 1MB areas, are shown as data_free when you run SHOW TABLE STATUS. But smaller gaps of unused space are not shown. MySQL has no way of reporting the "crumbs" of unused space.
When you use OPTIMIZE TABLE on an InnoDB table, it still cannot shrink the tablespace, it only copies data to a new tablespace. It tries to defragment the data, leaving out the gaps where possible. So if there are a lot of large and small gaps in your old tablespace, the new tablespace should have a smaller total size.
However, while filling pages of the new tablespace, InnoDB deliberately leaves 1/16 of each page unused, to allow for future updates that might need just a little bit more room. So in theory, you might see OPTIMIZE TABLE cause the file to grow larger if the original was very compact and the new file was created with more "elbow room."
But that still does not account for the 20GB to 25GB change you saw. That might be because sizes are cached. That is, the old file was in fact 25GB, but the table status was not reporting it. MySQL 8.0 especially has some caching behavior on some table statistics: https://bugs.mysql.com/bug.php?id=86170
So how to reduce the table size in MySQL?
Deleting rows is the most effective way. If you don't need data to be in the database anymore, delete it. If you might need data for archival purposes but don't need to query it every day, then copy it out to some long-term archiving format, or another database instance on a large-capacity server, and then delete the data from your primary database.
Changing data types to be smaller. For example, why use a BIGINT (64-bits) when a SMALLINT (16-bits) is sufficient for the values you store? It may seem like a small change, but it adds up. Values are stored in the row, but also stored again in any indexes that include that column.
Using compression. The best results are in text and strings that store readable text. The amount of compression depends on the nature of the data. Don't count on this too much, because at best one can expect a 2:1 ratio of compression, and often not even that much.
Ultimately, databases tend to grow larger, and often even the rate of growth accelerates. If you accumulate a lot of data and never delete or archive them, you must make a strategy to support the growth. You may just have to get larger and larger storage volumes.

Free up space in MySQL 5.6.20 - InnoDB

first off, not a DB guy. here is the problem, data drive for the database is 96% full. in the my.cnf there is a line that has the following, (only showing part due to space)
innodb_data_file_path=nmsdata1:4000M;nmsdata2:4000M;
going up to
nmsdata18:4000M:autoextend
so in the folder where the files are stored files 1-17 are 4gb in size, file 18 is 136gb as of today.
I inherited the system and it has no vendor support or much documentation.
I can see there are a few tables that are really large
Table_name NumRows Data Length
---------- ------- -----------
pmdata 100964536 14199980032
fault 310864227 63437946880
event 385910821 107896160256
I know ther is a ton of writes happening and there should be a cron job that tells it to only keep the last 3 months data but I am concerned the DB is fragmented and not releasing space back for use.
so my task is to free up space in the DB so the drive does not fill up.
This is a weakness of innodb: tablespaces never shrink. They grow, and even if you "defragment" the tables, they just get written internally to another part of the tablespace, leaving more of the tablespace "free" for use by other data, but the size of the file on disk does not shrink.
Even if you DROP TABLE, that doesn't free space to the drive.
This has been a sore point for InnoDB for a long time: https://bugs.mysql.com/bug.php?id=1341 (reported circa 2003).
The workaround is to use innodb_file_per_table=1 in your configuration, so each table has its own tablespace. Then when you use OPTIMIZE TABLE <tablename> it defragments by copying data to a new tablespace, in a more efficient, compact internal layout, and then drops the fragmented one.
But there's a big problem with this in your case. Even if you were to optimize tables after setting innodb_file_per_table=1, their data would be copied into new tablespaces, but that still wouldn't shrink or drop the old multi-table tablespaces like your nmsdata1 through 18. They would still be huge, but "empty."
What I'm saying is that you're screwed. There is no way to shrink these tablespaces, and since you're full up on disk space, there's no way to refactor them either.
Here's what I would do: Build a new MySQL Server. Make sure innodb_file_per_table=1 is configured. Also configure the default for the data file path: innodb_data_file_path=ibdata1:12M:autoextend. That will make the central tablespace small from the start. We'll avoid expanding it with data.
Then export a dump of your current database server, all of it. Import that into your new MySQL server. It will obey the file-per-table setting, and data will create and fill new tablespaces, one per table.
This is also an opportunity to build the new server with larger storage, given what you know about the data growth.
It will take a long time to import so much data. How long depends on your server performance specifications, but it will take many hours at least. Perhaps days. This is a problem if your original database is still taking traffic while you're importing.
The solution to that is to use replication, so your new server can "catch up" from the point where you created the dump to the current state of the database. This procedure is documented, but it may be quite a bit of learning curve for someone who is not a database pro, as you said: https://dev.mysql.com/doc/refman/8.0/en/replication-howto.html
You should probably get a consultant who knows how to do this work.

Improve performance of mysql LOAD DATA / mysqlimport?

I'm batching CSV 15GB (30mio rows) into a mysql-8 database.
Problem: the task takes about 20min, with approxy throughput of 15-20 MB/s. While the harddrive is capable of transfering files with 150 MB/s.
I have a RAM disk of 20GB, which holds my csv. Import as follows:
mysqlimport --user="root" --password="pass" --local --use-threads=8 mytable /tmp/mydata.csv
This uses LOAD DATA under the hood.
My target table does not have any indexes, but approx 100 columns (I cannot change this).
What is strange: I tried tweaking several config parameters as follows in /etc/mysql/my.cnf, but they did not give any significant improvement:
log_bin=OFF
skip-log-bin
innodb_buffer_pool_size=20G
tmp_table_size=20G
max_heap_table_size=20G
innodb_log_buffer_size=4M
innodb_flush_log_at_trx_commit=2
innodb_doublewrite=0
innodb_autoinc_lock_mode=2
Question: does LOAD DATA / mysqlimport respect those config changes? Or does it bypass? Or did I use the correct configuration file at all?
At least a select on the variables shows they are correctly loaded by the mysql server. For example show variables like 'innodb_doublewrite' shows OFF
Anyways, how could I improve import speed further? Or is my database the bottleneck and there is no way to overcome the 15-20 MB/s threshold?
Update:
Interestingly if I import my csv from harddrive into the ramdisk, performance is almost the same (just a little bit better, but never over 25 MB/s). I also tested the same amount of rows, but only with a few (5) columns. And there I'm getting to about 80 MB/s. So clearly the number of columns is the bottleneck? But why do more columns slow down this process?
MySQL/MariaDB engine have little parallelization when making bulk inserts. It can only use one CPU core per LOAD DATA statement. You may probably monitor CPU utilization during load to see one core is fully utilized and it can provide only so much of output data - thus leaving disk throughput underutilized.
The most recent version of MySQL has new parallel load feature: https://dev.mysql.com/doc/mysql-shell/8.0/en/mysql-shell-utilities-parallel-table.html . It looks promising but probably hasn't received much feedback yet. I'm not sure it would help in your case.
I saw various checklists on the internet that recommended having higher values in the following config params: log_buffer_size, log_file_size, write_io_threads, bulk_insert_buffer_size . But the benefits were not very pronounced when I performed comparison tests (maybe 10-20% faster than just innodb_buffer_pool_size being large enough).
This could be normal. Let's walk through what is being done:
The csv file is being read from a RAM disk, so no IOPs are being used.
Are you using InnoDB? If so, the data is going into the buffer_pool. As blocks are being built there, they are being marked 'dirty' for eventual flushing to disk.
Since the buffer_pool is large, but probably not as large as the table will become, some of the blocks will need to be flushed before it finishes reading all the data.
After all the data is read, and the table is finished, the dirty blocks will gradually be flushed to disk.
If you had non-unique indexes, they would similarly be written in a delayed manner to disk (cf 'Change buffering'). The change_buffer, by default occupies 25% of the buffer_pool.
How large is the resulting table? It may be significantly larger, or even smaller, than the 15GB of the csv file.
How much time did it take to bring the csv file into the ram disk? I proffer that that was wasted time and it should have been read from disk while doing the LOAD DATA; that I/O can be overlapped.
Please SHOW GLOBAL VARIABLES LIKE 'innodb%';; there are several others that may be relevant.
More
These are terrible:
tmp_table_size=20G
max_heap_table_size=20G
If you have a complex query, 20GB could be allocated in RAM, possibly multiple times!. Keep those to under 1% of RAM.
If copying the csv from hard disk to ram disk runs slowly, I would suspect the validity of 150 MB/s.
If you are loading the table once every 6 hours, and it takes 1/3 of an hour to perform, I don't see the urgency of making it faster. OTOH, there may be something worth looking into. If that 20 minutes is downtime due to the table being locked, that can be easily eliminated:
CREATE TABLE t LIKE real_table;
LOAD DATA INFILE INTO t ...; -- not blocking anyone
RENAME TABLE real_table TO old, t TO real_table; -- atomic; fast
DROP TABLE old;

Does 'Rows per batch' is SSIS OLE DB destination help reducing locking?

There is an option called 'Rows per batch' in OLE DB Destination, which, when specified, pulls a certain amount of rows within a batch, otherwise, pull all rows in the source in one batch.
Question: If my source and/or targer server are all highly OLTP database, will setting a low number on this parameter (for eg, 10k or 50k) help reducing lock escalation chance, so that the loading process can make minimal impact on either of the databases?
"Rows per batch" is actually more for tuning your data flow. By calculating the maximum width of a row, in bytes, and then dividing the default buffer size (the default is 10MB), you will get the number of rows you can insert in one "batch" without spilling the data out to tempdb or your buffer disk (depending if you set a specific location for you temp buffer). While keeping your data flow completely in memory and not needing to spill to disk, you'll keep your data transfer moving as quickly as possible.
The "Table Lock" option in the OLE DB Destination is what tells the server to lock the table or not.
In general the answer is: yes.
It also depends on the processing speed of the rows and the overhead per batch.
If your transaction with all the rows in a batch takes to long, consider splitting up. But splitting it up in too small batches can also yield performance problems.
The best way is to test and find the sweetspot.

Neo4j batch insertion with .CSV files taking huge amount of time to sort&index

I'm trying to create a database with data collected from google n-grams. It's actually a lot of data, but after the creation of the CSV files the insertion was pretty fast. The problem is that, immediately after the insertion, the neo4j-import tool indexes the data, and this step its taking too much time. It's been more than an hour and it looks like it achieved 10% of progress.
Nodes
[*>:9.85 MB/s---------------|PROPERTIES(2)====|NODE:198.36 MB--|LABE|v:22.63 MB/s-------------] 25M
Done in 4m 54s 828ms
Prepare node index
[*SORT:295.94 MB-------------------------------------------------------------------------------] 26M
This is the console info atm. Does anyone have a suggestion about what to do to speed up this process?
Thank you. (:
Indexing takes a long time depending on number of nodes. I tried indexing with 10 million nodes and it took around 35 minutes, but you can still try these settings :
Increase your page cache size which is stored in '/var/lib/neo4j/conf/neo4j.properties' file (in my ubuntu system). Edit the following line
dbms.pagecache.memory=4g
according to your RAM, allocate size, here, 4g means 4gb space. Also, you can try changing java memory size which is stored in neo4j-wrapper.conf
wrapper.java.initmemory=1024
wrapper.java.maxmemory=1024
You can also read neo4j documentation on this - http://neo4j.com/docs/stable/configuration-io-examples.html