How to avoid Access database bloat? - ms-access

I'm programming an Access database but I realized that its size increases dramatically as it is being used, growing to hundreds of MB. After compacting it the size came back to 5MB. What normally cause this increase of size and how I can avoid it?

You also can turn off row locking. I have a process and a file of about 5 Megs in size.
When you run a simple update, it bloats to a 125 Megs. If you turn off row locking, then the file does not grow at all with the update.
So you want to disable row locking – this will MASSIVE reduce bloating. The option you want to un-check is this one:
File->options->client settings, and then uncheck
[x] Open database by using record-level locking
Access does not have true row locking, but does have what is called database page locking. So in a roundabout way, if you turn on row locking, then Access just expands all records to the size of one database page – the result is massive bloating. Try disabling the above option. (You have to exit + re-start Access for this setting change to take effect).

If you're really going from 5MB to hundreds of MB that can be compacted back to 5 MB then as others have mentioned you're INSERTING and DELETING a lot of records. This is usually because you need to create temporary tables.
Most of time temporary tables aren't technically required and can be remove them by either querying a query or using dynamic SQL. If you can't do this, its probably worth while to create a separate temporary database that you link to.
Its important to note that each user have their own copy of the temp database and that it gets destroyed at either the beginning or the end of their session.

Lots of adding and deleting records is one cause of database bloat. If this is your development db, then database bloat is unavoidable as you repeatedly compile and save your vba project; the bloat may be far less pronounced in end-user databases.

Doing any work in an Access database will cause the size of the file to increase. I have several databases that bloat to almost 2GB in size when a morning process is running. This process inserts, updates and deletes data.
One thing that is important when working with MS Access is to use compact and repair. This will shrink the size of the database.
I wouldn't worry about the DB growing to a couple of hundred of MBs, that is still small for Access.

Related

Free up space in MySQL 5.6.20 - InnoDB

first off, not a DB guy. here is the problem, data drive for the database is 96% full. in the my.cnf there is a line that has the following, (only showing part due to space)
innodb_data_file_path=nmsdata1:4000M;nmsdata2:4000M;
going up to
nmsdata18:4000M:autoextend
so in the folder where the files are stored files 1-17 are 4gb in size, file 18 is 136gb as of today.
I inherited the system and it has no vendor support or much documentation.
I can see there are a few tables that are really large
Table_name NumRows Data Length
---------- ------- -----------
pmdata 100964536 14199980032
fault 310864227 63437946880
event 385910821 107896160256
I know ther is a ton of writes happening and there should be a cron job that tells it to only keep the last 3 months data but I am concerned the DB is fragmented and not releasing space back for use.
so my task is to free up space in the DB so the drive does not fill up.
This is a weakness of innodb: tablespaces never shrink. They grow, and even if you "defragment" the tables, they just get written internally to another part of the tablespace, leaving more of the tablespace "free" for use by other data, but the size of the file on disk does not shrink.
Even if you DROP TABLE, that doesn't free space to the drive.
This has been a sore point for InnoDB for a long time: https://bugs.mysql.com/bug.php?id=1341 (reported circa 2003).
The workaround is to use innodb_file_per_table=1 in your configuration, so each table has its own tablespace. Then when you use OPTIMIZE TABLE <tablename> it defragments by copying data to a new tablespace, in a more efficient, compact internal layout, and then drops the fragmented one.
But there's a big problem with this in your case. Even if you were to optimize tables after setting innodb_file_per_table=1, their data would be copied into new tablespaces, but that still wouldn't shrink or drop the old multi-table tablespaces like your nmsdata1 through 18. They would still be huge, but "empty."
What I'm saying is that you're screwed. There is no way to shrink these tablespaces, and since you're full up on disk space, there's no way to refactor them either.
Here's what I would do: Build a new MySQL Server. Make sure innodb_file_per_table=1 is configured. Also configure the default for the data file path: innodb_data_file_path=ibdata1:12M:autoextend. That will make the central tablespace small from the start. We'll avoid expanding it with data.
Then export a dump of your current database server, all of it. Import that into your new MySQL server. It will obey the file-per-table setting, and data will create and fill new tablespaces, one per table.
This is also an opportunity to build the new server with larger storage, given what you know about the data growth.
It will take a long time to import so much data. How long depends on your server performance specifications, but it will take many hours at least. Perhaps days. This is a problem if your original database is still taking traffic while you're importing.
The solution to that is to use replication, so your new server can "catch up" from the point where you created the dump to the current state of the database. This procedure is documented, but it may be quite a bit of learning curve for someone who is not a database pro, as you said: https://dev.mysql.com/doc/refman/8.0/en/replication-howto.html
You should probably get a consultant who knows how to do this work.

Reducing file thrashing in MySQL for Windows

I am fairly new to MySQL. I have a database consisting of a few hundred table files. When I run a report I notice (through ProcMon) that MySQL is opening and closing the tables hundreds of thousands of times! That greatly affects performance. Is there some setting to direct MySQL to keep table files open until MySQL is shut down? Or at least to reduce the file thrashing?
Thanks.
Plan A: Don't worry about it.
Plan B: Increase table_open_cache to a few thousand. (See SHOW VARIABLES LIKE 'table_open_cache';) If that value won't stick, check the Operating System to see if it is constraining thing (ulimit).
Plan C: It is rare to see an application that need over a hundred tables. Ponder what the application is doing. (WP, for example, uses 12 tables per user. This does not scale well.)

How can I put regularly accessed data into a "quick access" area in a database

Very soon I will be building a database structure that will contain 2 million rows. Generally there are no more than 200 rows queried per minute and of those 200 it'll be 10-20 of those rows that are being queried.
Given the size of the table, I'd like to "store" the queried row somewhere so that any other end users querying this row will be able to get the row data "quicker". I then want this row to be accessed via this for a while and then put back into the main table once it's no longer in use. I believe this will make access quicker and more efficient.
Using the below schema, I'll provide an example. In this case row 1 has been accessed from the application layer. The application layer queries the "accessed" table to see if the row is there. If it is, it uses this and updates the "accessed" table with any changed data. If it isn't, it is queried from the main large table and dropped into the "accessed" table until the cron runs (say 10 minutes later) when all "accessed" data is copied into the main table and deleted from the accessed table.
http://sqlfiddle.com/#!2/d76f6/2
I'm trying to work out the following:
1) Will this show an increase in efficiency (I would imagine each query against "accessed" instead of the main will be significantly faster)?
2) What technology should be used for the "accessed" data storage? It's likely the main table will be stored in MariaDB/MySQL, however I'm happy to run it in flat files, sqlite, a different instance or keep it within the same instance... I'm open to suggestions that will make this more efficient, and in theory there's no reason the application layer couldn't act as an intermediary between any technologies
Premature optimization. Overcomplex design to start with. What you want to implement is a most frequently accessed cache system. However, the duty of a DMBS system is indeed to do these kind of system optimizations for you. There are already caches at disk level, file system level, and database level. What you are saying is that, even before having the system in place, you already know it is not going to perform as expected.
Maybe you know more than you state in your question, but on the face of it, optimizations should be done after, with suitable profiling.
There are a lot of ways to cache data.
On mysql you can use memory tables. Memory tables are much more faster than innodb-myisam tables
You can use memory based key value storage systems like redis, memcached
On application layer you can cache your data to filesystem

Updating large quantities of data in a production database

I have a large quantity of data in a production database that I want to update with batches of data while the data in the table is still available for end user use. The updates could be insertion of new rows or updates of existing rows. The specific table is approximately 50M rows, and the updates will be between 100k - 1M rows per "batch". What I would like to do is insert replace with a low priority.. In other words, I want the database to kind of slowly do the batch import without impacting performance of other queries that are occurring concurrently to the same disk spindles. To complicate this, the update data is heavily indexed. 8 b-tree indexes across multiple columns to facilitate various lookup that adds quite a bit of overhead to the import.
I've thought about batching the inserts down into 1-2k record blocks, then having the external script that loads the data just pause for a couple seconds between each insert, but that's really kind of hokey IMHO. Plus, during a 1M record batch, I really don't want to add 500-1000 2second pauses to add 20-40 minutes of extra load time if its not needed. Anyone have ideas on a better way to do this?
I've dealt with a similar scenario using InnoDB and hundreds of millions of rows. Batching with a throttling mechanism is the way to go if you want to minimize risk to end users. I'd experiment with different pause times and see what works for you. With small batches you have the benefit that you can adjust accordingly. You might find that you don't need any pause if you run this all sequentially. If your end users are using more connections then they'll naturally get more resources.
If you're using MyISAM there's a LOW_PRIORITY option for UPDATE. If you're using InnoDB with replication be sure to check that it's not getting too far behind because of the extra load. Apparently it runs in a single thread and that turned out to be the bottleneck for us. Consequently we programmed our throttling mechanism to just check how far behind replication was and pause as needed.
An INSERT DELAYED might be what you need. From the linked documentation:
Each time that delayed_insert_limit rows are written, the handler checks whether any SELECT statements are still pending. If so, it permits these to execute before continuing.
Check this link: http://dev.mysql.com/doc/refman/5.0/en/server-status-variables.html What I would do is write a script that will execute your batch updates when MySQL is showing Threads_running or Connections under a certain number. Hopefully you have some sort of test server where you can determine what a good number threshold might be for either of those server variables. There are plenty of other of server status variables to look at in there also. Maybe control the executions by the Innodb_data_pending_writes number? Let us know what works for you, its an interesting question!

Is Postgres better than MySql when one needs to add a column to a table with millions of rows?

We're having problems with Mysql. When I search around, I see many people having the same problem.
I have joined up with a product where the database has some tables with as many as 150 million rows. One example of our problem is that one of these tables has over 30 columns and about half of them are no longer used. When trying to remove columns or renaming columns, mysql wants to copy the entire table and rename. With this amount of data, it would take many hours to do this and the site would be offline pretty much the whole time. This is just the first of several large migrations to improve the schema. These aren't intended as a regular thing. Just a lot of cleanup I inherited.
I tried searching to see if people have the same problem with Postgres and I find almost nothing in comparison talking about this issue. Is this because Postgres is a lot better at it, or just that less people are using postgres?
In PostgreSQL, adding a new column without default value to a table is instantaneous, because the new column is only registered in the system catalog, not actually added on disk.
When the only tool you know is a hammer, all your problems look like a nail. For this problem, PostgreSQL is much much better at handling these types of changes. And the fact is, it doesn't matter how well you designed your app, you WILL have to change the schema on a live database someday. While MySQL's various engines really are amazing for certain corner cases, here none of them help. PostgreSQL's very close integration between the various layers means that you can have things like transactional ddl that allow you to roll back anything that isn't an alter / create database / tablespace. Or very very fast alter tables. Or non-impeding create indexes. And so on. It limits PostgreSQL to the things it does well (traditional transactional db load handling is a strong point) and not so great at the things that MySQL often fills in the gaps on, like live networked clustered storage with the ndb engine.
In this case none of the different engines in MySQL allow you to easily solve this problem. The very versatility of multiple storage engines means that the lexer / parser / top layer of the DB cannot be as tightly integrated to the storage engines, and therefore a lot of the cool things pgsql can do here mysql can't.
I've got a 118Gigabyte table in my stats db. It has 1.1 billion rows in it. It really should be partitioned but it's not read a whole lot, and when it is we can wait on it. At 300MB/sec (the speed the array it's on can read) it takes approximately 118*~3seconds to read, or right around 5 minutes. This machine has 32Gigs of RAM, so it cannot hold the table in memory.
When I ran the simple statement on this table:
alter table mytable add test text;
it hung waiting for a vacuum. I killed the vacuum (select pg_cancel_backend(12345) (<-- pid in there) and it finished immediately. A vacuum on this table takes a long time to run btw. Normally it's not a big deal, but when making changes to table structure, you have to wait on vacuums, or kill them.
Dropping a column is just as simple and fast.
Now we come to the problem with postgresql, and that is the in-heap MVCC storage. If you add that column, then do an update table set test='abc' it updates each row, and exactly doubles the size of the table. Unless HOT can update the rows in place, but then you need a 50% fill factor table which is double sized to begin with. The only way to get the space back is to either wait and let vacuum reclaim it over time and reuse it one update at a time, or to run cluster or vacuum full to shrink it back down.
you can get around this by running updates on parts of the table at a time (update where pkid between 1 and 10000000; ...) and running vacuum between each run to reclaim the space.
So, both systems have warts and bumps to deal with.
maybe because this should not be a regualr occurrence.
perhaps, reading between the lines, you need to be adding a row to another table, instead of columns to a large existing table..?