Why should I care about compacting an MS Access .mdb file? - ms-access

We distribute an application that uses an MS Access .mdb file. Somebody has noticed that after opening the file in MS Access the file size shrinks a lot. That suggests that the file is a good candidate for compacting, but we don't supply the means for our users to do that.
So, my question is, does it matter? Do we care? What bad things can happen if our users never compact the database?

In addition to making your database smaller, it'll recompute the indexes on your tables and defragment your tables which can make access faster. It'll also find any inconsistencies that should never happen in your database, but might, due to bugs or crashes in Access.
It's not totally without risk though -- a bug in Access 2007 would occasionally delete your database during the process.
So it's generally a good thing to do, but pair it with a good backup routine. With the backup in place, you can also recover from any 'unrecoverable' compact and repair problems with a minimum of data loss.

Make sure you compact and repair the database regularly, especially if the database application experiences frequent record updates, deletions and insertions. Not only will this keep the size of the database file down to the minimum - which will help speed up database operations and network communications - it performs database housekeeping, too, which is of even greater benefit to the stability of your data. But before you compact the database, make sure that you make a backup of the file, just in case something goes wrong with the compaction.
Jet compacts a database to reorganize the content within the file so that each 4 KB "page" (2KB for Access 95/97) of space allotted for data, tables, or indexes is located in a contiguous area. Jet recovers the space from records marked as deleted and rewrites the records in each table in primary key order, like a clustered index. This will make your db's read/write ops faster.
Jet also updates the table statistics during compaction. This includes identifying the number of records in each table, which will allow Jet to use the most optimal method to scan for records, either by using the indexes or by using a full table scan when there are few records. After compaction, run each stored query so that Jet re-optimizes it using these updated table statistics, which can improve query performance.
Access 2000, 2002, 2003 and 2007 combine the compaction with a repair operation if it's needed. The repair process:
1 - Cleans up incomplete transactions
2 - Compares data in system tables with data in actual tables, queries and indexes and repairs the mistakes
3 - Repairs very simple data structure mistakes, such as lost pointers to multi-page records (which isn't always successful and is why "repair" doesn't always work to save a corrupted Access database)
4 - Replaces missing information about a VBA project's structure
5 - Replaces missing information needed to open a form, report and module
6 - Repairs simple object structure mistakes in forms, reports, and modules
The bad things that can happen if the users never compact/repair the db is that it will become slow due to bloat, and it may become unstable - meaning corrupted.

Compacting an Access database (also known as a MS JET database) is a bit like defragmenting a hard drive. Access (or, more accurately, the MS JET database engine) isn't very good with re-using space - so when a record is updated, inserted, or deleted, the space is not always reclaimed - instead, new space is added to the end of the database file and used instead.
A general rule of thumb is that if your [Access] database will be written to (updated, changed, or added to), you should allow for compacting - otherwise it will grow in size (much more than just the data you've added, too).
So, to answer your question(s):
Yes, it does matter (unless your database is read-only).
You should care (unless you don't care about your user's disk space).
If you don't compact an Access database, over time it will grow much, much, much larger than the data inside it would suggest, slowing down performance and increasing the possibilities of errors and corruption. (As a file-based database, Access database files are notorious for corruption, especially when accessed over a network.)
This article on How to Compact Microsoft Access Database Through ADO will give you a good starting point if you decide to add this functionality to your app.

I would offer the users a method for compacting the database. I've seen databases grow to 600+ megabytes when compacting will reduce to 60-80.

To echo Nate:
In older versions, I've had it corrupt databases - so a good backup regime is essential. I wouldn't code anything into your app to do that automatically. However, if a customer finds that their database is running really slow, your tech support people could talk them through it if need be (with appropriate backups of course).
If their database is getting to be so large that the compaction starts to be come a necessity though, maybe it's time to move to MS-SQL.

I've found that Access database files almost always get corrupted over time. Compacting and repairing them helps hold that off for a while.

Well it really matters! mdb files keep increasing in size each time you manipulate its data, until it reaches unbearable size. But you don't have to supply a compacting method through your interface. You can add the following code in your mdb file to have it compacted each time the file is closed:
Application.SetOption ("Auto Compact"), 1

I would also highly recommend looking in to VistaDB (http://www.vistadb.net/) or SQL Compact(http://www.microsoft.com/sql/editions/compact/) for your application. These might not be the right fit for your app... but are def worth a look.

If you don't offer your users a way to decompress and the raw size isn't an issue to begin with, then don't bother.

Related

Internal Data Redundancy happens in Microsoft Access

We are using ms access2010 and we are having unnecessary 50% increase of the data file problem
every day. We use the compact and repair process on a daily basis at every nights.
But almost every day, in the middle of day, when this huge increase happens and performance
is badly affected we have to run this process again manually,after that this huge size difference disappears. I suspect the problem would be because of the internal behaviour of Access engine while updating data.
Can anyone please explain to me when updating a record how much space is wasted internally by
data base engine?
For instance, suppose we have a record of 100 bytes, when we update it somehow and the size decreases to 80 how much will the wasted space be? is it 20 or much more than that?
Conversely, when we increase a data record by update will it be any wasted space created by the update process in data file?
any idea or suggestion on how to boost the performance would be appreciated.
You can run C&R via VBA
Public Sub CompactDB()
CommandBars("Menu Bar").Controls("Tools").Controls("Database utilities").Controls("Compact and repair database...").accDoDefaultAction
End Sub
Reasons your database can bloat (compacting only solves some of this -- decompiling / recompiling is necessary for the rest, if you code / use macros).
MS Access is file-based, not server transaction based, so you're
always writing and rewriting to the hard drive for a variable space.
To get around this, switch to MS Access ADP files using either MDSE,
which you can install from the MS Office Professional CD by browsing
to it on the CD (not part of the installation wizard), or, hook the
database up to a server, such as SqlServer. You'll have to build a
new MS Access document of type ADP (as opposed to MDB). Doing so
puts you in a different developmental regime, however, than you're
used to, so read about this before doing it.
Compiling. Using macros plus the "compile in background" option is no different than compiling your MS Access project by having coded in Access Basic, Visual Basic for Access, or Visual Basic using the VB Editor that comes with MS Access.
Whatever changes you made last time remain as compiled pseudocode, so you are pancaking one change on top of another, even though you only are playing with the lastest version of your code.
Queries, especially large queries, take up space when they're run which is never reclaimed until you compact. You can make your queries more efficient, but you'll never get away from this completely.
Locktypes, cursortypes, and cursorlocations on ADODB, depending on how you set them up, can take up a lot of space if you choose combinations that are really data intensive. These can be marshalled (configured) in such a way to return only what's necessary. There is a knowledge base article on the MDSN library at microsoft.com detailing how ADODB causes a lot of bloat, and recommends to use DAO, but this is a cop-out; what you do is use ADODB well and you'll get around this, and DAO does not eliminate bloat, either.
DAO functions.
Object creation -- tables, forms, controls, reports -- all take up space. If you create a form and delete it later, the space that the form is not reclaimed until you compact.
Cute pictures. These always take up space, and MS Access does not store them efficiently. A 20K JPEG can wind up like an 800K or 1MB bitmap format once stored in Access, and there's nothing you can do about that in MS Access 97. You can put the image on a form and use subform references of the image where ever you want it, but you still don't get around the inefficient storage format.
OLE Objects. If you have an OLE field and decide to insert, say, a spreadsheet in that field, you take the entire Excel Workbook with it, not just that sheet. Be careful how to use OLE objects.
Table properties with the subtable set to [auto]. Set this property, for all tables, to [none]. Depending on how many tables you have, performance can also perceptibly improve.
You can also get the Jet Compact utility from Microsoft.com for databases that are corrupted.
Source

Why is my Access 2007 database growing so much?

I work on a Win32 application that has developed a very strange problem of the database quietly growing until finally it reaches the 2 GB file size limit. We use ADO to connect to an Access 2007 database. The application has worked nicely for years with no such difficulty being observed. as you may imagine, when it reaches the 2 GB limit, the database becomes corrupt. I have quite a few customer databases now that were sent to us for repair--all around 2GB in size. once compacted, they come back to < 10 MB.
we see some database growth over time but never growth on that sort of scale.
I made a small database "checker" that adds up the contents of all fields in all records to give some idea how much actual data is present. having checked this new tool on databases that have recently been compacted, I think the tool is working correctly. all the bloated databases have not more than 10 MB of data each.
We don't compact the database at app start. It has seemed to me that because we don't delete large amounts of data, compacting the database isn't something we "should" need to do. customers with larger databases (there are some but they are on earlier versions).
Can you suggest how we could have a database that should be <10 MB could grow to 2 GB?
A few remarks about what our app does:
any restructuring is done using DAO when ADO does not have the database open.
we do use transactions in a few places
for convenience, certain records are convenient to delete and recreate instead of find/edit/delete. typically this operation involves 5-30 records, each about 8K per record. this only occurs when the user presses "Save".
there are other record types that are about 70 KB/record but we're not using delete/recreate with them.
we use a BLOB ("OLEObject") field to store binary data.
thank you for any insights you can offer.
MS Access files bloat very easily. They store a lot of history of transactions, as well as retaining size during record deletion.
When I write an application with an Access database I factor regular compaction into the design as it is the only way to keep the database in line.
Compacting on close can present issues (depending on the environment) such as users forcing the compact to abort because they want their computer to finish shutting down at the end of the day. Equally, compact on open can cause frustrating delays where the user would like to get into the program but cannot.
I normally try and organise for the compact to be done as a scheduled task on an always on PC such as a server. Please follow the link for more information: http://support.microsoft.com/kb/158937
thank you all for your help. found where it happened:
var
tbl:ADOX_TLB.Table;
cat:ADOX_TLB.Catalog;
prop:ADOX_TLB.Property_;
begin
cat:=ADOX_TLB.CoCatalog.Create;
cat.Set_ActiveConnection(con.ConnectionObject);
// database growth here
tbl:=cat.Tables.Item[sTableName];
prop:=tbl.Properties['ValidationText'];
Result:=prop.Value;
prop:=nil;
tbl:=nil;
cat:=nil;
end;
each time this function was called, the database grew by about 32KB.
i changed to do this function less often and do it with DAO instead of ADO.
So doing a little research, I came across a discussion about how MS Access files will grow until compacted, even when data is deleted. From this I infer that they are storing the transaction history within the file. This means that they will continue to grow with each access.
The solution is compaction. You apparently need to compact the database regularly. You may want to do this on application close, instead of launch if it takes to long.
Also note that this means multi-operation changes (such as the delete then reinsert modified value mentioned above) will likely cause the file to expand more quickly.

What is the best way to prevent Access database bloat

Intro:
I am creating a Access database system that will be rolled out with multi-user functionality.
But as i am creating this database in Access 2000 (Old school I know) there are quite a lot of bugs and random mysterious problems that occur when my database gets passed 40-60MB.
My question:
Has anyone got a good solution to how I can shrink this down or to prevent the bloat?
Details:
I am using many local tables combined with SQL Tables and my front-end links to a back-end SQL Server.
I have already tried compact and repair but it only ever shrinks it to about 15MB and after the user has used the database a few time the bloat expands quickly to over 50-60MB!
Let me know if more detail is needed but that is the rough outline of my problem.
Many Thanks!
Here's some ideas for you to follow.
You said you also have a lot of local tables. Split the local tables off into yet another Access database. So you'll have 2 back-ends (1 SQL Server & 1 Access), and the front end.
Create a batch file that opens your local tables backend database with the /compact option. So, it will look something like this:
"C:\Prog...\Microsoft...\Officexx\ C:\ProjectX_backend.mdb /compact"
Then run this batch file on a daily basis using scheduled tasks. Your frontend should never need compacting unless you edit it in any way.
If you are stuck with 2000, which has a quite bad reputation, then you have to dig down into your application and find out what creates the bloat. The most common reason are bulk inserts followed by deletes. Other reasons, are the use of OLE Object fields. Other reasons are programmatic changes in in form, etc objects. You really have to go through your application and find the specific cause.
An mdb file that is only connected to a backed server and does not make changes to local objects should not grow.
As for your random issues, besides some lack of stability in the 2000 version, you should look into bad RAM in the computers, bad hard drives, and broken network controllers if your mdb file is shared on the network.

Can splitting .MDB files into segments help with stability?

Is this a realistic solution to the problems associated with larger .mdb files:
split the large .mdb file into
smaller .mdb files
have one 'central' .mdb containing
links to the tables in the smaller
.mdb files
How easy would it be to make this change to an .mdb backed VB application?
Could the changes to the database be done so that there are no changes required to the front-end application?
Edit Start
The short answer is "No, it won't solve the problems of a large database."
You might be able to overcome the DB size limitation (~2GB) by using this trick, but I've never tested it.
Typically, with large MS Access databases, you run into problems with speed and data corruption.
Speed
Is it going to help with speed? You still have the same amount of data to query and search through, and the same algorithm. So all you are doing is adding the overhead of having to open up multiple files per query. So I would expect it to be slower.
You might be able to speed it up by reducing the time time that it takes to ge tthe information off of disk. You can do this in a few ways:
faster drives
put the MDB on a RAID (anecdotally RAID-1,0 may be faster)
split the MDB up (as you suggest) into multiple MDBs, and put them on separate drives (maybe even separate controllers).
(how well this would work in practice vs. theory, I can't tell you - if I was doing that much work, I'd still choose to switch DB engines)
Data Corruption
MS Access has a well deserved reputation for data corruption. To be fair, I haven't had it happen to me fore some time. This may be because I've learned not to use it for anything big; or it may be because MS has put a lot of work in trying to solve these problems; or more likely a combination of both.
The prime culprits in data corruption are:
Hardware: e.g., cosmic rays, electrical interference, iffy drives, iffy memory and iffy CPUs - I suspect MS Access does not have as good error handling/correcting as other Databases do.
Networks: lots of collisions on a saturated network can confuse MS Access and convince it to scramble important records; as can sub-optimally implemented network protocols. TCP/IP is good, but it's not invincible.
Software: As I said, MS has done a lot of work on MS Access over the years, if you are not up to date on your patches (MS Office and OS), get up to date. Problems typically happen when you hit extremes like the 2GB limit (some bugs are hard to test and won't manifest them selves except at the edge cases, which makes the less likely to have been seen or corrected, unless reported by a motivated user to MS).
All this is exacerbated with larger databases, because larger databases typically have more users and more workstations accessing it. Altogether the larger database and number of users multiply to provide more opportunity for corruption to happen.
Edit End
Your best bet would be to switch to something like MS SQL Server. You could start by migrating your data over, and then linking one MDB to to it. You get the stability of SQL server and most (if not all) of your code should still work.
Once you've done that, you can then start migrating your VB app(s) over to us SQL Server instead.
If you have more data than fits in a single MDB then you should get a different database engine.
One main issue that you should consider is that you can't enforce referential integrity between tables stored in different MDBs. That should be a show-stopper for any actual database.
If it's not, then you probably don't have a proper schema designed in the first place.
For reasons more adequately explained by CodeSlave the answer is No and you should switch to a proper relational database.
I'd like to add that this does not have to be SQL Server. Quite possibly the reason why you are reluctant to do this is one of cost, SQL Server being quite expensive to obtain and deploy if you are not in an educational or charitable organisation (when it's remarkably cheap and then usually a complete no-brainer).
I've recently had extremely good results moving an Access system from MDB to MySQL. At least 95% of the code functioned without modification, and of the remaining 5% most was straightforward with only a few limited areas where significant effort was required. If you have sloppy code (not closing connections or releasing objects) then you'll need to fix these, but generally I was remarkably surprised how painless this approach was. Certainly I would highly recommend that if the reason you are reluctant to move to a database backend is one of cost then you should not attempt to manipulate .mdb files and go instead for the more robust database solution.
Hmm well if the data is going through this central DB then there is still going to be a bottle neck in there. The only reason I can think why you would do this is to get around the size limit of an access mdb file.
Having said that if the business functions can be split off in the separate applications then that might be a good option with a central DB containing all the linked tables for reporting purposes. I have used this before to good effect

What do Repair and Compact operations do to an .MDB? Will they stop an application crashing?

What do Repair and Compact operations do to an .MDB?
If these operations do not stop a 1GB+ .MDB backed VB application crashing, what other options are there?
Why would a large sized .MDB file cause an application to crash?
"What do compact and repair operations do to an MDB?"
First off, don't worry about repair. The fact that there are still commands that purport to do a standalone repair is a legacy of the old days. That behavior of that command was changed greatly starting with Jet 3.51, and has remained so since that. That is, a repair will never be performed unless Jet/ACE determines that it is necessary. When you do a compact, it will test whether a repair is needed and perform it before the compact.
So, what does it do?
A compact/repair rewrites the data file, elmininating any unused data pages, writing tables and indexes in contiguous data pages and flagging all saved QueryDefs for re-compilation the next time they are run. It also updates certain metadata for the tables, and other metadata and internal structures in the header of the file.
All databases have some form of "compact" operation because they are optimized for performance. Disk space is cheap, so instead of writing things in to use storage efficiently, they instead write to the first available space. Thus, in Jet/ACE, if you update a record, the record is written to the original data page only if the new data fits within the original data page. If not, the original data page is marked unused and the record is rewritten to an entirely new data page. Thus, the file can become internally fragmented, with used and unused data pages mixed in throughout the file.
A compact organizes everything neatly and gets rid of all the slack space. It also rewrites data tables in primary key order (Jet/ACE clusters on the PK, but that's the only index you can cluster on). Indexes are also rewritten at that point, since over time those become fragmented with use, also.
Compact is an operation that should be part of regular maintenance of any Jet/ACE file, but you shouldn't have to do it often. If you're experiencing regular significant bloat, then it suggests that you may be mis-using your back-end database by storing/deleting temporary data. If your app adds records and deletes them as part of its regular operations, then you have a design problem that's going to make your data file bloat regularly.
To fix that error, move the temp tables to a different standalone MDB/ACCDB so that the churn won't cause your main data file to bloat.
On another note not applicable in this context, front ends bload in different ways because of the nature of what's stored in them. Since this question is about an MDB/ACCDB used from VB, I'll not go into details, but suffice it to say that compacting a front end is something that's necessary during development, but only very seldom in production use. The only reason to compact a production front end is to update metadata and recompile queries stored in it.
It's always been that MDB files become slow and prone to corruption as they get over 1GB, but I've never known why - it's always been just a fact of life. I did some quick searching, I can't find any official, or even well-informed insider, explanations of why this size is correlated with MDB problems, but my experience has always been that MDB files become incredibly unreliable as you approach and exceed 1GB.
Here's the MS KB article about Repair and Compact, detailing what happens during that operation:
http://support.microsoft.com/kb/209769/EN-US/
The app probably crashes as the result of improper/unexpected data returned from a database query to an MDB that large - what error in particular do you get when your application crashes? Perhaps there's a way to catch the error and deal with it instead of just crashing the application.
If it is crashing a lot then you might want to try a decompile on the DB and/or making a new database and copying all the objects over to the new container.
Try the decompile first, to do that just add the /decompile flag to the startup options of your DB for example
“C:\Program Files\access\access.mdb” “C:\mydb.mdb” /decompile
Then compact, compile and then compact again
EDIT:
You cant do it without access being installed but if it is just storing data then a decompile will not do you any good. You can however look at jetcomp to help you with you compacting needs
support.microsoft.com/kb/273956