How to compress an MS Access database - ms-access

I have an .mdb file which is 70MB.
After deleting all records contained in the file, the size remains 70MB.
How do I make my .mdb file smaller?

Every database engine that has ever existed needs regular maintenance operations run on them to optimize data storage and to recover slack space. Back in xBase days, you ran a PACK command to remove deleted rows, for instance. On SQL Server, you run scripts to shrink the actual data files for the same reasons.
Why does every database engine do this?
Because it would be a huge performance hit if every write to the database had to rewrite the whole file in optimized order. Consider a database that stores each data table in a separate file. If a table has 10000 records, and you delete the 5000th record, to get rid of slack space, you'd have to rewrite the whole second half of the data file. Instead, every database uses some form of marking the space used as unused and discardable the next time the optimize operations are run on the data table.
Jet/ACE is no different in this regard than any other database engine and any application using a Jet/ACE database as a data store should have regular maintenance operations scheduled, including a backup and then a compact.
There are some issues with this in Jet/ACE that aren't present in server database engines. Specifically, you can't compact unless all users have closed their connections to the data file. In a server database, the users connect to the database engine's server-side process, and that server-side demon is the only "user" of the actual data files in which the data is stored. Thus, the server demon can decide when to perform the optimization and maintenance routines, since it's entirely in control of when the data files are in use or not.
One common problem with Access applications is that users will leave their application open on their computers and leave the office for the day, which means that when you run your compact operation, say at 2:00am, the file is still open and you can't run it (because compact replaces the original file). Most programmers of Access applications who encounter this problem will either tolerate the occasional failure of this kind of overnight maintenance (volume shadow copy still allows a backup of the file, though there's no guarantee that backup copy will be in a 100% internally consistent state), or they will engineer their Access applications to terminate at a time appropriate to allow overnight maintenance operations. I've done both, myself.
In non-Access applications, the same problem exists, but has to be tackled differently. For web applications, it's something of a problem, but in general, I'd say that any web app that churns the data enough that a compact would be needed is one for which a Jet/ACE data store is wholly inappropriate.
Now, on the subject of COMPACT ON CLOSE:
It should never be used by anyone.
Ever.
It's useless and downright dangerous when it actually kicks in.
It's useless because there's no properly-architected production environment in which users would ever be opening the back end -- if it's an Access app, it should be split, with users only ever opening the front end, and if it's a web app, users won't be interacting directly with the data file. So in both scenarios, nobody is ever going to trigger the COMPACT ON CLOSE, so you've wasted your time turning it on.
Secondly, even if somebody does occasionally trigger it, it's only going to work if that user is the only one with the database open. As I said above, it can't be compacted if there are other users with it open, so this isn't going to work, either -- COMPACT ON CLOSE can only run when the user triggering it has exclusive access.
But worst of all, COMPACT ON CLOSE is dangerous and if it does run can lead to actual data loss. This is because there are certain states an Jet/ACE datebase can be in wherein internal structures are out of whack, but the data is all still accessible. When the compact/repair operation is run in that state, data can potentially be lost. This is an extremely rare condition, but it is a very remote possibility.
The point is that COMPACT ON CLOSE is not conditional, and there is no prompt that asks you if you want to run it. You don't get a chance to do a backup before it runs, so if you have it turned on and it kicks in when your database is in that very rare state, you could lose data that you'd otherwise be able to recover if you did not run the compact operation.
So, in short, nobody with any understanding of Jet/ACE and compacting ever turns on COMPACT ON CLOSE.
For a single user, you can just compact as needed.
For a shared application, some kind of scheduled maintenance script is the best thing, usually running overnight on the file server. That script would make a backup of the file, then run the compact. It's quite a simple script to write in VBScript, and easily scheduled.
Last of all, if your application frequently deletes large numbers of records, in most cases that's an indication of a design error. Records that are added and deleted in regular production use are TEMPORARY DATA and don't belong in your main data file, both logically speaking and pragmatically speaking.
All of my production apps have a temp database as part of the architecture, and all temp tables are stored there. I never bother to compact the temp databases. If for some reason performance bogged down because of bloat within the temp database, I'd just copy a pristine empty copy of the temp database over top of the old one, since none of the data in there is anything other than temporary. This reduces churn and bloat in front end or back end and greatly reduces the frequency of necessary compacts on the back end data file.
On the question of how to compact, there are a number of options:
in the Access UI you can compact the currently open database (TOOLS | DATABASE UTILITIES). However, that doesn't allow you to make a backup as part of the process, and it's always a good idea to backup before compacting, just in case something goes wrong.
in the Access UI you can compact a database that is not open. This one compacts from an existing file to a new one, so when you're done you have to rename both the original and the newly compacted file (to have the new name). The FILE OPEN dialog that asks you what file to compact from does allow you to rename the file at that point, so you can do it as part of the manual process.
in code, you can use the DAO DBEngine.CompactDatabase method to do the job. This is usable from within Access VBA, or from a VBScript, or from any environment where you can use COM. You are responsible in your code for doing the backup and renaming files and so forth.
another option in code is JRO (Jet & Replication Objects), but it offers nothing in regard to compact operations that DAO doesn't already have. JRO was created as a separate library to handle Jet-specific features that were not supported in ADO itself, so if you're using ADO as your interface, the MS-recommended library for compacting would be JRO. From within Access, JRO is inappropriate for compact, as you'd already have the CompactDatabase method available, even if you don't have a DAO reference (the DBEngine is always available in Access whether or not you have a DAO reference). In other words, DBEngine.CompactDatabase can be used within Access without either a DAO or ADO reference, where as the JRO CompactDatabase method is only available with a JRO reference (or using late binding). From outside of Access, JRO may be the appropriate library.
Let me stress how important backups are. You won't need it 999 times out of 1000 (or even less often), but when you need it, you'll need it bad! So never compact without making a backup beforehand.
Finally, after any compact, it's a good idea to check the compacted file to see if there's a system table called MSysCompactErrors. This table will list any problems encountered during the compact, if there were any.
That's all I can think of regarding compact for now.

Open the mdb and do a 'Compact and Repair'. This will reduce the size of the mdb.
You can also set the 'Compact on Close' option to on (off by default).
Here is a link to some additional information:
http://www.trcb.com/computers-and-technology/data-recovery/ways-to-compact-and-repair-an-access-database-27384.htm

The Microsoft Access database engine provides a CompactDatabase method that makes a compact copy of the database file. The database file must be closed before calling CompactDatabase.
Documentation:
Pages on microsoft.com about "Compact and Repair Database"
DBEngine.CompactDatabase Method (DAO)
Here's a Python script that uses DAO to copy and compact MDB files:
import os.path
import sys
import win32com.client
# Access 97: DAO.DBEngine.35
# Access 2000/2003: DAO.DBEngine.36
# Access 2007: DAO.DBEngine.120
daoEngine = win32com.client.Dispatch('DAO.DBEngine.36')
if len(sys.argv) != 3:
print("Uses Microsoft DAO to copy the database file and compact it.")
print("Usage: %s DB_FILE FILE_TO_WRITE" % os.path.basename(sys.argv[0]))
sys.exit(2)
(src_db_path, dest_db_path) = sys.argv[1:]
print('Using database "%s", compacting to "%s"' % (src_db_path, dest_db_path))
daoEngine.CompactDatabase(src_db_path, dest_db_path)
print("Done")

With python you can compact with the pypyodbc libary (either .mdb or .accdb)
import pypyodbc
pypyodbc.win_compact_mdb('C:\\data\\database.accdb','C:\\data\\compacted.accdb')
(source)
Then you can copy compacted.accdb back to database.accdb with shutil:
import shutil
shutil.copy2('C:\\data\\compacted.accdb','C:\\data\\database.accdb')
(source)
Note: As far as I know for Access DB with ODBC, python and its libraries must be 32bit (link). Also, these steps probably only work with Windows OS.

Related

Internal Data Redundancy happens in Microsoft Access

We are using ms access2010 and we are having unnecessary 50% increase of the data file problem
every day. We use the compact and repair process on a daily basis at every nights.
But almost every day, in the middle of day, when this huge increase happens and performance
is badly affected we have to run this process again manually,after that this huge size difference disappears. I suspect the problem would be because of the internal behaviour of Access engine while updating data.
Can anyone please explain to me when updating a record how much space is wasted internally by
data base engine?
For instance, suppose we have a record of 100 bytes, when we update it somehow and the size decreases to 80 how much will the wasted space be? is it 20 or much more than that?
Conversely, when we increase a data record by update will it be any wasted space created by the update process in data file?
any idea or suggestion on how to boost the performance would be appreciated.
You can run C&R via VBA
Public Sub CompactDB()
CommandBars("Menu Bar").Controls("Tools").Controls("Database utilities").Controls("Compact and repair database...").accDoDefaultAction
End Sub
Reasons your database can bloat (compacting only solves some of this -- decompiling / recompiling is necessary for the rest, if you code / use macros).
MS Access is file-based, not server transaction based, so you're
always writing and rewriting to the hard drive for a variable space.
To get around this, switch to MS Access ADP files using either MDSE,
which you can install from the MS Office Professional CD by browsing
to it on the CD (not part of the installation wizard), or, hook the
database up to a server, such as SqlServer. You'll have to build a
new MS Access document of type ADP (as opposed to MDB). Doing so
puts you in a different developmental regime, however, than you're
used to, so read about this before doing it.
Compiling. Using macros plus the "compile in background" option is no different than compiling your MS Access project by having coded in Access Basic, Visual Basic for Access, or Visual Basic using the VB Editor that comes with MS Access.
Whatever changes you made last time remain as compiled pseudocode, so you are pancaking one change on top of another, even though you only are playing with the lastest version of your code.
Queries, especially large queries, take up space when they're run which is never reclaimed until you compact. You can make your queries more efficient, but you'll never get away from this completely.
Locktypes, cursortypes, and cursorlocations on ADODB, depending on how you set them up, can take up a lot of space if you choose combinations that are really data intensive. These can be marshalled (configured) in such a way to return only what's necessary. There is a knowledge base article on the MDSN library at microsoft.com detailing how ADODB causes a lot of bloat, and recommends to use DAO, but this is a cop-out; what you do is use ADODB well and you'll get around this, and DAO does not eliminate bloat, either.
DAO functions.
Object creation -- tables, forms, controls, reports -- all take up space. If you create a form and delete it later, the space that the form is not reclaimed until you compact.
Cute pictures. These always take up space, and MS Access does not store them efficiently. A 20K JPEG can wind up like an 800K or 1MB bitmap format once stored in Access, and there's nothing you can do about that in MS Access 97. You can put the image on a form and use subform references of the image where ever you want it, but you still don't get around the inefficient storage format.
OLE Objects. If you have an OLE field and decide to insert, say, a spreadsheet in that field, you take the entire Excel Workbook with it, not just that sheet. Be careful how to use OLE objects.
Table properties with the subtable set to [auto]. Set this property, for all tables, to [none]. Depending on how many tables you have, performance can also perceptibly improve.
You can also get the Jet Compact utility from Microsoft.com for databases that are corrupted.
Source

What is the best way to prevent Access database bloat

Intro:
I am creating a Access database system that will be rolled out with multi-user functionality.
But as i am creating this database in Access 2000 (Old school I know) there are quite a lot of bugs and random mysterious problems that occur when my database gets passed 40-60MB.
My question:
Has anyone got a good solution to how I can shrink this down or to prevent the bloat?
Details:
I am using many local tables combined with SQL Tables and my front-end links to a back-end SQL Server.
I have already tried compact and repair but it only ever shrinks it to about 15MB and after the user has used the database a few time the bloat expands quickly to over 50-60MB!
Let me know if more detail is needed but that is the rough outline of my problem.
Many Thanks!
Here's some ideas for you to follow.
You said you also have a lot of local tables. Split the local tables off into yet another Access database. So you'll have 2 back-ends (1 SQL Server & 1 Access), and the front end.
Create a batch file that opens your local tables backend database with the /compact option. So, it will look something like this:
"C:\Prog...\Microsoft...\Officexx\ C:\ProjectX_backend.mdb /compact"
Then run this batch file on a daily basis using scheduled tasks. Your frontend should never need compacting unless you edit it in any way.
If you are stuck with 2000, which has a quite bad reputation, then you have to dig down into your application and find out what creates the bloat. The most common reason are bulk inserts followed by deletes. Other reasons, are the use of OLE Object fields. Other reasons are programmatic changes in in form, etc objects. You really have to go through your application and find the specific cause.
An mdb file that is only connected to a backed server and does not make changes to local objects should not grow.
As for your random issues, besides some lack of stability in the 2000 version, you should look into bad RAM in the computers, bad hard drives, and broken network controllers if your mdb file is shared on the network.

Looking to make MySQL db records readable as CSV-like lines in text file via standard file io interface Linux device driver (working with legacy code)

I want to be able to read from a real live proper MySQL database using standard file access routines. I don't mean reading the MySQL database's own underlying private files. What I mean is implementing a file-based linux device driver that "presents" a MySQL database as a file. In other words, the text file is a "View" of the MySQL database. The MySQL records are presented in our homegrown custom variation of the CSV format that the legacy code was originally written to understand.
Background
I have some legacy code that reads from a text file that contains a very large table of data, each line being a separate record. New records (lines) need to be added but there is contention for the file among the team, there is also an overhead in deployment of the legacy code and this file to many systems when releasing the software to them. The text file itself also needs to be version controlled.
Rather than modify the legacy code to call a MYSQL database version of these records directly, I thought it would be better to leave it untouched. This would avoid risks in modifying the code and ease deployment and moreover, modifying the code would cause much overhead in de-risking, design discussions, more testing etc.
So what I'm looking to do is write a file-based device driver such that this makes the MySQL database appear as a file to the legacy code, with the data within the format that the legacy code expects. That way the legacy code is not changed and can work oblivious that the file is really an underlying database. Contention is removed because the individual records in the database can now be updated/added to separately (via MySQL, or even better a separate web admin interface that guides and validates data entry from the user for individual records) and deployment effort is much reduced without having to up-issue the whole file on all the systems that use it.
The device driver would contain routines to internally translate standard file read operations into MySQL queries to the MySQL database and contain routines to return the MySQL results and translate these into the text format for returning back to the file read operation.
This is for a Linux/Unix platform.
Has this been done and what are your thoughts?
(cleaned up the question, grammar, clarification, readability. This does not affect the accepted answer.)
This kind of thing has been done before - an obvious example being the dynamic view filing system in ClearCase which provided (maybe still does?) a virtualised view onto a version control repository. Behind the scenes it implemented an object cache and used RPC to fetch objects from other hosts if necessary, and made extensive use of both local and remote databases.
It's fairly clear that you are going to implement the bulk of your filing system in user-space, but you will need a (small) kernel resident portion. Unless there's a really good reason to do otherwise, FUSE is what you're looking for - it will provide the kernel-resident part for you. All you'll need to write is glue to turn file operations into SQL requests.

What do Repair and Compact operations do to an .MDB? Will they stop an application crashing?

What do Repair and Compact operations do to an .MDB?
If these operations do not stop a 1GB+ .MDB backed VB application crashing, what other options are there?
Why would a large sized .MDB file cause an application to crash?
"What do compact and repair operations do to an MDB?"
First off, don't worry about repair. The fact that there are still commands that purport to do a standalone repair is a legacy of the old days. That behavior of that command was changed greatly starting with Jet 3.51, and has remained so since that. That is, a repair will never be performed unless Jet/ACE determines that it is necessary. When you do a compact, it will test whether a repair is needed and perform it before the compact.
So, what does it do?
A compact/repair rewrites the data file, elmininating any unused data pages, writing tables and indexes in contiguous data pages and flagging all saved QueryDefs for re-compilation the next time they are run. It also updates certain metadata for the tables, and other metadata and internal structures in the header of the file.
All databases have some form of "compact" operation because they are optimized for performance. Disk space is cheap, so instead of writing things in to use storage efficiently, they instead write to the first available space. Thus, in Jet/ACE, if you update a record, the record is written to the original data page only if the new data fits within the original data page. If not, the original data page is marked unused and the record is rewritten to an entirely new data page. Thus, the file can become internally fragmented, with used and unused data pages mixed in throughout the file.
A compact organizes everything neatly and gets rid of all the slack space. It also rewrites data tables in primary key order (Jet/ACE clusters on the PK, but that's the only index you can cluster on). Indexes are also rewritten at that point, since over time those become fragmented with use, also.
Compact is an operation that should be part of regular maintenance of any Jet/ACE file, but you shouldn't have to do it often. If you're experiencing regular significant bloat, then it suggests that you may be mis-using your back-end database by storing/deleting temporary data. If your app adds records and deletes them as part of its regular operations, then you have a design problem that's going to make your data file bloat regularly.
To fix that error, move the temp tables to a different standalone MDB/ACCDB so that the churn won't cause your main data file to bloat.
On another note not applicable in this context, front ends bload in different ways because of the nature of what's stored in them. Since this question is about an MDB/ACCDB used from VB, I'll not go into details, but suffice it to say that compacting a front end is something that's necessary during development, but only very seldom in production use. The only reason to compact a production front end is to update metadata and recompile queries stored in it.
It's always been that MDB files become slow and prone to corruption as they get over 1GB, but I've never known why - it's always been just a fact of life. I did some quick searching, I can't find any official, or even well-informed insider, explanations of why this size is correlated with MDB problems, but my experience has always been that MDB files become incredibly unreliable as you approach and exceed 1GB.
Here's the MS KB article about Repair and Compact, detailing what happens during that operation:
http://support.microsoft.com/kb/209769/EN-US/
The app probably crashes as the result of improper/unexpected data returned from a database query to an MDB that large - what error in particular do you get when your application crashes? Perhaps there's a way to catch the error and deal with it instead of just crashing the application.
If it is crashing a lot then you might want to try a decompile on the DB and/or making a new database and copying all the objects over to the new container.
Try the decompile first, to do that just add the /decompile flag to the startup options of your DB for example
“C:\Program Files\access\access.mdb” “C:\mydb.mdb” /decompile
Then compact, compile and then compact again
EDIT:
You cant do it without access being installed but if it is just storing data then a decompile will not do you any good. You can however look at jetcomp to help you with you compacting needs
support.microsoft.com/kb/273956

Why should I care about compacting an MS Access .mdb file?

We distribute an application that uses an MS Access .mdb file. Somebody has noticed that after opening the file in MS Access the file size shrinks a lot. That suggests that the file is a good candidate for compacting, but we don't supply the means for our users to do that.
So, my question is, does it matter? Do we care? What bad things can happen if our users never compact the database?
In addition to making your database smaller, it'll recompute the indexes on your tables and defragment your tables which can make access faster. It'll also find any inconsistencies that should never happen in your database, but might, due to bugs or crashes in Access.
It's not totally without risk though -- a bug in Access 2007 would occasionally delete your database during the process.
So it's generally a good thing to do, but pair it with a good backup routine. With the backup in place, you can also recover from any 'unrecoverable' compact and repair problems with a minimum of data loss.
Make sure you compact and repair the database regularly, especially if the database application experiences frequent record updates, deletions and insertions. Not only will this keep the size of the database file down to the minimum - which will help speed up database operations and network communications - it performs database housekeeping, too, which is of even greater benefit to the stability of your data. But before you compact the database, make sure that you make a backup of the file, just in case something goes wrong with the compaction.
Jet compacts a database to reorganize the content within the file so that each 4 KB "page" (2KB for Access 95/97) of space allotted for data, tables, or indexes is located in a contiguous area. Jet recovers the space from records marked as deleted and rewrites the records in each table in primary key order, like a clustered index. This will make your db's read/write ops faster.
Jet also updates the table statistics during compaction. This includes identifying the number of records in each table, which will allow Jet to use the most optimal method to scan for records, either by using the indexes or by using a full table scan when there are few records. After compaction, run each stored query so that Jet re-optimizes it using these updated table statistics, which can improve query performance.
Access 2000, 2002, 2003 and 2007 combine the compaction with a repair operation if it's needed. The repair process:
1 - Cleans up incomplete transactions
2 - Compares data in system tables with data in actual tables, queries and indexes and repairs the mistakes
3 - Repairs very simple data structure mistakes, such as lost pointers to multi-page records (which isn't always successful and is why "repair" doesn't always work to save a corrupted Access database)
4 - Replaces missing information about a VBA project's structure
5 - Replaces missing information needed to open a form, report and module
6 - Repairs simple object structure mistakes in forms, reports, and modules
The bad things that can happen if the users never compact/repair the db is that it will become slow due to bloat, and it may become unstable - meaning corrupted.
Compacting an Access database (also known as a MS JET database) is a bit like defragmenting a hard drive. Access (or, more accurately, the MS JET database engine) isn't very good with re-using space - so when a record is updated, inserted, or deleted, the space is not always reclaimed - instead, new space is added to the end of the database file and used instead.
A general rule of thumb is that if your [Access] database will be written to (updated, changed, or added to), you should allow for compacting - otherwise it will grow in size (much more than just the data you've added, too).
So, to answer your question(s):
Yes, it does matter (unless your database is read-only).
You should care (unless you don't care about your user's disk space).
If you don't compact an Access database, over time it will grow much, much, much larger than the data inside it would suggest, slowing down performance and increasing the possibilities of errors and corruption. (As a file-based database, Access database files are notorious for corruption, especially when accessed over a network.)
This article on How to Compact Microsoft Access Database Through ADO will give you a good starting point if you decide to add this functionality to your app.
I would offer the users a method for compacting the database. I've seen databases grow to 600+ megabytes when compacting will reduce to 60-80.
To echo Nate:
In older versions, I've had it corrupt databases - so a good backup regime is essential. I wouldn't code anything into your app to do that automatically. However, if a customer finds that their database is running really slow, your tech support people could talk them through it if need be (with appropriate backups of course).
If their database is getting to be so large that the compaction starts to be come a necessity though, maybe it's time to move to MS-SQL.
I've found that Access database files almost always get corrupted over time. Compacting and repairing them helps hold that off for a while.
Well it really matters! mdb files keep increasing in size each time you manipulate its data, until it reaches unbearable size. But you don't have to supply a compacting method through your interface. You can add the following code in your mdb file to have it compacted each time the file is closed:
Application.SetOption ("Auto Compact"), 1
I would also highly recommend looking in to VistaDB (http://www.vistadb.net/) or SQL Compact(http://www.microsoft.com/sql/editions/compact/) for your application. These might not be the right fit for your app... but are def worth a look.
If you don't offer your users a way to decompress and the raw size isn't an issue to begin with, then don't bother.