I work on a Win32 application that has developed a very strange problem of the database quietly growing until finally it reaches the 2 GB file size limit. We use ADO to connect to an Access 2007 database. The application has worked nicely for years with no such difficulty being observed. as you may imagine, when it reaches the 2 GB limit, the database becomes corrupt. I have quite a few customer databases now that were sent to us for repair--all around 2GB in size. once compacted, they come back to < 10 MB.
we see some database growth over time but never growth on that sort of scale.
I made a small database "checker" that adds up the contents of all fields in all records to give some idea how much actual data is present. having checked this new tool on databases that have recently been compacted, I think the tool is working correctly. all the bloated databases have not more than 10 MB of data each.
We don't compact the database at app start. It has seemed to me that because we don't delete large amounts of data, compacting the database isn't something we "should" need to do. customers with larger databases (there are some but they are on earlier versions).
Can you suggest how we could have a database that should be <10 MB could grow to 2 GB?
A few remarks about what our app does:
any restructuring is done using DAO when ADO does not have the database open.
we do use transactions in a few places
for convenience, certain records are convenient to delete and recreate instead of find/edit/delete. typically this operation involves 5-30 records, each about 8K per record. this only occurs when the user presses "Save".
there are other record types that are about 70 KB/record but we're not using delete/recreate with them.
we use a BLOB ("OLEObject") field to store binary data.
thank you for any insights you can offer.
MS Access files bloat very easily. They store a lot of history of transactions, as well as retaining size during record deletion.
When I write an application with an Access database I factor regular compaction into the design as it is the only way to keep the database in line.
Compacting on close can present issues (depending on the environment) such as users forcing the compact to abort because they want their computer to finish shutting down at the end of the day. Equally, compact on open can cause frustrating delays where the user would like to get into the program but cannot.
I normally try and organise for the compact to be done as a scheduled task on an always on PC such as a server. Please follow the link for more information: http://support.microsoft.com/kb/158937
thank you all for your help. found where it happened:
var
tbl:ADOX_TLB.Table;
cat:ADOX_TLB.Catalog;
prop:ADOX_TLB.Property_;
begin
cat:=ADOX_TLB.CoCatalog.Create;
cat.Set_ActiveConnection(con.ConnectionObject);
// database growth here
tbl:=cat.Tables.Item[sTableName];
prop:=tbl.Properties['ValidationText'];
Result:=prop.Value;
prop:=nil;
tbl:=nil;
cat:=nil;
end;
each time this function was called, the database grew by about 32KB.
i changed to do this function less often and do it with DAO instead of ADO.
So doing a little research, I came across a discussion about how MS Access files will grow until compacted, even when data is deleted. From this I infer that they are storing the transaction history within the file. This means that they will continue to grow with each access.
The solution is compaction. You apparently need to compact the database regularly. You may want to do this on application close, instead of launch if it takes to long.
Also note that this means multi-operation changes (such as the delete then reinsert modified value mentioned above) will likely cause the file to expand more quickly.
Related
We are using ms access2010 and we are having unnecessary 50% increase of the data file problem
every day. We use the compact and repair process on a daily basis at every nights.
But almost every day, in the middle of day, when this huge increase happens and performance
is badly affected we have to run this process again manually,after that this huge size difference disappears. I suspect the problem would be because of the internal behaviour of Access engine while updating data.
Can anyone please explain to me when updating a record how much space is wasted internally by
data base engine?
For instance, suppose we have a record of 100 bytes, when we update it somehow and the size decreases to 80 how much will the wasted space be? is it 20 or much more than that?
Conversely, when we increase a data record by update will it be any wasted space created by the update process in data file?
any idea or suggestion on how to boost the performance would be appreciated.
You can run C&R via VBA
Public Sub CompactDB()
CommandBars("Menu Bar").Controls("Tools").Controls("Database utilities").Controls("Compact and repair database...").accDoDefaultAction
End Sub
Reasons your database can bloat (compacting only solves some of this -- decompiling / recompiling is necessary for the rest, if you code / use macros).
MS Access is file-based, not server transaction based, so you're
always writing and rewriting to the hard drive for a variable space.
To get around this, switch to MS Access ADP files using either MDSE,
which you can install from the MS Office Professional CD by browsing
to it on the CD (not part of the installation wizard), or, hook the
database up to a server, such as SqlServer. You'll have to build a
new MS Access document of type ADP (as opposed to MDB). Doing so
puts you in a different developmental regime, however, than you're
used to, so read about this before doing it.
Compiling. Using macros plus the "compile in background" option is no different than compiling your MS Access project by having coded in Access Basic, Visual Basic for Access, or Visual Basic using the VB Editor that comes with MS Access.
Whatever changes you made last time remain as compiled pseudocode, so you are pancaking one change on top of another, even though you only are playing with the lastest version of your code.
Queries, especially large queries, take up space when they're run which is never reclaimed until you compact. You can make your queries more efficient, but you'll never get away from this completely.
Locktypes, cursortypes, and cursorlocations on ADODB, depending on how you set them up, can take up a lot of space if you choose combinations that are really data intensive. These can be marshalled (configured) in such a way to return only what's necessary. There is a knowledge base article on the MDSN library at microsoft.com detailing how ADODB causes a lot of bloat, and recommends to use DAO, but this is a cop-out; what you do is use ADODB well and you'll get around this, and DAO does not eliminate bloat, either.
DAO functions.
Object creation -- tables, forms, controls, reports -- all take up space. If you create a form and delete it later, the space that the form is not reclaimed until you compact.
Cute pictures. These always take up space, and MS Access does not store them efficiently. A 20K JPEG can wind up like an 800K or 1MB bitmap format once stored in Access, and there's nothing you can do about that in MS Access 97. You can put the image on a form and use subform references of the image where ever you want it, but you still don't get around the inefficient storage format.
OLE Objects. If you have an OLE field and decide to insert, say, a spreadsheet in that field, you take the entire Excel Workbook with it, not just that sheet. Be careful how to use OLE objects.
Table properties with the subtable set to [auto]. Set this property, for all tables, to [none]. Depending on how many tables you have, performance can also perceptibly improve.
You can also get the Jet Compact utility from Microsoft.com for databases that are corrupted.
Source
Is this a realistic solution to the problems associated with larger .mdb files:
split the large .mdb file into
smaller .mdb files
have one 'central' .mdb containing
links to the tables in the smaller
.mdb files
How easy would it be to make this change to an .mdb backed VB application?
Could the changes to the database be done so that there are no changes required to the front-end application?
Edit Start
The short answer is "No, it won't solve the problems of a large database."
You might be able to overcome the DB size limitation (~2GB) by using this trick, but I've never tested it.
Typically, with large MS Access databases, you run into problems with speed and data corruption.
Speed
Is it going to help with speed? You still have the same amount of data to query and search through, and the same algorithm. So all you are doing is adding the overhead of having to open up multiple files per query. So I would expect it to be slower.
You might be able to speed it up by reducing the time time that it takes to ge tthe information off of disk. You can do this in a few ways:
faster drives
put the MDB on a RAID (anecdotally RAID-1,0 may be faster)
split the MDB up (as you suggest) into multiple MDBs, and put them on separate drives (maybe even separate controllers).
(how well this would work in practice vs. theory, I can't tell you - if I was doing that much work, I'd still choose to switch DB engines)
Data Corruption
MS Access has a well deserved reputation for data corruption. To be fair, I haven't had it happen to me fore some time. This may be because I've learned not to use it for anything big; or it may be because MS has put a lot of work in trying to solve these problems; or more likely a combination of both.
The prime culprits in data corruption are:
Hardware: e.g., cosmic rays, electrical interference, iffy drives, iffy memory and iffy CPUs - I suspect MS Access does not have as good error handling/correcting as other Databases do.
Networks: lots of collisions on a saturated network can confuse MS Access and convince it to scramble important records; as can sub-optimally implemented network protocols. TCP/IP is good, but it's not invincible.
Software: As I said, MS has done a lot of work on MS Access over the years, if you are not up to date on your patches (MS Office and OS), get up to date. Problems typically happen when you hit extremes like the 2GB limit (some bugs are hard to test and won't manifest them selves except at the edge cases, which makes the less likely to have been seen or corrected, unless reported by a motivated user to MS).
All this is exacerbated with larger databases, because larger databases typically have more users and more workstations accessing it. Altogether the larger database and number of users multiply to provide more opportunity for corruption to happen.
Edit End
Your best bet would be to switch to something like MS SQL Server. You could start by migrating your data over, and then linking one MDB to to it. You get the stability of SQL server and most (if not all) of your code should still work.
Once you've done that, you can then start migrating your VB app(s) over to us SQL Server instead.
If you have more data than fits in a single MDB then you should get a different database engine.
One main issue that you should consider is that you can't enforce referential integrity between tables stored in different MDBs. That should be a show-stopper for any actual database.
If it's not, then you probably don't have a proper schema designed in the first place.
For reasons more adequately explained by CodeSlave the answer is No and you should switch to a proper relational database.
I'd like to add that this does not have to be SQL Server. Quite possibly the reason why you are reluctant to do this is one of cost, SQL Server being quite expensive to obtain and deploy if you are not in an educational or charitable organisation (when it's remarkably cheap and then usually a complete no-brainer).
I've recently had extremely good results moving an Access system from MDB to MySQL. At least 95% of the code functioned without modification, and of the remaining 5% most was straightforward with only a few limited areas where significant effort was required. If you have sloppy code (not closing connections or releasing objects) then you'll need to fix these, but generally I was remarkably surprised how painless this approach was. Certainly I would highly recommend that if the reason you are reluctant to move to a database backend is one of cost then you should not attempt to manipulate .mdb files and go instead for the more robust database solution.
Hmm well if the data is going through this central DB then there is still going to be a bottle neck in there. The only reason I can think why you would do this is to get around the size limit of an access mdb file.
Having said that if the business functions can be split off in the separate applications then that might be a good option with a central DB containing all the linked tables for reporting purposes. I have used this before to good effect
What do Repair and Compact operations do to an .MDB?
If these operations do not stop a 1GB+ .MDB backed VB application crashing, what other options are there?
Why would a large sized .MDB file cause an application to crash?
"What do compact and repair operations do to an MDB?"
First off, don't worry about repair. The fact that there are still commands that purport to do a standalone repair is a legacy of the old days. That behavior of that command was changed greatly starting with Jet 3.51, and has remained so since that. That is, a repair will never be performed unless Jet/ACE determines that it is necessary. When you do a compact, it will test whether a repair is needed and perform it before the compact.
So, what does it do?
A compact/repair rewrites the data file, elmininating any unused data pages, writing tables and indexes in contiguous data pages and flagging all saved QueryDefs for re-compilation the next time they are run. It also updates certain metadata for the tables, and other metadata and internal structures in the header of the file.
All databases have some form of "compact" operation because they are optimized for performance. Disk space is cheap, so instead of writing things in to use storage efficiently, they instead write to the first available space. Thus, in Jet/ACE, if you update a record, the record is written to the original data page only if the new data fits within the original data page. If not, the original data page is marked unused and the record is rewritten to an entirely new data page. Thus, the file can become internally fragmented, with used and unused data pages mixed in throughout the file.
A compact organizes everything neatly and gets rid of all the slack space. It also rewrites data tables in primary key order (Jet/ACE clusters on the PK, but that's the only index you can cluster on). Indexes are also rewritten at that point, since over time those become fragmented with use, also.
Compact is an operation that should be part of regular maintenance of any Jet/ACE file, but you shouldn't have to do it often. If you're experiencing regular significant bloat, then it suggests that you may be mis-using your back-end database by storing/deleting temporary data. If your app adds records and deletes them as part of its regular operations, then you have a design problem that's going to make your data file bloat regularly.
To fix that error, move the temp tables to a different standalone MDB/ACCDB so that the churn won't cause your main data file to bloat.
On another note not applicable in this context, front ends bload in different ways because of the nature of what's stored in them. Since this question is about an MDB/ACCDB used from VB, I'll not go into details, but suffice it to say that compacting a front end is something that's necessary during development, but only very seldom in production use. The only reason to compact a production front end is to update metadata and recompile queries stored in it.
It's always been that MDB files become slow and prone to corruption as they get over 1GB, but I've never known why - it's always been just a fact of life. I did some quick searching, I can't find any official, or even well-informed insider, explanations of why this size is correlated with MDB problems, but my experience has always been that MDB files become incredibly unreliable as you approach and exceed 1GB.
Here's the MS KB article about Repair and Compact, detailing what happens during that operation:
http://support.microsoft.com/kb/209769/EN-US/
The app probably crashes as the result of improper/unexpected data returned from a database query to an MDB that large - what error in particular do you get when your application crashes? Perhaps there's a way to catch the error and deal with it instead of just crashing the application.
If it is crashing a lot then you might want to try a decompile on the DB and/or making a new database and copying all the objects over to the new container.
Try the decompile first, to do that just add the /decompile flag to the startup options of your DB for example
“C:\Program Files\access\access.mdb” “C:\mydb.mdb” /decompile
Then compact, compile and then compact again
EDIT:
You cant do it without access being installed but if it is just storing data then a decompile will not do you any good. You can however look at jetcomp to help you with you compacting needs
support.microsoft.com/kb/273956
There is a prevailing opinion that regards Access as an unreliable backend database for concurrent use, especially for more than 20 concurrent users, due to the tendency of the database being corrupted.
There is a minority opinion that says an Access database backend is perfectly stable and performant, provided that:
Your network has no problems, and
You write your program correctly.
My question is very specific: what does "Write your program correctly" mean? What are the requirements that you have to follow in order to prevent the database from being corrupted?
Edit: To be clear: The database is already split. Assume less than 25 users. I'm not interested in performance considerations, only database stability.
If you’re looking for great example of what programming practices you need to avoid, number one on the list is generally that of NOT running a split database. Number two is not placing the front end on each computer.
For example the above poster had all kinds of problems, but you can darn your bet that their failing was either that they didn’t have the databae split, or they weren’t placing the software (front end) on each computer.
As for the person having to resort to some weird locking mechanism, that’s kind of strange and not required. Access (actually the JET data engine, now called ACE) has had a row locking feature built in since office 2000 came out.
I’ve been deploying applications written access commercially for about 12 years now. In all those years I had one corruption occur from ONE customer.
Keep in mind that before Microsoft started pushing and selling SQL server, they rated the JET database engine for about 50 users. While my clients don't have problems, in 9 out of 10 cases when someone has a probem you find number one on the list is that they failed to split the database, or they’re not installing the front in part on each computer.
As for coding Techniques or tips? Any program design that you build and make it in which a reduced number of records are loaded into the form is a great start in your designs. In other words you never want to just simply throw up a form attached to a large table without restricting the the records to be loaded into the form. This is probably the number one tip I can give here.
For example, it makes no sense to load up an instant teller machine with everybody’s account number, and THEN ask the user what account number to work on. In fact I asked a 80 year old grandmother if this idea made any sense, and even she could figure that out. It makes far more sense to ask the user what account to work on, and then simply load in the one customer.
The above same concept applies to a split database on a network. If you ask a user for the customer account number, and THEN open up the form to the one record with a where clause, then even with 100,000 records in the back end, the form load time will be near instant because only ONE RECORD will be dragged from the customers table down the network wire.
Also keep in mind that there is a good number of commercial applications in the marketplace such as simply accounting that use a jet back end ( you can actually open simply accounting files with MS access, they renamed the extensions to hide this fact, but it is an access mdb file).
Some of my clients have 3-5 users with headsets on, and they’re running my reservation software all day long. Many have booked more then 40,000+ customers and in a 10 year period NONE of them have had a probem. (the one corruption example above was actually on a single user system believe it or not).
So, I never had one service call due to reliability of my access products. On the other hand this application only has 160 forms, and about 30,000 lines of code. It has about 65 highly related and noralized tables (relations enforced, and also cascade deletes).
So there’s no particular programming approach needed here for multi user applications, the exception being good designs that reduce bandwidth requirements.
At the end of the day it turns out that good applications are ones that do not load unnecessary records into a form. It turns out that when you design your applications this way then when you change your backend part to SQL server you find this approach results in very little work needed to make your access front end work great with a SQL server back end.
At last count I think here's an estimate of close to 100 million access users around the world. Access is by far the most popular desktop database engine out there and for the most part users find they have trouble free operation.
The only people who have operational problems on networks are those that not split, and not placed the front end on each computer.
The only compelling answers so far seem to be to reduce network traffic, and make sure your hardware cannot fail.
I find these answers unsatisfactory for a number of reasons.
The network traffic position is contradictory. If the database can only handle a certain amount of network traffic, then people need sensible guidelines to gauge this, so they can intelligently choose a database that is appropriate.
Blaming Access database crashes on hardware failures is not a defensible position. Users will (rightly) claim that their other software doesn't suffer from these kinds of problems.
Access database corruption is not an imaginary problem. The people who regularly suggest that 5 to 20 users is the upper practical limit for Access applications are speaking from experience.
Also see Corrupt Microsoft Access MDBs FAQ Which I've compiled over the years based on newsgroup postings and predates Allen's page. That said my clients have had very few corruptions over the years and have never lost data nor had to restore from backup.
I'm not sure what "write your program correcly" means in this context. I've read a few postings indicating this but it's more the implementation aspects. As Albert has pointed out you have to split the database and give each user their own copy of the FE MDB/MDE. You can't access a backend MDB over a wireless network card as they are too unstable. Same with a WAN unless the WAN is very fast/wide and very stable. We then suggest upszing to SQL Server or using Terminal Services/Citrix.
I have several clients running 20 to 25 users all day long into the system. One MDB has 120 tables while another has 160 tables. A few tables have over 600,000 to 800,000 records. The one client had 4 or 5 corruptions in five or seven years. We figured out the cause of all but two of those. And they were hardware related in one way or another. At least one of these apps should've been upsized to SQL Server. However that was cancelled on me by a Dilbert's PHB (Pointy Haired Boss).
Very good code (wrapped in trasactions with rollbacks) we had a call center with over 100 very active users at a time back in Access 97 days.
Another one with VB 5 front-end, Access Jet on portables that RAS (yes the old dial up days) to a SQL Server 6 database - 250 concurrent users.
People using the wizard to link a form directly to a table where the form is used to make edits ... might be a problem.
Uncompleted transactions e.g a recordset that does not get closed properly and a break in network connection for any reason while a database is open (have seen the power saving features of NIC causing corruption) are my number one causes
I don't believe the number of users is a limitation with MS-Access Jet Engine.
My understanding is that the Jet Engine queues up concurrent maintenance transactions to apply them 1 at a time (like a printer queue does to print print jobs). Via ODBC connectivity, and an intelligent user-application program that manages the record set sizes, locking of records open for edit, and only maintains DB connections long enough to retrive a record and save a record, that puts little strain on the jet engine. I look at mdb file as tables. You can conceivably have 100s of these in one database, or more. The SQL querying to these tables would be by random access, and the naming convention of the mdb files lets the SQL query built in the applciations program which table (mdb file) to access. MS-Access databases can be 10s 100s or 1000s of Gigabytes this way and run smoothly. Proper indexing and normalizing of data to revent storing of redundant data also helps. I've never run into a database crash or concurrency issue with MS-Access and ODBC and Win32 Perl GUI interface driving the applciation. I use no MS-Access Objects other than tables, indexes, and perhaps views/queries. And yes, I store the database on a dedicated PC and install my applications software on each workstation PC.
We distribute an application that uses an MS Access .mdb file. Somebody has noticed that after opening the file in MS Access the file size shrinks a lot. That suggests that the file is a good candidate for compacting, but we don't supply the means for our users to do that.
So, my question is, does it matter? Do we care? What bad things can happen if our users never compact the database?
In addition to making your database smaller, it'll recompute the indexes on your tables and defragment your tables which can make access faster. It'll also find any inconsistencies that should never happen in your database, but might, due to bugs or crashes in Access.
It's not totally without risk though -- a bug in Access 2007 would occasionally delete your database during the process.
So it's generally a good thing to do, but pair it with a good backup routine. With the backup in place, you can also recover from any 'unrecoverable' compact and repair problems with a minimum of data loss.
Make sure you compact and repair the database regularly, especially if the database application experiences frequent record updates, deletions and insertions. Not only will this keep the size of the database file down to the minimum - which will help speed up database operations and network communications - it performs database housekeeping, too, which is of even greater benefit to the stability of your data. But before you compact the database, make sure that you make a backup of the file, just in case something goes wrong with the compaction.
Jet compacts a database to reorganize the content within the file so that each 4 KB "page" (2KB for Access 95/97) of space allotted for data, tables, or indexes is located in a contiguous area. Jet recovers the space from records marked as deleted and rewrites the records in each table in primary key order, like a clustered index. This will make your db's read/write ops faster.
Jet also updates the table statistics during compaction. This includes identifying the number of records in each table, which will allow Jet to use the most optimal method to scan for records, either by using the indexes or by using a full table scan when there are few records. After compaction, run each stored query so that Jet re-optimizes it using these updated table statistics, which can improve query performance.
Access 2000, 2002, 2003 and 2007 combine the compaction with a repair operation if it's needed. The repair process:
1 - Cleans up incomplete transactions
2 - Compares data in system tables with data in actual tables, queries and indexes and repairs the mistakes
3 - Repairs very simple data structure mistakes, such as lost pointers to multi-page records (which isn't always successful and is why "repair" doesn't always work to save a corrupted Access database)
4 - Replaces missing information about a VBA project's structure
5 - Replaces missing information needed to open a form, report and module
6 - Repairs simple object structure mistakes in forms, reports, and modules
The bad things that can happen if the users never compact/repair the db is that it will become slow due to bloat, and it may become unstable - meaning corrupted.
Compacting an Access database (also known as a MS JET database) is a bit like defragmenting a hard drive. Access (or, more accurately, the MS JET database engine) isn't very good with re-using space - so when a record is updated, inserted, or deleted, the space is not always reclaimed - instead, new space is added to the end of the database file and used instead.
A general rule of thumb is that if your [Access] database will be written to (updated, changed, or added to), you should allow for compacting - otherwise it will grow in size (much more than just the data you've added, too).
So, to answer your question(s):
Yes, it does matter (unless your database is read-only).
You should care (unless you don't care about your user's disk space).
If you don't compact an Access database, over time it will grow much, much, much larger than the data inside it would suggest, slowing down performance and increasing the possibilities of errors and corruption. (As a file-based database, Access database files are notorious for corruption, especially when accessed over a network.)
This article on How to Compact Microsoft Access Database Through ADO will give you a good starting point if you decide to add this functionality to your app.
I would offer the users a method for compacting the database. I've seen databases grow to 600+ megabytes when compacting will reduce to 60-80.
To echo Nate:
In older versions, I've had it corrupt databases - so a good backup regime is essential. I wouldn't code anything into your app to do that automatically. However, if a customer finds that their database is running really slow, your tech support people could talk them through it if need be (with appropriate backups of course).
If their database is getting to be so large that the compaction starts to be come a necessity though, maybe it's time to move to MS-SQL.
I've found that Access database files almost always get corrupted over time. Compacting and repairing them helps hold that off for a while.
Well it really matters! mdb files keep increasing in size each time you manipulate its data, until it reaches unbearable size. But you don't have to supply a compacting method through your interface. You can add the following code in your mdb file to have it compacted each time the file is closed:
Application.SetOption ("Auto Compact"), 1
I would also highly recommend looking in to VistaDB (http://www.vistadb.net/) or SQL Compact(http://www.microsoft.com/sql/editions/compact/) for your application. These might not be the right fit for your app... but are def worth a look.
If you don't offer your users a way to decompress and the raw size isn't an issue to begin with, then don't bother.