I have the problem with MS. Access. The problem is, My current MS. Access Back-end file size is 320 MB but after I compact database it still has file size only 222 MB so it mean I lost file size 98 MB. My question is, what that problem? after it lost file size 98 MB why it keep more slower then before when user use it? What about record in that file lost or not? Thank you in advance.
This is normal behaviour, and you did not lose any data. A compact + repair (C + R) is a normal maintains that you should do on your database. How often you do this kind off maintains will much depend on how many users, how much data “churning” etc.
So some can go for weeks, or even perhaps longer without having to C + R the back end. Some, much less time.
So why does the file grow like this?
There are several reasons, but one is simply that to allow multiple users, then when you delete a record, access cannot reclaim the disk space, because you (may) have multiple users working with the data. You cannot “move” all that other data down to fill the “hole” because that would CHANGE the position of existing data.
So if I am editing record 400, and another user deletes record 200, then a “hole” exists at 200. However, if I want to reclaim the space, I would have to “move down” every single record to fill that hole. So if the database has 100,000 records, and I delete record 50, then I now have to move a MASSIVE 99950 records back down to fill that one hole! That is way too slow.
So in place of the HUGE (and slow) process of processing 99950 records (a lot of data) then access just simple leaves the “hole” in that spot.
The reason is multi-user. With say 5 users working on the system, then you can’t start moving around data WHILE users are working. The place or spot of an existing record would thus be moving all the time.
So moving records around is NOT practical if you are to allow multiple users.
The other issue that causes file growth is that if you open up and edit record (again say position and record 50 out of 100,000). What happens if you type in extra information and now the record is TOO LARGE for that spot at position 50?
So now your record is too large. Now we have the opposite problem of a delete – we need to expand and make the “hole” or spot 50 larger. And to do that, we might have to move 100,000 or more records to increase the size of the hole for that one record.
The “hole” or “spot” for the record is NOT large enough anymore.
So what access does is simply mark and set the old record (the old spot) as deleted, and then access puts the too large record we just editing at the end of the file (thus expands at the end of the file). So the file grows, even with just editing, and not necessary only due to delets.
So deleting a record does not really “remove” the hole, and is to slow from a performance point of view.
And as noted, if we move records (which is way too slow), then other users working on the data would find the position of the current record they are working on NOT in the same place anymore.
So we can’t start “moving around” that data during edting.
So access is NOT able to re-claim space during operation. It is too slow, causes way too much disk i/o for a simple delete, and also as noted would not work in multi-user when position of records are always changing due to some delete.
To reclaim all those “holes” and “spots”, then you do a C + R. So this is a scheduled type of maintains that you do when no one is working on data. (say late at night, or after all workers go home). This also explains why only ONE user can be working to do a C + R.
So you not losing any data – the C + R simply is re-claiming all those “holes” and “spots” of un-used space, but the process is time consuming.
So it is too slow “during” operation of your application to re-claim those spots. Such re-claim of wasted and un-used space thus only occurs during a C + R, and not during high speed and interactive operations when your users are working.
I should point out that “most” database systems have this issue, and while “some” attempt re-claim the un-used space, it is simple better to have a separate process and separate action to reclaim that space during system maintains, not during use of the application.
What you are seeing is normal.
And after a C + R you should see improved performance. Often not much, but if the file is really large, full of lots of gaps and holes, then a C + R will reduce the file size a lot, and can help perfoamnce. Access also re-builds the indexes, and also orders data by PK order – this can also increase performance as you “more often” get to read data in PK order.
Related
Our product has been growing steadily over the last few years and we are now on a turning point as far as data size for some of our tables is, where we expect that the growth of said tables will probably double or triple in the next few months, and even more so in the next few years. We are talking in the range of 1.4M now, so over 3M by the end of the summer and (since we expect growth to be exponential) we assume around 10M at the end of the year. (M being million, not mega/1000).
The table we are talking about is sort of a logging table. The application receives data files (csv/xls) on a daily basis and the data is transfered into said table. Then it is used in the application for a specific amount of time - a couple of weeks/months - after which it becomes rather redundant. That is: if all goes well. If there is some problem down the road, the data in the rows can be useful to inspect for problem solving.
What we would like to do is periodically clean up the table, removing any number of rows based on certain requirements, but instead of actually deleting the rows move them 'somewhere else'.
We currently use MySQL as a database and the 'somewhere else' could be the same, but can be anything. For other projects we have a Master/Slave setup where the whole database is involved, but that's not what we want or need here. It's just some tables where the Master table would need to become shorter and the Slave only bigger, not a one-on-one sync.
The main requirement for the secondary store would be that the data should be easy to inspect/query when need to, either by SQL or another DSL, or just visual tooling. So we are not interested in backing up the data to one or more CSV files or another plain text format, since that is not as easy to inspect. The logs will then be somewhere on S3 so we would need to download it, grep/sed/awk on it... We'd much rather have something database like that we can consult.
I hope the problem is clear?
For the record: while the solution can be anything we prefer to have the simplest solution possible. It's not that we don't want Apache Kafka (example), but then we'd have to learn it, install it, maintain it. Every new piece of technology adds onto our stack, the lighter it remains the more we like it ;).
Thanks!
PS: we are not just being lazy here, we have done some research but we just thought it'd be a good idea to get some more insight in the problem.
I have a table, “tblData_Dir_Statistics_Detail” with 15 fields that is holding about 150,000 records. I can load the records into this table but, am having trouble updating other tables from this table (I want to use several different queries to update a couple different tables). There is only one index on the table and the only thing unusual about the table is there are 3 text fields that I run out to 255 characters because some of the paths\data are that long or even exceed 255. I have tried trim these to 150 characters but it has no impact on the correcting the problems that I am having using this table.
Additionally, I manually recreated the table because it acts like it is corrupted. Even this had no impact on the problems.
The original problem that I was getting is that my code would stop with a “System resource exceeded.”.
Here is the list of things I am experience and can’t seem to figure out why:
When I use the table in an update query (using task manager) I always see my Physical Memory usage for Access jump from about 35,000 K to 85,000 K instantly when the code hits this query and then, within a second or two, I get the resources exceed error.
Sometimes, but not all the time, when I compact and repair, tblData_Dir_Statistics_Detail is deleted by the process and is subsequently listed in MSysCompactError table as an error. The “ErrorCode” in the table is -1011 and the “ErrorDiscription” is “System resource exceeded.”
Sometimes, but not all the time, when I compact and repair, if I lose tblData_Dir_Statistics_Detail, I will lose the next one below it in the database window (shows also in the SYS table).
Sometimes, but not all the time, when I compact and repair, if I lose tblData_Dir_Statistics_Detail, I will lose the next TWO positions below it in the database window (shows also in the SYS table).
I have used table structures like this with much larger tables without problems for years. Additionally, I have a parallel table “tblData_Dir_Statistics” which has virtually the same structure and holds the same data but at a summarized level, and have no trouble with that or any other table.
Summary:
My suspicion is that there is some kind of character being imported into one of the fields that is corrupting this entire table.
If this is true, how could I find the corruption?
If it is not this, what else could it be?
Many Thanks!
A few considerations:
Access files have a size-limit of 2 GB. If your file becomes at any time bigger than 2 GB (even by 1 byte) the whole file is corrupted
Access creates temporary objects when sorting data and/or executing queries (and those temporary objects are created and stored in the file). Depending on the complexity of your queries, those temporary objects might be pushing the file size up (see previous paragraph).
If you are using text fields with lengths bigger than 255 characters, consider using Memo fields (these fields cannot be indexed as far as I remember, so be careful when using them)
Consider adding more indexes to your tables to ease and speed up the queries.
Consider dividing the database: Put all the data in one (or more) file(s) and link the tables in it (them) to another Access file, and execute the queries in this last one.
I've just switched from mysql to work with mongoDB and it's pretty awesome but I'm struggling with the db datasize..
I have about 700 documents per day, and each has about 900 comments embeddeds inside.
The average object size is about 53k (this is only with a couple of hours), so with easy maths it should be 53*700 = 37MB. But the total size is about 250MB (storageSize) (only 2h!)
So, I'll create more than 1GB of data every day, in mysql was about 100mb/day (even less).
is this normal? How can I deal with it? Thanks!
The reason why you are seeing this is because of fragmentation of record objects.
Each document within MongoDB is held within an internal record object, think of it as a C++ struct which represents a document.
Record objects are single contiguous pieces of hard disk space, so as to limit the number of hard disk look-ups and make them sequential. This hard disk look-up has a nasty down side though, if are constantly growing your documents then they must constantly be moved to larger and larger record objects, sending the old record objects to the $freelists (an internal list of free spaces) to be used by another object of that size that comes in.
This creates fragmentation, I believe this is what you are seeing with your own data.
One way to solve this normally is to use powerof2sizes ( http://docs.mongodb.org/manual/reference/command/collMod/ ), unfortunately due to how your document increases I do not think this will work.
Another way to solve this would be to manually set the padding so that the document always fits and never moves however you cannot yet: https://jira.mongodb.org/browse/SERVER-1810
The best way, currently, to solve this problem is to change your schema to factor out the comments into their own collection.
This does mean two queries but they should be two indexed super fast queries, maybe a couple of microseconds slower than loading that document from disk.
incase of planning to change schema, visit http://docs.mongodb.org/ecosystem/use-cases/pre-aggregated-reports to avoid data grow and fragmentation issue.
One thing I haven't seen in any of the current answers is document padding on initial insert. You can avoid the data growth (to some extent) by "padding" the documents with some extra space at the beginning to accommodate the comments that will be added in the future.
http://docs.mongodb.org/manual/faq/developers/#faq-developers-manual-padding
Using the data you already have on hand about your average document size, add a little bit to that and on your initial insert, include that padding. It should improve your update performance as well as avoid the swiss cheese effect the commenters above are talking about.
For reference, this is why you are seeing so much extra space:
http://docs.mongodb.org/manual/core/record-padding/
So, at my workplace, they have a huge access file (used with MS Access 2003 and 2007). The file size is about 1.2GB, so it takes a while to open the file. We cannot delete any of the records, and we have about 100+ tables (each month we create 4 more tables, don't ask!). How do I improve this, i.e. downsizing the file?
You can do two things:
use linked tables
"compact" the database(s) every once in a while
The linked tables will not in of themselves limit the overall size of the database, but it will "package" it in smaller, more manageable files. To look in to this:
'File' menu + 'Get External data' + 'Linked tables'
Linked tables also have many advantages such as allowing one to keep multiple versions of data subset, and selecting a particular set by way of the linked table manager.
Compacting databases reclaims space otherwise lost as various CRUD operations (Insert, Delete, Update...) fragment the storage. It also regroup tables and indexes, making search more efficient. This is done with
'Tools' menu + 'Database Utilities' + 'Compact and Repair Database...'
You're really pushing up against the limits of MS Access there — are you aware that the file can't grow any larger than 2GB?
I presume you've already examined the data for possible space saving through additional normalization? You can "archive" some of the tables for previous months into separate MDB files and then link them (permanently or as needed) to your "current" database (in which case you'd actually be benefiting from what was probably an otherwise bad decision to start new tables for each month).
But, with that amount of data, it's probably time to start planning for moving to a more capacious platform.
You should really think about your db architecture. If there aren't any links between the tables you could try to move some of them to another database (One db per year :) as a short-term solution..
A couple of “Grasping at straws” ideas
Look at the data types for each column, you might be able to store some numbers as bytes saving a small amount per record
Look at the indexes and get rid of the ones you don’t use. On big tables unnecessary indexes can add a large amount of overhead.
I would + 2^64 the suggestions about the database design being a bit odd but nothing that hasn’t already been said so I wont labour the point
well .. Listen to #Larry, and keep in mind that, on the long term, you'll have to find another database to hold your data!
But on the short term, I am quite disturbed by this "4 new tables per month" thing. 4 tables per month is 50 per year ... That surely sounds strange to every "database manager" here. So please tell us: how many rows, how are they built, what are they for, and why do you have to build tables every month?
Depending on what you are doing with your data, you could also think about archiving some tables as XML files (or even XLS?). This could make sense for "historic" data, that do not have to be accessed through relations, views, etc. One good example would be the phone calls list collected from a PABX. Data can be saved as/loaded from XML/XLS files through ADODB recordsets or the transferDatabase method
Adding more tables every month: that is already a questionable attitude, and seems suspicious regarding data normalisation.
If you do that, I suspect that your database structure is also sub-optimal regarding field sizes, data types and indexes. I would really start by double checking those.
If you really have a justification for monthly tables (which I cannot imagine, again), why not having 1 back-end per month ?
You could also have on main back-end, with, let's say, 3 month of data online, and then an archive db, where you transfer your older records.
I use that for transactions, with the main table having about 650.000 records, and Access is very responsive.
I have read several times that after you delete a row in an InnoDB table in MySQL, its space is not reused, so if you make a lot of INSERTs into a table and then periodically DELETE some rows the table will use more and more space on disk, as if the rows were not deleted at all.
Recently I've been told though that the space occupied by deleted rows is re-used but only after some transactions are complete and even then - not fully. I am now confused.
Can someone please make sense of this to me? I need to do a lot of INSERTs into an InnoDB table and then every X minutes I need to DELETE records that are more than Y minutes old. Do I have a problem of ever-growing InnoDB table here, or is it paranoia?
It is paranoia :)
DB's don't grow in size unnecessarily, but for performance issues space is not freed either.
What you've heard most probably is that if you delete records that space is not given back to the Operating System. Instead, it's kept as an empty space for the DB to re-use afterwards.
This is because:
DB needs to have some HD space to save its data; if it doesn't have any space, it reserves some empty space at first.
When you insert a new row, a piece of that space is used.
When you run out of free space, a new block is reserved, and so on.
Now, when you delete some rows, in order to prevent reserving more and more blocks, its space is kept free but never given back to the Operating System, so you can use it again later without any need of reserving new blocks.
As you can see, space is re-used, but never given back. That's the key point to your question.
in innodb, there is no practical way of freeing up the space.
use per table ibdata file, that will
enable you to delete record copy the
data to a new table and delete old
table, thus recovering records.
use mysqldump and whole lots of
receipe to clean up the whole
server. Check following:
http://dev.mysql.com/doc/refman/5.0/en/adding-and-removing.html
All of these methods become impractical when you are using huge tables(in my case they are more than 250GB) and you must keep them deleting records to better performance.
You will have to seriously think, whether you have enough space on your harddisk to perform one of the above function (in my case I do not think 1TB is enough for all these actions)
with Innotab table (and mysql itself) the option are fairly limited if have serious database size.