I have been told that "we have a feeling that the database is too slow" and that I should sort it out. Not being a database guru, I am looking for ideas.
My idea is to measure existing performance and then to attempt optimizations, measuring the effect of each.
Can anyone suggest a good (preferably FOSS) tool for measuring access to an MS Access database?
And a good book or site (or just make some suggestions) for a general approach to optimizing an MS Access database?
Thanks in advance.
Update: it seems that database writes, which are all INSERT, never UPDATE, are the problem. There are very few queries (it seems like data is stored "just in case")
Also look at the built-in performance analyzer wizard. From the Access 2003 main menu, Tools-> Analyze -> Performance. If you have Access 2007, look on the Database Tools ribbon.
However, I think your first order of business should be to get a handle on "we have a feeling that the database is too slow". Find out which aspects of the application are too slow and focus your optimization efforts on those. Don't waste time optimizing parts which don't need it.
You have the right idea, get a baseline measurement, tweak and then measure again. I made a quick routine that runs a query X times (normally 20 or so) and averages out the response time. That way I can see if I am making an improvement.
Before you start make sure that you have compacted the access database as this often helps to speed things up. After that I would look at indexes and what queries are being run. A common mistake is to bind a form to a whole table, it works fine when the app is small but once you have lots of records it can drag.
You may wish to decompile if you have not done so already. If there are lots of subforms on tabs, there is no need to load the recordsource until the form is needed. With a little effort, you can usually reduce the subforms to one or two and just load forms into the subform control, as required. Oddly enough, a long path to the back-end is detrimental to performance, so if you have one, consider a mapped drive.
As for tools, this may help:
Microsoft's Unsupported Jet Optimization Tools and more
You may also be interested in:
Tony Toews on performance issues
Query optimization
Google Jet SHOWPLAN. It will tell you a lot about how Jet/ACE is optimizing your SQL statement.
Related
I just come to use database management system and have no experience with all the DBMS programme. For those expert who have experience all the three platform, your opinion on this is very much appreciated. Please guide and let me know the pros and cons.
I been doing my database in Excel and would like to continue in other platform despite of the size of my folder is going bigger and handling the calculation, analysis and graph is getting slower and slower.
You may not need to make the jump from Excel to the Mysql/SQL Server/Oracle level.
One prime consideration is, will you have multiple users/sessions using the data at one time. If not, the whole concurrency/locking layers of these databases are unnecessary overhead and you should look into something like SQLite.
Another is backup/recovery. Are happy happy to have 'snapshots' of your data (eg hourly/daily/weekly) and revert back to them if you have a problem, or do you need to be able to restore every last change even if you've been physically beating the flames out on your server.
You can use any DBMS you want.
For SQL Server and Oracle, these are proprietary DBMSs so you need to purchase them.
For MySQL it is open source so you don't need to purchase.
For comparison between these, see:
https://blogs.oracle.com/GeorgeTrujillo/entry/oracle_versus_mysql
http://www.vesigo.com/SQLServer/DatabaseServerComparison.aspx
Is this a realistic solution to the problems associated with larger .mdb files:
split the large .mdb file into
smaller .mdb files
have one 'central' .mdb containing
links to the tables in the smaller
.mdb files
How easy would it be to make this change to an .mdb backed VB application?
Could the changes to the database be done so that there are no changes required to the front-end application?
Edit Start
The short answer is "No, it won't solve the problems of a large database."
You might be able to overcome the DB size limitation (~2GB) by using this trick, but I've never tested it.
Typically, with large MS Access databases, you run into problems with speed and data corruption.
Speed
Is it going to help with speed? You still have the same amount of data to query and search through, and the same algorithm. So all you are doing is adding the overhead of having to open up multiple files per query. So I would expect it to be slower.
You might be able to speed it up by reducing the time time that it takes to ge tthe information off of disk. You can do this in a few ways:
faster drives
put the MDB on a RAID (anecdotally RAID-1,0 may be faster)
split the MDB up (as you suggest) into multiple MDBs, and put them on separate drives (maybe even separate controllers).
(how well this would work in practice vs. theory, I can't tell you - if I was doing that much work, I'd still choose to switch DB engines)
Data Corruption
MS Access has a well deserved reputation for data corruption. To be fair, I haven't had it happen to me fore some time. This may be because I've learned not to use it for anything big; or it may be because MS has put a lot of work in trying to solve these problems; or more likely a combination of both.
The prime culprits in data corruption are:
Hardware: e.g., cosmic rays, electrical interference, iffy drives, iffy memory and iffy CPUs - I suspect MS Access does not have as good error handling/correcting as other Databases do.
Networks: lots of collisions on a saturated network can confuse MS Access and convince it to scramble important records; as can sub-optimally implemented network protocols. TCP/IP is good, but it's not invincible.
Software: As I said, MS has done a lot of work on MS Access over the years, if you are not up to date on your patches (MS Office and OS), get up to date. Problems typically happen when you hit extremes like the 2GB limit (some bugs are hard to test and won't manifest them selves except at the edge cases, which makes the less likely to have been seen or corrected, unless reported by a motivated user to MS).
All this is exacerbated with larger databases, because larger databases typically have more users and more workstations accessing it. Altogether the larger database and number of users multiply to provide more opportunity for corruption to happen.
Edit End
Your best bet would be to switch to something like MS SQL Server. You could start by migrating your data over, and then linking one MDB to to it. You get the stability of SQL server and most (if not all) of your code should still work.
Once you've done that, you can then start migrating your VB app(s) over to us SQL Server instead.
If you have more data than fits in a single MDB then you should get a different database engine.
One main issue that you should consider is that you can't enforce referential integrity between tables stored in different MDBs. That should be a show-stopper for any actual database.
If it's not, then you probably don't have a proper schema designed in the first place.
For reasons more adequately explained by CodeSlave the answer is No and you should switch to a proper relational database.
I'd like to add that this does not have to be SQL Server. Quite possibly the reason why you are reluctant to do this is one of cost, SQL Server being quite expensive to obtain and deploy if you are not in an educational or charitable organisation (when it's remarkably cheap and then usually a complete no-brainer).
I've recently had extremely good results moving an Access system from MDB to MySQL. At least 95% of the code functioned without modification, and of the remaining 5% most was straightforward with only a few limited areas where significant effort was required. If you have sloppy code (not closing connections or releasing objects) then you'll need to fix these, but generally I was remarkably surprised how painless this approach was. Certainly I would highly recommend that if the reason you are reluctant to move to a database backend is one of cost then you should not attempt to manipulate .mdb files and go instead for the more robust database solution.
Hmm well if the data is going through this central DB then there is still going to be a bottle neck in there. The only reason I can think why you would do this is to get around the size limit of an access mdb file.
Having said that if the business functions can be split off in the separate applications then that might be a good option with a central DB containing all the linked tables for reporting purposes. I have used this before to good effect
It is looking increasingly like I'll have to go live before I have had the time to tweak all the queries/tables etc, before I go live with a website (already 6 months behind schedule, so all though this is not the ideal scenario - thats how things are).
Its now a case of having to bite the bullet. Its just a case of trying to work out how big that bullet will be when we come to 'biting it'. Once the databse goes live obviously we cant change the data on a whim, because its live data. I am fairly confident on the most of db schema - for e.g. the tables are in most 3 and 4th normal form, and constraints are used to ensure data integrity. I have also put in some indexes on some column that (I think) will be used a lot in queries though this was done quite hurridly and not tested - this is the bit I am worried about.
To clarify, I am not talking about wholesale structure change. The tables themselves are unlikely to change (if ever), however it is almost guaranteed that I will have to tune the tables at some stage (either personally or by hiring someone).
I want to know how much of a task this is. Specifically, assuming a database of a few gigabytes (so far roughly 300 tables)
Assuming 50% of the tables need tuning in the next few months:
How long will it take to perform the tuning (I know this is a "how long is a piece of string" type question) - but what are the main determinants of the effort required, so I can work out how long it is likely to take?
Is it possible to either lock sections of the database, (or specific tables) whilst the indexes are being reworked, or does the entire databse need to go offline? (I am using mySQL 5.x as the db)
Is what I describe (going live before ALL tables perfectly tuned) outrageously risky/inadvisable ? (Does it justify the months of sleepless nights this has caused me so far) ?
In general it is much harder to fix a poor database design that is causing performance issues after going live becasue you have to deal with the existing records. Even worse, the poor design may not become apparent until months after going live when there are many records instead of a few. This is why databses should be designed with performance in mind (no this is not premature optimization, there are known techniques which generally perform better than other techniques and they shoulod be considered inthe design) and databases should be tested against a test set of records that is close to or more than the expected level of records you would have after a couple of years.
As to how long it will take to completely fix a badly designed database, months or years. Often the worst part is something that is central to the design (like say an EAV table) and which will require almost every query/sp/view. UDF to be adjusted to move to a better structure. You then have to ensure all records are moved to the new better structure. The sooner you can fix a mistake like this the better. Far better to move a couple of thousand records to a new structure than 100,000,000.
If your structure is ok but your queries are bad, you are better off as you can take the top ten worst performing (Choose based not just on total time to run but time X no of times run) and fix, rinse and repeat.
If you are in the midst of fixing a poor database, this book might come in handy:
http://www.amazon.com/Refactoring-Databases-Evolutionary-Database-Design/dp/0321293533/ref=sr_1_1?ie=UTF8&s=books&qid=1268158669&sr=8-1
I would try at least to quantify the limits of the database before going live, so that at least you would know when the activity generated from your application is getting near to that threshold.
You may want to simulate (automatically as much as possible) the typical usage of the database from your application, and check how many concurrent users/sessions/transactions, etc it can handle before it breaks. This, at least, should allow you to solve the "sleepless nights" issue.
As for the original "How easy is it...?" question, the answer obviously depends on many factors. However, the above analysis would undoubtedly help, as at the very least you will be in a position to say whether your database requires tweaking or not.
To answer the title question, I'd say it's fairly easy to tune your DB after deploying into Production.
It's a great idea to be improving performance after deploying to any environment. Being Production adds a bit of pressure, along with the schedule. I'd suggest deploying to Prod, and let it perform as it will. Then start measuring:
how long to run Report X in different times (peak vs after-hours, if there is such a concept in your app).
what's the user's experience when using the app for those critical use-cases?
Then take a backup of your Prod environment, and create yourself a pre-Prod environment. There you'll be able to run your upgrade scripts to be able to measure the 'how long' type questions you have. Index creation, upgrade down-times, etc. When tuning queries, etc, you'll have a great idea of how it performs with production data & volumes. Granted, you won't have the benefits of having those users performing those inserts at the same time.
Keep that backup for multiple iterations, failed upgrades, new/unprepared-for issues, etc.
Keep making backups after each deployment, so that you can test the next round of improvements to your DB.
It depends on what you're tuning. Let's say you're adding an index to a couple tables, or changing a table type from MyISAM to InnoDB or something, then with a large enough table, those things could be done in 5 to 10 minutes depending on your hardware. It won't take hours. That said, it's still best to do any live-db tuning in the middle of the night.
You can grab a read lock by calling FLUSH TABLES WITH READ LOCK but it's probably better to put up a "we're doing maitenance" message in your app for the 15-30 mins you're doing it, just to be safe.
The risk is inherent to the situation and what happens if there are serious problems. I usually take a more cowboy approach and take stuff live, especially if they aren't under high load so I can easily find pain points and fix them. If this is a mission critical system, then no, load test or whatever you can first to be sure you're as ready as you can be. Also, keep in mind that you cannot foresee all the issues you'll have. If your indexes are good, then you're probably okay to take it live and see what needs to be worked on.
Creation of objects like tables and indexes are fairly essential, even if the code has to be authorized or created by the dba. What other areas normally carried out by dbas should the accomplished developer be aware of?
A developer is responsible for doing everything that makes his code a) correct and b) fast.
This of course includes creating indexes and tables.
Making a DBA responsible for indexes is a bad idea. What if the code runs slowly? Who is to be blamed: a developer with bad code or a DBA with a bad index?
A DBA should convey database supporting operations like making backups, building the infrastructure etc, and report the lack of resources.
He or she should not be a sole person for making the decicions that affect the performance of the whole database system.
Relational databases, as for now, are not yet in that state that would allow splitting of responsibility so that developers could make the queries right and the DBA could make them fast. That's a myth.
If there is a lack of resources (say, an index makes some query fast at the expence of some DML operation being slow), this should be reported by a DBA, not fixed.
Now, it is a decision making time. What do we need more, fast query or a fast insert?
This decision should be made by the program manager (and not the DBA or developer).
And when the decision is made, the developer should be given the new task: "make the SELECT query to be as fast as possible, taking in account that you don't have this index". Or "make an INSERT query to be as fast as possible, taking in account that you will have this index".
A developer should know everything about how a database works, when it works normally.
A DBA should know everything about how to make a database to work normally.
The latter includes ability to make a backup, ability to restore from a backup and ability to detect and report a resource contention.
The ins and outs of database storage and optimization are huge. Knowing how to index and partition tables well is invaluable knowledge.
Also, how to read a query execution plan. SQL is such a cool language in that it will tell you exactly how it's going to run your code, so long as you ask nicely. This is absolutely essential in optimizing your code and finding bottlenecks.
Database maintenance (backups, shrinking files, etc) is always important to keep your server running smoothly. It's something that's often overlooked, too.
Developers should know all about triggers and stored procedures--getting the database to work for you. These things can help automate so many tasks, and often developers overlook them and try to handle it all app side, when they should really be handled by something that thinks in sets.
Which brings me to the most important point, database developers need to think in sets. To often I hear, "For each row, I want to..." and this is generally an alarm in my head. You should be thinking about how the set interacts and the actions you want to take on entire columns.
Optimization. Your code allways should use as little resources as you can achieve.
I would recommend developing an understanding of the security architecture for the relevant DBMS.
Doing so could facilitate your development of secure code.
With SQL Server specifically in mind for example:
Understand why your “managed code” (such as .NET CLR) should not be granted elevated privileges. What would be the implications of doing so?
What is Cross-Database ownership chaining? How does it work?
Understand execution context.
How does native SQL Server encryption work?
How can you sign a stored procedure? Why would you even want to do this?
Etc.
As a general rule, the more you understand about the engine you are working with, the more performance you can squeeze from it.
One thing that currently springs to mind is how to navigate and understand the information that database "system" tables/views gives to you. E.g. in sql server the views that are under the master database. These views hold information such as current logins, lists of tables and partitions etc. which is all useful stuff in trying to track down things such as hung logins or whether users are currently connected etc.
Relationships of your tables. You should always have a recent printout and soft copy of your database. You need to know the primary keys, foreign keys, required and auto filled columns, without that I think you can't write efficient queries or make sure your database is carrying only what it needs.
I think everyone else covered it.
Having a good understanding of the architecture of your database system will definitely be helpful. Can you draw a diagram by heart to show components of your DBMS and their interactions?
Why don't databases automatically index tables based on query frequency? Do any tools exist to analyze a database and the queries it is receiving, and automatically create, or at least suggest which indexes to create?
I'm specifically interested in MySQL, but I'd be curious for other databases as well.
That is a best question I have seen on stackoverflow. Unfortunately I don't have an answer. Google's bigtable does automatially index the right columns, but BigTable doesn't allow arbitrary joins so the problem space is much smaller.
The only answer I can give is this:
One day someone asked, "Why can't the computer just analyze my code and and compile & statically type the pieces of code that run most often?"
People are solving this problem today (e.g. Tamarin in FF3.1), and I think "auto-indexing" relational databases is the same class of problem, but it isn't as much a priority. A decade from now, manually adding indexes to a database will be considered a waste of time. For now, we are stuck with monitoring slow queries and running optimizers.
There are database optimizers that can be enabled or attached to databases to suggest (and in some cases perform) indexes that might help things out.
However, it's not actually a trivial problem, and when these aids first came out users sometimes found it actually slowed their databases down due to inferior optimizations.
Lastly, there's a LOT of money in the industry for database architects, and they prefer the status quo.
Still, databases are becoming more intelligent. If you use SQL server profiler with Microsoft SQL server you'll find ways to speed your server up. Other databases have similar profilers, and there are third party utilities to do this work.
But if you're the one writing the queries, hopefully you know enough about what you're doing to index the right fields. If not then having the right indexes is likely the least of your problems...
-Adam
MS SQL 2005 also maintains an internal reference of suggested indexes to create based on usage data. It's not as complete or accurate as the Tuning Advisor, but it is automatic. Research dm_db_missing_index_groups for more information.
There is a script on I think an MS SQL blog with a script for suggesting indexes in SQL 2005 but I can't find the exact script right now! Its just the thing from the description as I recall. Here's a link to some more info http://blogs.msdn.com/bartd/archive/2007/07/19/are-you-using-sql-s-missing-index-dmvs.aspx
PS just for SQL Server 2005 +
There are tools out there for this.
For MS SQL, use the SQL Profiler (to record activity against the database), and the Database Engine Tuning Advisor (SQL 2005) or the Index Tuning Wizard (SQL 2000) to analyze the activities and recommend indexes or other improvements.
Yes, some engines DO support automatic indexing. One such example for mysql is Infobright, their engine does not support "conventional" indexes and instead implicitly indexes everything - this is a column-based storage engine.
The behaviour of such engines tends to be very different from what developers (And yes, you need ot be a DEVELOPER to even be thinking about using Infobright; it is not a plug-in replacement for a standard engine) expect.
I agree with what Adam Davis says in his comment. I'll add that if such a mechanism existed to create indexes automatically, the most common reaction to this feature would be, "That's nice... How do I turn it off?"
Part of the reason may be that indexes don't just give a small speedup. If you don't have a suitable index on a large table queries can run so slowly that the application is entirely unusable, and possibly if it is interacting with other software it simply won't work. So you really need the indexes to be right before you start trying to use the application.
Also, rather than building an index in the background, and slowing things down further while it's being built, it is better to have the index defined before you start adding significant amounts of data.
I'm sure we'll get more tools that take sample queries and work out what indexes are necessary; also probably we will eventually get databases that do as you suggest and monitor performance and add indexes they think are necessary, but I don't think they will be a replacement for starting off with the right indexes.
Seems that MySQL doesn't have a user-friendly profiler. Maybe you want to try something like this, a php class based in MySQL profiler.
Amazon's SimpleDB has automatic indexing on all columns based on your usage:
http://aws.amazon.com/simpledb/
It has other limitations though:
It's a key-value store, not an RDB. Obviously that means slow joins (and no built-in join support).
It has a 10gb limit on table size. There are libraries that will handle partitioning big data for you although this locks you into that library's way of doing things, which can have its own problems.
It stores all values as strings, even numbers, which makes sorting a column with a 1,9, and 10 come out like 1,10,9 unless you use a library which hacks this by 0 padding. This also impacts negative numbers.
The 10gb limit is bigger than many might assume, so you could proceed with this for a simple site that you plan on rewriting if it ever hits big.
It's unfortunate this kind of automatic indexing didn't make it into DynamoDb, which appears to have replaced it - they don't even mention SimpleDb in their Product list anymore, you have to find it through old links to it.
Google App Engine does that (see the index.yaml file).