MS Access: 100% CPU usage - ms-access

We have an MS Access 2003 ADP application with SQL Server. Sometimes, without any apparent reason, this application starts consuming 100% of CPU time (50% on a dual-core CPU system). This is what Windows Task Manager and other process monitoring/analysis tools are showing, anyway. Usually, the only way to stop such CPU thrashing is to restart the application.
We still don't know how to trigger this problem at will. But I have a feeling that it usually happens when some of the forms get closed by a user.
NB: Recently we noticed that one of the forms consistently makes CPU usage raise to 100% whenever it gets minimized. Most of the time CPU usage goes back to normal when that form is "un-minimized". Perhaps, it's a different problem, but we'd like to uncover this mystery, too. :)
Googling for a solution of this problem didn't yield very good results. The most frequent theory is that MS Access gets into some sort of waiting-for-events loop which is practically harmless, performance-wise, because the thread running that loop has very low priority. This doesn't seem to help us because in our case (a) it certainly does hurt the system's performance and (b) it's still unclear what exactly makes Access to get into such "bad state" and how to avoid that.

I've gotten this CPU usage problem before in the past, but I don't remember if we ever discovered a solution or it just went away at some point.
In your post, you didn't mention reviewing the VBA. I'd recommend looking for a loop that under certain conditions becomes an endless loop.

I wonder if it is a hangover from this problem that access used to have in the "old" days
http://support.microsoft.com/kb/160819
Whilst the article does say it is fixed in versions >=2000 it still might be something.

Related

Why is there a 10 second delay in this http conversation?

I've got a weird situation. The first time I hit an embedded web server (uclinux/boa) at 10.1.10.29, I get a 10 second delay in the browser window before things start happening. "first time" means I haven't hit the machine in a few days. Browser type/OS doesn't matter (source is 10.1.10.20)
I've got a wireshark capture of it happening.
And here is the detail of frame 296:
Note packet 374 doesn't pop out for around 10 seconds after 296. The packets between those 2 aren't from the machine in question. It's just sitting there for 10 seconds and decides to retransmit. How's it supposed to work?
The main reason is most certainly because the code was swapped out from memory.
MS-Windows is really bad in that regard. If some program does not get used for "too long", it gets swapped out of memory. Period. When you come back at it, it has to re-read it back from the hard drive.
The one good thing (main reason) Windows does that is to defragment the kernel memory. For that, it is good.
You have similar problems under Linux, however, only if your server needs the memory. In other words, if you have tons of processes and they all fight for as much of memory as possible, then it is likely to swap out the least used software. Otherwise it will stay in place.
If you were to use the Cassandra database system, you would notice that on any computer that runs anything else than Cassandra. If you just run Cassandra, it remains fast all the time. If you run other software that use a lot of the memory, Cassandra is slow on first access. This is particularly noticeable.
I want to add the answer that solved our problem that had the problem with the 10 second delay, then working and after 5 minutes of inactivity adding another 10 seconds delay.
First of all, we wiresharked everything, and tried to find some kind of error in code, or in the way that the computer or server handled the network traffic. Found nothing out of the ordinary.
After much searching we found it was a DNS-"problem". In the DNS-server that the client computer used, there were dual entries for the domain name of the server. One was correct and one (the first one in the list) was wrong.
So removing the wrong dns pointer solved the problem.
This means the problem was that the computer tried the first address it got, waited 10seconds to get a reply, didnt get it and went to the second address in line. This creates no error messages as this is how DNS is supposed to work. And that is why all our wireshark logs showed up as just waiting 10 seconds with no error and no reason, and then just jump into life, work for as long as the DNS record is valid (5 minutes in our case) and then the procedure needs to be done again.
Hope this helps someone who has a similar problem.

How to manage or convert extremely large program

I have a Microsoft Access business critical database that was originally created in the 90's and has been enlarged and upgraded up to Access 2007 at this point. We have been using this database as a front end for a custom written ERP system essentially. We have moved most of the data over to an SQL server long ago, but we are still using MS Access as a front end. AS the project grows, we have a full time developer, we have started having stability problems and extremely frequent crashes of unknown causes.
As an example: 1 time out of 10 or so, a certain form will crash if I change the data in 1 specific field. There is no code firing at the time, the data is in a local temporary table that typically only has 5 rows most of the time. If I change the data in the table nothing goes wrong, but if I change it on the form Access will hard crash and dump me to the desktop. There are other examples I could provide of unexplained crashes
I am looking for advice on where to go at this point -- the access front end has all of the business logic for running our company essentially so I can't just abandon it. Ideally we would re-write the entire front end in some other language. The problem is that as a small company we don't have the resources to re-write the entire system in anything resembling a good time frame, and don't have the cash flow to pay someone else to do it. My ideal solution would be a conversion of some sort from the access front end to another end point -- whether web or local windows -- but my searches here and on google make that seem like a non-starter.
So essentially every avenue I look at seems to be a dead-end:
We can't find the source of the crashes to stabilize our current system,
We can't stop production in our current system for as long as it would take to re-write it,
We can't afford to pay someone else to write a new system,
Automated conversion tools seem like a waste of money
Are there other options or which of the options that I have thought of seems best?
We have an enterprise level program with an Access front end and an SQL Server back end. I wonder if it might not help to split the program up into different pieces for diagnostics. For instance if you have Order Entry and Inventory Management you could have one front end for each function. (Yes I can hear the howling in the background but if it was only for the purpose of diagnosis maybe it would help... )
You can also export the Access Database objects to text files and then import them into a fresh new database to get rid of weird errors in some occasions.
Well, I guess I will rephrase the basic issue here. If you have two Computing Science graduates on staff then they should have long ago anticipated that they reached the limits of Access. I fail to see this as any different than an overworked, oven in a restaurant that now have too many customers or a delivery truck that does not have the capacity to deliver goods to customers.
Since funds don't exist to re-write then your staff failed to put aside funds on a monthly bases to deal with this situation and now your choices as a result of this delayed action to deal with this growing problem places you in a difficult situation.
The computing science people you have on staff should have long ago seen this wall and limit you hit coming. In my experience is with most CS people is they consider Access rather limited, and thus even MORE alarm bells should have been ringing here and this means even less excuse exists for you to be placed in this unfortunate predicament.
So, assuming the computing science staff you have are well maintaining this application (and I graciously accept this is the case), then then a logical conclusion is this application has reached or exceeded the limits of Access. As noted such limits should have been long ago anticipated.
As you well point out that funds now do not exist for a re-write, then few choices exist without such funds.
However, you case may not be so bad since we NOW know you have experienced developers on staff. Given this case, then my suggestion would be to consider breaking out modules or small manageable parts and features one at a time from the application and having your well trained and experienced developers build either a web interface, or perhaps even using something like .net if you wish to stay 100% desktop. So this "window" of opportunity is a great chance to consider a change in architecture)
Since the data limits are SQL server and NOT Access, then both applications (existing access front end) and the new parts can BOTH well and easy operate on the same data. As you do this, you then break out and remove the existing parts from the Access application. This would suggest you eventaully return to accetable stability in the Access applicaiton. At that point , you could continue, or stop to save funds.
As noted, without funds to re-write, then the only choice is to find some means to free up SOME limited resources on a monthly basis to solve this problem.
At the end of the day, the solution to this problem is more resources, but without such resources then few technical choices and options exist here. Based on the information given so far YOU have made it clear you don't have resources here. However, the solution to this problem here requires resource allocation and planning.
In other words a technical fix to this problem without resources allocated is not likely an available option for you.
I apologize sincerely for not being able to give you a technology solution here, but this looks to be a solution that will require resources to be allocated to the problem and no simple shortcut or trick or magic silver bullet exists here.

How can I investigate these mystery Django crashes?

A Django site (hosted on Webfaction) that serves around 950k pageviews a month is experiencing crashes that I haven't been able to figure out how to debug. At unpredictable intervals (averaging about once per day, but not at the same time each day), all requests to the site start to hang/timeout, making the site totally inaccessible until we restart Apache. These requests appear in the frontend access logs as 499s, but do not appear in our application's logs at all.
In poring over the server logs (including those generated by django-timelog) I can't seem to find any pattern in which pages are hit right before the site goes down. For the most recent crash, all the pages that are hit right before the site went down seem to be standard render-to-response operations using templates that seem pretty straightforward and work well the rest of the time. The requests right before the crash do not seem to take longer according to timelog, and I haven't been able to replicate the crashes intentionally via load testing.
Webfaction says that isn't a case of overrunning our allowed memory usage or else they would notify us. One thing to note is that the database is not being restarted (just the app/Apache) when we bring the site back up.
How would you go about investigating this type of recurring issue? It seems like there must be a line of code somewhere that's hanging - do you have any suggestions about a process for finding it?
I once had some issues like this, and it basically boiled down to my misunderstanding of thread-safety within django middleware. Basically the django middleware is I believe a singleton that is shared among all threads, and these threads were thrashing with the values set on a custom middleware class I had. My solution was to rewrite my middleware to not use instance or class attributes that changed, and to switch the critical parts of my application to not use threads at all with my uwsgi server as these seemed to be an overall performance downside for my app. Threaded uwsgi setups seem to work best when you have views that may complete at different intervals (some long running views and some fast ones).
Since you can't really describe what the failure conditions are until you can replicate the crash, you may need to force the situation with ab (apache benchmark). If you don't want to do this with your production site you might replicate the site in a subdomain. Warning: ab can beat the ever loving crap out of a server, so RTM. You might also want to give the WF admins a heads up about what you are going to do.
Update for comment:
I was suggesting using the exact same machine so that the subdomain name was the only difference. Given that you used a different machine there are a large number of subtle (and not so subtle) environmental things that could tweak you away from getting the error to manifest. If the new machine is OK, and if you are willing to walk away from the problem without actually solving it, you might simply make it your production machine and be happy. Personally I tend to obsess about stuff like this, but then again I'm also retired and have plenty of time to play with my toes. :-)

php cli script hangs with no messages

I've written a PHP script that runs via SSH and nohup, meant to process records from a database and do stuff with them (eg. process some images, update some rows).
It works fine with small loads, up to maybe 10k records. I have some larger datasets that process around 40k records (not a lot, I realize, but it adds up to a lot of work when each record requires the download and processing of up to 50 images).
The larger datasets can take days to process. Sometimes I'll see in my debug logs memory errors, which are clear enough-- but sometimes the script just appears to "die" or go zombie on me. My tail of the debug log just stops, with no error messages, the tail of the nohup log ends with no error, and the process is still showing in a ps list, looking like this--
26075 pts/0 S 745:01 /usr/bin/php ./import.php
but no work is getting done.
Can anyone give me some ideas on why a process would just quit? The obvious things (like a php script timeout and memory issues) are not a factor, as far as I can tell.
Thanks for any tips
PS-- this is hosted on a godaddy VDS (not my choice). I am sort of suspecting that godaddy has some kind of limits that might kick in on me despite what overrides I put in the code (such as set_time_limit(0);).
Very likely the OOM killer. If you really , really really want to stay out of its reach, as root, have your process write -17 to /proc/self/oom_adj. Caution: The kernel usually knows better. Evading the OOM killer can actually cripple the same RDBMS that you are trying to query. What a vicious cycle that would be :)
You probably (instead) want to stagger queries based on what you read from /proc/loadavg and /proc/meminfo. If you increase loads or swap exponentially, you need to back off, especially as a background process :)
Additionally, monitor IOWAIT while you run. This can be averaged from /proc/stat when compared with the time the system booted. Note it when you start and as you progress.
Unfortunately, the serial killer known as the OOM killer does not maintain a body count that is accessible beyond parsing kernel messages.
Or, your cron job keeps hitting its ulimited amount of allocated heap. Either way, your job needs to back off when appropriate, or prevent its own demise (as noted above) prior to doing any work.
As a side note, you probably should not be doing what you are doing on shared hosting. If its that big, its time to get a VPS (at least) where you have some control over what process gets to do what.

Is there any way to monitor the number of CAS stackwalks that are occurring?

I'm working with a time sensitive desktop application that uses p/invoke extensively, and I want to make sure that the code is not wasting a lot of time on CAS stackwalks.
I have used the SuppressUnmanagedCodeSecurity attribute where I think it is necessary, but I might have missed a few places. Does anyone know if there is a way to monitor the number of CAS stackwalks that are occurring, and better yet pinpoint the source of the security demands?
You can use the Process Explorer tool (from Sysinternals) to monitor your process.
Bring up Process Explorer, select your process and right click to show "Properties". Then, on the .NET tab, select the .NET CLR Security object to monitor. Process Explorer will show counters for
Total Runtime Checks
Link Time Checks
% Time in RT Checks
Stack Walk Depth
These are standard security performance counters described here ->
http://msdn.microsoft.com/en-us/library/adcbwb64.aspx
You could also use Perfmon or write your own code to monitor these counters.
As far as I can tell, the only one that is really useful is item 1. You could keep an eye on that while you are debugging to see if it is increasing substantially. If so, you need to examine what is causing the security demands.
I don't know of any other tools that will tell you when a stackwalk is being triggered.