After searching the internet via Google, we could not find any proper answer to our question. This is our situation...
We experience some warnings in the Windows Eventlog that shows this (Warning) information:
Clearing expired DatabaseQueryPath 5845128627522525605, database=Security, timeLimit=598, expired=2019-01-18T12:41:00, started=2019-01-18T12:31:02
Can anyone explain what this warning information means ?
It means something like: some d-node work was going on, and it reached its time limit and had to be cleared by the server. You might be able to correlate to some timeout error that happened at about the expired time, but on an e-node.
Related
Cannot find a clean way to set Stackdriver alert notifications on errors in cloud functions
I am using a cloud function to process data to cloud data store. There are 2 types of errors that I want to be alerted on:
Technical exceptions which might cause function to 'crash'
Custom errors that we are logging from the cloud function
I have done the below,
Created a log metric searching for specific errors (although this will not work for 'crash' as the error message can be different each time)
Created an alert for this metric in Stackdriver monitoring with parameters as in below code section
This is done as per the answer to the question,
how to create alert per error in stackdriver
For the first trigger of the condition I receive an email. However, on subsequent triggers lets say on the next day, I don't. Also the incident is in 'opened' state.
Resource type: cloud function
Metric:from point 2 above
Aggregation: Aligner: count, Reducer: None, Alignment period: 1m
Configuration: Condition triggers if: Any time series violates, Condition:
is above, Threshold: 0.001, For: 1 min
So I have 3 questions,
Is this the right way to do to satisfy my requirement of creating alerts?
How can I still receive alert notifications for subsequent errors?
How to set the incident to 'resolved' either automatically/ manually?
I was having a similar problem and managed to at least get a mail every time. The "trick" seems to be to use sum instead of count in combination with for most recent value - see the screenshot below.
This causes Stackdriver to send a mail everytime a matching log entry is found and closing the issue a minute later.
Normally, alerts resolve themselves once the alerting policy stops firing. The problem you're having with your alerts not resolving is because your metric only writes non-zero points - if there are no errors, it doesn't write zero. That means that the policy never gets an unambiguous signal that everything is fine, so the alerts just sit there (they'll automatically close after 7 days, but I imagine that's not all that useful for you).
This is a common problem and it's a tricky one to solve. One possibility is to write your policy as a ratio of errors to something non-zero, like request count. As long as the request count is non-zero, the ratio will compute zero if there are no errors, and so an alert on the ratio will automatically resolve. You need to be a bit careful about rounding errors, though - if your request count is high enough, you might potentially miss a single error because the ratio could round to zero.
Aaron Sher, Stackdriver engineer
We got around this issue by having the insertId as a label of the log-based metric we created for every log record we get from the pods running our services.
In the alerting policy, this label helped in two things:
We grouped by it (named as record_id) which served in making each incident unique, so it got reported without waiting for other incidents to get resolved and at the same time it got resolved instantly.
We used it in the documentation of the notification to include a direct link to the issue (log record) itself which was a nice and essential feature to have. https://console.cloud.google.com/logs/viewer?project=MY_PROJECT&advancedFilter=insertId%3D%22${metric.label.record_id}%22
As #Aaron Sher mentioned in his answer, it is a tricky problem. We might have done something not recommended or not efficient, but it works fine and of course we are open for improvement recommendations.
My ps4 is currently set to automatically update, but since about a week ago or so, it's required that i boot up in safe mode in order to do so, giving me error CE-30002-5 (needing safe mode to update). This is going to get really tedious turning on the system, finding out it needs a system update, shutting it down, booting back up into safemode, updating, and then letting it boot back up in standard mode. Is there a way to fix this somehow where it doesn't need safe mode to update?
From a quick google search:
https://www.playstation.com/en-ie/get-help/help-library/error-codes/ce-30002-5/
In my opinion, I feel that you are focussing on the wrong aspect of this. It seems you are focusing on the issue that this task will become tedious, and not on the issue that you should be focussing on - which is that your system is throwing an error (eg: Something is not right or broken) .
You've got an error code, and google is your friend. You'll get better help there than someone else doing your research for you.
I receive a Chrome (43.0.2357.124) "Aw, snap!" error that renders "Inspected target disconnected. Once it reloads we will attach to it automatically." in the developer console.
Without being too specific to my project and trying to make the question more generally applicable, it appears to occur occur during a Promise that features a ~5 second delay.
This function (can be seen in context on the repo https://github.com/mitTransportAnalyst/CoAXs/blob/master/public/scripts/main/services/analystService.js#L35-L44) performs fine on Firefox 38.0.5. It is receiving a large GeoJSON array - perhaps that could somehow be related to the issue, though I do not know for sure.
At this point, any advice on next steps for how to debug this would be appreciated, even googling this specific issue doesn't come up with any results (5 irrelevant results as of Wed 6:00, June 17: https://www.google.com/search?sclient=psy-ab&biw=1280&bih=678&q=%22inspected%20target%20disconnected%22%20chrome&oq=%22inspected%20target%20disconnected%22%20chrome&gs_l=serp.3...805885.806603.1.806844.2.2.0.0.0.0.72.122.2.2.0....0...1c.1.64.serp..2.0.0.O7y1WqVbj0c&pbx=1&psj=1&ion=1&cad=cbv&sei=LvKBVfarHcyw-AHVioHYBg&rct=j#q=%22Inspected+target+disconnected%22+chrome).
Added this as a comment but interested to see if anyone knows why this happened:
Issue ended up being related to the delayed receipt of > 3 MB files (assembled piecemeal). There is some (limited) documentation of this error occurring here code.google.com/p/v8/issues/detail?id=3968 (the results of which are, unfortunately, inconclusive). Ended up working with the data provider and reducing file size substantially, which resolved the issue. Curiously - if anyone can posit as to why this was occurring - console.loging where data was concatenated seemed to avoid the issue. If this didn't occur, the tab would suddenly exceed ~1.3GB and crash.
You can see link to point where I was console.loging here: https://github.com/mitTransportAnalyst/CoAXs/blob/master/public/scripts/analyst.js#L10343
Turn off your extensions. I had a Knockoutjs context debugger plugin and it caused the very same behaviour with the same version of Chrome.
I just have the same problem. When i check code that have a infinity loop. That is add the same object again and again it takes high memory. When it full the ram then the page is going to unresponsive. When i check it in Mozilla Firefox the ram full alert is shown in my Antivirus. Chrome can handle it but Mozilla can't take it. It will loop as it's possible. So don't blame chrome it is handle the exception. Check the codes. If its not your page then leave it.
Finally check the loops....
I have a "standard persistent disk" of size 10GB on Google Cloud using Ubutu 12.04. Whenever, I try to remove this, I encounter following error
The resource 'projects/XXX/zones/us-central1-f/disks/tahir-run-master-340fbaced6a5-d2' is not ready
Does anybody know about what's going on? How can I get rid of this disk?
This happened to me recently as well. I deleted an instance but the disk didn't get deleted (despite the auto-delete option being active). Any attempt to manually delete the disk resource via the dev console resulted in the mentioned error.
Additionally, the progress of the associated "Delete disk 'disk-name'" operation was stuck on 0%. (You can review the list of operations for your project by selecting compute -> compute engine -> operations from the navigation console).
I figured the disk-resource was "not ready" because it was locked by the stuck-operation, so I tried deleting the operation itself via the Google Compute Engine API (the dev console doesn't currently let you invoke the delete method on operation-resources). It goes without saying, trying to delete the operation proved to be impossible as well.
At the end of the day, I just waited for the problem to fix-itself. The following morning, I tried deleting the disk again, as it looks like the lock had been lifted in the meantime, as the operation was successful.
As for the cause of the problem, I'm still left clueless. It looks like the delete-operation was stuck for whatever reason (probably related to some issue or race-condition going on with the data-center's hardware/software infrastructure).
I think this probably isn't considered as a valid answer by SO's standards, but I felt like sharing my experience anyway, as I had a really hard time finding any info about this kind of google cloud engine problems.
If you happen to ever hit the same or similar issue, you can try waiting it out, as any stuck operation will (most likely) eventually be canceled after it has been in PENDING state for too long, releasing any locked resources in the process.
Alternatively, if you need to solve the issue ASAP (which is often the case if the issue is affecting any resource which is crtical to your production environment), you can try:
Contacting Google Support directly (only available to paid support customers)
Posting in the Google Compute Engine discussion group
Send an email to gc-team(at)google.com to report a Production issue
I believe your issue is the same as the one that was solved few days ago.
If your issue didn't happen after performing those steps, you can follow Andrea's suggestion or create a new issue.
Regards,
Adrián.
I'm want to get an idea how I should handle end-user visible error messages in my web application.
How much information do you give in
error messages?
Do you redirect all errors,
regardless of type, to a common error
page, or do you have a small set of pages (404, 403, all others)?
Do you give error codes that the user
could reference/give to you that only
you understand?
Do you give any technical details?
As I stated, my users are non-technical regular Joe folks.
Display a nice error to the user, Log a detailed error for yourself.
I try to do the following:
make sure you never run the risk of passwords or connection strings appearing in error messages.
Make sure the errors get logged to a persistable medium. I prefer a database so that I can query by time range and other paramaters. I don't log 404s.
If the application is an internal app that does not need to be pretty, it may be ok to have the error info on the page. Even if you are logging this stuff, it is nice to be able to have your users email you a screen shot or copy/paste.
If 3 seems distasteful, have some error info written as HTML comments. Then you can at least see the info by viewing source.
In general I try to give users as much information needed to help them solve their problems themselves. For example, in the case of a 404, you might want to let them know to double check that the URL they are looking for is correct.
They obviously wont need stack traces, and the like, but it will make sense for you to log that level of detail somewhere for diagnostics and debugging.
for fatal errors, keep them short, so they can repeat them over the phone or e-mail: can't connect to database, etc.
for non-fatal errors, describe the condition fully: Error, can not save the invoice without an invoice date.
I also always log everything, the parameters to the function and any internal values that may be of use.
I try to show users enough information that they know it's an issue they need to tell someone about, but try to avoid showing them so much it scares them!
If possible the error message should tell them what just failed e.g did their save just fail, or has it saved fine, but the refresh of the screen afterwards had an issue. Extra error information (e.g. stack traces) should be logged somewhere where you can get at it without the user having to send it to you.
When it comes to displaying errors for the end user, I find it a good practise to display a errorcode (so me and administrators know what error it is) and a typical "ops something went wrong, please contact an administrator"
It can be good to give a bit more information for common errors that could be the cause of the users actions. But usually too much information can scare or confuse the user.
None, just show give a reference number so user can give it to you, and you can check the details from the application logs (obviously you need to keep a copy of error logs).
Your web application's error messages should always (at
least) be the answers these 3 questions (in that order):
What happened?
Why did it happen?
What can be done about it?
I have used it for many years, originally from Apple's
"Human Interface Guidelines: The Apple Desktop Interface". Newer version.
Microsoft has similar guidelines.
This also makes it easy to write them - this structured
approach makes it faster to write them as one can just
answer the questions.
The error messages should also be specific. Any information
that the web application know about and that the user may
need to resolve the problem should be in the error message.
The (infamous) error message "An error happend." is simply
not acceptable.
Optional: more technical information that the user may not
understand can be placed at the end. But it should be marked
as such.