Open a File in D - exception

If I want to safely try to open a file in D, is the preferred way to either
try to open it, catch exception (and optionally figure out why) if it fails or
check if it exists, is readable and only then open it
I'm guessing the second alternative results in more IO and is more complex right?

If the file is expected to be there according to normal program operation and the given user input, then use 1 - just try to open the file and rely on exception handling to handle the exceptional situation that the file is not there.
For example:
/// If the user has a local configuration file in his home directory, open that.
/// Otherwise, open the global configuration file that is a part of the program,
/// and should be installed on all systems where the program is running.
File configFile;
if ("~/.transmogrifier.conf".expandTilde.exists)
configFile.open("~/.transmogrifier.conf".expandTilde);
else
configFile.open("/etc/transmogrifier.conf");
Note that using 2 might lead to security issues in your program. For example, if the file is present at the moment when your program checks if the file exists, but is gone when it tries to open it, your program may behave in an unexpected way. If you use 2, make sure that your program still behaves in a desirable way if opening the file fails even though your program just checked that the file exists and is readable.

Generally, it's better to check whether the file exists first, because it's often very likely that the file doesn't exist, and simply letting it fail when you try and open it is a case of using exceptions for flow control. It's also inefficient in the case where the file doesn't exist, because exceptions are quite expensive in D (though the cost of the I/O may still outweigh the cost of the exception given how expensive I/O is).
It's generally considered bad practice to use exceptions in cases where the exception is likely to be thrown. In those cases, it's far better to return whether the operation succeeded or to check whether the operation is likely to succeed prior to attempting the operation. In the case of opening files, you'd likely do the latter. So, the cleanest way to do what you're trying to do would be to do something like
if(filename.exists)
{
auto file = File(filename);
...
}
or if you want to read the whole file in as a string in one go, you'd do
if(filename.exists)
{
auto fileContents = readText(filename);
...
}
exists and readText are in std.file, and File is in std.stdio.
If you're dealing with a case where it's highly likely that the file will exist and that therefore it's very unlikely that an exception will be thrown, then skipping the check and just trying to open the file is fine. But what you want to avoid is relying on the exception being thrown when it's not unlikely that the operation will fail. You want exceptions to be thrown rarely, so you check that operations will succeed before attempting them if it's likely that they will fail and throw an exception. Otherwise, you end up using exceptions for flow control and harm the efficiency (and maintainability) of your program.
And it's often the case that a file won't be there when you try and open it, so it's usually the case that you should check that a file exists before trying to open it (but it does ultimately depend on your particular use case).

I'd say you need to be prepared for an exception to be thrown anyway, otherwise you have a race condition (another process may delete the file between the test and the open etc). So it's best just to go ahead and open, then deal with the contingency.

Related

Which exceptions should I handle?

I am designing a WinForms application.
At the moment, all my exceptions are being logged at the UI level.
However, for none of them, do I do anything other than logging. Is this indicative of a bad design?
Furthermore, in one method (.NET's method execute a command on a windows service), it can throw exceptions of type Win32Exception and InvalidOperationException.
With an exception like FileNotFound, I could prompt the user to provide another file (although .NET has methods built-in to check for the file's existence), but with exceptions like the above, they are down to low-level problems with the machine, so these can only be logged really.
Is this the right way to go with deciding which exceptions to catch? Also, should I catch or throw ArgumentNullException? It indicates a problem with the code, right?
(I'll use Eric Lippert's taxonomy of exceptions throughout the answer.)
If there is nothing you can do about it, then just log and bail out of the current operation, screen or the entire application, depending on the seriousness of the error. Just don't try to proceed in the face of fatal exceptions. In some extreme cases (like an AccessViolationException), just logging or even letting your finally blocks run may not be a good idea because you don't know what will happen if you run code in a corrupt process.
FileNotFoundException and other exogenous exceptions you should handle anyways. Even though you can check if a file exists beforehand, nothing prevents it from becoming inaccessible in between the check and its use. These exceptions depend on external conditions that you have no control over, so you should be prepared to handle them.
You should never catch ArgumentNullException or any other boneheaded exceptions. If it hurts when you do that, don't do it. If you pass a null argument when you shouldn't, then don't pass it. Fix the code so that it deals with the null reference beforehand.

How to write a safe code : Condition checking Vs Exception handling?

Conditional Checking:
if denominator == 0:
// do something like informing the user, or skipping this iteration.
else:
result = numerator/denominator
if FileExists('path/to/file'):
// open file read & write.
else:
// do something like informing the user, or skipping this iteration.
Exception Handling:
try:
result = numerator/denominator
catch (DevidedByZeroException):
//take action
try:
//open file read & write.
catch (FileNotExistsException):
//take action
I'm frequently encountering situations like this. Which one to go for? Why?
As ever it depends.
In my opinion exceptions should be exceptional.
If you are routinely expecting that something might not work then you should do conditional checks. Conditional check code gets executed all the time regardless of whether there is a problem, so the checks shouldn't take a lot of time.
You should leave exception handling for rare or unlikely circumstances. So how likely is it going to be that the file won't exist?
I had a case where I wanted to write a file to a network drive, the code to check that the UNC share exists can take upto 30 seconds to timeout so you want to be using exceptions here!
I think second snipet with exception handling is better because you can catch other exceptions generated by unpredicated errors.
And in many cases your instruction don't throw a exception when something going not good, then you must use condition to detect it, you can do it without exception catching or use it in try block with throw good instance of expeption class.
In the first example, it's entirely possible that the file could be deleted between the check and the open, so you could get a FileNotExistsException anyway. In the Python community this is known as the LBYL (look before you leap) vs EAFP (easier to ask forgiveness than permission) debate, and Pythonic consensus is that EAFP is better in general.

Rethrowing exception question

I read several posts on exception handling/rethrowing exceptions on here (by looking at the highest voted threads), but I am slightly confused:
-Why would you not want the immediate catch block to handle an exception but rather something above it?
-Also, I read quite frequently that you should only handle exceptions which you can "handle". Does that mean actually doing something about it, such as retrying the operation?
You might want to catch an exception (e.g. file not found) and do some processing - e.g. if you open two files and the second file is missing, you will want to close the first file again before you continue, so that it isn't left open.
You might then want to tell the caller that an error occurred, so you re-throw the same exception or throw a new exception, describing the problem.
In some cases, if you get an exception, your code has no way of knowing if it is an error or not (e.g. if you are asked to load an XML file, but you get a File Not Found exception, is that an error, or should you return a blank XMl result?). In these cases you either want to re-throw the exception, or not handle it all all, and let the calling code decide how to deal with the problem.
Your second point is the answer to the first. Sometimes the lower-level functionality does not know enough about the context of the application to know what the right action should be. For example, if opening a file for reading fails because there is no file of that name, then the application might want to ask for a different file, or abort the whole operation, or whatever. At some level, some part of the application will take the responsibility to do the right thing, unless of course just having the program crash is an acceptable action to take.
Answering to your second question - you need to handle the exception in the immediate block only if can do anything about it: for example close connection to db, close streams, retry or retry with different params, log exception (if there will not be an exception generic handler on the higher levels). Probably only immediate block of code knows such details and can handle them. Calling blocks need to know that the error occurred they might know better what to do with exception.
For example immediate block works with a file. A caller might try to open a file from different locations(In the process of "probing") and ignore several errors as long as at least one succeeds. Another part of code might consider the very first failed attempt as an error. Caller block might chose to notify the user that an error is occurred, probably let her/him know some helpful info on how to fix the problem. Also it is nice to provide the means to notify support of the problem – some kind of dialog allowing user to ask for help, describe problem and send a message. In this message you might attach logs, some info about the environment like OS, versions of frameworks, programs, browser capabilities whatever you need to diagnose the problem (if user permits you to do so).
An exception is "handled" if the method which caught it can satisfy its construct. For example, the contract for a routine OpenRecentDocument which is called when the user selects an item from the "recent files" menu might specify that it must either (1) successfully open a document window, or (2) try unsuccessfully to open a document window, roll back any side-effects resulting from the attempt, and notify the user of the what happened. If OpenRecentDocument catches an exception while trying to open the file, but it is able to roll back any side effects from the attempt and notify the user, the routine will have satisfied its contract and should thus return without rethrowing the exception.
One unfortunate "gotcha" in all this is that there isn't any standard means by which routines which throw an exception can indicate whether their attempted operation has resulted in side-effects which could not be rolled back. There is no inherent way, for example, of distinguishing an InvalidOperationException which occurs unexpectedly while updating a shared data structure (which would imply that other open documents may have been corrupted), from an InvalidOperationException which occurs while updating the data associated with the document being loaded, even if one has anticipated the latter possibility and provided for it. The best one can do is either try to catch any InvalidOperationException which might occur in the latter situation near the spot that it occurs, encapsulate that exception in some other exception type, and throw that, or else have data structures maintain an "object corrupted" flag and ensure that if a data structure is found to be corrupt, all future operations on it will fail as cleanly as possible. Neither approach is at all elegant. The more common approach, which could probably be described as "hope for the best", usually works.

Best way to handle a typical precondition exception?

Which of the following ways of handling this precondition is more desirable and what are the greater implications?
1:
If Not Exists(File) Then
ThrowException
Exit
End If
File.Open
...work on file...
2:
If Exists(File) Then
File.Open
....work on file...
Else
ThrowException
Exit
End
Note: The File existence check is just an example of a precondition to HANDLE. Clearly, there is a good case for letting File existence checks throw their own exceptions upwards.
I prefer the first variant so it better documents that there are preconditions
Separating the pre-condition check from work is only valid if nothing can change between the two. In this case an external event could delete the file before you open it. Hence the check for file existence has little value, the open call has to check this anyway, let it produce the exception.
It's a style thing. Both work well however I prefer option 1. I like to exit my method as soon as I can and have all the checks up front.
Readability of first approach is higher than the second one.
Second option can nest quite fast if you have several preconditions to check; moreover, it suggests that the if/else is somehow in the normal flow, while it is really only for exceptional situations.
As well, expressiveness of first approach is therefore higher than the second one.
As we are talking about preconditions, they should be checked in the beginning of the procedure, just to ensure the contract is being respected; for this reason, the entire check should be somehow separated from the remaining part of the procedure.
For these two reasons, I would definitely go for the first option.
Note: I am talking here about preconditions: I expect that the contract of your function explicitly defines the file as existing, and therefore not having it would be a sign of programming error.
Otherwise, if we are simply talking about exception handling, I would simply leave it to the File.Open, handling that exception only if there is some idea on how to proceed with that.
Every exception must be produced at the appropriate level. In this case, your exception is an open() issue, which is handled by the open() call. Therefore, you should not add exception code to your routine, because you would duplicate stuff. This holds unless:
you want to abstract your IO backend (say your high level routine can either use file open, but also MySQL in the future). In this case, it would be better for client codes to know a more standard and unique exception will be produced if IO issues arise
the presence of a low level exception implies a higher level exception with high level semantic (for example, not being able to open a password file means that no auth is possible and therefore you should raise something like UnableToAuthenticateException)
As for coding style of your two cases, I would definitely go for the first. I hate long blocks of code, in particular under ifs. They also tend to nest and if you choose the second strategy, you will end up indenting too much.
A true precondition is something which, if happens, is a bug in the caller situation: you design a function under certain conditions but they are not hold, so the caller should never have called the function with these data.
Your case of not finding a file could be like this, if the file is required and its existence is checked before in another part of the code; however, this is not quite so, as djna says: file deletion or network failure could cause an error to happen right when you open the file.
The most common treatment is then to try to open the file, and throw an exception on failure. Then, assuming that an exception hasn't been thrown, continue with normal work.

When is it okay to check if a file exists?

File systems are volatile. This means that you can't trust the result of one operation to still be valid for the next one, even if it's the next line of code. You can't just say if (some file exists and I have permissions for it) open the file, and you can't say if (some file does not exist) create the file. There is always the possibility that the result of your if condition will change in between the two parts of your code. The operations are distinct: not atomic.
To make matters worse, the nature of the problem means that if you're tempted to make this check, odds are you're already worried or aware that something you don't control is likely to happen to the file. The nature of development environments make this event less likely to happen during your testing and very difficult to reproduce. So not only do you have a bug, but the bug won't show up while testing.
Therefore under normal circumstances the best course of action is to not even try to check if a file or directory exists. Instead, put your development time into handling exceptions from the file system. You have to handle these exceptions anyway, so this is a much better use of your resources. Even though exceptions are slow, checking the existence of a file requires an extra trip to disk, and disk access is much slower. I even have a well-voted answer to this effect in another question.
But I'm having some doubts. In .Net, for example, if that's really always true, the .Exists() methods wouldn't be in the API in the first place. Also consider scenarios where you expect your program to need to the create file. The first example that comes to mind is for a desktop application. This application installs a default user-config file to it's home directory, and the first time each user starts the application it copies this file to that user's application data folder. It expects the file not to exist on that first startup.
So when is it acceptable to check in advance for the existence (or other attributes, like size and permissions) of a file? Is expecting failure rather than success on the first attempt a good enough rule of thumb?
The File.Exists method exists primarily for testing for the existence of a file when you do not intend to open the file. For example testing for the existence of a locking file whose very existence tells you something but whose contents are immaterial.
If you are going to open the file then you will need to handle any exception regardless of the results of any prior calls to File.Exists. So, in general, there is no real value in calling it in these circumstances. Just use the appropriate FileMode enumeration value in your open method and handle any exceptions, as simple as that.
EDIT: Even though this is couched in terms of the .Net API, it is based on the underlying system API. Both Windows and Unix have system calls (i.e. CreateFile) that use the equivalent of the FileMode enumeration. In fact in .Net (or Mono) the FileMode value is just passed through to the underlying system call.
As a general policy, methods like File.Exists, or properties like WeakReference.Alive or SomeConcurrentQueue.Count are not useful as a means of ensuring that a "good" state exists, but can be useful as a means of determining that a "bad" state exists without doing any unnecessary (and possibly counterproductive) work. Such situations may arise in many scenarios involving locks (and files, since they often include locks). Because all routines that need to lock on a set of resources should, whenever practical, always acquire locks on those resources in a consistent order, it may be necessary to acquire a lock on one resource which is expected to exist before acquiring a resource which may or may not exist. In such a scenario, while it's impossible to avoid the possibility that one might lock the first resource, fail to acquire the second, and then release the first lock without having done any useful work with it, checking for the existence of the second resource before acquiring the lock on the first would minimize unnecessary and useless effort.
It depends on your requirements, but one way is to try to obtain an exclusive open file handle, with some sort of retry mechanism. Once you have that handle, it's going to be hard (or impossible) for another process to delete (or move) that file.
I've used code in .NET similiar to the following to obtain an exclusive file handle, where I expect some other process to be possibly writing the file:
FileInfo fi = new FileInfo(fullFilePath);
int attempts = maxAttempts;
do
{
try
{
// Asking to open for reading with exclusive access...
fs = fi.Open(FileMode.Open, FileAccess.Read, FileShare.None);
}
// Ignore any errors...
catch {}
if (fs != null)
{
break;
}
else
{
Thread.Sleep(100);
}
}
while (--attempts > 0);
One example: You may be able to check for existence of files which you are unable to open (due to, for example, permissions).
Another, possibly better example: You want to check for the existence of a Unix device file. But definitely do not open it; opening it has side effects (e.g., open/close /dev/st0 will rewind the tape)
In *nix environment a well established method for checking if another copy of the program is already running is to create a lock file. So the check for file existence is used to verify this.
I'd only check it if I expect it to be missing (e.g. the application settings) and only if I have to read the file.
If I have to write to the file, it's either a logfile (so I can just append to it or create a new one) or I replace the contents of it, so I might as well recreate it anyway.
If I expect that the file exists, it would be right that an Exception is thrown. Exception handling should then inform the user or perform recovery. My opinion is that this results in cleaner code.
File protection (i.e. not overwriting (possibly important) files) is different, in that case I'd always check whether a file exists, if the framework doesn't do that for me (think SaveFileDialog)
I think the check makes sense when you want to be sure the file was there in the first place. As you said settings files...if there is a file I will try and merge the existing settings instead of blowing them away.
Other cases would be when a user tells me to do something with a file. Yes I know the openFileDialog will check if a file exists (But this is optional). I vaguely remeber back in VB6 this was not the case, so verifying the file existed that they just told me to use was common.
I'd rather not program by exception.
Edit
I didn't miss the point. You might try and access the file, an exception is thrown and then when you go to create the file, the file was already placed there. Which now causes your exception handling code to go on the fritz. So I guess we could then have an exception handler in our exception handler to catch that the file changed yet again...
I'd rather try and prevent exceptions, not use them to control logic.
Edit
Additionally another time to check for attributes such as size is when your waiting for a file operation to finish, yes you never know for sure but with a good algorithim and depending on the system writting the file you might be able to handle a good deal of cases (Had a system running for five years which watched for small files coming over ftp, and it uses a the same api as the file system watcher, and then starts polling waiting for the file to stop changing, before raising an event that the file is ready to be consumed).
This may be too simplistic, but I would think the primary reason for checking for the existence of a file (hence the existence of .Exists()) would be to prevent unintended overwrites of existing files, not to avoid exceptions caused by attempting to access non-existent nor non-accessible files.
EDIT 2
This was, in fact, too simplistic and I recommend you see Stephen Martin's response.
If you're that concerned about somebody else removing the file, perhaps you should implement some sort of locking system. For instance, I used to work on the code for C-News, a Usenet news server. Since a lot of the things it did could happen asynchronously, it would "lock" a file or a directory by making a temp file, and then hard linking it to a file named "LOCK". If the link failed, it would mean that some other version of the program was writing to that directory, otherwise it was yours and you could do what you like.
The nifty thing about this is that most of the program was written in shell and awk, and this was a very portable locking mechanism. Also, the lock file would contain the PID of the owner, so you could look at the existing lock file to see if the owner was still running.
We have a diagnostic tool that has to gather a set of files, installer log included. Depending on different conditions the installer log can be in one of two folders. Even worse, there can be different versions of the log in both of these folders. How does the tool find the right one?
It's quite simple if you check for existence. If only one is present, grab that file. If two exist, find which has the latest modification time and grab that file. That's just normal way of doing things.
While this is a language-agnostic post, it seems you are talking about .NET. Most systems (.NET and others) have more detailed APIs in order to figure out if the file exists when opening the file.
What you should do is make a call to access the file, as it will typically indicate through some sort of error that the file doesn't exist (if it truly doesn't). In .NET, you would have to go through the P/Invoke layer and use the CreateFile API function. If that function returns an error of ERROR_FILE_NOT_FOUND, then you know that the file does not exist. If it returns successfully, then you have a handle that you can use.
The point here is that it is a somewhat atomic operation, which ultimately is what you are looking for.
Then, with the handle, you can pass it to a FileStream constructor and perform your work on the file.
There are a numbers of possible applications you may well be writing that a simple File.Exists is more than adequate for the job. If it's a config file that only your application will use then you do not need to go so overkill in your exception handling.
Whilst the "flaws" you have pointed out in using this method are all valid, it doesn't mean they are not acceptable flaws for some situations.
A variety of apps include built-in web servers. It's common for them to generate self-signed SSL certificates the first time they start up. A straightforward way to implement this would be to check whether the cert exists on startup, and create it if not.
In theory, it could exist for the check, and not exist later. In that case, we'd get an error when we try to listen, but that can be handled quite easily and is not a big deal.
It's also possible that it doesn't exist for the check, and exists later. In that case, it either gets overwritten with a new cert, or writing the new cert fails, depending on your policy. The first is a little annoying, in terms of the cert change causing some alarm, but also not really critical, especially if you do a bit of logging to indicate what is going on.
And, in practice, both cases are extraordinarily unlikely to ever come up.
Like you pointed out its always important what the program should do if the file is missing. In all my applications the user can always delete the config file and the application will create a new one with default values. No Problem. I also ship my applications without config files.
But users tend to delete files and even files they should not delete like serial keys and template files. I always check for these files because without them the application is unable to run at all. I can not create a new serial key from default.
Whats should happen when the file is missing? You can do a file find or exception handler but the real question is : What will happen when the file is missing? Or how important is the file for the application. I check all the time before I try to access any support files for the app. Additional I do error handling if the file is corrupt and can not be loaded.
I think anytime that you know that the file may or may not exist and you want to perform some alternate action based on the existence of the file, you should do the check because in this case it's not an exceptional condition for the file to not exist. This won't absolve you from having to handle exceptions -- from someone else either removing or creating the file between the check and your open -- but it makes the intent of the program clear and doesn't rely on exception handling to perform flow-control logic.
EDIT: An example might be log rotation on start up.
try
{
if (File.Exists("app.log"))
{
RotateLogs();
}
log = File.Open("app.log", FileMode.CreateNew );
}
catch (IOException)
{
...another writer, perhaps?
}
catch (UnauthorizedAccessException)
{
...maybe I should have used runas?
}
To answer my own question (in part), I want to expand on the example I used: a default config file.
Rather than check if it exists at app startup and try to copy the file if the check fails, the thing to do is always try to copy the file. You just do it in such a way that the copy will fail if the file exists rather than replace an existing file. This way all you need to do is catch and ignore any exception thrown if the copy fails because of an existing file.
Your problem could easily be solved with basic computer science... read up on Semaphores.
(I did not mean to sound like a jerk, I was just pointing you to a simple answer for a common problem).
I think the reason for "Exists" is to determine when files are missing without the need for creating all the OS housekeeping data required to access the file or having exceptions being thrown. So it's a file handling optimisation more than anything else.
For a single file, the saving the "Exists" gives is generally insignificant. If you were checking if a file exists many, many times (for example, searching for #include files) then the saving could be significant.
In .Net, the specification for File.Exists doesn't list any exceptions that the method might throw, unlike for example File.Open which lists nine exceptions, so there's certainly less checking going on in the former.
Even if "Exists" returns true, you still need to handle exceptions when opening the file, as the .Net reference suggests.