Is it ok to use open inline with json.dump [duplicate] - json

In Python, if you either open a file without calling close(), or close the file but not using try-finally or the "with" statement, is this a problem? Or does it suffice as a coding practice to rely on the Python garbage-collection to close all files? For example, if one does this:
for line in open("filename"):
# ... do stuff ...
... is this a problem because the file can never be closed and an exception could occur that prevents it from being closed? Or will it definitely be closed at the conclusion of the for statement because the file goes out of scope?

In your example the file isn't guaranteed to be closed before the interpreter exits. In current versions of CPython the file will be closed at the end of the for loop because CPython uses reference counting as its primary garbage collection mechanism but that's an implementation detail, not a feature of the language. Other implementations of Python aren't guaranteed to work this way. For example IronPython, PyPy, and Jython don't use reference counting and therefore won't close the file at the end of the loop.
It's bad practice to rely on CPython's garbage collection implementation because it makes your code less portable. You might not have resource leaks if you use CPython, but if you ever switch to a Python implementation which doesn't use reference counting you'll need to go through all your code and make sure all your files are closed properly.
For your example use:
with open("filename") as f:
for line in f:
# ... do stuff ...

Some Pythons will close files automatically when they are no longer referenced, while others will not and it's up to the O/S to close files when the Python interpreter exits.
Even for the Pythons that will close files for you, the timing is not guaranteed: it could be immediately, or it could be seconds/minutes/hours/days later.
So, while you may not experience problems with the Python you are using, it is definitely not good practice to leave your files open. In fact, in cpython 3 you will now get warnings that the system had to close files for you if you didn't do it.
Moral: Clean up after yourself. :)

Although it is quite safe to use such construct in this particular case, there are some caveats for generalising such practice:
run can potentially run out of file descriptors, although unlikely, imagine hunting a bug like that
you may not be able to delete said file on some systems, e.g. win32
if you run anything other than CPython, you don't know when file is closed for you
if you open the file in write or read-write mode, you don't know when data is flushed

The file does get garbage collected, and hence closed. The GC determines when it gets closed, not you. Obviously, this is not a recommended practice because you might hit open file handle limit if you do not close files as soon as you finish using them. What if within that for loop of yours, you open more files and leave them lingering?

Hi It is very important to close your file descriptor in situation when you are going to use it's content in the same python script. I today itself realize after so long hecting debugging. The reason is content will be edited/removed/saved only after you close you file descriptor and changes are affected to file!
So suppose you have situation that you write content to a new file and then without closing fd you are using that file(not fd) in another shell command which reads its content. In this situation you will not get you contents for shell command as expected and if you try to debug you can't find the bug easily. you can also read more in my blog entry http://magnificentzps.blogspot.in/2014/04/importance-of-closing-file-descriptor.html

During the I/O process, data is buffered: this means that it is held in a temporary location before being written to the file.
Python doesn't flush the buffer—that is, write data to the file—until it's sure you're done writing. One way to do this is to close the file.
If you write to a file without closing, the data won't make it to the target file.

Python uses close() method to close the opened file. Once the file is closed, you cannot read/write data in that file again.
If you will try to access the same file again, it will raise ValueError since the file is already closed.
Python automatically closes the file, if the reference object has been assigned to some another file. Closing the file is a standard practice as it reduces the risk of being unwarrantedly modified.
One another way to solve this issue is.... with statement
If you open a file using with statement, a temporary variable gets reserved for use to access the file and it can only be accessed with the indented block. With statement itself calls the close() method after execution of indented code.
Syntax:
with open('file_name.text') as file:
#some code here

Related

Need help downloading and reading a zipped CSV file in memory with Clojure

I have an external site from which I want to download a zipped CSV file. Currently, I'm downloading it unzipped, saving it to disk, then unzipping it, saving the unzipped file to disk, then reading the unzipped file with the CSV reader. Lots of useless steps in the process can be trimmed out, and I went on my way to do so.
This amazing answer helped me to get myself going. I tried to use the first option linked there (GZIPInputStream), but I get a "Not GZIP format" error, so I suppose I have to go to the second option.
This is my current code, and it does what I want it to do:
(defn download-zipped-stream!
(:body (clj-http.client/get "www.example.com" {:as :stream})))
(with-open
[stream (ZipInputStream. download-zipped-stream!)]
(.getNextEntry stream)
(doall (clojure.data.csv/read-csv (clojure.java.io/reader stream) :separator \;)))
I literally got to this by trial and error. There are mainly three things I'd like to change / understand about this code.
Ideally, I would want to break my code in two parts: one to download and unzip the content, returning a stream - the reason being that I want to decide later whether I want to read it as a csv directly, or write to disk (I don't want to lose this option, because, during development, it is much easier to read a pre-downloaded csv file than downloading the big content every single time). Turns out that, if I try to access the stream outside of the with-open call, I get a "stream closed" error (which, from what I understand, makes total sense).
On the above code, I have to call this .getNextEntry, or I get an empty list. As someone who is striving to write functional code, this bothers me, because, from what I can understand, I'm dealing with states here - my stream object looks mutable, which is something I really don't want. Isn't there a way to work around this step and straight-up not have it there?
I tried to call the read-csv method directly on the stream object, but the read-csv doesn't really know how to handle ZipInputStreams, apparently. Seeing this, I simply and hopefully throwed an io/reader call in between, and it worked. I don't know if this is the best approach, though. Is it correct?
I'm quite new to Clojure, and I'm completely clueless about Java in general, so, as you can see, my knowledge about those stream objects is pretty limited. I tried to read something about it in Java, but I quitted because I was not sure about how much of it could be useful for someone learning Clojure, so any pointers are also appreciated.
I think you are on the right approach. Suggestions to consider:
Consider using wget to manually download the *.csv.gz file to your local disk. Then, just open that local file instead of using clj-http.client/get.
I haven't played much with ZipInputStream, but if using .getNextEntry() seems to be required, just go with it.
The examples for read-csv show using a Reader to give access to the input file, so this is the expected behavior.
This template project shows how I like to organize a Clojure project & source code. Be sure to peruse the list of documentation provided.
Don't forget to utilize cljdoc.org for looking up Clojure library API docs. For example, see the API docs for data.csv.
Update
You may also want to review this answer.
Use https://github.com/techascent/tech.ml.dataset optionally with https://scicloj.github.io/tablecloth/index.html (a dplyr like api for TMD)
Also has advantage of being extremely fast and able to handle datasets that can't fit in memory, talks SQL, Arrow, et. al. Join conversation about it here:
https://clojurians.zulipchat.com/#narrow/stream/151924-data-science/topic/tech.2Eml.2Edataset

Injecting into an EXE

I really want to inject my C++ program into another (compiled) program. The way I want to do this is changing the first part of bytes (where the program starts) to goto the binary of my program (pasted into an codecave for example) and when it is finished running to goto back where it went before the injected program started running.
Is this is even possible? and if it is, is it a good/smart idea todo so?
Are there other methods of doing so?
For example:
I wrote a program that will write the current time to a file and then terminates, so if i inject it to Internet Explorer and launch it, it will first write its current time to a file and then start Internet Explorer.
In order to do this, you should start reading the documentation for PE files, which you can download at microsoft.
Doing this takes a lot research and experimenting, which is beyond the scope of stackoverflow. You should also be aware that doing this depends heavily on the executable you try to patch. It may work with your version, but most likely not with another version. There are also techniques against this kind of attack. May be built into the executable as well as in the OS.
Is it possible?
Yes. Of course, but it's not trivial.
Is it smart?
Depends on what you do with it. Sometimes it may be the only way.

Open a File in D

If I want to safely try to open a file in D, is the preferred way to either
try to open it, catch exception (and optionally figure out why) if it fails or
check if it exists, is readable and only then open it
I'm guessing the second alternative results in more IO and is more complex right?
If the file is expected to be there according to normal program operation and the given user input, then use 1 - just try to open the file and rely on exception handling to handle the exceptional situation that the file is not there.
For example:
/// If the user has a local configuration file in his home directory, open that.
/// Otherwise, open the global configuration file that is a part of the program,
/// and should be installed on all systems where the program is running.
File configFile;
if ("~/.transmogrifier.conf".expandTilde.exists)
configFile.open("~/.transmogrifier.conf".expandTilde);
else
configFile.open("/etc/transmogrifier.conf");
Note that using 2 might lead to security issues in your program. For example, if the file is present at the moment when your program checks if the file exists, but is gone when it tries to open it, your program may behave in an unexpected way. If you use 2, make sure that your program still behaves in a desirable way if opening the file fails even though your program just checked that the file exists and is readable.
Generally, it's better to check whether the file exists first, because it's often very likely that the file doesn't exist, and simply letting it fail when you try and open it is a case of using exceptions for flow control. It's also inefficient in the case where the file doesn't exist, because exceptions are quite expensive in D (though the cost of the I/O may still outweigh the cost of the exception given how expensive I/O is).
It's generally considered bad practice to use exceptions in cases where the exception is likely to be thrown. In those cases, it's far better to return whether the operation succeeded or to check whether the operation is likely to succeed prior to attempting the operation. In the case of opening files, you'd likely do the latter. So, the cleanest way to do what you're trying to do would be to do something like
if(filename.exists)
{
auto file = File(filename);
...
}
or if you want to read the whole file in as a string in one go, you'd do
if(filename.exists)
{
auto fileContents = readText(filename);
...
}
exists and readText are in std.file, and File is in std.stdio.
If you're dealing with a case where it's highly likely that the file will exist and that therefore it's very unlikely that an exception will be thrown, then skipping the check and just trying to open the file is fine. But what you want to avoid is relying on the exception being thrown when it's not unlikely that the operation will fail. You want exceptions to be thrown rarely, so you check that operations will succeed before attempting them if it's likely that they will fail and throw an exception. Otherwise, you end up using exceptions for flow control and harm the efficiency (and maintainability) of your program.
And it's often the case that a file won't be there when you try and open it, so it's usually the case that you should check that a file exists before trying to open it (but it does ultimately depend on your particular use case).
I'd say you need to be prepared for an exception to be thrown anyway, otherwise you have a race condition (another process may delete the file between the test and the open etc). So it's best just to go ahead and open, then deal with the contingency.

.tbc to .tcl file

this is a strange question and i searched but couldn't find any satisfactory answer.
I have a compiled tcl file i.e. a .tbc file. So is there a way to convert this .tbc file back to .tcl file.
I read here and someone mentioned about ::tcl_traceCompile and said this could be used to disassemble the .tbc file. But being a novice tcl user i am not sure if this is possible, or to say more, how exactly to use it.
Though i know that tcl compiler doesn't compile all the statements and so these statements can be easily seen in .tbc file but can we get the whole tcl back from .tbc file.
Any comment would be great.
No, or at least not without a lot of work; you're doing something that quite a bit of effort was put in to prevent (the TBC format is intended for protecting commercial code from prying eyes).
The TBC file format is an encoding of Tcl's bytecode, which is not normally saved at all; the TBC stands for Tcl ByteCode. The TBC format data is only produced by one tool, the commercial “Tcl Compiler” (originally written by either Sun or Scriptics; the tool dates from about the time of the transition), which really is a leveraging of the built-in compiler that every Tcl system has together with some serialization code. It also strips as much of the original source code away as possible. The encoding used is unpleasant; you want to avoid writing your own loader of it if you can, and instead use the tbcload extension to do the work.
You'll then need to use it with a custom build of Tcl that disables a few defensive checks so that you can disassemble the loaded code with the tcl::unsupported::disassemble command (which normally refuses to take apart anything coming from tbcload); that command exists from Tcl 8.5 onwards. After that, you'll have to piece together what the code is doing from the bytecodes; I'm not aware of any tools for doing that at all, but the bytecodes are mostly fairly high level so it's not too difficult for small pieces of code.
There's no manual page for disassemble; it's formally unsupported after all! However, that wiki page I linked to should cover most of the things you need to get started.
I can say partially "yes" and conditionaly too. That condition is if original tcl code is written in namespace and procs are defined within namespace curly braces. Then you source tbc file in tkcon/wish and see code using info procs and namespace command. Offcourse you need to know namespace name. However that also can be found.

When is it okay to check if a file exists?

File systems are volatile. This means that you can't trust the result of one operation to still be valid for the next one, even if it's the next line of code. You can't just say if (some file exists and I have permissions for it) open the file, and you can't say if (some file does not exist) create the file. There is always the possibility that the result of your if condition will change in between the two parts of your code. The operations are distinct: not atomic.
To make matters worse, the nature of the problem means that if you're tempted to make this check, odds are you're already worried or aware that something you don't control is likely to happen to the file. The nature of development environments make this event less likely to happen during your testing and very difficult to reproduce. So not only do you have a bug, but the bug won't show up while testing.
Therefore under normal circumstances the best course of action is to not even try to check if a file or directory exists. Instead, put your development time into handling exceptions from the file system. You have to handle these exceptions anyway, so this is a much better use of your resources. Even though exceptions are slow, checking the existence of a file requires an extra trip to disk, and disk access is much slower. I even have a well-voted answer to this effect in another question.
But I'm having some doubts. In .Net, for example, if that's really always true, the .Exists() methods wouldn't be in the API in the first place. Also consider scenarios where you expect your program to need to the create file. The first example that comes to mind is for a desktop application. This application installs a default user-config file to it's home directory, and the first time each user starts the application it copies this file to that user's application data folder. It expects the file not to exist on that first startup.
So when is it acceptable to check in advance for the existence (or other attributes, like size and permissions) of a file? Is expecting failure rather than success on the first attempt a good enough rule of thumb?
The File.Exists method exists primarily for testing for the existence of a file when you do not intend to open the file. For example testing for the existence of a locking file whose very existence tells you something but whose contents are immaterial.
If you are going to open the file then you will need to handle any exception regardless of the results of any prior calls to File.Exists. So, in general, there is no real value in calling it in these circumstances. Just use the appropriate FileMode enumeration value in your open method and handle any exceptions, as simple as that.
EDIT: Even though this is couched in terms of the .Net API, it is based on the underlying system API. Both Windows and Unix have system calls (i.e. CreateFile) that use the equivalent of the FileMode enumeration. In fact in .Net (or Mono) the FileMode value is just passed through to the underlying system call.
As a general policy, methods like File.Exists, or properties like WeakReference.Alive or SomeConcurrentQueue.Count are not useful as a means of ensuring that a "good" state exists, but can be useful as a means of determining that a "bad" state exists without doing any unnecessary (and possibly counterproductive) work. Such situations may arise in many scenarios involving locks (and files, since they often include locks). Because all routines that need to lock on a set of resources should, whenever practical, always acquire locks on those resources in a consistent order, it may be necessary to acquire a lock on one resource which is expected to exist before acquiring a resource which may or may not exist. In such a scenario, while it's impossible to avoid the possibility that one might lock the first resource, fail to acquire the second, and then release the first lock without having done any useful work with it, checking for the existence of the second resource before acquiring the lock on the first would minimize unnecessary and useless effort.
It depends on your requirements, but one way is to try to obtain an exclusive open file handle, with some sort of retry mechanism. Once you have that handle, it's going to be hard (or impossible) for another process to delete (or move) that file.
I've used code in .NET similiar to the following to obtain an exclusive file handle, where I expect some other process to be possibly writing the file:
FileInfo fi = new FileInfo(fullFilePath);
int attempts = maxAttempts;
do
{
try
{
// Asking to open for reading with exclusive access...
fs = fi.Open(FileMode.Open, FileAccess.Read, FileShare.None);
}
// Ignore any errors...
catch {}
if (fs != null)
{
break;
}
else
{
Thread.Sleep(100);
}
}
while (--attempts > 0);
One example: You may be able to check for existence of files which you are unable to open (due to, for example, permissions).
Another, possibly better example: You want to check for the existence of a Unix device file. But definitely do not open it; opening it has side effects (e.g., open/close /dev/st0 will rewind the tape)
In *nix environment a well established method for checking if another copy of the program is already running is to create a lock file. So the check for file existence is used to verify this.
I'd only check it if I expect it to be missing (e.g. the application settings) and only if I have to read the file.
If I have to write to the file, it's either a logfile (so I can just append to it or create a new one) or I replace the contents of it, so I might as well recreate it anyway.
If I expect that the file exists, it would be right that an Exception is thrown. Exception handling should then inform the user or perform recovery. My opinion is that this results in cleaner code.
File protection (i.e. not overwriting (possibly important) files) is different, in that case I'd always check whether a file exists, if the framework doesn't do that for me (think SaveFileDialog)
I think the check makes sense when you want to be sure the file was there in the first place. As you said settings files...if there is a file I will try and merge the existing settings instead of blowing them away.
Other cases would be when a user tells me to do something with a file. Yes I know the openFileDialog will check if a file exists (But this is optional). I vaguely remeber back in VB6 this was not the case, so verifying the file existed that they just told me to use was common.
I'd rather not program by exception.
Edit
I didn't miss the point. You might try and access the file, an exception is thrown and then when you go to create the file, the file was already placed there. Which now causes your exception handling code to go on the fritz. So I guess we could then have an exception handler in our exception handler to catch that the file changed yet again...
I'd rather try and prevent exceptions, not use them to control logic.
Edit
Additionally another time to check for attributes such as size is when your waiting for a file operation to finish, yes you never know for sure but with a good algorithim and depending on the system writting the file you might be able to handle a good deal of cases (Had a system running for five years which watched for small files coming over ftp, and it uses a the same api as the file system watcher, and then starts polling waiting for the file to stop changing, before raising an event that the file is ready to be consumed).
This may be too simplistic, but I would think the primary reason for checking for the existence of a file (hence the existence of .Exists()) would be to prevent unintended overwrites of existing files, not to avoid exceptions caused by attempting to access non-existent nor non-accessible files.
EDIT 2
This was, in fact, too simplistic and I recommend you see Stephen Martin's response.
If you're that concerned about somebody else removing the file, perhaps you should implement some sort of locking system. For instance, I used to work on the code for C-News, a Usenet news server. Since a lot of the things it did could happen asynchronously, it would "lock" a file or a directory by making a temp file, and then hard linking it to a file named "LOCK". If the link failed, it would mean that some other version of the program was writing to that directory, otherwise it was yours and you could do what you like.
The nifty thing about this is that most of the program was written in shell and awk, and this was a very portable locking mechanism. Also, the lock file would contain the PID of the owner, so you could look at the existing lock file to see if the owner was still running.
We have a diagnostic tool that has to gather a set of files, installer log included. Depending on different conditions the installer log can be in one of two folders. Even worse, there can be different versions of the log in both of these folders. How does the tool find the right one?
It's quite simple if you check for existence. If only one is present, grab that file. If two exist, find which has the latest modification time and grab that file. That's just normal way of doing things.
While this is a language-agnostic post, it seems you are talking about .NET. Most systems (.NET and others) have more detailed APIs in order to figure out if the file exists when opening the file.
What you should do is make a call to access the file, as it will typically indicate through some sort of error that the file doesn't exist (if it truly doesn't). In .NET, you would have to go through the P/Invoke layer and use the CreateFile API function. If that function returns an error of ERROR_FILE_NOT_FOUND, then you know that the file does not exist. If it returns successfully, then you have a handle that you can use.
The point here is that it is a somewhat atomic operation, which ultimately is what you are looking for.
Then, with the handle, you can pass it to a FileStream constructor and perform your work on the file.
There are a numbers of possible applications you may well be writing that a simple File.Exists is more than adequate for the job. If it's a config file that only your application will use then you do not need to go so overkill in your exception handling.
Whilst the "flaws" you have pointed out in using this method are all valid, it doesn't mean they are not acceptable flaws for some situations.
A variety of apps include built-in web servers. It's common for them to generate self-signed SSL certificates the first time they start up. A straightforward way to implement this would be to check whether the cert exists on startup, and create it if not.
In theory, it could exist for the check, and not exist later. In that case, we'd get an error when we try to listen, but that can be handled quite easily and is not a big deal.
It's also possible that it doesn't exist for the check, and exists later. In that case, it either gets overwritten with a new cert, or writing the new cert fails, depending on your policy. The first is a little annoying, in terms of the cert change causing some alarm, but also not really critical, especially if you do a bit of logging to indicate what is going on.
And, in practice, both cases are extraordinarily unlikely to ever come up.
Like you pointed out its always important what the program should do if the file is missing. In all my applications the user can always delete the config file and the application will create a new one with default values. No Problem. I also ship my applications without config files.
But users tend to delete files and even files they should not delete like serial keys and template files. I always check for these files because without them the application is unable to run at all. I can not create a new serial key from default.
Whats should happen when the file is missing? You can do a file find or exception handler but the real question is : What will happen when the file is missing? Or how important is the file for the application. I check all the time before I try to access any support files for the app. Additional I do error handling if the file is corrupt and can not be loaded.
I think anytime that you know that the file may or may not exist and you want to perform some alternate action based on the existence of the file, you should do the check because in this case it's not an exceptional condition for the file to not exist. This won't absolve you from having to handle exceptions -- from someone else either removing or creating the file between the check and your open -- but it makes the intent of the program clear and doesn't rely on exception handling to perform flow-control logic.
EDIT: An example might be log rotation on start up.
try
{
if (File.Exists("app.log"))
{
RotateLogs();
}
log = File.Open("app.log", FileMode.CreateNew );
}
catch (IOException)
{
...another writer, perhaps?
}
catch (UnauthorizedAccessException)
{
...maybe I should have used runas?
}
To answer my own question (in part), I want to expand on the example I used: a default config file.
Rather than check if it exists at app startup and try to copy the file if the check fails, the thing to do is always try to copy the file. You just do it in such a way that the copy will fail if the file exists rather than replace an existing file. This way all you need to do is catch and ignore any exception thrown if the copy fails because of an existing file.
Your problem could easily be solved with basic computer science... read up on Semaphores.
(I did not mean to sound like a jerk, I was just pointing you to a simple answer for a common problem).
I think the reason for "Exists" is to determine when files are missing without the need for creating all the OS housekeeping data required to access the file or having exceptions being thrown. So it's a file handling optimisation more than anything else.
For a single file, the saving the "Exists" gives is generally insignificant. If you were checking if a file exists many, many times (for example, searching for #include files) then the saving could be significant.
In .Net, the specification for File.Exists doesn't list any exceptions that the method might throw, unlike for example File.Open which lists nine exceptions, so there's certainly less checking going on in the former.
Even if "Exists" returns true, you still need to handle exceptions when opening the file, as the .Net reference suggests.