on Linux ext3 filesystem, what happens if mv() is called on the same file (file descriptor) while reading the file? It is actually an exam question and I can only say something like:
CPU traps OS for interrupt handling
etc, etc.
I would appreciate if OS guys out there can help me out, please :D
Linux rename man page:
That explains most of the details of this.
If one or more processes have the file open when the last link is removed,
the link shall be removed before rename() returns, but the removal of the
file contents shall be postponed until all references to the file are closed.
Related
In Python, if you either open a file without calling close(), or close the file but not using try-finally or the "with" statement, is this a problem? Or does it suffice as a coding practice to rely on the Python garbage-collection to close all files? For example, if one does this:
for line in open("filename"):
# ... do stuff ...
... is this a problem because the file can never be closed and an exception could occur that prevents it from being closed? Or will it definitely be closed at the conclusion of the for statement because the file goes out of scope?
In your example the file isn't guaranteed to be closed before the interpreter exits. In current versions of CPython the file will be closed at the end of the for loop because CPython uses reference counting as its primary garbage collection mechanism but that's an implementation detail, not a feature of the language. Other implementations of Python aren't guaranteed to work this way. For example IronPython, PyPy, and Jython don't use reference counting and therefore won't close the file at the end of the loop.
It's bad practice to rely on CPython's garbage collection implementation because it makes your code less portable. You might not have resource leaks if you use CPython, but if you ever switch to a Python implementation which doesn't use reference counting you'll need to go through all your code and make sure all your files are closed properly.
For your example use:
with open("filename") as f:
for line in f:
# ... do stuff ...
Some Pythons will close files automatically when they are no longer referenced, while others will not and it's up to the O/S to close files when the Python interpreter exits.
Even for the Pythons that will close files for you, the timing is not guaranteed: it could be immediately, or it could be seconds/minutes/hours/days later.
So, while you may not experience problems with the Python you are using, it is definitely not good practice to leave your files open. In fact, in cpython 3 you will now get warnings that the system had to close files for you if you didn't do it.
Moral: Clean up after yourself. :)
Although it is quite safe to use such construct in this particular case, there are some caveats for generalising such practice:
run can potentially run out of file descriptors, although unlikely, imagine hunting a bug like that
you may not be able to delete said file on some systems, e.g. win32
if you run anything other than CPython, you don't know when file is closed for you
if you open the file in write or read-write mode, you don't know when data is flushed
The file does get garbage collected, and hence closed. The GC determines when it gets closed, not you. Obviously, this is not a recommended practice because you might hit open file handle limit if you do not close files as soon as you finish using them. What if within that for loop of yours, you open more files and leave them lingering?
Hi It is very important to close your file descriptor in situation when you are going to use it's content in the same python script. I today itself realize after so long hecting debugging. The reason is content will be edited/removed/saved only after you close you file descriptor and changes are affected to file!
So suppose you have situation that you write content to a new file and then without closing fd you are using that file(not fd) in another shell command which reads its content. In this situation you will not get you contents for shell command as expected and if you try to debug you can't find the bug easily. you can also read more in my blog entry http://magnificentzps.blogspot.in/2014/04/importance-of-closing-file-descriptor.html
During the I/O process, data is buffered: this means that it is held in a temporary location before being written to the file.
Python doesn't flush the buffer—that is, write data to the file—until it's sure you're done writing. One way to do this is to close the file.
If you write to a file without closing, the data won't make it to the target file.
Python uses close() method to close the opened file. Once the file is closed, you cannot read/write data in that file again.
If you will try to access the same file again, it will raise ValueError since the file is already closed.
Python automatically closes the file, if the reference object has been assigned to some another file. Closing the file is a standard practice as it reduces the risk of being unwarrantedly modified.
One another way to solve this issue is.... with statement
If you open a file using with statement, a temporary variable gets reserved for use to access the file and it can only be accessed with the indented block. With statement itself calls the close() method after execution of indented code.
Syntax:
with open('file_name.text') as file:
#some code here
I am using PhpStorm for few months now and I have just noticed something really weird about language injections in the version 9.0.
Sometimes I have to declare that some strings in my PHP are Javascript instructions. When I do so and save my file (with auto-upload on), it looks like PhpStorm is doing a lot of remote checks, file moves and transfers, I dont really understand why... and I'm afraid that it may overwrite files that I didn't modifie. I'm working directly on a production server with other people, I know it's dangerous but we have no choice for the moment.
In the file transfer logs, I have something like that :
[18/09/2015 10:47] Automatic upload completed in less than a minute: 2 items deleted, 50 items moved, 4 files transferred (4 Kb/s)
Can someone help understand what is going on ?
I have found a way to do what I want, but didn't find the reason of theses uploads that PhpStorm does without asking anything...
The problem is that, until now, I didn't found a way to save files one by one. It looks like PhpStorm has only a "Save all" option that uploads every files changed since last save (if you ask for auto-upload). And in the case of a language injection PhpStorm seems to change a something in the opened files that forces it re-upload them all.
So I disabled auto-upload and bound a shortcut to "Upload to default server". This option uploads only your current file but it saves it before. So it's a kind of auto-upload but a little less agressive and it gives me the possibility to just save my files (with "save all") or to save only the current one and upload it instantly.
This is the way I used to work before using PhpStorm, I find it more convenient and less violent than this automatic upload process that Phpstorm uses.
If someones find something better I'm opened to any advice.
Following my previous question (Maxmind world cities database issue (MySql)), for which I did not receive any solution, just closed my question with couple comments (anyway, thanks for the comments).
I repost my question in an other way : how could somebody import a database contained in txt file under bytecode form, file compressed in a tar.gz file (may be twice), and this on MySQL for Windows.
Here is the file : http://www.maxmind.com/app/worldcities
Thanks in advance,
This is a problem which seems to be affecting a number of people, me included. The problem is currently being discusssed at the MaxMind forums. You may find it helpful to look- hopefully it can be resolved soon.
[EDIT] It's been solved! The file WAS compressed twice, as you said. See the link for details.
I found the solution with a_horse help : as he said, the file is twice zipped (tar.gz), but in the wrong way.
So here is the process : gunzip the tar.gz file. You gonna have a worldcitiespop.txt. Rename this file as a tar.gz. Gunzip (force if it's required) this file. You gonna obtain a worldcitiespop.tar file. Rename this file as a txt and here is it!
When you have malformed files of this sort, the first advisable thing is to use a program like file. file looks at the first few bytes of a file for magic numbers which identify the format of the file, ignoring the potentially-misleading extension. Using this tool, you could have determined the filetype, changed the extension to the appropriate one, and continued extracting until you had the plaintext you were after.
I hope you'll pardon the broad answer, especially after you've already found a solution to your specific problem, but for the purposes of future visitors to the site, it is more likely they have the general problem of "unable to open a file which has the wrong extension" than your specific issue.
I have a folder with ~10 000 subfolders.
Can any linux API or tool watch for any change in any folder below e.g. /SharedRoot or do I have to setup inotify for each folder? (i.e. I loose if I want to do this for 10k+ folders). I guess yes, since I've already seen examples of this inefficient method, for instance http://twistedmatrix.com/trac/browser/trunk/twisted/internet/inotify.py?rev=28866#L345
My problem:
I need to keep folders time-sorted with most recently active "project" up top.
When a file changes, each folder above that file should update its last-modified timestamp to match the file. Delays are ok. Opening a file (typically MS Excel) and closing again, its file date can jump up and then down again. For this reason I need to wait until after a file is closed, then queue the folder of that file for checking, and only a while later do I go and look for the newest file in its folder, since the filedate of the triggering file could already be back-dated to its original timestamp by Excel or similar programs. Also in case several files from same folder are used/created, it makes sense to buffer timestamping of that folders' parents to at least get a bunch of updates collapsed into one delayed update.
I'm looking for a linux solution. I have some code that can be run on a windows server, most of the queing functionality is here: http://github.com/sesam/FolderdateFollowsFiles/blob/master/FolderdateFollowsFiles/Follower.vb
Available API:s
The relative of inotify on windows, ReadDirectoryChangesW, can watch a folder and its whole subtree; see bWatchSubtree on http://msdn.microsoft.com/en-us/library/aa365465(VS.85).aspx
Samba?
Patching samba source is a possibility, but perhaps there are already hooks available? Other possibilities, like client side (various windows versions) and spying on file activities in order to update folders recursively?
Yes, you need to use inotify, however you need not consume watches on every node immediately.
The process (similar to how beagle does it) is rather simple:
Establish a watch on the root node.
Do a breadth first (not depth first) search starting at the root node
Establish watches on directories, in the order of the search.
Watch for directory create events, continue adding as they do. Re-sort your list as this happens.
The breadth first search is important, otherwise you might miss some stuff due to a race of when you start and what clients of the root node are doing.
See this question, which also mentions this RFQ. I had the same exact problem that you are facing.
In essence, one thread continues to watch for directory create events, adding new watches on new directories almost at the same time that they are created. Something else sorts the list either on demand, or after the inotify thread releases its lock.
I've attempted lock-free versions of the above, but with .. questionable .. success :)
I saw you are running these trees under a Samba share. Maybe you can use the ClamAV virus scanning VFS module for inspiration to see how they trigger the 'scan on close'.
Samba Howto : Stackable VFS Modules
It should be pretty straightforward to check the time of the closed file and modify the directory path leading to it without any of the performance/memory overhead associated with inotify et al.
Just a thought.
It's a simple problem. Sometimes Windows will just halt everything and throws a BSOD. Game over, please reboot to play another game. Or whatever. Annoying but not extremely serious...
What I want is simple. I want to catch the BSOD when it occurs. Why? Just for some additional crash logging. It's okay that the system goes blue but when it happens, I just want to log some additional information or perform one additional action.
Is this even possible? If so, how? And what would be the limitations?
Btw, I don't want to do anything when the system recovers, I want to catch it while it happens. This to allow me one final action. (For example, flushing a file before the system goes down.)
BSOD happens due to an error in the Windows kernel or more commonly in a faulty device driver (that runs in kernel mode). There is very little you can do about it. If it is a driver problem, you can hope the vendor will fix it.
You can configure Windows to a create memory dump upon BSOD which will help you troubleshoot the problem. You can get a pretty good idea about the faulting driver by loading the dump into WinDbg and using the !analyze command.
Knowing which driver is causing the problem will let you look for a new driver, but if that doesn't fix the problem, there is little you can do about it (unless you're very good with a hex editor).
UPDATE: If you want to debug this while it is happening, you need to debug the kernel. A good place to pick up more info is the book Windows Internals by Mark Russinovich. Also, I believe there's a bit of info in the help file for WinDbg and there must be something in the device driver kit as well (but that is beyond my knowledge).
The data is stored in what's called "Minidumps".
You can then use debugging tools to explore those dumps. The process is documented here http://forums.majorgeeks.com/showthread.php?t=35246
You have two ways to figure out what happened:
The first is to upload the dmp file located under C:\Minidump***.dmp to microsoft service as they describe it : http://answers.microsoft.com/en-us/windows/wiki/windows_10-update/blue-screen-of-death-bsod/1939df35-283f-4830-a4dd-e95ee5d8669d
or use their software debugger WinDbg to read the dmp file
NB: You will find several files, you can tell the difference using the name that contain the event date.
The second way is to note the error code from the blue screen and to make a search about it in Google and Microsoft website.
The first method is more accurate and efficient.
Windows can be configured to create a crash dump on blue screens.
Here's more information:
How to read the small memory dump files that Windows creates for debugging (support.microsoft.com)