Can any linux API or tool watch for any change in any folder below e.g. /SharedRoot or do I have to setup e.g. inotify for each folder? - samba

I have a folder with ~10 000 subfolders.
Can any linux API or tool watch for any change in any folder below e.g. /SharedRoot or do I have to setup inotify for each folder? (i.e. I loose if I want to do this for 10k+ folders). I guess yes, since I've already seen examples of this inefficient method, for instance http://twistedmatrix.com/trac/browser/trunk/twisted/internet/inotify.py?rev=28866#L345
My problem:
I need to keep folders time-sorted with most recently active "project" up top.
When a file changes, each folder above that file should update its last-modified timestamp to match the file. Delays are ok. Opening a file (typically MS Excel) and closing again, its file date can jump up and then down again. For this reason I need to wait until after a file is closed, then queue the folder of that file for checking, and only a while later do I go and look for the newest file in its folder, since the filedate of the triggering file could already be back-dated to its original timestamp by Excel or similar programs. Also in case several files from same folder are used/created, it makes sense to buffer timestamping of that folders' parents to at least get a bunch of updates collapsed into one delayed update.
I'm looking for a linux solution. I have some code that can be run on a windows server, most of the queing functionality is here: http://github.com/sesam/FolderdateFollowsFiles/blob/master/FolderdateFollowsFiles/Follower.vb
Available API:s
The relative of inotify on windows, ReadDirectoryChangesW, can watch a folder and its whole subtree; see bWatchSubtree on http://msdn.microsoft.com/en-us/library/aa365465(VS.85).aspx
Samba?
Patching samba source is a possibility, but perhaps there are already hooks available? Other possibilities, like client side (various windows versions) and spying on file activities in order to update folders recursively?

Yes, you need to use inotify, however you need not consume watches on every node immediately.
The process (similar to how beagle does it) is rather simple:
Establish a watch on the root node.
Do a breadth first (not depth first) search starting at the root node
Establish watches on directories, in the order of the search.
Watch for directory create events, continue adding as they do. Re-sort your list as this happens.
The breadth first search is important, otherwise you might miss some stuff due to a race of when you start and what clients of the root node are doing.
See this question, which also mentions this RFQ. I had the same exact problem that you are facing.
In essence, one thread continues to watch for directory create events, adding new watches on new directories almost at the same time that they are created. Something else sorts the list either on demand, or after the inotify thread releases its lock.
I've attempted lock-free versions of the above, but with .. questionable .. success :)

I saw you are running these trees under a Samba share. Maybe you can use the ClamAV virus scanning VFS module for inspiration to see how they trigger the 'scan on close'.
Samba Howto : Stackable VFS Modules
It should be pretty straightforward to check the time of the closed file and modify the directory path leading to it without any of the performance/memory overhead associated with inotify et al.
Just a thought.

Related

PhpStorm language injections and deployment configuration

I am using PhpStorm for few months now and I have just noticed something really weird about language injections in the version 9.0.
Sometimes I have to declare that some strings in my PHP are Javascript instructions. When I do so and save my file (with auto-upload on), it looks like PhpStorm is doing a lot of remote checks, file moves and transfers, I dont really understand why... and I'm afraid that it may overwrite files that I didn't modifie. I'm working directly on a production server with other people, I know it's dangerous but we have no choice for the moment.
In the file transfer logs, I have something like that :
[18/09/2015 10:47] Automatic upload completed in less than a minute: 2 items deleted, 50 items moved, 4 files transferred (4 Kb/s)
Can someone help understand what is going on ?
I have found a way to do what I want, but didn't find the reason of theses uploads that PhpStorm does without asking anything...
The problem is that, until now, I didn't found a way to save files one by one. It looks like PhpStorm has only a "Save all" option that uploads every files changed since last save (if you ask for auto-upload). And in the case of a language injection PhpStorm seems to change a something in the opened files that forces it re-upload them all.
So I disabled auto-upload and bound a shortcut to "Upload to default server". This option uploads only your current file but it saves it before. So it's a kind of auto-upload but a little less agressive and it gives me the possibility to just save my files (with "save all") or to save only the current one and upload it instantly.
This is the way I used to work before using PhpStorm, I find it more convenient and less violent than this automatic upload process that Phpstorm uses.
If someones find something better I'm opened to any advice.

How to organize code so that we can move and update it without having to edit the location of the configuration file?

The issue that I consider is how to write code that can easily know the location of a required config file and yet is portable, without any edit, from an environment to another. We don't want to edit the location of the configuration file to adapt the code to each new environment, say each time we move the code from a development environment to production. The method should not rely on resources that are not universally available, such as an access to user-defined environment variables or an access to a specific directory. For example, it may seem that using the DOCUMENT_ROOT as a base location for the config file is the way to go, but that is not universal. First, in a command line environment the DOCUMENT_ROOT makes no sense. Second, a programmer might be given access to a sub-folder of the DOCUMENT_ROOT only. Another requirement is that the configuration file could depend on values known at run time, say the user who call the application, as in this question How to load a config file based on user selection from "unknown" location .
The question is not what is the best location of the configuration file in specific environments, such as Location to put user configuration files in windows . The programmers would still have to figure out the best location so that end users could easily find the configuration file. The question is how this location, whatever it is, even if it depends on values known at run time, can be passed to the code in a portable manner.
One approach is to design any script file with in mind that it is to be included in another file and so on until we get to a wrapper script that only defines the directory of the config file to the benefit of the included file and other included files therein. Once this directory path is known, other configuration values can be obtained from a named configuration file within it. This works because the wrapper scripts are not updated when we update the code from a repository or testing environment. This approach seems universally applicable : no special support of any kind such as an access to user defined environment variables or to some specific directory in the server is needed. As long as you have access to the code, which is a strict minimum to expect, it works. Also, scripts are often naturally designed to be included in another file - so it is natural.
The approach only requires that we agree on a convention for the name of the constant, say CONFIG_DIRECTORY. If every programmer would agree to search at the location specified by this constant for the config file, then any user of the code could put the config file anywhere and just define this constant accordingly.
In Linux, they have the folder /etc for config files. So, the notion of an universally agreed standard in a very large context is already there. This is the same idea than the one proposed here, except that it is the same constant for all machines and someone might not have access to that level of the server. Moreover, we lose the possibility to have different configuration directories for different wrapper scripts. Allowing the universal standard to be a constant name, say 'CONFIG_DIRECTORY', instead of being the fixed constant '/etc', seems just an extra flexibility with no additional inconvenient. It does require that we define this constant in some wrapper script, but we could fall back to the old approach if it is not defined. The outcome, if the approach is strictly applied, would be that all the scripts required in the server document root would only be simple wrappers that define a configuration directory. That seems cool. Often people say that it is safer to have important code outside the document root.

What is an efficient way for logging in an existing system

I have the following in my system:
4 File folders
5 Applications that do some processing on files in the folders and then move files to the next folder (processing: read files, update db..)
The process is defined by Stages: 1,2,3,4,5.
As the files are moved along, the Stage field within them is updated to the next Stage.
Sometimes there are exceptions in the system, not necessarily exception in code but exception in the process.
For instance, there is an error in transmitting the file to the next folder. In this case the stage is not updated and an record is written in the DB for this file.
What I want to do, what is the best approach?
I want to plug a utility of some sort or add code to the applications that will capture any exceptions in the process. Like if a file was not moved, I want to know what stage and why. This will help in figuring out the break down in the process.
I need something that will provide the overall health of the process.
Now sure how to go about doing this from an architectural point of view.
The scheduler? Well that might knock the idea out anyway.
Exit code is still up and running from dos days.
it's a property of the Application Class (0 the default) is success
So from your app you'd detect an error and set ApplicationExitCode to some meaning number like 1703 (boo hoo)
Application.ShutDown(1703);// is the .net4 way
However seeing as presumably the scheduler is just running the app, you'd have to script it all up. Might as well just write a common logging dll and add it to each app as mess about with that, especially if you want the same behaviour if it's run from outside the scheduler.
Another option would be delegating. ie you write an app that runs the app (passed in as a command line parameter) and logs the result (via exit code for instance) and then change scheduler items to call that with the requisite parameter.

Autoupdate ala Google Chrome workflow

In the company I am I was asked to write an autoupdate function a la chrome. I.e. It should check periodically whether a new version is available, download the new version and apply it silently the next time the application starts.
I already have something up and running but it is more like a dirty hack than something I feel happy about it. So, I would like to know how to design and implement such a solution. My horrible hack works as this:
Have a mechanism to check whether a new version exists (a database query or a web service)
Download a full zip with the whole new version.
Check file signature. If everything went alright, set a registry value: must update to true.
When the application restarts, if the must update value is true, launch an update program and exist.
The update deletes the contents of the application folder, unzips the update and replaces the old contents, launches the application and exits.
Now, I would like to change it, so it works cleaner. I am planning to send the update as a bsdiff file. It gets downloaded. But the question is, what happens next?
When do apply the update?
Who is in charge of applying the patch? is it the program itself or is it a third program, as I did, which is in charge of applying the patch and relaunch the application?
If your going down the C++ route you can go to chromium and download the Chrome source code and dig around to see how the update is done, this might give you a better idea on how to approach it. Here's an article that might help.
If your familiar with .NET the recently release nuget also has an auto update feature that might be useful to look at, you can get the source code from here. David Ebbo has a blog about how its done here.
I'm not up to date on Delphi but you might be able to use either of the above options.
The workflow you proposed is more or less like it should work, but there's no need to re-invent the wheel - there are plenty libraries out there that will do this for you. Using a 3rd party library has the benefit of keeping your code cleaner while making sure the dirty process of auto-update is contained and working flawlessly.
Trust me, I know. I'm the author of NAppUpdate, an app update framework for .NET (which you might want to try out or learn from).
So, after giving it a lot of though, this is what I came with (for active directory I will refer to the directory where the main program lies, active program is the main program and update program is the one that replaces the active program and its resource files):
The active program checks if there is a new version every certain amount of time. If so, download it
Prepare new version in a separate folder (this can be done by copying the contents of the directory with the program to a subdirectory and applying a binary patch, or simply unziping the new version).
Set a flag that indicates that a new version is ready.
When a program is exiting (and one has to control for different interrupts here):
The active program checks the new version ready flag. Launch the update program and exit.
The update program checks if it can write in the active directory. If so, replaces the contents with the prepared version.
The update program has to recheck links and update them accordingly.
So guys, if you have a better workflow, please tell me.
You could literally use the Google Chrome update workflow by using the Google Chrome updater:
http://code.google.com/p/omaha/
They open sourced it Feb 2009.

How can multiple developers use the same vcproj files?

I'm working on a project with two other developers that's built on FireBreath. So far, I've been able to get things working perfectly on my machine, but we need to coordinate our development via Mercurial. So I pushed my files to the repository and thought all was well.
Unfortunately, that doesn't work.
The various .vcproj files that make up the solution all contain hard-coded references to my local file system. This works fine for me, because I'm not moving the project around. But when you try to build the solution on another machine with a different file structure (different drive letter, different folder location, etc.) everything breaks.
I used FireBreath's standard project generation script (Python) and then the Visual Studio CMake script (prep2008.cmd) to generate the solution files. What can I do to tweak things so that other developers can use the same code base?
If your developers are not using the same build/make/project files, this could quickly become a maintenance nightmare. So you should definitively all use the same .vcproj files. (An exception to this would be if the project files were generated from some other files. In that case treat those other files in the way described above.)
there's two ways to deal with the problem of differing setups on different machines. One is to make all paths relative to the project's path. The other is to use environment variables to refer to files/tools/libraries/whatever. IME it's best to use relative paths for everything that can be checked out with the project, and use environment variables for the rest. Add a script that checks for the existence of all necessary environment variable, pointing out the meaning of any missing ones, and run this as a build prerequisite, so whoever tries to get a new build machine up and running gets hints at what to do.
To make sure that everyone caught the updated comments from sbi's answer, let me give you the "definitive" answer from the FireBreath devs.
Your build directory is disposable; you should never share .vcproj files. Instead, you should regenerate your build/ directory any time you change the project and on each new computer, just like any project that uses CMake.
For more information, see http://colonelpanic.net/2010/11/firebreath-tips-working-with-source-control/
For reference, I am the primary author of FireBreath and I wrote the article.
I'm not familiar with FireBreath, but you need to make the references relative, and then recreate that relative structure on every machine. That is, if your project sits in "c:\myprojects\thisproject" and has an additional include directory "c:\mydir\mylib\include", then the latter path needs to be replaced with "....\mydir\mylib\include".
EDIT: I rewrote my anyswer to make it clearer. When I got you correctly, your problem is that FireBreath generates those .vcproj files with absolute paths in it, and you want to use this .vcproj files on a different developer machine.
I see 3 options:
Live with it. That means, make sure, every team member has the same file structure / view to the file system, tools installed in the same place.
Ask the authors of FireBreath to change their .vcproj generator to allow relative paths, use of environment variables etc.
If 1 or 2 does not work, write a program or script for changing the absolute path to relatives in those .vcproj files. Run this script whenever you have to regenerate your FireBreath project.
What you should not do due to the FireBreath FAQ: don't change the .vcproj manually, those changes will be lost next time the project is regenerated.
EDIT: seems that "option 4." turned out to be the best solution: generating those .vcproj files for each developer individually. Hope my suggestions were helpful, either.