Many binary files synchronization - mercurial

I have about 100 000 files on office server (images, pdf's, etc...)
Each day files count grows about 100-500 items, and about 20-50 old files changes.
What is the best way to synchronize Web-server with these files?
Can any system like Mercurial, GIT help?
(On office server, I'll commit changes, and web-server periodically do updates)?
Second problem is, that on Web-server I have user-generated-content (binary-files) (other files).
Each day this users upload about 1000-2000 new files. Old files don't change.
And I need to backup these files to local machine.
Can any system like Merurial, GIT help in this situation?
(On web-server I'll commit these files by cron, and on local machine I'll do updates)
Thanks
UPD.
Office server is Windows Server 2008 R2
Web-server is Debian 5 lenny

The simplest and most reliable mechanism (in my experience) is rsync.
On Windows, however, rsync over ssh is badly broken due to issues with how Cygwin interacts with named pipes. Rsync over its own protocol works (as long as you don't care about encryption), but I've had lots of problems getting rsync to stay up as a Windows service for more than a few days at a time. DeltaCopy is a Windows app that uses the rsync tools behind the scenes; it seems to work very well, though I haven't tried the ssh option.

A DVCS is not a good solution in this case: it will keep the all history, which you don't always need, and will make any clone a massive operation.
An artifact repository like Nexus is much more adapted if you need some kind of versioning with integrity check associated with your binaries.
Otherwise (no versioning), a simple rsync like Marcelo proposes is enough.

Related

Using versioning on a VM with several users

We are looking for a way to use GitHub on an internal system that we are developing at work. We have developed it in PHP and MySQL, with a fair bit of jQuery/Ajax, on a Windows Server VM running IIS. Other staff can access the frontend over the network using the IP address.
There are currently three people working on it and at the moment we directly edit the file on the VM as we need it to still communicate with the database to check our changes have worked. There is no option to install anything like WAMP on our individual machines and there are the usual group policy restrictions so the only access we have to a database is via the VM. We have been working with copies of files/folders and the database but there is always the risk that then merging these would be a massive task.
I do use GitHub (mainly desktop but I can just about get by with using the command line as long as I have a list of the command in front of me) at home to sync between my PC and Laptop, via GitHub.com and believe that the issues we get with several people needing to update the same file would be eradicated by using it here at work.
However, there are some queries we need to ensure we have straight in our heads before putting forward a request.
Is what we are asking for viable? Can several branches on the same server be worked on at the same time or would this only work on an individual machine.
Given that our network is fairly restricted, is there any way that we can work on the files on our machine and connect to a VM hosted database? I believe that an IDE will allow us to run php files on a standard machine (although a request for Eclipse is now around 6 weeks old and there is still no confirmation that we will get it any time soon) but will this also allow .
The stuff we do is not overly sensitive but the company would certainly not want what we do out there in a public repository (and also would not be likely to pay for a premium GitHub account) so we would need to branch/pull/merge directly from our machines to the VM.
Does anyone have any advice/suggestions/solutions to this? Although GitHub would be a preferred option as I already use it, we are open to any suggestion that will allow three people, on different machines, simultaneously work on a central system while ensuring that we do not overwrite or affect each others stuff.
Setting up a git repo on Windows is not trivial and may require a fair bit of work. You can try using SVN it is fairly straight forward to install on windows and has a better learning curve than Git. I am not saying SVN is better/worse as compared to Git, it's much better suited to your needs. We have a similar setup and we use Tortoise SVN https://subversion.apache.org/ as a client. SVN also has branches and stuff.
SVN for server side repository https://subversion.apache.org/
If you would still prefer Git on windows, check this out - https://www.linkedin.com/pulse/step-guide-setup-secure-git-remote-repository-windows-nivedan-bamal
1) It is possible to work on many branches and then merge them into a single branch. That's the preferred Git development way. You can do the same on SVN.

PHPStorm cache on downloaded files?

So I've used PHPStorm before, and have been asked to evaluate it (along with some other coworkers) as I already had my own private license, for how effective it would be with my current company. Although I'm hitting a bit of a snag that I really dont think should be a show stopper.
Anyway, the way my company has its development environments setup now is a bit odd. We check everything into subversion, into different directories than what it will end up on the clients system because we save them to debian packages. This makes working with the files directly from subversion difficult, as PHPstorm has no idea where related files are located.
However, because of this, our files on our development virtual machines are not directly under subversion. Instead, we patch up our virtual machines by installing the updated packages when needed.
This makes life difficult for an IDE, which wants to keep a local copy of the files on your system. The best way I can figure out how to do this, is to run a synchronize between the remote server and local server (going by timestamp and size should be fine, and completes in less than a minute). It would be fine to tell developers "after you patch, make sure you sync with phpstorm".
However, the problem I'm having is, if I modify a file on the remote system, sync (and it says it downloaded) it takes several minutes after opening the file for the remote changes to be seen in phpstorm
I have no idea why this would be, and could potentially lead to really bad results if someone makes a few quick changes, saves, and overwrites the needed files.
I'm currently running phpstorm on Ubuntu 14.04 64-bit
Any help would be appreciated

Is system-wide Mercurial installation enough in a shared enviroment?

I am learning how I can install Mercurial on our team system, but I am not experienced enough to make some decision.
For our team, we have a server machine used as a repository. Every team member also has her/his own Linux RedHat installed machine. However, we do not do anything on our local terminals and we do everything on the server. Every member has a user directory on the server such as /home/Cassie, /home/john, ... and we save all our code and work there. When we turn on the local terminals, the GNOME system shows our personal files on the server not the local machine. Whenever everyone click the terminal application on desktop, it connects to her own home directory. Thus, we do not need to use SSH command to connect to the server. It is like the school multi-users system. Everyone has a user account and she logs into her own account to do her own work. I hope I can install a shared repository on that server and every one can do push, pull, etc. all kind of commands there.
1) Since we use a shared environment, does it mean that I need to install Mercurial on only the server and that is enough for everyone to do "commit", "push", "pull", etc. commands?
2) By installing only system-wide Mercurial, does it eliminate the ability to do local commit? If I would like to let everyone still have the "local commit" ability, how should I do it?
3) I have searched online. Some people mentioned that for a shared network server, it is impossible to have locks for any two users if they are trying to access the same file at the same time. Does it imply my situation?
In sum, we do all the work on the server. I hope to find a plan to have Mercurial control on a repository shared by everyone when everyone still has local commit ability and the repository still has some locks protection if any two users try to access a file at the same time. If this scenario is feasible, can I just install the Mercurial on the server or I need to install Mercurial for both servers and users machines? If it is impossible for the scenario, would someone please suggest me a plan to have version control for our system?
1) Since we use a shared environment, does it mean that I just need to install the Mercurial on the server and it is enough for everyone to do "commit","push","pull"..etc commands ?
If your users are logging into a shell on the server in order to do their work, then yes it is sufficient to have Mercurial installed only on the server.
2) By installing only system-wide Mercurial, does it eliminate the ability to do local commit ? If I would like to let everyone still have the "local commit" ability, how should I do it ?
Your users will presumably checkout from a shared "root" repository into their own home directory in order to work on the code. They will have a "local" copy of the repo in their home directory and will push into the shared root repository.
3) I have searched online. Some people mentioned that for a shared network server, it is impossible to have locks for any two users if they are trying to access the same file at the same time. Does it imply my situation ?
As long as your users are working within their own local copies of the repo, they will not interfere with one another. The only time a conflict may arise is when committing back to the shared root repository -- in which case the user will need to merge their changes and resolve any conflicts.
I would recommend reading carefully through Joel Spolsky's excellent Hg Init tutorial for a better understanding of how Mercurial handles "central" and "local" copies.

Where do you keep the configuration files for your stack?

For the website(s) I am a developer for we have a number of different technologies which make up our stack, each with a different set of configurations etc.
This is a Rails stack, so we're running things including:
Nginx w/ Passenger
Varnish
Redis
Memcached
MySQL
MongoDB
As we're continually tweaking our configs and changing them to support our continually changing system, and if we were to 'lose' the configurations (e.g. due to a server crash or otherwise) it would be a huge pain to rebuild from memory.
Given that version control would be extremely useful I can quite easily add these files into a Git repo or similar and store them in the cloud somewhere, but what about application-specific configuration (for example, URL Rewrite config for a website on a shared server)? Should these be in this same repo as well?
Put website specific stuff in the Git repo of that website, and system-wide stuff in a "systems" git repo.
If you are not currently using Source Control (of any kind) in your development environment, stop whatever you are doing and sort that out right now. That is the most important aspect of your setup.
At a very minimum you should keep EVERYTHING that is a text file and relates to your app (yes all config files, URL rewrites).
Others suggest you can put binary files also, but at the very minimum all source code, all config etc should be in source control.
By the end of the day :)

Mercurial local repository backup

I'm a big fan of backing things up. I keep my important school essays and such in a folder of my Dropbox. I make sure that all of my photos are duplicated to an external drive. I have a home server where I keep important files mirrored across two drives inside the server (like a software RAID 1).
So for my code, I have always used Subversion to back it up. I keep the trunk folder with a stable copy of my application, but then I create a branch named with my username, and inside there is my working copy. I make very few changes between commits to that branch, with the understanding that the code in there is my backup.
Now I'm looking into Mercurial, and I must admit I haven't truly used it yet so I may have this all wrong. But it seems to me that you have a server-side repository, and then you clone it to a working directory in the form of a local repository. Then as you work on something, you make commits to that local repository, and when things are in a state to be shared with others, you hg push to the parent repository on the server.
Between pushes of stable, tested, bug-free code, where is the backup?
After doing some thinking, I've come to the conclusion that it is not meant for backup purposes and it assumes you've handled that on your own. I guess I need to keep my Mercurial local repositories in my dropbox or some other backed-up location, since my in-progress code is not pushed to the server.
Is this pretty much it, or have I missed something? If you use Mercurial, how do you backup your local repositories? If you had turned on your computer this morning and your hard drive went up in flames (or, more likely, the read head went bad, or the OS corrupted itself, ...), what would be lost? If you spent the past week developing a module, writing test cases for it, documenting and commenting it, and then a virus wipes your local repository away, isn't that the only copy?
So then on the flip side, do you create a remote repository for every local repository and push to it all the time?
How do you find a balance? How do you ensure your code is backed up? Where is the line between using Mercurial as backup, and using a local filesystem backup utility to keep your local repositories safe?
It's ok thinking of Subversion as a 'backup', but it's only really doing that by virtue of being on a separate machine, which isn't really intrinsic to Subversion. If your Subversion server was the same machine as your development machine - not uncommon in the Linux world - you're not really backed up in the sense of having protection from hardware failure, theft, fire, etc. And in fact, there is some data in that case that is not backed up at all - your current code may exist in two places but everything else in the repository (eg. the revision history) only exists in one place, on the remote server.
It's exactly the same for Mercurial except that you've taken away the need for a separate server and thus made it so that you have to explicitly think about backing up rather than it being a side-effect of needing to have a server somewhere. You can definitely set up another Mercurial repository somewhere and push your changes to that periodically and consider that your backup. Alternatively, simply backup your local repository in the same way that you'd back up any other important directory. With you having a full copy of the repository locally, including all revision history and other meta data, this is arguably even more convenient and safe than the way you currently do it with Subversion.
The "hidden" .hg directory stores all of the local commits. You can back up this directory using a standard backup program.
The changes get to the remote directory only when you push. Commits stay local, but you get them if you clone your repository. Then, yes, if you want your things to get to the server repository you have to push to it "all the time".
On the other hand, nothing stops you to have several machines and push content from one to another. Every mercurial repository can turn itself into a server in a matter of seconds typing "hg serve".
I'm not sure it really answer to you question, but I too am a big fan of backup and manage things this way with many clones of my repository (I also use massively mq to work in patch mode but that's another story).
PS: as a sidenote, I'm considering to use mercurial as a tool for filesystem backup. The only thing that bother me is that for this purpose I would prefer to disable the diff feature and treat all files as binary, but that should be easy.