Semantic Versioning - version update on configuraion updates - configuration

In our application I can set the configuration (DB host, API keys, and so on) at deploy time, therefore the configuration is independent of my code, hence my application version.
Most of the configuration can be set that way but there is a single third party configuration file that we rarely update and currently can not be set that way.
If I need to update the values for this file, what do I do with my version number?
The semver documentation doesn't seem to cover this particular case as the configuration doesn't really do any of the following:
It doesn't introduce backwards compatible fixes
It doesn't introduce new backwards compatible functionality
It doesn't introduce backwards incompatibility functionality
All this configuration update does is increases the number of workers we currently run.
Thank you

Related

Couchbase Sync Gateway- Server and Client API vs bucket shadowing

I am working on a project that uses Couchbase Server and Sync Gateway to synchronize the contents of a bucket with iOS and Android clients running Couchbase Lite. I also need read and write access to the Couchbase Server from a Node.js server application. From the research I've done, using shadowing is frowned upon (https://github.com/couchbase/sync_gateway/wiki/Bucket-Shadowing), which led me to look into the Sync Gateway API as a means to update the bucket from the Node.js application. Updating existing documents through the Sync Gateway API appears to require the most recent revision ID of the document to be passed in, requiring a separate read before the modification (http://mobile-couchbase.narkive.com/HT2kvBP0/cblite-sync-gateway-couchbase-server), which seems potentially inefficient. What is the best way to solve this problem?
Updating a document (which is really creating a new revision) requires the revision ID. Otherwise Couchbase can't associate the update with a parent. This breaks the whole approach to conflict resolution. (Couchbase uses a method known as multiversion concurrency control.)
The expectation is that you're updating the existing contents of a document. This implies you've read the document already, including the revision ID.
If for some reason you don't need to the old contents to update the document, you still need the revision ID. If you work around it (for example, by purging a document through Sync Gateway and then pushing your new version) you can end up with two versions of document in the system with no connection, which will cause a special kind of conflict.
So the short answer is no, there's no way to avoid this (without causing yourself other headaches).
I am not sure why your question was downvoted, as it seems like a reasonable question. You are correct, the Couchbase bucket that is used by Sync Gateway should probably best be thought of as "opaque", you should not be poking around in there and changing things. There are a number of implementations of Couchbase Lite, such as one for Java, .NET, and Mac OS X. Have you considered making a web service that, on one side, is serving your application, and on the other side is itself a Couchbase Lite client? You should be able to separate your data as necessary using channels.

Git environment setup. Advice needed

Background info:
We are currently 3 web programmers (good, real-life friends, no distrust issues).
Each programmers SSH into the single Linux server, where the code resides, under their own username with sudo powers.
We all use work on the different files at one time. We ask the question "Are you in the file __?" sometimes. We use Vim so we know if the file is opened or not.
Our development code (no production yet) resides in /var/www/
Our remote repo is hosted on bitbucket.
I am *very* new to Git. I used subversion before but I was basically spoon-fed instructions and was told exactly what to type to sync up codes and commit.
I read about half of Scott Chacon's Pro Git and that's the extent to most of my Git knowledge.
In case it matters, we run Ubuntu 11.04, Apache 2.2.17, and Git 1.7.4.1.
So Jan Hudec gave me some advice in the previous question. He told me that a good practice to do the following:
Each developer have their own repo on their local computer.
Let the /var/www/ be the repo on the server. Set the .git folder to permission 770.
That would mean that each developer's computer need to have their own LAMP stack (or at least Apache, PHP, MySQL, and Python installed).
The codes are mostly JavaScript and PHP files so it's not a big deal to clone it over. However how do we locally manage the database?
In this case, we only have two tables and it'll be simple to recreate the entire database locally (at least for testing). But in the future when the database gets too big, then should we just remotely log on the MySQL database on the server or should we just have a "sample" data for developing and testing purposes?
What you're doing is transitioning from "everybody works together in one environment" to "everybody has their own development environment". The major benefit is everybody won't be stepping on each other's feet.
Other benefits include a heterogeneous development environment, that is if everyone is developing on the same machine the software will become dependent on that one setup because developers are lazy. If everyone develops in different environments, even just with slightly different versions of the same stuff, they'll be forced to write more robust code to deal with that.
The main drawback, as you've noticed, is setting up the environment is harder. In particular, making sure the database works.
First, each developer should have their own database. This doesn't mean they all have to have their own database server (though its good for heterogeneous purposes) but they should have their own database instance which they control.
Second, you should have a schema and not just whatever's in the database. It should be in a version controlled file.
Third, setting up a fresh database should be automatic. This lets developers set up a clean database with no hassle.
Fourth, you'll need to get interesting test data into that database. Here's where things get interesting...
You have several routes to do that.
First is to make a dump of an existing database which contains realistic data, sanitized of course. This is easy, and provides realistic data, but it is very brittle. Developers will have to hunt around to find interesting data to do their testing. That data may change in the next dump, breaking their tests. Or it just might not exist at all.
Second is to write "test fixtures". Basically each test populates the database with the test data it needs. This has the benefit of allowing the developer to get precisely the data they want, and know precisely the state the database is in. The drawbacks are that it can be very time consuming, and often the data is too clean. The data will not contain all the gritty real data that can cause real bugs.
Third is to not access the database at all and instead "mock" all the database calls. You trick all the methods which normally query a database into instead returning testing data. This is much like writing test fixtures, and has most of the same drawbacks and benefits, but it's FAR more invasive. It will be difficult to do unless your system has been designed to do it. It also never actually tests if your database calls work.
Finally, you can build up a set of libraries which generate semi-random data for you. I call this "The Sims Technique" after the video game where you create fake families, torture them and then throw them away. For example, lets say you have User object who needs a name, an age, a Payment object and a Session object. To test a User you might want users with different names, ages, ability to pay and login status. To control all that you need to generate test data for names, ages, Payments and Sessions. So you write a function to generate names and one to generate ages. These can be as simple as picking randomly from a list. Then you write one to make you a Payment object and one a Session object. By default, all the attributes will be random, but valid... unless you specify otherwise. For example...
# Generate a random login session, but guarantee that it's logged in.
session = Session.sim( logged_in = true )
Then you can use this to put together an interesting User.
# A user who is logged in but has an invalid Visa card
# Their name and age will be random but valid
user = User.sim(
session = Session.sim( logged_in = true ),
payment = Payment.sim( invalid = true, type = "Visa" ),
);
This has all the advantages of test fixtures, but since some of the data is unpredictable it has some of the advantages of real data. Adding "interesting" data to your default sim and rand functions will have wide ranging repercussions. For example, adding a Unicode name to random_name will likely discover all sorts of interesting bugs! It unfortunately is expensive and time consuming to build up.
There you have it. Unfortunately there's no easy answer to the database problem, but I implore you to not simply copy the production database as it's a losing proposition in the long run. You'll likely do a hybrid of all the choices: copying, fixtures, mocking, semi-random data.
A few options, in order of increasing complexity:
You all connect to the live master DB, read/write permissions. This is risky, but I guess you're already doing it. Make sure you have backups!
Use test fixtures to populate a local test DB and just use it. Not sure what tools there are for this in the PHP world.
Copy (mysqldump) the master database and import it into your local machines' MySQL instances, then set up your dev environments to connect to your local MySQL. Repeat the dump/import as necessary
Set up one-way replication from the master to your local instances.
Optionally, set up a read-only user on the main DB, and configure your app to let you switch to a read-only connection to the real master DB in case you can't wait for that next copy of the master data.
Own repo does not mean own Staging server (this config is hardly maintained and extremely bad scaled to 10-20-100 developers)
It's always better to have as soon as possible (semi-)automated build-system, which convert repository-stored source-data to live system (less handwork - less changes to make non-code errors) and (maybe) some type of Continuos Integration (test often, find bugs fast). For build-system (DB-part) you have only to prepare initial data (tables structures, data-dumps) as (versioned) texts, which are
easy mergeable between merges
handled and processed and converted to final usable object by code, not by hand - no human errors, no operation's interferences

Database update outside application

am I correct assuming that if a different process updates the DB then my NHibernate powered application will be out-of-sync? I'm almost using non-lazy update.
My target DB is mysql 5.0, if it makes any difference.
There isn't a simple way to answer that without more context.
What type of application are you thinking about (web, desktop, other)?
What do you think would be out of sync exactly?
If you have a desktop application with an open window with an open session that has data loaded and you change the same entities somewhere else, of course the DB will be out of sync, but you can use Refresh to update those entities.
If you use NH second-level caching and you modify the cached entities somewhere else, the cache contents will be out of sync, but you can still use Refresh or cache-controlling methods to update directly from the DB.
In all cases, NH provides support for optimistic concurrency by using Version properties; those prevent modifications to out-of-sync entities.
Yes, the objects in your current session will be out of sync, the same way a DataSet/DataTable would be out of sync if you fetch it and another process updates the same data.

How to Manage a dataset together with an application?

The application's code and configuration files are maintained in a code repository. But sometimes, as a part of the project, I also have a some data (which in some cases can be >100MB, >1GB or so), which is stored in a database. Git does a nice job in handling the code and its changes, but how can the development team easily share the data?
It doesn't really fit in the code version control system, as it is mostly large binary files, and would make pulling updates a nightmare. But it does have to be synchronised with the repository, because some code revisions change the schema (ie migrations).
How do you handle such situations?
We have the data and schema stored in xml and use liquibase to handle the updates to both the schema and the data. The advantage here is that you can diff the files to see what's going on, it plays nicely with any VCS and you can automate it.
Due to the size of your database this would mean a sizable "version 0" file. But, using the migration strategy, after that the updates should be manageable as they would only be deltas. You might be able to convert your existing migrations one-to-one to liquibase as well which might be nicer than a big-bang approach.
You can also leverage #belisarius' strategy if your deltas are very large so each developer doesn't have to apply the delta individually.
It seems to me that your database has a lot of parallels with a binary library dependency: it's large (well, much larger than a reasonable code library!), binary, and has its own versions which must correspond to various versions of your codebase.
With this in mind, why not integrate a dependency manager (e.g. Apache Ivy) with your build process and let it manage your database? This seems like just the sort of task that a dependency manager was built for.
Regarding the sheer size of the data/download, I don't think there's any magic bullet (short of some serious document pre-loading infrastructure) unless you can serialize the data into a delta-able format (the XML/JSON/SQL you mentioned).
A second approach (maybe not so compatible with dependency management): If the specifics of your code allow it, you could keep a second file that is a manual diff that can take a base (version 0) database and bring it up to version X. Every developer will need to keep a clean version 0. A pull (of a version with a changed DB) will consist of: pull diff file, copy version 0 to working database, apply diff file. Note that applying the diff file might take a while for a sizable DB, so you may not be saving as much time over the straight download as it first seems.
We usually use the database sync or replication schema.
Each developer has 2 copies of the database, one for working and the other just for keeping the sync version.
When the code is synchronized, the script syncs the database too (the central DB against the "dead" developer's copy). After that each developer updates his own working copy. Sometimes a developer needs to keep some of his/her data, so these second updates are not always driven by the standard script.
It is as robust as the replication schema .... sometimes (depending on the DB) that doesn't represent good news.
DataGrove is a new product that gives you version control for databases. We allow you to store the entire database (schema and data), tag, restore and share the database at any point in time.
This sounds like what you are looking for.
We're currently working on features to allow git-like (push-pull) behaviors so developers can share their repositories across machines, so I can load the latest version of your database when I need it.

Tools and Methods [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
What tools and procedures would you recommend or use yourself to help streamline the following sceanario: (I know its a long one but any help is appreciated)
I work in a team that have an ecommerce app that we develop at our company. Its a reasonably standard LAMP application that we have been developing on and off for about 3 years. We develop the application on a testing domain, here we add new features and fix bugs etc. Our bug tracking and feature development is all managed within a hosted subversion solution (unfuddle.com). As bugs are reported we make these fixes on the testing domain and then commit changes to svn when we are happy the bug has been fixed. We follow this same procedure with the addition of new features.
It is worth pointing out there the general architecture of our system and application across our servers. Each time a new feature is developed we roll this update out to all sites using our application (always a server we control). Each site using our system essentially uses exactly the same files for 95% of the codebase. We have a couple of folders within each site which contain files bespoke to that site - css files / images etc. Other than that the differences between each site are defined by various configuration settings within each sites database.
This gets on to the actual deployment itself. As and when we are ready to roll out an update of some kind we run a command on the server that the testing site is on. This performs a copy command (cp -fru /testsite/ /othersite/) and goes through each vhost force updating the files based on modified date. Each additional server that we host on has a vhost that we rsync the production codebase to and we then repeat the copy procedure on all sites on that server. During this process we move out the files we dont want to be overwritten, moving them back when the copy has completed. Our rollout script performs a number of other function such as applying SQL commands to alter each database, adding fields / new tables etc.
We have become increasingly concerned that our process is not stable enough, not fault-tolerant and is also a bit of a brute-force method. We're also aware we are not making best use of subversion as we have a position where working on a new feature would prevent us from rolling out an important bug fix as we are not making use of branches or tags. It also seems wrong that we have so much replication of files across our servers. We're also not able to easily perform a rollback on what we have just rolled out. We do perform a diff before each rollout so we can get a list of files that will be changed so we know what has been changed after but the process to rollback would still be problematic. In terms of the database i've started looking into dbdeploy as a potential solution. What we really want though is some general guidance about how we can improve our file management and deployment. Ideally we want the file management to be more closely linked to our repository so a rollout / rollback would be more connected to svn. Something like using the export command to make sure the site files are the same as the repo files. It would also be good though if the solution maybe would also stop the file replication around our servers.
Ignoring our current methods it would be really good to hear how other people approach the same problem.
to summarise...
What is the best way for making files across multiple servers stay in sync with svn?
How should we prevent file replication? symlinks / something else?
How should we structure our repo so we can dev new features and fix old ones?
How should we trigger rollouts/rollbacks?
Thanks in advance.
For rollback and testing out new features, the standard subversion concepts of branches and tags should be sufficient:
always create a tag before rollout, and roll out that tag. Rollback would then mean to return to the previous tag.
develop new features in branches and merge to the trunk when completed; alternatively, develop new features in trunk, and have a maintenance branch that receives only bug fixes.
keep the per-site files in separate directories in subversion, and use a configuration file on each site, or a symbolic link, to have sites refer to their specific files.
To reduce file duplication, I recommend to use NFS (in particular when all sites are virtual machines on the same host - make the host the NFS server, and the sites NFS clients; alternatively, make a dedicated VM the NFS server). To deploy an update, only install the new files on the NFS server; the clients will pick up changes automatically.
If you need a multi-step update (e.g. first update the databases in each client, then update the code), you should still use NFS, but add symlinks to that. Check out the new code into a separate directory on the NFS server, then go to all VMs, update the databases, and change the symlink in the VM to point to the new code. When done, remove the old code on the NFS server.
you may want to look at this article which covers deployment of PHP apps.
http://blog.digitalstruct.com/2009/10/07/deployments-php-applications/
Specifically it mentions a few tools which might help:
Phing
Ant
Liquibase
DbDeploy
I have also heard a few people mentioning using capistrano so you might want to look at that too.
EDIT:
from looking at this poll http://twtpoll.com/3zwfox it seems that SVN export is a common method in the community for deploying php apps. This poll seems to have been used in this slideshare presentation http://www.slideshare.net/ccornutt/taming-the-deployment-beast