How do I create a safe local development environment? - mysql

I'm currently doing web development with another developer on a centralized development server. In the past this has worked alright, as we have two separate projects we are working on and rarely conflict. Now, however, we are adding a third (possible) developer into the mix. This is clearly going to create problems with other developers changes affecting my work and vice versa. To solve this problem, I'm thinking the best solution would be to create a virtual machine to distribute between the developers for local use. The problem I have is when it comes to the database.
Given that we all develop on laptops, simply keeping a local copy of the live data is plain stupid.
I've considered sanitizing the data, but I can't really figure out how to replace the real data, with data that would be representative of what people actually enter with out repeating the same information over and over again, e.g. everyone's address becomes 123 Testing Lane, Test Town, WA, 99999 or something. Is this really something to be concerned about? Are there tools to help with this sort of thing? I'm using MySQL. Ideally, if I sanitized the db it should be done from a script that I can run regularly. If I do this I'd also need a way to reduce the size of the db itself. (I figure I could select all the records created after x and whack them and all the records in corresponding tables out so that isn't really a big deal.)
The second solution I've thought of is to encrypt the hard drive of the vm, but I'm unsure of how practical this is in terms of speed and also in the event of a lost/stolen laptop. If I do this, should the vm hard drive file itself be encrypted or should it be encrypted in the vm? (I'm assuming the latter as it would be portable and doesn't require the devs to have any sort of encryption capability on their OS of choice.)
The third is to create a copy of the database for each developer on our development server that they are then responsible to keep the schema in sync with the canonical db by means of migration scripts or what have you. This solution seems to be the simplest but doesn't really scale as more developers are added.
How do you deal with this problem?

Use fake data -- invest in a data generator if you must, but please don't use real data in a development environment, especially if it's possible that access to it may be compromised. I'm more familiar with tools for MS SQL, but googling for "MySQL data generator" brought up EMS SqlManager and Datanamic.

As tvanfosson mentioned, use fake data instead of live. Doing so will not only keep the live data safe but also allow you to test different scenarios, such as international names and such.
As for how to distribute your DB, your schema and creation scripts really should be in source control, so each developer can create a local copy of the database as they see fit.

You could set up a fixtures (seed data) system. You provide the data once and it gets put into the db as many times as you need. That could be held in source control so that the fixtures are used/updated by all users.
I think that auto-generators are usually a bad idea. It is hard for them to generate information that could be real. Fixtures would allow you to make this information and know that it is what you are looking for. You could also push the bounds of your validators by using fixtures.
It may take a bit of time to set up the first time around, but I think you will get a much higher quality of data that is put in for testing.
Regards,
Justin

Related

How to protect mysql database from anyone

I have launched my project to a hosting company. But I am worried about how to protect my mysql database from the hosting company.
My question is how can I protect my database from the hosting company so they can't access my database / data.
Here's a relevant rule of IT security:
"If a bad guy has unrestricted physical access to your computer, it's not your computer anymore."
http://technet.microsoft.com/en-us/library/cc722487.aspx
If you don't trust your hosting company, it's time to get a new one. There's little you can do to prevent someone with physical access to a server from getting at what's on it.
I think that you will just have to trust them. There is no way to fully protect the database, because a hosting company has access to almost all levels of your application. They can event inject a code that would fetch all data in some layer of your application.
The hosting company is only one of the threats. You should think about XSS's, CSRF's, data sniffing at network level and so on...
As all have answered there really is no way to protect your data from the hosting company. They own the server therefore giving them access to all databases on it.
Depending on your data you could encrypt all of it but that's kind of overkill and not a practical solution unless your data is sensitive. In that case I would recommend getting a server of your own and building it to support your needs.
You could check out rackspace and setup one of their servers but again if it's not physically in your possession they could potentially get on it and see what's there. I think it's less likely as you would be setting up your own VM or server through them.
I guess there is nothing we can do to prevent it from the hosting company. therefore configuring your own server may be the only option.
I whole heartedly endorse the general rule that Jay puts forward.
However, in certain environments it might be a good idea to take extra steps to ensure that your data is somewhat more protected given the rule that it really is someoneelses computer.
Try to encrypt data that does not need to be acted up with public keys, and keep private keys off of the server. This is trivial to overcome if someone else can change the code and ensure that unencrypted copies are kept in parallel.
So try to use stuff like Tripwire to ensure that your code has not been changed. Again tripwire can be reconfigured with physical access, so this is not fool proof, but it can work
Good luck, you are interested in a tackling an intractable problem, which are, of course, the most fun.
-FT

Collaborating on websites with relational databases and a CMS

What processes do you put in place when collaborating in a small team on websites with databases?
We have no problems working on site files as they are under revision control, so any number of our developers can work from any location on this aspect of a website.
But, when database changes need to be made (either directly as part of the development or implicitly by making content changes in a CMS), obviously it is difficult for the different developers to then merge these database changes.
Our approaches thus far have been limited to the following:
Putting a content freeze on the production website and having all developers work on the same copy of the production database
Delegating tasks that will involve database changes to one developer and then asking other developers to import a copy of that database once changes have been made; in the meantime other developers work only on site files under revision control
Allowing developers to make changes to their own copy of the database for the sake of their own development, but then manually making these changes on all other copies of the database (e.g. providing other developers with an SQL import script pertaining to the database changes they have made)
I'd be interested to know if you have any better suggestions.
We work mainly with MySQL databases and at present do not keep track of revisions to these databases. The problems discussed above pertain mainly to Drupal and Wordpress sites where a good deal of the 'development' is carried out in conjunction with changes made to the database in the CMS.
You put all your database changes in SQL scripts. Put some kind of sequence number into the filename of each script so you know the order they must be run in. Then check in those scripts into your source control system. Now you have reproducible steps that you can apply to test and production databases.
While you could put all your DDL into the VC, this can get very messy very quickly if you try to manage lots and lots of ALTER statements.
Forcing all developers to use the same source database is not a very efficient approach either.
The solution I used was to maintain a file for each database entity specifying how to create the entity (primarily so the changes could be viewed using a diff utility), then manually creating ALTER statements by comparing the release version with the current version - yes, it is rather labour intensive but the only way I've found to solve the problem.
I had a plan to automate the generation of the ALTER statements - it should be relatively straightforward - indeed a quick google found this article and this one. Never got round to implementing one myself since the effort of doing so was much greater than the frequency of schema changes on the projects I was working on.
Where i work, every developer (actually, every development virtual machine) has its own database (or rather, its own schema on a shared Oracle instance). Our working process is based around complete rebuilds. We don't have any ability to modify an existing database - we only ever have the nuclear option of blowing away the whole schema and building from scratch.
We have a little 'drop everything' script, which uses queries on system tables to identify every object in the schema, constructs a pile of SQL to drop them, and runs it. Then we have a stack of DDL files full of CREATE TABLE statements, then we have a stack of XML files containing the initial data for the system, which are loaded by a loading tool. All of this is checked into source control. When a developer does an update from source control, if they see incoming database changes (DDL or data), they run the master build script, which runs them in order to create a fresh database from scratch.
The good thing is that this makes life simple. We never need to worry about diffs, deltas, ALTER TABLE, reversibility, etc, just straightforward DDL and data. We never have to worry about preserving the state of the database, or keeping it clean - you can get back to a clean state at the push of a button. Another important feature of this is that it makes it trivial to set up a new platform - and that means that when we add more development machines, or need to build an acceptance system or whatever, it's easy. I've seen projects fail because they couldn't build new instances from their muddled databases.
The main bad thing is that it takes some time - in our case, due to the particular depressing details of our system, a painfully long time, but i think a team that was really on top of its tools could do a complete rebuild like this in 10 minutes. Half an hour if you have a lot of data. Short enough to be able to do a few times during a working day without killing yourself.
The problem is what you do about data. There are two sides to this: data generated during development, and live data.
Data generated during development is actually pretty easy. People who don't work our way are presumably in the habit of creating that data directly in the database, and so see a problem in that it will be lost when rebuilding. The solution is simple: you don't create the data in the database, you create it in the loader scripts (XML in our case, but you could use SQL DML, or CSV with your database's import tool, or whatever). Think of the loader scripts as being source code, and the database as object code: the scripts are the definitive form, and are what you edit by hand; the database is what's made from them.
Live data is tougher. My company hasn't developed a single process which works in all cases - i don't know if we just haven't found the magic bullet yet, or if there isn't one. One of our projects is taking the approach that live is different to development, and that there are no complete rebuilds; rather, they have developed a set of practices for identifying the deltas when making a new release and applying them manually. They release every few weeks, so it's only a couple of days' work for a couple of people that often. Not a lot.
The project i'm on hasn't gone live yet, but it is replacing an existing live system, so we have a similar problem. Our approach is based on migration: rather than trying to use the existing database, we are migrating all the data from it into our system. We have written a rather sprawling tool to do this, which runs queries against the existing database (a copy of it, not the live version!), then writes the data out as loader scripts. These then feed into the build process just like any others. The migration is scripted, and runs every night as part of our daily build. In this case, the effort needed to write this tool was necessary anyway, because our database is very different in structure to the old one; the ability to do repeatable migrations at the push of a button came for free.
When we go live, one of our options will be to adapt this process to migrate from old versions of our database to new ones. We'll have to write completely new queries, but they should be very easy, because the source database is our own, and the mapping from it to the loader scripts is, as you would imagine, straightforward, even as the new version of the system drifts away from the live version. This would let us keep working in the complete rebuild paradigm - we still wouldn't have to worry about ALTER TABLE or keeping our databases clean, even when we're doing maintenance. I have no idea what the operations team will think of this idea, though!
You can use the replication module of the database engine, if it has one.
One server will be the master, changes are to be made on it.
Developers copies will be slaves.
Any changes on the master will be duplicated on the slaves.
It's a one way replication.
Can be a bit tricky to put into place as any changes on the slaves will be erased.
Also it means that the developers should have two copy of the database.
One will be the slave and another the "development" database.
There are also tools for cross database replications.
So any copies can be the master.
Both solutions can lead to disasters (replication errors).
The only solution is see fit is to have only one database for all developers and save it several times a day on a rotating history.
Won't save you from conflicts but you will be able to restore the previous version if it happens (and it always do...).
Where I work we are using Dotnetnuke and this poses the same problems. i.e. once released the production site has data going into the database as well as files being added to the file system by some modules and in the DNN file system.
We are versioning the site file system with svn which for the most part works ok. However, the database is a different matter. The best method we have come across so far is to use RedGate tools to synchronise the staging database with the production database. RedGate tools are very good and well worth the money.
Basically we all develop locally with a local copy of the database and site. If the changes are major we branch. Then we commit locally and do a RedGate merge to put our DB changes on the the shared dev server.
We use a shared dev server so others can do the testing. Once complete we then update the site on staging with svn and then merge the database changes from the development server to the staging server.
Then to go live we do the same from staging to prod.
This method works but is prone to error and is very time consuming when small changes need to be made. The prod DB is always backed up so we can roll back easily if a delivery goes wrong.
One major headache we have is that Dotnetnuke uses identity cols in many tables and if you have data going into tables on development and production such as tabs and permissions and module instances you have a nightmare syncing them. Ideally you want to find or build a cms that uses GUI's or something else in the database so you can easily sync tables that are in use concurrently.
We'd love to find a better method! As we have a lot of trouble with branching and merging when projects are concurrent.
Gus

MS Access antiquated? Anything new in 2011?

Our company has a database of 17,000 entries. We have used MS Access for over 10 years for our various mailings. Is there something new and better out there? I'm not a techie, so keep in mind when answering. Our problems with Access are:
-no record of what was deleted,
-will not turn up a name in a search if cap's or punctuation
is not entered exactly,
-is complicated for us to understand the de-duping process.
- We'd like a more nimble program that we can access from more than one dedicated computer.
The only applications I know of that are comparable to Access are FileMaker Pro, and the database component of the Open Office suite. FM Pro is a full-fledged product and gets good marks for ease of use from non-technical users, while Base is much less robust and is not nearly as easy for creating an application.
All of the answers recommending different databases really completely miss the point here -- the original question is about a data store and application builder, not just the data store.
To the specific problems:
PROBLEM 1: no record of what was deleted
This is a design error, not a flaw in Access. There is no database that really keeps a record of what's deleted unless someone programs logging of deleted data.
But backing up a bit, if you are asking this question it suggest that you've got people deleting things that shouldn't be deleted. There are two solutions:
regular backups. That would mean you could restore data from the last backup and likely recover most of the lost data. You need regular backups with any database, so this is not really something that is specific to Access.
design your database so records are never deleted, just marked deleted and then hidden in data entry forms and reports, etc. This is much more complicated, but is very often the preferred solution, as it preserves all the data.
Problem #2: will not turn up a name in a search if cap's or punctuation is not entered exactly
There are two parts to this, one of which is understandable, and the other of which makes no sense.
punctuation -- databases are stupid. They can't tell that Mr. and Mister are the same thing, for instance. The solution to this is that for all data that needs to be entered in a regularized fashion, you use all possible methods to insure that the user can only enter valid choices. The most common control for this is a dropdown list (i.e., "combo box"), which limits the choices the user has to the ones offered in the list. It insures that all the data in the field conforms to a finite set of choices. There are other ways of maintaining data regularity and one of those involves normalization. That process avoids the issue of repeatedly storing, say, a company name in multiple records -- instead you'd store the companies in a different table and just link your records to a single company record (usually done with a combo box, too). There are other controls that can be used to help insure regularity of data entry, but that's the easiest.
capitalization -- this one makes no sense to me, as Access/Jet/ACE is completely case-insensitive. You'll have to explain more if you're looking for a solution to whatever problem you're encountering, as I can't conceive of a situation where you'd actually not find data because of differences in capitalization.
Problem #3: is complicated for us to understand the de-duping process
De-duping is a complicated process, because it's almost impossible for the computer to figure out which record among the candidates is the best one to keep. So, you want to make sure your database is designed so that it is impossible to accidentally introduce duplicate records. Indexing can help with this in certain kinds of situations, but when mailing lists are involved, you're dealing with people data which is almost impossible to model in a way where you have a unique natural key that will allow you to eliminate duplicates (this, too, is a very complicated topic).
So, you basically have to have a data entry process that checks the new record against the existing data and informs the user if there's a duplicate (or near match). I do this all the time in my apps where the users enter people -- I use an unbound form where they type in the information that is the bare minimum to create a new record (usually some combination of lastname, firstname, company and email), and then I present a list of possible matches. I do strict and loose matching and rank by closeness of the match, with the closer matches at the top of the list.
Then the user has to decide if there's a match, and is offered the opportunity to create the duplicate anyway (it's possible to have two people with the same name at the same company, of course), or instead to abandon adding the new record and instead go to one the existing records that was presented as a possible duplicate.
This leaves it up to the user to read what's onscreen and make the decision about what is and isn't a duplicate. But it maximizes the possibility of the user knowing about the dupes and never accidentally creating a duplicate record.
Problem #4: We'd like a more nimble program that we can access from more than one dedicated computer.
This one confuses me. Access is multi-user out of the box (and has been from the very beginning, nearly 20 years ago). There is no limitation whatsoever to a single computer. There are things you have to do to make it work, such as splitting your database into to parts, one part with just the data tables, and the other part with your forms and reports and such (and links to the data tables in the other file). Then you keep the back end data file on one of the computers that acts as a server, and give a copy of the front end (the reports, forms, etc.) to each user. This works very well, actually, and can easily support a couple of dozen users (or more, depending on what they are doing and how well your database is designed).
Basically, after all of this, I would tend to second #mwolfe02's answer, and agree with him that what you need is not a new database, but a database consultant who can design for you an application that will help you manage your mailing lists (and other needs) without you needing to get too deep into the weeds learning Access (or FileMaker or whatever). While it might seem more expensive up front, the end result should be a big productivity boost for all your users, as well as an application that will produce better output (because the data is cleaner and maintained better because of the improved data entry systems).
So, basically, you either need to spend money upfront on somebody with technical expertise who would design something that allows you to do better work (and more efficiently), or you need to invest time in upping your own technical skills. None of the alternatives to Access are going to resolve any of the issues you've raised without significant investment in interface design to further the goals you have (cleaner data, easier to find information, etc.).
At the risk of sounding snide, what you are really looking for is a consultant.
In the hands of a capable programmer, all of your issues with Access are easily handled. The problems you are having are not the result of using the wrong tool, but using that tool less than optimally.
Actually, if you are not a techie then Access is already the best tool for you. You will not find a more non-techie friendly way to build a data application from bottom to top.
That said, I'd say you have three options at this point:
Hire a competent database consultant to improve your application
Find commercial off-the-shelf (COTS) software that does what you need (I'm sure there are plenty of products to handle mailings; you'll need to research)
Learn about database normalization and building proper MS Access applications
If you can find a good program that does what you want then #2 above will maximize your Return on Investment (ROI). One caveat is that you'll need to convert all of your existing data, which may not be easy or even possible. Make sure you investigate that before you buy anything.
While it may be the most expensive option up-front, hiring a competent database consultant is probably your best option if you need a truly custom solution.
SQL Server sounds like a viable alternative to your scenario. If cost is a concern, you can always use SQL Server Express, which is free. Full blown SQL Server provides a lot more functionality that might not be needed right away. Express is a lot simpler as the number of features provided with it are much smaller. With either version though you will have centralized store for your data and the ability to allow all transactions to be recorded in the transaction log. Also, both have the ability to import data from an Access Database.
The newest version of SQL Server is 2008 R2
You probably want to take a look at modern databases. If you're into Microsoft-based products, start with SQL Server Express
EDIT: However, since I understand that you're not a programmer yourself, you'd probably be better off having someone experienced look into your technical problem more deeply, like the other answer suggests.
It sounds like you may want to consider a front-end for your existing Access data store. Microsoft has yet to replace Access per se, but they do have a new tool that is a lot lower on the programming totem pole than some other options. Check out Visual Studio Lightswitch - http://www.microsoft.com/visualstudio/en-us/lightswitch.
It's fairly new (still in beta) but showing potential. With it, just as with any Visual Studio project, you can connect to an MS Access datasource and design a front-end to interact with it. The plus-side here is programming requirements are much lower than with straight-up Visual Studio (read: Wizards).
Given that replacing your Access DB will require some font-end programming, you may look into VistaDB. It should allow your front end to be created in .NET with an XCopy database on the backend without requiring a server. One plus is that it retains SQL Server syntax, so if you do decide to move to SQL Server you'll be one step ahead.
(Since you're not a techie and may not understand my previous statement, you might pass my answer on to the consultant/programmer/database guy who is going to do the work for you.)
http://www.vistadb.net/

Synchronizing MS Access database file

I am developing a database with about 10 tables in it. Basically it will be used in 2 or 3 distant geographical locations (let's call them A,B and C). The desired work flow will be as follows:
A,B and C should always have the same database. So when A does any changes he should be able to send those changes over to B and C. Emailing the entire mdb file doesnt make sense since its 15+mb in size. So I would like to send the new additional records and changes only to B and C. The changes B and C make should also be reflected to the other repective parties. How can I do this?
I have a few ideas in mind but cont know how to implement it.
solution 'A' - export the data tables only into a xls file and email that. But the importing of the tables into the mdb file could be a bit complex right? and the xls is file will also become bigger and bigger with time.
solution 'B' - try extract just the changes and email only the new parts? (but how to extract just those)
Solution 'C' - find some way of syncing all users onto the same database(storage) location. I was thinking of a front/back end splitting solution by storing the tables in a shared drive in the parent company's server (which is also overseas). But the network connection between locations is very slow, and I dont know how much bandwidth is needed for this.
Any recomendations would be most welcome!
In regard to sources for information on replication, start with my Jet Replication Wiki.
But I would never recommend Jet replication for your scenario. The only environment where I currently recommend it (and I've been doing replicated apps since 1997 and still have several in production use) is for supporting laptop users who have to work with live data in the field disconnected from any network, and return to the home office and synch direct with the mother ship.
The easiest solutions with an Access application would be hosting the app on Windows Terminal Server/Citrix and the users would run it over a Remote Desktop Connection, or using Sharepoint. The Terminal Server/Citrix solution has no accomodation for disconnected users, but Sharepoint can accomodate offline usage and synch changes when connected. Access 2010 and Sharepoint 2010 provide a host of new features, including better schema design, the equivalent of triggers and greatly improved peformance for large Sharepoint lists, so it's a no-brainer to me that if you choose Sharepoint you'd want to use A2010 and Sharepoint 2010.
While it's possible to do what you want with Jet Replication, it requires a lot of setup on the server and client ends, and is relatively fragile (not in terms of data integrity if you're using indirect replication (as you should), but in terms of network reliability) -- there are too many moving parts and too many failure points.
Windows Terminal Server/Citrix is by far the simplest, with the fewest moving parts and completely centralized administration, and works very well for a relatively small investment.
Sharepoint is more complicated than WTS/Citrix, but is less complex and more centralized than a Jet Replication solution.
If it were me, I'd probably go with WTS/Citrix if there was no need for disconnected usage, but I'd be salivating over trying out A2010/Sharepoint 2010. If there was a need for disconnected usage, then I'd definitely go the Sharepoint route.
You want to use "Jet Replication". See
MSDN Search for jro at http://social.msdn.microsoft.com/Search/en-US?query=jro&ac=8
MSDN Search for access replication at http://social.msdn.microsoft.com/Search/en-US?query=access%20replication&ac=3
It's been some time since I did it, but the indirect method of replication worked well for me in a similar situation.
It takes something to set up. The documentation used to be appalling for it, but I found articles written by Michael Kaplan (aka Michka) that walked me through how to do it.
If your final environment is going to be fairly stable, then use Access the whole way. If not, then I'd urge you to take HansUp's advice and go with SQL Server or SharePoint.
Do note: if you're working in Access 2007 or later, replication is not directly supported, and you'll have to roll-your-own bits and pieces. If you're using an earlier installation, you'll be fine, but allow time for some head-scratching.

How to maintain application configuration data in the database across multiple environments?

The company I work for has attempted to maintain configuration data for our application across multiple environments, but syncing that data has always been problematic and we've never come up with a good solution.
To help clarify, we (developers or business) might change some configuration using our admin interface on the Staging environment, test it, and then want to copy those changes to our Production environment without having to redo all the changes in the Production environment. We've also typically wanted to sync these changes between all of our environments (dev, staging, & production), again without having to make the changes individually on each environment.
Preferably we don't want to use any low level tools, as asking the business to use something like RedGate's SQL Data Compare and copying individual rows wouldn't work. It would need to be something intuitive enough so the not-so-technical could use it and not overwhelm them.
How do we maintain this configuration data across the different environments while still providing the business with the ability to test their changes before applying it to the live environment?
What level of technical know-how will the users have? As product manager at Red Gate I can give you our perspective. Although we're not considering support for data in our v1 release of SQL Source Control (currently under development), it will inevitably follow. However, this would still require those who wish to edit static data to do so in SSMS, although they could of course use edit the values using SSMS's graphical designers. Or is this still less intuitive than you'd like? They would be changing the data on a dev or staging database and would be expected to verify that the changes are correct and function as expected. These would then be committed to source control via our tool.
To deploy it would be a question of launching SQL Data Compare, although we plan to provide simple shortcuts from SSMS, rather than requiring users to negotiate their way around a completely separate tool. We haven't nailed down designs for this functionality so I'd encourage you to participate in our Early Access Program and state your case. More details of the Program can be found here:
http://www.red-gate.com/Products/SQL_Source_Control/index.htm