I've been doing some experiments on my data storage and for that reason I've created a handful of fake ACLs . Now I want to delete them . So I queried the data storage using the following :
select * from dm_Acl enable (row_based)
But then I realized that there is no such attribute as date created or modified or any thing else related to date what so ever . Then (with doubt) I thought that alcs might be considered as DM_SYSOBJECT but then I queried a specific alc name that I had in mind but there was no result . I was wondering if there is any approach for me to meet my objective ?
I think you must not delete ACL based on their creation date (moreover this is not possible), as there are might be objects referenced with an ACL.
So, I think what you really need is to delete orphaned ACL objects (which are not referenced with any objects).
There is a dm_DMClean Documentum Job which does exactly this.
However, I'm currently not sure if it deletes orphaned custom dm_acl objects or only automatically created ones which name starts with dm_45.. (I haven't been working with DCTM for a long time already), but it is easy to check - make sure you have an orphaned ACL, run the job and check if your acl was deleted.
Sergi's answer is pretty much good, but I had issue with deliberately deleted ACL's on production environment. Whole issue was fixed by simply creating new ACL's. It seems that there is no additional link between object's ACL property and ACL object itself, so in case of a problem it should be easily fixable.
Since you say this is your development environment you can go ahead and delete ACL's you don't want to have in your environment. In this situation it's wise to run ConsistencyChecker job from time to time.
Check for orphaned ACL's, if there is no orphaned objects then try to query objects you created during your development period and JOIN ACL properties from created objects to dm_acl table.
Related
I am coming from object relation database background, I understand Couchbase is schema-less, but data migration will still happen as the application develop.
In SQL we have management tool to alter table, or I can write migration script with SQL to do migration from version 1 table to version 2 table.
But in document, say we have json Document UserProfile:
UserProfile
{
"Owner": "Rich guy!",
"Car": "Cool car"
}
We might want to add a last visit field there, allow user have multiple car, so the new updated document will become follows:
UserProfile
{
"Owner": "Rich guy!",
"Car": ["Cool car", "Another car"],
"LastVisit": "2015-09-29"
}
But for easier maintenance, I want all other UserProfile documents to follow the same format, having "Car" field as an array.
From my experience in SQL, I could write migration script which support migrating different version of table. Migrate from version 1 table to version 2...N table.
So how can I should I write such migration code? I will have to really just writing an app (executable) using Couchbase SDK to migrate all the documents each time?
What will be the good way for doing migration like this?
Essentially, your problem breaks down into two parts:
Finding all the documents that need to be updated.
Retrieving and updating said documents.
You can do this in one of two ways: using a view that gives you the document ids, or using a DCP stream to get all the documents from the bucket. The view only gives you the ids of the documents, so you basically iterate over all the ids, and then retrieve, update and store each one using regular key-value methods. The DCP protocol, on the other hand, gives you the actual documents.
The advantage of using a view is that it's very simple to implement, works with any language SDK, and it lets you write your own logic around the process to make it more robust and safe. The disadvantage is having to build a view just for this, and also that if the data keeps changing, you must retrieve the ENTIRE view result at once, because if you try to page over the view with offsets, the ordering of results can change, thus giving you an inconsistent snapshot of the data.
The advantage of using DCP to stream all documents is that you're guaranteed to get a consistent snapshot of your data even if it's constantly changing, and also that you get the whole document directly as part of the stream, so you don't need to retrieve it separately - just update and store back to the database. The disadvantage is that it's currently only implemented in the Java SDK and is considered an experimental feature. See this blog for a simple implementation.
The third - and most convenient for an SQL user - way to do this is through the N1QL query language that's introduced in Couchbase 4. It has the same data manipulation commands as you would expect in SQL, so you could basically issue a command along the lines of UPDATE myBucket SET prop = {'field': 'value'} WHERE condition = 'something'. The advantage of this is pretty clear: it both finds and updates the documents all at once, without writing a single line of program code. The disadvantage is that the DML commands are considered "beta" in the 4.0 release of Couchbase, and that if the data set is too large, then it might not actually work due to timing out at some point. And of course, that fact that you need Couchbase 4.0 in the first place.
I don't know of any official tool currently to help with data model migrations, but there are some helpful code snippets depending on the SDK you use (see e.g. bulk updates in java).
For now you will have to write your own script. The basic process is as follow:
Make sure all your documents have a model_version attribute that you increment after each migration.
Before a migration update your application code so it can handle both the old and the new model_version, and so that new documents are written in the new model.
Write a script that iterate through all the old model documents in your bucket(you need a view that emits the document key), make the update you want, increment model_version and save the document back.
In a high concurrency environment it's important to have good error handling and monitoring, you could have for example a view that counts how many documents are in each model_version.
You can use Couchmove, which is a java migration tool working like Flyway DB.
You can execute N1QL queries with this tool to migrate your documents and keep tracking of your changes.
If I understood correctly, the crux here is getting and then 'update every CB docs'. This can be done with a view, provided that you understand that views are only 'eventually consistent' (unlike read/write actions which are strongly consistent).
If (at migration-time) no new documents are added to your bucket, then your view would be up-to-date and should return the entire set of documents to be migrated. easy.
On the other hand, if new documents continue to be written into your bucket, and these documents need to be migrated, then you will have to run your migration code continually to catch all these new docs (since the view wont return them until it is updated, a few seconds later).
In this 2nd scenario, while migration is happening, your bucket will contain a heterogeneous collection of docs: some that have been migrated already, some that are about to be migrated and some that your view has not 'seen' yet (because they were recently added) and would only be migrated once you re-run the migration code.
To make the migration process efficient, you'll need to find a way to differentiate between already-migrated items and yet-to-be-migrated items. You can add a field to each doc with its 'version number' and update it during the migration. Your view should be defined to only select documents with older 'version number' and ignore already-migrated items.
I suggest you read more about couchbase views - here and on their site.
Regarding your migration: There are two aspects here: (1) getting the list of document ids that need to be updated and (2) the actual update.
The actual update is simple: you retrieve the doc and save it again with the new format. There's no explicit schema. Where once you added column in SQL and populated it, you now just add a field in the json-doc (of all the docs). All migrated docs should have this field. Side note: Things get little more complicated if (while you're migrating) the document can be updated by another process. This requires special handling (read aboud CAS if that's the case).
Getting all the relevant doc-keys requires that you define a view and query it. Its beyond the scope of this answer (and is very well documented). Once you have all the keys, you simply iterate them one by one and update them.
With N1QL, Couchbase provides the same schema migration capabilities as you have in RDBMS or object-relational database. For the example in your question, you can place the following query in a migration script:
UPDATE UserProfile
SET Car = TO_ARRAY(Car),
LastVisit = NOW_STR();
This will migrate all the documents in your bucket to your new schema. Note that update statements in Couchbase provide document-level atomicity, not statement-level atomicity. But since this update is idempotent (repeatable), you can run it multiple times if you run into errors. Note: similar to the last paragraph of David's answer above.
PROBLEM
I am developing an app where the data model will be very similar to JSFiddle's. A user will create a new entry that will be assigned a GUID in the database. My question is how to handle when other users want to modify/fork/version the original entry. JSFiddle handles this by versioning the entry (so the URL becomes something like jsfiddle.net/GUID/1).
What is the benefit to JSFiddle's method over assigning a new GUID to the modified version and just recording a relationship to the original entry in the database?
It seems like no matter what I will have to create a new entry in the database that will essentially be a modified copy of the original.
Also, there will be both registered and anonymous users just like JSFiddle. The registered users should be able to log in and see all of their own entries and possibly the versions/forks that exist off of their own entries (though this isn't currently a requirement).
Am I missing something? Is there a right and wrong way to do this?
TECH
Using parse.com's RESTful API for data CRUD; node on the server.
What is the benefit to JSFiddle's method over assigning a new GUID to the modified version and just recording a relationship to the original entry in the database?
I would imagine none, both would require the same copy operation and the same double query (in MongoDB) to get the parent.
The only difference is what field you go by.
Am I missing something?
Not that I can see.
Is there a right and wrong way to do this?
It seems as though you have this pretty well covered frankly.
MVCC does seem the right way to do this in some respects, however you don't have to go the full hog. If you were there might be cause for you to change to a database that has it built in like CouchDB or something because MongoDBs implementation would be on top of its current existing lock mechanisms, its like adding a lock on a lock.
For some quite complex unit testing environment, we want to dynamically change the tables contained in the metadata. Removing tables from it are supported using .remove(table) or even .clear(). But how to later re-add such a table?
There is a _add_table(name, schema) method in MetaData, but this doesn't seem to be the official way. Also the Table._set_parent(metadata) seems more appropiate if one has to go the "use internal methods" route.
There also is Table.tometadata(metadata) which creates a new table instance that is attached the the new metadata. So I could create a complete new metadata and attach all "now needed" tables. But that would mean all the remaining code would need to know about the new table instances, connected to the new metadata. I don't want to go this route.
UPDATE: We're now considering fork/multiprocessing to load the tables only in a subprocess (isolated environment) so that only that subprocess is "tainted" and the next tests wont be hurt. I am noting this here for completeness, it's no strictly related to the main question, but might help others who find this question.
mutation of a MetaData object in a non-additive way is barely supported, and overall you shouldn't build use cases on top of it. Using new MetaData objects that contain the schema you're looking for in a particular scenario will work best.
How can I track changes to a development database and apply those changes to a production database (SQL Server 2008)?
I keep a local copy of a database on my development server, and as I'm adding new features, I may add new tables or change field and table names in the database. What's a good way to track such changes and then apply them to the main database?
Is there some way to do a "diff"-like operation between two databases and merge definition changes?
I considered merge-replication, but I'm not sure how well than handles schema changes. For example, here: http://technet.microsoft.com/en-us/library/ms151870.aspx it mentions that I basically cannot use SSMS to make definition changes, because it drops and recreates tables, which is not allowed for published objects.
A smart piece of software could compare column counts, types, positions, and apply other fuzzy matching/logical deduction methods to figure out that a table was renamed or a new table was added or a column name changed, after which it could present the differences to the user for confirmation and automatic application.
Does anything like what I've described above exist, or am I stuck remembering to save DDL statements in SSMS and running them manually in the production database?
Maybe you need a migration tool like (for example) FluentMigrator, which helps you track database changes in source code.
Here is a tutorial from the original author of Fluent Migrator, explaining what Fluent Migrator is, why you might need it and how it works.
Another alternative would be what you already mentioned:
A smart piece of software could compare column counts, types,
positions, and apply other fuzzy matching/logical deduction methods to
figure out that a table was renamed or a new table was added or a
column name changed, after which it could present the differences to
the user for confirmation and automatic application.
I never tried it myself, but I've seen lots of recommendations for Redgate SQL Compare (which apparently does exactly what you asked for) here at Stack Overflow.
At my company we have several developers all working on projects internally, each with their own virtualbox setup. We use SVN to handle the source, but occasionally run into issues where a database (MySQL) schema change is necessary, and this has to be propagated to all of the other developers. At the moment we have a manually-written log file which lists what you changed, and the SQL needed to perform the change.
I'm hoping there might be a better solution -- ideally one linked to SVN, e.g. if you update to revision 893 the system knows this requires database revision 183 and updates your local schema automagically. We're not concerned with the data being synched, just the schema.
Of course one solution would be to have all developers running off a single, central database; this however has the disadvantage that a schema change could break everyone else's build until they do an svn up.
One option is a data dictionary in YAML/JSON. There is a nice article here
I'd consider looking at something like MyBatis Schema Migration tools. It isn't exactly what you describe, but I think it solves your problem in an elegant way and can be used without pulling in core MyBatis.
In terms of rolling your own, what I've always done is to have a base schema file that will create the schema from scratch as well as a delta file that appends all schema changes as deltas, separated by version numbers (you can try and use SVN numbers, but I always find it easier just to manually increment). Then have a schema_version table, which contains that information in it for the live database, the canonical schema file will have that information in it and have a script that will run all changes subsequent to the existing DB version from the delta script.
So you'd have a schema like:
-- Version: 1
CREATE TABLE user (
id bigint,
name varchar(20))
You have the tool manage the schema version table and see something like:
> SELECT * FROM schema_version;
1,2011-05-05
Then you have a few people add to the schema and have a delta file that would look like:
-- Version: 2
ALTER TABLE user ADD email varchar(20);
-- Version: 3
ALTER TABLE user ADD phone varchar(20);
And a corresponding new schema checked in with:
-- Version: 3
CREATE TABLE user (
id bigint,
name varchar(20),
email charchar(20),
phone varchar(20))
When you run the delta script against a database with the initial schema (Version 1), it will read the value from the schema_version table and apply all deltas greater than that to your schema. This gets trickier when you start dealing with branches, but serves as a simple starting point.
There are a couple approaches I've used before or currently use:
Sequential Version Number
Most that use this approach have a separate program that grabs a version number from the database, and then executes any statements associated with database versions higher than that number, finally updating the version number in the database.
So if the version is 37 and there are statements associated with version 1 through 38 in the upgrading application, it will skip 1 through 37 and execute statements to bring the database to version 38.
I've seen implementations that also allow for downgrade statements for each version to undo what the upgrade did, and this allows for taking a database from version 38 back down to version 37.
In my situation we had this database upgrading in the application itself and did not have downgrades. Therefore, changes were source-controlled because they were part of the application.
Directed Acyclic Graph
In a more recent project I came up with a different approach. I use classes that are nodes of a directed acyclic graph to encapsulate the statements to do specific upgrades to the database for each specific feature/bugfix/etc. Each node has an attribute to declare its unique name and the names of any nodes on which it was dependent. These attributes are also used to search the assembly for all upgrade nodes.
A default root node is given as the dependency node for any nodes without dependencies, and this node contains the statements to create the migrationregister table that lists the names of nodes that have already been applied. After sorting all the nodes into a sequential list, they are executed in turn, skipping the ones that are already applied.
This is all contained in a separate application from the main application, and they are source-controlled in the same repository so that when a developer finishes work on a feature and the database changes associated with it, they are committed together in the same changeset. If you pull the changes for the feature, you also pull the database changes. Also, the main application simply needs a list of the expected node names. Any extra or missing, and it knows the database does not match.
I chose this approach because the project often has parallel development by multiple developers, with each developer sometimes having more than 1 thing in development (branchy development, sometimes very branch). Juggling database version numbers was quite the pain. If everybody started with version 37 and "Alice" starts on something and uses version 38 so it will change her database, and "Bob" also starts on work that has to change the database and also uses version 38, someone will need to change eventually. So let's say Bob finishes and pushes to the server. Now Alice, when she pulls Bob's changeset, has to change the version for statements to 39 and set her database version back to 37 so that Bob's changes will get executed, but then hers execute again.
But when all that happens when Alice pulls Bob's changeset is that there's simply a new migration node and another line in the list of node names to check against, things just work.
We use Mercurial (distributed) rather than SVN (client-server), so that's part of why this approach works so well for us.
An easy solution would be to keep a complete schema in SVN (or whatever library). That is, every time you change the schema, run MySQL "desc" to dump out descriptions of all the tables, overwrite the last such schema dump with this, and then commit. Then if you run a version diff, it should tell you what changed. You would, of course, need to keep all the tables in alphabetical order (or some predictable order).
For a different approach: Years ago I worked on a project for a desktop application where we were periodically sending out new versions that might have schema changes, and we wanted to handle these with no user intervention. So the program had a description of what schema it expected. At start up it did some metadata calls to check the schema of the database that it actually had and compared these to what it expected. If then automatically updated the schema to match what it expected. Usually when we added a new column we could simply let it start out null or blank, so this required pretty much zero coding effort once we got the first version to work. When there was some actual manipulation required to populate new fields, we'd have to write custom code, but that was relatively rare.