I just recently tried using Talend Cloud Data Integration (Talend Studio 7.2)
and I'm figuring how to dynamically migrate a whole MSSQL server database to MySQL(blank DB).
So far, I heard that it's possible to do that with Talent's Dynamic Schema.
However, I only found a tutorial for a table (data) migration, or some post that didn't specifically explained much about settings and components to use.
Here's my job design
I'm using global variable to dynamically define the table, however, I'm not sure how i can define the schema dynamically.
** I have about 100 tables in each databases(MSSQL) and they all have different number and types of columns which is not applicable to the tutorial written here.
Can you let me know how I can define my schema dynamically?
I am hoping to find a solution where one job could perform a whole database migration. Eg) Iterating tables in a source DB then generate an output a destination DB.
Please let me know either if I should proceed with Dynamic Schema or Job Template + job design/components/setting needed.
Thank you
I am using MongoDB first time here. Is there any best way to transfer bulkdata from mysql into MongoDB. I try to search to in different ways but i did not find.
You would have to physically map all your mysql tables to documents in mongodb but you can use an already developed tool.
You can try: Mongify (http://mongify.com/)
It's a super simple way to transform your data from a MySql to MongoDB. It has a ton of support for changing your existing schema into a schema that would work better with MongoDB.
Mongify will read your mysql database, build a translation file for you and all you have to do is map how you want your data transformed.
It supports:
Updating internal IDs (to BSON ObjectID)
Updating referencing IDs
Typecasting values
Embedding Tables into other documents
Before filters (to change data manually before import)
and much much more...
There is also a short 5 min video on the homepage that shows you how easy it is.
Try it out and tell me what you think.
Please up vote if you find this answer helpful.
I am coming from object relation database background, I understand Couchbase is schema-less, but data migration will still happen as the application develop.
In SQL we have management tool to alter table, or I can write migration script with SQL to do migration from version 1 table to version 2 table.
But in document, say we have json Document UserProfile:
UserProfile
{
"Owner": "Rich guy!",
"Car": "Cool car"
}
We might want to add a last visit field there, allow user have multiple car, so the new updated document will become follows:
UserProfile
{
"Owner": "Rich guy!",
"Car": ["Cool car", "Another car"],
"LastVisit": "2015-09-29"
}
But for easier maintenance, I want all other UserProfile documents to follow the same format, having "Car" field as an array.
From my experience in SQL, I could write migration script which support migrating different version of table. Migrate from version 1 table to version 2...N table.
So how can I should I write such migration code? I will have to really just writing an app (executable) using Couchbase SDK to migrate all the documents each time?
What will be the good way for doing migration like this?
Essentially, your problem breaks down into two parts:
Finding all the documents that need to be updated.
Retrieving and updating said documents.
You can do this in one of two ways: using a view that gives you the document ids, or using a DCP stream to get all the documents from the bucket. The view only gives you the ids of the documents, so you basically iterate over all the ids, and then retrieve, update and store each one using regular key-value methods. The DCP protocol, on the other hand, gives you the actual documents.
The advantage of using a view is that it's very simple to implement, works with any language SDK, and it lets you write your own logic around the process to make it more robust and safe. The disadvantage is having to build a view just for this, and also that if the data keeps changing, you must retrieve the ENTIRE view result at once, because if you try to page over the view with offsets, the ordering of results can change, thus giving you an inconsistent snapshot of the data.
The advantage of using DCP to stream all documents is that you're guaranteed to get a consistent snapshot of your data even if it's constantly changing, and also that you get the whole document directly as part of the stream, so you don't need to retrieve it separately - just update and store back to the database. The disadvantage is that it's currently only implemented in the Java SDK and is considered an experimental feature. See this blog for a simple implementation.
The third - and most convenient for an SQL user - way to do this is through the N1QL query language that's introduced in Couchbase 4. It has the same data manipulation commands as you would expect in SQL, so you could basically issue a command along the lines of UPDATE myBucket SET prop = {'field': 'value'} WHERE condition = 'something'. The advantage of this is pretty clear: it both finds and updates the documents all at once, without writing a single line of program code. The disadvantage is that the DML commands are considered "beta" in the 4.0 release of Couchbase, and that if the data set is too large, then it might not actually work due to timing out at some point. And of course, that fact that you need Couchbase 4.0 in the first place.
I don't know of any official tool currently to help with data model migrations, but there are some helpful code snippets depending on the SDK you use (see e.g. bulk updates in java).
For now you will have to write your own script. The basic process is as follow:
Make sure all your documents have a model_version attribute that you increment after each migration.
Before a migration update your application code so it can handle both the old and the new model_version, and so that new documents are written in the new model.
Write a script that iterate through all the old model documents in your bucket(you need a view that emits the document key), make the update you want, increment model_version and save the document back.
In a high concurrency environment it's important to have good error handling and monitoring, you could have for example a view that counts how many documents are in each model_version.
You can use Couchmove, which is a java migration tool working like Flyway DB.
You can execute N1QL queries with this tool to migrate your documents and keep tracking of your changes.
If I understood correctly, the crux here is getting and then 'update every CB docs'. This can be done with a view, provided that you understand that views are only 'eventually consistent' (unlike read/write actions which are strongly consistent).
If (at migration-time) no new documents are added to your bucket, then your view would be up-to-date and should return the entire set of documents to be migrated. easy.
On the other hand, if new documents continue to be written into your bucket, and these documents need to be migrated, then you will have to run your migration code continually to catch all these new docs (since the view wont return them until it is updated, a few seconds later).
In this 2nd scenario, while migration is happening, your bucket will contain a heterogeneous collection of docs: some that have been migrated already, some that are about to be migrated and some that your view has not 'seen' yet (because they were recently added) and would only be migrated once you re-run the migration code.
To make the migration process efficient, you'll need to find a way to differentiate between already-migrated items and yet-to-be-migrated items. You can add a field to each doc with its 'version number' and update it during the migration. Your view should be defined to only select documents with older 'version number' and ignore already-migrated items.
I suggest you read more about couchbase views - here and on their site.
Regarding your migration: There are two aspects here: (1) getting the list of document ids that need to be updated and (2) the actual update.
The actual update is simple: you retrieve the doc and save it again with the new format. There's no explicit schema. Where once you added column in SQL and populated it, you now just add a field in the json-doc (of all the docs). All migrated docs should have this field. Side note: Things get little more complicated if (while you're migrating) the document can be updated by another process. This requires special handling (read aboud CAS if that's the case).
Getting all the relevant doc-keys requires that you define a view and query it. Its beyond the scope of this answer (and is very well documented). Once you have all the keys, you simply iterate them one by one and update them.
With N1QL, Couchbase provides the same schema migration capabilities as you have in RDBMS or object-relational database. For the example in your question, you can place the following query in a migration script:
UPDATE UserProfile
SET Car = TO_ARRAY(Car),
LastVisit = NOW_STR();
This will migrate all the documents in your bucket to your new schema. Note that update statements in Couchbase provide document-level atomicity, not statement-level atomicity. But since this update is idempotent (repeatable), you can run it multiple times if you run into errors. Note: similar to the last paragraph of David's answer above.
I want to populate a mysql database with basic 'holiday resort' content e.g. name of the resort, description, country. What methods can i use to populate it?
First, find the Web sites of one or more holiday companies who offer the destinations you're interested in. You're going to scrape these.
You don't say what language you're using for the implementation, but here is how you might do it in Perl:
Write a scraper using LWP::UserAgent and HTML::TreeBuilder to explore the site and
extract the destination information.
Use DBI with the DBD::MySQL
driver to insert the data into your database.
Where is the content? You can use LOAD DATA INFILE syntax, or import from a CSV file, or write a script in Java or C++ or C# to parse the file(s) holding the data and populate the database via INSERT statements. You could hire and intern and make him/her type it all up. If you don't have it in a file, you could write a web-spider to go and get crap from Google and stuff it into the database.
But I can't help you until you tell me where the data is.
Ok, I know I'm late but I created an application to (create and/or) auto-populate tables. Below is a short demonstration but check it out here if you want.
I am building my first database driven website with Drupal and I have a few questions.
I am currently populating a google docs excel spreadsheet with all of the data I want to eventually be able to query from the website (after it's imported). Is this the best way to start?
If this is not the best way to start what would you recommend?
My plan is to populate the spreadsheet then import it as a csv into the mysql db via the CCK Node.
I've seen two ways to do this.
http://drupal.org/node/133705 (importing data into CCK nodes)
http://drupal.org/node/237574 (Inserting data using spreadsheet/csv instead of SQL insert statements)
Basically my question(s) is what is the best way to gather, then import data into drupal?
Thanks in advance for any help, suggestions.
There's a comparison of the available modules at http://groups.drupal.org/node/21338
In the past when I've done this I simply write code to do it on cron runs (see http://drupal.org/project/phorum for an example framework that you could strip down and build back up to do what you need).
If I were to do this now I would probably use the http://drupal.org/project/migrate module where the philosophy is "get it into MySQL, View the data, Import via GUI."
There is a very good module for this, node import. It allows you to take your GoogleDocs spreadsheet and import it as a .csv file.
It's really easy to use, the module allows you to map your .csv columns to the node fields you want them to go to, so you don't have to worry about setting your columns in a particular order. Also, if there is an error on some records, it will spit out a .csv with the error files and what caused the error, but will import all good records.
I have imported up to 3000 nodes with this method.