Hadoop configuration using both MongoDB and MySQL - mysql

any one can give sample code from mongodb to rdbs ... I tried already , fetching data from mongodb and output store in mongodb.For that i knew how to do hadoop configuration in java job.
And i want to know three things...
which hadoop version support both mongodb and rdbs?
Is it possible to use multiple collections as input...? If possible, how we can do that?
I tried mongodb query in hadoop,It's working fine.But when i defined sort or limit...It is not working properly..even it's not fetching data from mongodb...

1. which hadoop version support both mongodb and rdbs?
I believe that all versions of Hadoop supporting MongoDB also support RDBMS (the RDBMS implementations predate MongoDB).
For supported versions of Hadoop to use with MongoDB, see: Building the Adapter. Check the version information as some Hadoop versions do not support the Streaming Connector (i.e. if you want to write your jobs in non-JVM languages such as Python).
2. Is it possible to use multiple collections as input...?
If possible, how we can do that?
MongoDB Hadoop Connector v1.0.0 does not support multiple collections as input, but there are a few folks in the community working on this (see: Feature/multiple inputs).
3. I tried mongodb query in hadoop,It's working fine. But when i defined
sort or limit... It is not working properly..even it's not fetching data
from mongodb...
Can you provide an example of how/where you provided these options? Are you referring to the mongo.input.sort and mongo.input.limit properties?
You may want to try enabling the Database Profiler in MongoDB to confirm the queries are being sent:
db.setProfilingLevel(2)

Related

Nifi for database migration

Why would nifi be a good use case for database migration if all it does is sending the same data over and over again?(I have tried to extract data from a database and putting them into a JSON file I was seeing multiple entries of the same tuple.) Wouldn't that be a waste of computing resources?
If I just want to migrate the database once and sometimes update the changed columns only, is nifi still a good tool to use?
It all depends on which database you want to migrate from/to which environments. Is it a large enterprise Oracle DB you want to migrate into Hadoop? Look into Sqoop https://sqoop.apache.org/. I would recommend Sqoop for doing one-time imports of large databases into Hadoop.
You can use NiFi to do an import as well, using processors such as ExecuteSQL, QueryDatabaseTable, GenerateTableFetch... They all work with JDBC connectors, so depending on if your database supports this, you could opt for this as well.
If you want to get incremental changes, you could use the QueryDatabaseTable processor and use it's Maximum-Value Column property, Matt Burgess has an article explaining how you can put this in place over at https://community.hortonworks.com/articles/51902/incremental-fetch-in-nifi-with-querydatabasetable.html.

Using Erwin to create DDL for Amazon Redshift

Has anyone been successful in creating DDL using Erwin for Amazon Redshift? If not, does anyone know of a way to convert, say a MySQL DDL from Erwin, to an Amazon Redshift compliant DDL?
I understand that Redshift is based on PostgreSQL 8.0.2. However, there are numerous PostgreSQL features that are no compatible with Redshift. So, if I use a tool to convert MySQL DDL to PostgreSQL DDL and try to execute is against Redshift, I always run into issues.
I would appreciate any help.
One approach that works (with limited features) is to forward engineer erwin model into ODBC 3.x compliant schema i.e.
Select Target Database (Actions ---> Target Database) as ODBC/Generic with version as 3.0.
The reason this works is because ODBC/Generic sql can be executed on Redshift without any changes.
NOTE: Features like
Identity
Encode
may need manipulating FET template or more.
However, just selecting target database to ODBC may suffice in general.
Update:
this link suggests that newer DM may have further support for Redshift.

DB2's JSON capabilities

I'm writing an application in MeteorJS, which requires use of MongoDB. However, I'd really like to use an SQL database, as my data is highly relational, and I could make use of features like views.
I see that IBM has a Mongo wireline driver which natively emulates Mongo i.e. you can create a frontend that thinks it's communicating to a Mongo database, while in reality, it's being backed by an SQL database. This, to me, seems ideal, at least until Meteor supports a native relational backend.
Both DB2 and Informix have Mongo drivers, and my question is this: have any of you used the JSON and Mongo driver capabilities of either of these DBs and are there limitations or factors to consider? This is a greenfield project so there's no legacy database that needs to be supported.
I'd prefer to use DB2, as Informix appears to be a legacy product and I'm hesitant to start a brand new project with technology I'll have trouble finding trained staff for. Ironically, however, it seems that Informix has deeper support for JSON, including full two-way conversion of JSON to relational tables and back, indexing, etc. (even sharding and replication)
My reading of DB2 is that currently it only supports JSON as an additional JSON/BSON field into which all JSON data will go, but without automatic two-way access to the other relational columns. Is this correct? Anyone using DB2's JSON features?
I suspect in future versions, IBM will put better JSON support into DB2 (sort of how XML was gradually integrated), but I need something now. So my options for now, as I see them:
Use Informix with its better JSON support.
Use DB2 with less JSON support (unless I'm mistaken), and wait for
new versions
Use MongoDB for now and wait for Meteor to support a
relational DB
Any other options?

Using Cassandra 3.x abilities in lower versions

We had some R&D for our new project, which uses Cassandra as db. The research shows that we can not use Cassandra 3.x version for import/exporting data using SSIS. So, we have to use lower versions. (what's your opinion?)
On the other hand, we need using materialized view in some cases, SASI secondary index, and other functionality and capabilities of newer versions.
Is there any alternative approach that help us using both versions together and share data between them? Is this a good solution or we should sacrifice the benefits of new versions for translating the data?
that we can not use Cassandra 3.x version for import/exporting data using SSIS
Why do you need SSIS for data import/export. Did you consider using Apache Spark for this purpose ? With Spark, you can migrate to Cassandra 3.x

General Questions about MySQL and MySQLite

I am going to be writing to a MySQLite database file, using Perl's DBD:SQLite module, and I wondering if it is possible for this file to be read by any distribution of MySQL? Is there a better way to create a simple MySQL database (using Perl)?
If it means anything, I'm only going to be using the database to store key-value pairs based on unique ID numbers for the keys. I tried BerkeleyDB but there is little support for it on Perl and I could not get it to work correctly in the past on certain versions of Windows.
Edit: I am aware that BerkeleyDB is a better way to do this, but when I was writing scripts for it, most of the methods have TODO, and I've had mysterious bugs on Windows Server 2003 using the same airtight code that ran for 2 weeks straight on my Win7 machine at home.
MySQL and SQLite are completely separate RDBMS systems. There is no such thing as MySQLite. To the best of my knowledge, MySQL cannot read SQLite databases.
If all you really want is a key-value store, perhaps look at Redis: http://code.google.com/p/redis/
I use Perl's DBI module which I can use to read databases using either MySQL or SQLite. All you need is the correct driver. In fact, if you write your program correctly, the backend database (either SQLite or MySql) is irrelevant. Your program will work with either one.
However, you can't use a SQLite database and then treat it as a MySQL database. They're two different creatures. Your program can be database agnostic, but once you chose a database, you can't switch back and forth. It'd be like opening an Oracle database as a MySQL database.
See This posting on Perl Monks for more info.
BerkeleyDB is well supported by perl. You have a choice between the older DB_File and the more fully featured BerkeleyDB module.
But there are tons of choices. If you don't want to have to run a separate server process, use DBI and DBD::SQLite or BerkeleyDB or any of the AnyDBM_File modules. For a simple server-based key-value store, there's redis or the older memcached.