Hadoop JUnit testing writing/reading to/from the hdfs - junit

I have written a class(es) that writes and reads from hdfs. Given certain conditions that are occurring when these classes are instantiated they create a specific path and file, and write to it (or they go to a previously created path and file and read from it). I have tested it by running a few hadoop jobs, and it appears to be functioning correctly.
However, I would like to be able to test this in the JUnit framework, but I have not found a good solution for being able to test reading and writing to hdfs in JUnit. I would appreciate an helpful advice on the matter. Thanks.

I haven't tried this myself yet, but I believe what you are looking for is org.apache.hadoop.hdfs.MiniDFSCluster.
It is in hadoop-test-.jar NOT hadoop-core-.jar. I guess the Hadoop project uses it internally to test.
Here it is:
http://svn.apache.org/viewvc/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/MiniDFSCluster.java?revision=1127823&view=markup&pathrev=1130381
I think there are plenty of uses of it in that same directory, but here is one:
http://svn.apache.org/viewvc/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/TestWriteRead.java?revision=1130381&view=markup&pathrev=1130381

Related

How to make two json file configurations for different Firebase projects?

We have two Firebase project, one for developing another production project. We use cloud functions. In one cloud functions, you need to use service-account-credentials.json. The problem is how can I make this function take data from service-account-credentials-dev.json when it proceeds to the development project, and when on production, then from service-account-credentials-prod.json?
I know about the environment, but as I understand, this feature does not allow you to download the json file for a particular project.
I found the answer to my question here. Doug Stevenson wrote "There is not. You could write your own program to read the json and convert that into a series of firebase CLI commands that get the job done"

how to simulate sequence during DAO layer test

I have spring+hibernate project, I want to write Unit test case for DAO layers,
Currently I am using HSqldb's in memory DB to test it. (I referred this )
In project, IDs are provided by sequences, As I am using in-memory DB, during test sequence are not presents so it was falling, For workaround, I have created different set of hbm files without sequence(and put them test's resource folder). Is there any better way to handle this, as keeping duplicate hbm file does look good to me. Any suggestion would be appreciated
If you need a sequence in your test database, just create it.
Also, make sure you have the correct database dialect configured with Hibernate.
Consult the following related questions for details:
Syntax issue with HSQL sequence: `NEXTVAL` instead of `NEXT VALUE`
SequenceGenerator problem with unit testing in Hsqldb/H2
"correct" way to select next sequence value in HSQLDB 2.0.0-rc8
Holding complete copies of the HBM files doesn't sound like a good idea (one of the principles I strongly believe in is the "DRY" principle).
The solution I'd suggest (unless there's a better solution from the Hibernate side) is to edit the HBM file in the "#before" method, in order to change just the different bits.
I'm more of a .Net guy, and I know that in In .Net there's a library called FluentNHibernate that allows the generate (and I assume that also edit existing) hbm at runtime. I'm not sure if there's something similar in Java, but you can also fallback for manipulating the hbm as an XML file.

Migrating subsets of production data back to dev

In our rails app we sometimes have db entries created by users that we'd like to make part of our dev environment, without exporting the whole table. So, we'd like to be able to have a special 'dev and testing' dump.
Any recommended best practices? mysqldump seems pretty cumbersome, and we'd like to pull in rails associations as well, so maybe a rake task would make more sense.
Ideas?
You could use an ETL tool like Pentaho Kettle. Once you have initial transformation setup that you want you could easily run it with different parameters in the future. This way you could also keep all your associations. I wrote a little blurb about Pentaho for another question here.
If you provide a rough schema I could probably help you get started on what your transformation would look like.
I had a similar need and I ended up creating a plugin for that. It was developed for Rails 2.x and worked fine for me, but I didn't have much use for it lately.
The documentation is lacking, but it's pretty simple. You basically install the plugin and then have a method to_sql available on all your models. Options are explained in README.
You can try it out and let me know if you have any issues, I'll try to help.
I'd go after it using a Rails runner script. That will allow your code to access the same things your Rails app would, including the database initializations. ActiveRecord will be able to take advantage of the model relationships you've defined.
Create some "transfer" tables in your production database and copy the desired data into those using the "runner" script. From there you could serialize the data, or use a dump tool, since you'll be dealing with a reduced amount of records. Reverse the process in the development environment to move the data into the database.
I had a need to populate the database in one of my apps from remote web logs and wrote a runner script that fired off periodically via cron, ftps the data from my site and inserts the data.

Where can I find good examples or tutorials for sqlalchemy-migrate

In this thread someone pointed me to use sqlalchemy-migrate to help with a fast-changing web application using sqlalchemy.
However a Do It Yourself method was also recommended consisting in manually writing CSV columns for the new database schema, and finally import them.
The problem is that I can't find real-world examples of sqlalchemy-migrate. Ressources that I have found at best decribe adding a single column or a column rename. The official documentation essentially describes the API and it's hard to see how to use migrate effectively. From the doc I cannot even know if migrate could help changing the database engine, from sqlite to mysql for example, while the DIY solution would to the job.
I really want to see code that would make some non-trivial transformations of a database schema and proving that migrate is really a useful tool.
Where can I find good examples/tutorials for sqlalchemy-migrate ?
Thanks !
Don't forget about google code search when looking for real work examples of code. For instance the follow search:
http://www.google.com/codesearch?hl=en&lr=&q=%22from+migrate+import%22+lang:python&sbtn=Search
Will pull up a number of real migration scripts. It basically looks for Python files with "from migrate import" in the file.
Work through some of these and see if you can follow what they're doing.

How to collaborate on mysql schema?

I'm working with another dev and together we're building out a MySQL database. We've each got our own local instances of MySQL 5.1 on our dev machines. We've not yet been able to identify a way for us to be able to make a local schema change (eg: add a field and some values for that field) and then export some kind of script or diff file that the other can import in. I've looked into Toad and Navicat's synchronization features but they seem oriented towards synchronizing between two instances, not an instance and an intermediate file. We thought MySQL Workbench would be great but this but the synchronization feature just seems plain broken. Any other ideas? How do you collaborate with others on the schema?
First of all put your final SQL schema into version control. So you'll always have a version of it with all changes. It can be a plain SQL file. Every developer in the team can use it as starting point to created his copy database. All changes must be applied to it. This will help you to find conflicts faster.
Also I used such file to create a test database to run unit-tests after each submit. So we were always sure that production code is working.
Then you can use any migration tool to move changed between developers. Here is similar question about this:
Mechanisms for tracking DB schema changes
If you're using PHP then look at Doctrine migrations.