Mapreduce with multiple mappers and reducers - hadoop2

I am trying to implement multiple mappers and reducers code. Here is my main method
public static void main(String[] args) {
//create a new configuration
Configuration configuration = new Configuration();
Path out = new Path(args[1]);
try {
//This is the first job to get the number of providers per state
Job numberOfProvidersPerStateJob = Job.getInstance(configuration, "Total number of Providers per state");
//Set the Jar file class, mapper and reducer class
numberOfProvidersPerStateJob.setJarByClass(ProviderCount.class);
numberOfProvidersPerStateJob.setMapperClass(MapForProviderCount.class);
numberOfProvidersPerStateJob.setReducerClass(ReduceForProviderCount.class);
numberOfProvidersPerStateJob.setOutputKeyClass(Text.class);
numberOfProvidersPerStateJob.setOutputValueClass(IntWritable.class);
//Provide the input and output argument this will be needed when running the jar file in hadoop
FileInputFormat.addInputPath(numberOfProvidersPerStateJob, new Path(args[0]));
FileOutputFormat.setOutputPath(numberOfProvidersPerStateJob, new Path(out,"out1"));
if (!numberOfProvidersPerStateJob.waitForCompletion(true)) {
System.exit(1);
}
//Job 2 for getting the state with maximum provider
Job maxJobProviderState = Job.getInstance(configuration, "State With Max Job providers");
//Set the Jar file class, mapper and reducer class
maxJobProviderState.setJarByClass(ProviderCount.class);
maxJobProviderState.setMapperClass(MapForMaxProvider.class);
maxJobProviderState.setReducerClass(ReducerForMaxProvider.class);
maxJobProviderState.setOutputKeyClass(IntWritable.class);
maxJobProviderState.setOutputValueClass(Text.class);
//Provide the input and output argument this will be needed when running the jar file in hadoop
FileInputFormat.addInputPath(maxJobProviderState, new Path(out,"out1"));
FileOutputFormat.setOutputPath(maxJobProviderState, new Path(out,"out2"));
//Exit when results are ready
System.exit(maxJobProviderState.waitForCompletion(true)?0:1);
}
The problem is whenever I am running it. It gives me final output from 2nd mapper class and not the reducer class. It is something like my 2nd reducer class is getting ignored.

You can implement ChainMappers using ( org.apache.hadoop.mapreduce.lib.chain.ChainMapper )and ChainReducers using ( org.apache.hadoop.mapreduce.lib.chain.ChainReducer ) , It will resolve your issue.

Related

How can I use an Excel file as test data correctly?

How can I best use an Excel file as input for an xUnit test? Note that I do not want to use the data inside the Excel, but the Excel itself.
Let's say I have a UnitTests project, where I want to place some Excel files, that I need to put in my tests:
[Fact]
public void Constructor_ShouldReadExcelFile()
{
var mapping = new ExcelMapping("excelfiles/test1.xlsx");
Assert.True(mapping.Valid);
}
but, when running that, the CurrentWorkingDirectory is set to the bin\Debug\net7.0 dir, and I need to make a relative path:
[Fact]
public void Constructor_ShouldReadExcelFile()
{
var mapping = new ExcelMapping("../../../excelfiles/test1.xlsx");
Assert.True(mapping.Valid);
}
This will work, but is this the "right" way?
Your solution looks fine to me.
I often need to retrieve test data files for unit tests and generally proceed as follows. The test data are also under version control but in a different folder than the unit tests. In my unit test class, I define a relative path for the test data and make a member for the absolute path:
const string testDataRelativePath = #"..\..\..\..\excelfiles\";
string testDataFolderAbsolutePath;
The relative path is relative to the project folder where the unit test dll is output.
In the constructor of the test class I define a value for the absolute path.
using System.IO;
using System.Reflection;
public class MyTestClass
{
public MyTestClass()
{
string projectDir = getProjectDir();
testDataFolderAbsolutePath = Path.GetFullPath(Path.Combine(projectDir, testDataRelativePath));
}
internal static string getProjectDir()
{
Assembly assembly = Assembly.GetExecutingAssembly();
return directoryPathNameFromAssemblyCodeBase(assembly);
}
internal static string directoryPathNameFromAssemblyCodeBase(Assembly assembly)
{
Uri codeBaseUrl = new Uri(assembly.CodeBase);
string codeBasePath = Uri.UnescapeDataString(codeBaseUrl.AbsolutePath);
return Path.GetDirectoryName(codeBasePath);
}
// ... Tests ...
}
In the test itself, I then do something like this:
string excelFilePath = Path.Combine(testDataFolderAbsolutePath, "test1.xlsx");
I find that this gives better results on the plurality of systems on which the tests are running.

Autodesk java api response mapping

We are using the forge-api-java-client. There is an issue in Model Derivatives getManifest call.
The response fails mapping with a single Message String being returned instead of the expected String Array.
Have switched to using local build of the jar, change in file Message.java to include an alternative constructor for the class setMessage
public void setMessage(String message) {
List<String> messages = new ArrayList<>();
messages.add(message);
setMessage(messages);
}
Could this change be merged into the project.
We'll check it, but as of today, that package is just under maintenance. You are welcome to submit a PR.

JSON to CSV conversion on HDFS

I am trying to convert a JSON file into CSV.
I have a JAVA code which is able to do it perfectly on UNIX file system and on local file system.
I have written below main class to perform this conversion on HDFS.
public class ClassMain {
public static void main(String[] args) throws IOException {
String uri = args[1];
String uri1 = args[2];
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
FSDataInputStream in = null;
FSDataOutputStream out = fs.create(new Path(uri1));
try{
in = fs.open(new Path(uri));
JsonToCSV toCSV = new JsonToCSV(uri);
toCSV.json2Sheet().write2csv(uri1);
IOUtils.copyBytes(in, out, 4096, false);
}
finally{
IOUtils.closeStream(in);
IOUtils.closeStream(out);
}
}
}
json2sheet and write2csv are methods which perform the conversion and write operation.
I am running this jar using below command:
hadoop jar json-csv-hdfs.jar com.nishant.ClassMain /nishant/large.json /nishant/output
The problem is, it does not write anything at /nishant/output. It creates a 0 sized /nishant/output file.
Maybe the usage of copyBytes is not a good idea here.
How to achieve this on HDFS if it is working OK on unix FS and local FS.
Here I am trying to convert JSON file to CSV and not trying to map JSON objects to their values
FileSystem needs only one configuration key to successfully connect to HDFS.
conf.set(key, "hdfs://host:port"); // where key="fs.default.name"|"fs.defaultFS"

How to read a CSV file from Hdfs?

I have my Data in a CSV file. I want to read the CSV file which is in HDFS.
Can anyone help me with the code??
I'm new to hadoop. Thanks in Advance.
The classes required for this are FileSystem, FSDataInputStream and Path. Client should be something like this :
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
Configuration conf = new Configuration();
conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/core-site.xml"));
conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/hdfs-site.xml"));
FileSystem fs = FileSystem.get(conf);
FSDataInputStream inputStream = fs.open(new Path("/path/to/input/file"));
System.out.println(inputStream.readChar());
}
FSDataInputStream has several read methods. Choose the one which suits your needs.
If it is MR, it's even easier :
public static class YourMapper extends
Mapper<LongWritable, Text, Your_Wish, Your_Wish> {
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
//Framework does the reading for you...
String line = value.toString(); //line contains one line of your csv file.
//do your processing here
....................
....................
context.write(Your_Wish, Your_Wish);
}
}
}
If you want to use mapreduce you can use TextInputFormat to read line by line and parse each line in mapper's map function.
Other option is to develop (or find developed) CSV input format for reading data from file.
There is one old tutorial here http://hadoop.apache.org/docs/r0.18.3/mapred_tutorial.html but logic is same in new versions
If you are using single process for reading data from file it is same as reading file from any other file system. There is nice example here https://sites.google.com/site/hadoopandhive/home/hadoop-how-to-read-a-file-from-hdfs
HTH

OSGi Declarative Services and Config Admin

I'm writting bundle with declarative services usage. For configuration I'm using properties in DS declaration. Those props can be normally changed by Config Admin, but they are not persisted. After container restart, component has default values.
I'am using Config admin like this:
Configuration c = configurationAdmin.getConfiguration(UserAgent.SERVICE_PID, null);
System.out.println(c.getProperties()); // every time is null!
Dictionary props = new Hashtable();
props.put(UserAgent.PROPERTY_PORT, 5555);
c.update(props);
and in component I have:
// ...
#Modified
public void updated(ComponentContext context) {
config = context.getProperties();
init();
}
#Activate
protected void activate(ComponentContext context) {
config = context.getProperties();
init();
}
//...
I'm using Felix, properties file is stored in cache
service.bundleLocation="file:bundles/cz.b2m.osgi.phonus.core_1.0.0.SNAPSHOT.jar"
service.pid="cz.b2m.osgi.phonus.sip"
port=I"5555"
But after restart isn't loaded. What I'm doing wrong? Thanks for all tips.
Problem was in Pax Runner which every restart (clean) erased the data folder of Config Admin bundle.
To make sure that Pax Runner does not clear the data, you can use the --usePersistedState=true flag.