i have some problems during writing my mapreduce funtions.
I want to solve the following problem:
I have a JSON file with 1mio JSONObject like this:
{"_id":3951,"title":"Two Family House (2000)","genres":["Drama"],"ratings":[{"userId":173,"rating":5},{"userId":195,"rating":5},{"userId":411,"rating":4},{"userId":593,"rating":2},{"userId":629,"rating":3},{"userId":830,"rating":3},{"userId":838,"rating":5},{"userId":850,"rating":4},{"userId":856,"rating":4},{"userId":862,"rating":5},{"userId":889,"rating":1},{"userId":928,"rating":5},{"userId":986,"rating":4},{"userId":1001,"rating":5},{"userId":1069,"rating":3},{"userId":1168,"rating":3},{"userId":1173,"rating":2},{"userId":1242,"rating":3},{"userId":1266,"rating":5},{"userId":1331,"rating":5},{"userId":1417,"rating":5},{"userId":1470,"rating":4},{"userId":1474,"rating":5},{"userId":1615,"rating":3},{"userId":1625,"rating":4},{"userId":1733,"rating":4},{"userId":1799,"rating":4},{"userId":1865,"rating":5},{"userId":1877,"rating":5},{"userId":1897,"rating":5},{"userId":1946,"rating":4},{"userId":2031,"rating":4},{"userId":2129,"rating":2},{"userId":2353,"rating":4},{"userId":2986,"rating":4},{"userId":3940,"rating":4},{"userId":3985,"rating":3},{"userId":4025,"rating":5},{"userId":4727,"rating":3},{"userId":5333,"rating":3}]}
and more....
One JSON Object is a Movie, which contains a array ratings. I want to count all ratings in the JSON File.
I created a Maven Proct in IntelliJ with the dependencys for Hadoop and JSON Parser. My MapReduce Class is this:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.json.simple.JSONArray;
import org.json.simple.JSONObject;
import org.json.simple.parser.JSONParser;
import org.json.simple.parser.ParseException;
import java.io.IOException;
import java.util.Iterator;
public class RatingCounter {
public static class RatingMapper extends Mapper<JSONObject, Text, Text, Text>{
private Text id = new Text();
private Text ratingAnzahl = new Text();
public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException{
JSONParser parser = new JSONParser();
try {
Object obj = parser.parse(value.toString());
JSONObject jsonObject = (JSONObject) obj;
String movieId = (String) jsonObject.get("_id");
int count = 0;
// loop array
JSONArray ratings = (JSONArray) jsonObject.get("ratings");
Iterator<String> iterator = ratings.iterator();
while (iterator.hasNext()) {
count++;
}
} catch (ParseException e) {
e.printStackTrace();
}
}
}
public static class RatingReducer extends Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
Text resultValue = new Text();
int allRatings = 0;
while (values.hasNext()){
allRatings += Integer.parseInt(values.toString());
}
resultValue.set(""+allRatings);
context.write(key, resultValue);
}
}
public static void main (String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "ratings count");
job.setJarByClass(RatingCounter.class);
job.setMapperClass(RatingMapper.class);
job.setReducerClass(RatingReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
I have no idea, how I can write the functions in Mapper and Reducer. Can someone help me pls?
I've made a few changes to your mapper and reducer.
First, for your mapper, you are not writing the output anywhere and your syntax while extending the Mapper class is also wrong(arguably). The first input to any mapper is a LongWritable (or Object type) offset of line. You can notice the changes below
public static class RatingMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
public void map(LongWritable key, Text value, Context context) throws IOException, ParseException{
JSONParser parser = new JSONParser();
Object obj = parser.parse(value.toString());
JSONObject jsonObject = (JSONObject) obj;
String movieId = (String) jsonObject.get("_id");
JSONArray ratings = (JSONArray) jsonObject.get("ratings");
context.write(new Text(movieId), new IntWritable(ratings.size()) );
}
}
Notice here, the output of map is written using context.write
Now, coming onto your Reducer some things will change because of the changes I made in the mapper. Also, since your Number of Ratings will always be an integer, you don't need to convert it to Text, use parseInt and then convert to Text again.
public static class RatingReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int allRatings = 0;
while (values.hasNext()){
allRatings += value.get();
}
context.write(key, new IntWritable(resultValue));
}
}
Related
I am developing a generic editor for JSON Array using JavaFX.
The display in the table in such a way that the columns will be the keys, and the value in the rows will be more descriptive. There can be a different number of keys in one JSONObject.
JSON of the form:
"[{\"key1\": 1, \"key2\": 2}, {\"key1\": 3, \"key2\": 4}]"
It needs to look like this:
key1
key2
1
2
3
4
Have any suggestions?
This can be broken down into two parts.
Use GSON to parse a JSON Array to an Array of POJOs.
Display a List of Objets in a TableView.
Key Code
//Add data to the TableView!
String jsonString = "[{\"keyOne\":\"1\", \"keyTwo\":\"2\"}, {\"keyOne\":\"3\", \"keyTwo\":\"4\"}]";
Gson gson = new Gson();
Data[] dataList = gson.fromJson(jsonString, Data[].class);
ObservableList<Data> observableList = FXCollections.observableArrayList(dataList);
tableView.setItems(observableList);
Main
import com.google.gson.Gson;
import javafx.application.Application;
import javafx.beans.property.SimpleStringProperty;
import javafx.collections.FXCollections;
import javafx.collections.ObservableList;
import javafx.scene.Scene;
import javafx.scene.control.TableColumn;
import javafx.scene.control.TableView;
import javafx.stage.Stage;
import javafx.scene.layout.StackPane;
public class App extends Application {
public static void main(String[] args) {
launch(args);
}
#Override
public void start(Stage stage){
TableView<Data> tableView = new TableView();
TableColumn<Data, String> column1 = new TableColumn<>("Key One");
column1.setCellValueFactory((cdf) -> new SimpleStringProperty(cdf.getValue().getKeyOne()));
TableColumn<Data, String> column2 = new TableColumn<>("Key Two");
column2.setCellValueFactory((cdf) -> new SimpleStringProperty(cdf.getValue().getKeyTwo()));
tableView.getColumns().add(column1);
tableView.getColumns().add(column2);
//Add data to the TableView!
String jsonString = "[{\"keyOne\":\"1\", \"keyTwo\":\"2\"}, {\"keyOne\":\"3\", \"keyTwo\":\"4\"}]";
Gson gson = new Gson();
Data[] dataList = gson.fromJson(jsonString, Data[].class);
ObservableList<Data> observableList = FXCollections.observableArrayList(dataList);
tableView.setItems(observableList);
Scene scene = new Scene(new StackPane(tableView));
stage.setTitle("JavaFX 13");
stage.setScene(scene);
stage.show();
}
}
Data Class
/**
*
* #author sedj601
*/
public class Data {
private String keyOne;
private String keyTwo;
public Data(String keyOne, String keyTwo) {
this.keyOne = keyOne;
this.keyTwo = keyTwo;
}
public String getKeyOne() {
return keyOne;
}
public void setKeyOne(String keyOne) {
this.keyOne = keyOne;
}
public String getKeyTwo() {
return keyTwo;
}
public void setKeyTwo(String keyTwo) {
this.keyTwo = keyTwo;
}
#Override
public String toString() {
StringBuilder sb = new StringBuilder();
sb.append("Data{keyOne=").append(keyOne);
sb.append(", keyTwo=").append(keyTwo);
sb.append('}');
return sb.toString();
}
}
module-info.java
module com.mycompany.javafx_test_2 {
requires javafx.controls;
exports com.mycompany.javafx_test_2;
opens com.mycompany.javafx_test_2 to com.google.gson;
requires com.google.gson;
}
Using GSON version 2.8.9.
Output
i can read data from csv with spark, but i don't know how to groupBy with specific array. I want to groupBy 'Name'. This is my code :
public class readspark {
public static void main(String[] args) {
final ObjectMapper om = new ObjectMapper();
System.setProperty("hadoop.home.dir", "D:\\Task\\winutils-master\\hadoop-3.0.0");
SparkConf conf = new SparkConf()
.setMaster("local[3]")
.setAppName("Read Spark CSV")
.set("spark.driver.host", "localhost");
JavaSparkContext jsc = new JavaSparkContext(conf);
JavaRDD<String> lines = jsc.textFile("D:\\Task\\data.csv");
JavaRDD<DataModel> rdd = lines.map(new Function<String, DataModel>() {
#Override
public DataModel call(String s) throws Exception {
String[] dataArray = s.split(",");
DataModel dataModel = new DataModel();
dataModel.Name(dataArray[0]);
dataModel.ID(dataArray[1]);
dataModel.Addres(dataArray[2]);
dataModel.Salary(dataArray[3]);
return dataModel;
}
});
rdd.foreach(new VoidFunction<DataModel>() {
#Override
public void call(DataModel stringObjectMap) throws Exception {
System.out.println(om.writeValueAsString(stringObjectMap));
}
}
);
}
Spark provides the group by functionality directly:
JavaPairRDD<String, Iterable<DataModel>> groupedRdd = rdd.groupBy(dataModel -> dataModel.getName());
This returns a pair rdd where the key is the Name (determined by the lambda provided to group by) and the value is data models with that name.
If you want to change the group by logic, all you need to do is provide corresponding lambda.
I have a JSON File with key value pairs and I want to put the key value pairs into headers.
So when I have a file with content like this:
[{"msgId": "8600C5A3-C666-4E63-BFDB-52BCF557F938", "jiraId": "ERR002"}]
I want to create headers with the name msgId and with value "8600C5A3-C666-4E63-BFDB-52BCF557F938", etc.
Or as an alternative: Is there a way to store the headers of an exchange to a file to which later on the headers can be restored in another exchange?
Thank you.
EDIT: My fork of the example.
public void jsonToHeaders(String body, #Headers Map<String, String> headers) throws ParseException {
LOG.info("Starting JSON conversion...");
LOG.debug("Body input, content: {} ", body);
JSONParser parser = new JSONParser();
JSONObject jsonObject = (JSONObject) parser.parse(body);
if (jsonObject != null)
{
String stringValue = null;
String stringKey = null ;
final String NA_STRING = "*** N/A ***";
for (Object key : jsonObject.keySet()) {
stringKey = ((key == null) ? NA_STRING : (String)key);
stringValue = ((jsonObject.get(stringKey) == null) ? NA_STRING : jsonObject.get(stringKey).toString());
headers.put(stringKey, stringValue);
LOG.debug("Processing key {} with value {}", stringKey, stringValue);
}
LOG.info("Done processed JSON: {}", headers.toString());
}
}
You can use bean for this case.
JSONToHeadersBean
package org.mybean;
import org.apache.camel.Headers;
import org.json.simple.JSONObject;
import org.json.simple.parser.JSONParser;
import org.json.simple.parser.ParseException;
import java.io.IOException;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import java.util.Set;
public class JSONToHeadersBean {
public void jsonToHeaders(String body, #Headers Map<String, String> headers) throws ParseException {
JSONParser parser = new JSONParser();
JSONObject object = (JSONObject) parser.parse(body);
object.keySet().forEach(key -> headers.put(key.toString(), object.get(key).toString()));
}
//for test
public static void main(String[] args) throws ParseException {
String body = "{\"msgId\": \"8600C5A3-C666-4E63-BFDB-52BCF557F938\", \"jiraId\": \"ERR002\"}";
JSONParser parser = new JSONParser();
JSONObject object = (JSONObject) parser.parse(body);
final Map<String, String> headers = new HashMap<String, String>();
object.keySet().forEach(key -> headers.put(key.toString(), object.get(key).toString()));
System.out.println();
}
}
Create bean
<bean class="org.mybean.JSONToHeadersBean" id="JSONToHeadersBean" name="JSONToHeadersBean"/>
And you can use it in route
<bean method="jsonToHeaders" ref="JSONToHeadersBean"/>
As an alternative you may parse JSON to HashMap and put it into a header:
.unmarshal().json(JsonLibrary.Jackson, java.util.Map.class)
.setHeader("params", simple("body"))
(requires camel-jackson dependency)
To access the stored values:
.log(LoggingLevel.INFO, "MsgId: ${header.params[msgId]}")
If I do this:
public static volatile ArrayList<Process> processes = new ArrayList<Process>(){
{
add(new Process("News Workflow", "This is the workflow for the news segment", "image"));
}
};
and then this:
String jsonResponse = gson.toJson(processes);
jsonResponse is null.
But if I do this:
public static volatile ArrayList<Process> processes = new ArrayList<Process>();
processes.add(new Process("nam", "description", "image"));
String jsonResponse = gson.toJson(processes);
Json response is:
[{"name":"nam","description":"description","image":"image"}]
Why is that?
I do not know what is the problem with Gson, but do you know, that you are creating subclass of ArrayList here?
new ArrayList<Process>(){
{
add(new Process("News Workflow", "This is the workflow for the news segment", "image"));
}
};
You can check that by
System.out.println( processes.getClass().getName() );
it won't print java.util.ArrayList.
I think you wanted to use static initialization as
public static volatile ArrayList<Process> processes = new ArrayList<Process>();
static {
processes.add( new Process( "News Workflow", "This is the workflow for the news segment", "image" ) );
};
It seems that there is problem with anonymous classes, same problem is here
import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
public class GSonAnonymTest {
interface Holder {
String get();
}
static Holder h = new Holder() {
String s = "value";
#Override
public String get() {
return s;
}
};
public static void main( final String[] args ) {
final GsonBuilder gb = new GsonBuilder();
final Gson gson = gb.create();
System.out.println( "h:" + gson.toJson( h ) );
System.out.println( h.get() );
}
}
UPD: look at Gson User Guide - Finer Points with Objects, last point "...anonymous classes, and local classes are ignored and not included in serialization or deserialization..."
First, I have a very simple java bean which can be easily serialized to json:
class Node {
private String text;
// getter and setter
}
Node node = new Node();
node.setText("Hello");
String json = new Gson().toJson(node);
// json is { text: "Hello" }
Then in order to make such beans have some dynamic values, so I create a "WithData" base class:
Class WithData {
private Map<String, Object> map = new HashMap<String, Object>();
public void setData(String key, Object value) { map.put(key, value); }
public Object getData(String key) = { return map.get(key); }
}
class Node extends WithData {
private String text;
// getter and setter
}
Now I can set more data to a node:
Node node = new Node();
node.setText("Hello");
node.setData("to", "The world");
But Gson will ignore the "to", the result is still { text: "Hello" }. I expect it to be: { text: "Hello", to: "The world" }
Is there any way to write a serializer for type WithData, that all classes extend it will not only generate its own properties to json, but also the data in the map?
I tried to implement a custom serializer, but failed, because I don't know how to let Gson serialize the properties first, then the data in map.
What I do now is creating a custom serializer:
public static class NodeSerializer implements JsonSerializer<Node> {
public JsonElement serialize(Node src,
Type typeOfSrc, JsonSerializationContext context) {
JsonObject obj = new JsonObject();
obj.addProperty("id", src.id);
obj.addProperty("text", src.text);
obj.addProperty("leaf", src.leaf);
obj.addProperty("level", src.level);
obj.addProperty("parentId", src.parentId);
obj.addProperty("order", src.order);
Set<String> keys = src.getDataKeys();
if (keys != null) {
for (String key : keys) {
obj.add(key, context.serialize(src.getData(key)));
}
}
return obj;
};
}
Then use GsonBuilder to convert it:
Gson gson = new GsonBuilder().
registerTypeAdapter(Node.class, new NodeSerializer()).create();
Tree tree = new Tree();
tree.addNode(node1);
tree.addNode(node2);
gson.toJson(tree);
Then the nodes in the tree will be converted as I expected. The only boring thing is that I need to create a special Gson each time.
Actually, you should expect Node:WithData to serialize as
{
"text": "Hello",
"map": {
"to": "the world"
}
}
(that's with "pretty print" turned on)
I was able to get that serialization when I tried your example. Here is my exact code
import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
import java.net.MalformedURLException;
import java.util.HashMap;
import java.util.Map;
public class Class1 {
public static void main(String[] args) throws MalformedURLException {
GsonBuilder gb = new GsonBuilder();
Gson g = gb.setPrettyPrinting().create();
Node n = new Node();
n.setText("Hello");
n.setData("to", "the world");
System.out.println(g.toJson(n));
}
private static class WithData {
private Map<String, Object> map = new HashMap<String, Object>();
public void setData(String key, Object value) { map.put(key, value); }
public Object getData(String key) { return map.get(key); }
}
private static class Node extends WithData {
private String text;
public Node() { }
public String getText() {return text;}
public void setText(String text) {this.text = text;}
}
}
I was using the JDK (javac) to compile - that is important because other compilers (those included with some IDEs) may remove the information on which Gson relies as part of their optimization or obfuscation process.
Here are the compilation and execution commands I used:
"C:\Program Files\Java\jdk1.6.0_24\bin\javac.exe" -classpath gson-2.0.jar Class1.java
"C:\Program Files\Java\jdk1.6.0_24\bin\java.exe" -classpath .;gson-2.0.jar Class1
For the purposes of this test, I put the Gson jar file in the same folder as the test class file.
Note that I'm using Gson 2.0; 1.x may behave differently.
Your JDK may be installed in a different location than mine, so if you use those commands, be sure to adjust the path to your JDK as appropriate.