How to uncompress Gzipped with Apache Spark Java - json

i have a sequence file. In this file is each value compressed json file with GZipped. My Problem, how to read in the gzipped json files with Apache Spark ?
for this my code,
JavaSparkContext jsc = new JavaSparkContext("local", "sequencefile");
JavaPairRDD<String, byte[]> file = jsc.sequenceFile("file:\\E:\\part-00004", String.class, byte[].class);
JavaRDD<String> map = file.map(new Function<Tuple2<String, byte[]>, String>() {
public String call(Tuple2<String, byte[]> stringTuple2) throws Exception {
byte[] uncompress = uncompress(stringTuple2._2);
return uncompress.toString();
}
});
But this code func not working.
Have a nice day

While creating spark context use the constructor which will also take the spark configuration as third parameter.
Set the spark configuration value for key “org.apache.hadoop.io.compression.codecs”
As below
“org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec”

Related

JSON to CSV conversion on HDFS

I am trying to convert a JSON file into CSV.
I have a JAVA code which is able to do it perfectly on UNIX file system and on local file system.
I have written below main class to perform this conversion on HDFS.
public class ClassMain {
public static void main(String[] args) throws IOException {
String uri = args[1];
String uri1 = args[2];
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
FSDataInputStream in = null;
FSDataOutputStream out = fs.create(new Path(uri1));
try{
in = fs.open(new Path(uri));
JsonToCSV toCSV = new JsonToCSV(uri);
toCSV.json2Sheet().write2csv(uri1);
IOUtils.copyBytes(in, out, 4096, false);
}
finally{
IOUtils.closeStream(in);
IOUtils.closeStream(out);
}
}
}
json2sheet and write2csv are methods which perform the conversion and write operation.
I am running this jar using below command:
hadoop jar json-csv-hdfs.jar com.nishant.ClassMain /nishant/large.json /nishant/output
The problem is, it does not write anything at /nishant/output. It creates a 0 sized /nishant/output file.
Maybe the usage of copyBytes is not a good idea here.
How to achieve this on HDFS if it is working OK on unix FS and local FS.
Here I am trying to convert JSON file to CSV and not trying to map JSON objects to their values
FileSystem needs only one configuration key to successfully connect to HDFS.
conf.set(key, "hdfs://host:port"); // where key="fs.default.name"|"fs.defaultFS"

Convert CSVwriter to inputstream

I have the following code where I want to write a list of objects onto a csv where I have defined the attributes and items. I want to convert the writer into a input stream so I read the values and do some performed computations. I also want to store this s3 file in a datastore like Amazon S3.
How do I convert the writer into a inputstream. I see no defined api. Can I read the file somehow like CSVReader reader = new CSVReader(csvWriter)?
public CSVWriter convertModelToObject(List attributes, final Class classType) throws IOException {
CSVWriter writer = new CSVWriter(new FileWriter("yourfile.csv"), com.opencsv.CSVParser.DEFAULT_SEPARATOR,
com.opencsv.CSVParser.DEFAULT_QUOTE_CHARACTER);
BeanToCsv bean = new BeanToCsv();
HeaderColumnNameMappingStrategy<T> mappingStrategy = new HeaderColumnNameMappingStrategy<>();
mappingStrategy.setType(classType);
bean.write(mappingStrategy, writer, attributes);
return writer;
Consider replacing the FileWriter you are using with a PipedWriter, creating it with a PipedReader that you would use when creating the CSVReader. You can find an example of the PipedReader Writer here.
Yes, you can.
The solution is to use InputStreamReader to read the file and pass that stream to Buffered reader and read line by line or as you want.
You can refer to this for more methods: https://www.geeksforgeeks.org/different-ways-reading-text-file-java/

RestFul cucumber acceptance test to get local file json as output

I want to mock a resttemplate output. We have a service /someservice/getJson to get json. To mock this service we kept a json file in the code base and tried to get it to the response entity as follows.
working code:
String baseURL = "http://localhost:1010"
String uri = /someservice/getJson
ResponseEntity<T> entity = restTemplate.exchange(baseURL + uri, GET, new HttpEntity<>(headers), type);
I have a json file in the code base (say codebase/../resource/myfile.json)
I would like to get the response entity as the local json I mock.
I tried using exchange method. It doesnt seems as working for me.
What I tried with my json file
String localJson = "/resource/myfile.json";
ResponseEntity<T> entity = restTemplate.exchange(localJson, GET, new HttpEntity<>(headers), type);
I think there are another methods to get it done other than exchange. But I am not aware of those.
Is there any other way / is there any mistake in what I tried ?
To read a file with JSON object and convert it into a POJO you can use ObjectMapper’s
readValue(File src, Class<T> valueType) method (readValue doc link) from Jackson framework:
import java.io.File;
import com.fasterxml.jackson.databind.ObjectMapper;
ObjectMapper mapper = new ObjectMapper();
YourJsonType response = mapper.readValue( new File("file with JSON object") , YourJsonType.class);

saving newtonsoft json response to isolatedstorage?

how can I save a list of Objects to isolatedstorage which is returned from a json call,
I already parse the result into a list of Objects, but I cannot seem to save it, I got a few xmlserialization issues, then I tried following this:
How to save a list of objects in isolated storage in wp7
but it led to a error
{System.InvalidOperationException: There was an error generating the XML document. ---> System.InvalidOperationException: You must implement a default accessor on Newtonsoft.Json.Linq.JObject because it inherits from ICollection.
Opted to do the following:
private static void SerialiseAsJson(string textToSave, string fileName)
{
using (System.IO.StreamWriter file = new System.IO.StreamWriter(#"..\..\NinjectModules\Fakes\FakeData\" + fileName))
{
file.WriteLine(textToSave);
}
}
Usage:
SerialiseAsJson(JsonConvert.SerializeObject([object to serialize]),"filename.JSon");

How to read a CSV file from Hdfs?

I have my Data in a CSV file. I want to read the CSV file which is in HDFS.
Can anyone help me with the code??
I'm new to hadoop. Thanks in Advance.
The classes required for this are FileSystem, FSDataInputStream and Path. Client should be something like this :
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
Configuration conf = new Configuration();
conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/core-site.xml"));
conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/hdfs-site.xml"));
FileSystem fs = FileSystem.get(conf);
FSDataInputStream inputStream = fs.open(new Path("/path/to/input/file"));
System.out.println(inputStream.readChar());
}
FSDataInputStream has several read methods. Choose the one which suits your needs.
If it is MR, it's even easier :
public static class YourMapper extends
Mapper<LongWritable, Text, Your_Wish, Your_Wish> {
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
//Framework does the reading for you...
String line = value.toString(); //line contains one line of your csv file.
//do your processing here
....................
....................
context.write(Your_Wish, Your_Wish);
}
}
}
If you want to use mapreduce you can use TextInputFormat to read line by line and parse each line in mapper's map function.
Other option is to develop (or find developed) CSV input format for reading data from file.
There is one old tutorial here http://hadoop.apache.org/docs/r0.18.3/mapred_tutorial.html but logic is same in new versions
If you are using single process for reading data from file it is same as reading file from any other file system. There is nice example here https://sites.google.com/site/hadoopandhive/home/hadoop-how-to-read-a-file-from-hdfs
HTH