Error :JsonStorage in Pig Local mode - json

I am running my Pigscript in Local mode in eclipse.
when I try to store the output in JsonStorage.
Exception in thread "main" java.lang.RuntimeException: Cannot instantiate:org.apache.pig.builtin.JsonStorage
at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:473)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.NonEvalFuncSpec(QueryParser.java:4976)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.StoreClause(QueryParser.java:3473)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1351)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:893)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:706)
at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1017)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:967)
at org.apache.pig.PigServer.registerQuery(PigServer.java:383)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:716)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
at org.apache.pig.PigServer.registerScript(PigServer.java:407)
at com.paypal.debugpig.DebugPig.main(DebugPig.java:13)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.pig.builtin.JsonStorage using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]
at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:458)
at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:470)
... 14 more
PigScript :
REGISTER C:/path/to/jar/pig.jar;
REGISTER C:/path/to/jar/UpperUDf/UpperUDf_fat.jar;
A = LOAD 'C:/path/to/data/file/student.txt' using PigStorage('\t') AS (name: chararray, age: int, gpa: float);
B = FOREACH A GENERATE myudfs.UPPER(name) ,age, gpa ;
Store B into 'output_student_Json' using org.apache.pig.builtin.JsonStorage();
when I dump or store the ouput in text file its working and but issues occurs when I try to store in JSON format.
Any pointers appreciated
Thank you

I have verified it, and it is working for me if i am using the below line of code for storing output into json file format.
store B into 'json_output' using JsonStorage();

Related

LuaLaTex using fontspec package and luacode reading JSON file

I'm using Latex since years but I'm new to embedded luacode (with Lualatex). Below you can see a simplified example:
\begin{filecontents*}{data.json}
[
{"firstName":"Max", "lastName":"Möller"},
{"firstName":"Anna", "lastName":"Smith"}
];
\end{filecontents*}
\documentclass[11pt]{article}
\usepackage{fontspec}
%\setmainfont{Carlito}
\usepackage{tikz}
\usepackage{luacode}
\begin{document}
\begin{luacode}
require("lualibs.lua")
local file = io.open('data.json','rb')
local jsonstring = file:read('*a')
file.close()
local jsondata = utilities.json.tolua(jsonstring)
tex.print('\\begin{tabular}{cc}')
for key, value in pairs(jsondata) do
tex.print(value["firstName"] .. ' & ' .. value["lastName"] .. '\\\\')
end
tex.print('\\hline\\end{tabular}')
\end{luacode}
\end{document}
When executing Lualatex following error occurs:
LuaTeX error [\directlua]:6: attempt to index field 'json' (a nil value) [\directlua]:6: in main chunk. \end{luacode}
When commenting the line \usepackage{fontspec} the output will be produced. Alternatively, the error can be avoided by commenting utilities.json.tolua(jsonstring) and all following lua-code lines.
So the question is: How can I use both "fontspec" package and json-data without generating an error message? Apart from this I have another question: How to enable german umlauts in output of luacode (see first "lastName" in example: Möller)?
Ah, I'm using TeX Live 2015/Debian on Ubuntu 16.04.
Thank you,
Jerome

Unable to Read JSON file using Elephant Bird

Trying to load the json file which is having null values in it by using elephant-bird JsonLoader.
sample.json
{"created_at": "Mon Aug 22 10:48:23 +0000 2016","id": 767674772662607873,"id_str": "767674772662607873","text": "KPIT Image Result for https:\/\/t.co\/Nas2ZnF1zZ... https:\/\/t.co\/9TnelwtIvm","source": "\u003ca href=\"http:\/\/twitter.com\" rel=\"nofollow\"\u003eTwitter Web Client\u003c\/a\u003e","truncated": false,"in_reply_to_status_id": 123,"in_reply_to_status_id_str": null,"in_reply_to_user_id": null,"in_reply_to_user_id_str": null,"in_reply_to_screen_name": null,"geo": null,"coordinates": null,"place": null,"contributors": null,"is_quote_status": false,"retweet_count": 0,"favorite_count": 0,"entities": {"hashtags": [],"urls": [{"url": "https:\/\/t.co\/Nas2ZnF1zZ","expanded_url": "http:\/\/miltonious.com\/","display_url": "miltonious.com","indices": [24, 47]}],"user_mentions": [],"symbols": []},"favorited": false,"retweeted": false,"possibly_sensitive": false,"filter_level": "low","lang": "en","timestamp_ms": "1471862903167"}
script:
REGISTER piggybank.jar
REGISTER json-simple-1.1.1.jar
REGISTER elephant-bird-pig-4.3.jar
REGISTER elephant-bird-core-4.1.jar
REGISTER elephant-bird-hadoop-compat-4.3.jar
json = LOAD 'sample.json' USING JsonLoader('created_at:chararray, id:chararray, id_str:chararray, text:chararray, source:chararray, in_reply_to_status_id:chararray, in_reply_to_status_id_str:chararray, in_reply_to_user_id:chararray, in_reply_to_user_id_str:chararray, in_reply_to_screen_name:chararray, geo:chararray, coordinates:chararray, place:chararray, contributors:chararray, is_quote_status:bytearray, retweet_count:long, favorite_count:chararray, entities:map[], favorited:bytearray, retweeted:bytearray, possibly_sensitive:bytearray, lang:chararray');
describe json;
dump json;
When I dump json,I am getting the following output and the worning
(Mon Aug 22 10:48:23 +0000 2016,767674772662607873,767674772662607873,google Image Result for Twitter Web Client,false,1234,12345,3214,43215,,,,,,,,,,,,,,)
WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad record, returning null for {complete json}
By warning i guess it is getting NULL values.
So how can we load a Json which is having null values in it.
And I have tried in another way i.e
json = LOAD 'sample.json' USING com.twitter.elephantbird.pig.load.JsonLoader('created_at:chararray, id:chararray, id_str:chararray, text:chararray, source:chararray, in_reply_to_status_id:chararray, in_reply_to_status_id_str:chararray, in_reply_to_user_id:chararray, in_reply_to_user_id_str:chararray, in_reply_to_screen_name:chararray, geo:chararray, coordinates:chararray, place:chararray, contributors:chararray, is_quote_status:bytearray, retweet_count:long, favorite_count:chararray, entities:map[], favorited:bytearray, retweeted:bytearray, possibly_sensitive:bytearray, lang:chararray');
describe json;
Output
Schema for json unknown.
Please suggest me.
Thanks.
You can try something like this,
MY_JSON = LOAD 'sample.json' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');
dump MY_JSON;

Error while querying Hive table ( Twitter data uploaded using Flume )

I am trying to analyse Twitter Data using Cloudera. Currently, I am able to stream Twitter Data into HDFS via Flume but I am experiencing issues when trying to query data using SQL from Hive table getting following exception:
java.io.IOException: org.apache.avro.AvroRuntimeException: java.io.IOException: Block size invalid or too large for this implementation: -40
Does this mean that the data was loaded into Hive but cannot be queried or was it not loaded into Hive at all?
My flume.conf file is
TwitterAgent.sources = Twitter
TwitterAgent.channels = FileChannel
TwitterAgent.sinks = HDFS
#TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.channels = FileChannel
TwitterAgent.sources.Twitter.consumerKey = nmmRpbWjQPAViWlJLjkJuq7mO
TwitterAgent.sources.Twitter.consumerSecret = *****
TwitterAgent.sources.Twitter.accessToken = *****
TwitterAgent.sources.Twitter.accessTokenSecret = *****
TwitterAgent.sources.Twitter.maxBatchSize = 50000
TwitterAgent.sources.Twitter.maxBatchDurationMillis = 100
#TwitterAgent.sources.Twitter.keywords = Canada, TTC,ttc, Toronto, Free, and, Apache,city, City, Hadoop, Mapreduce, hadooptutorial, Hive, Hbase, MySql
TwitterAgent.sinks.HDFS.channel = FileChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://quickstart.cloudera:8020/user/hive/warehouse/tweets/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 100
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 100
TwitterAgent.channels.FileChannel.type = file
TwitterAgent.channels.FileChannel.checkpointDir = /var/log/flume-ng/checkpoint/
TwitterAgent.channels.FileChannel.dataDirs = /var/log/flume-ng/data/
I have added added JAR file "hive-serdes-1.0-SNAPSHOT.jar"
ADD JAR /usr/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar
my .avsc location is '/home/cloudera/twitterDataAvroSchema.avsc' and having bellow code-
{"type":"record",
"name":"Doc",
"doc":"adoc",
"fields":[{"name":"id","type":"string"},
{"name":"user_friends_count","type":["int","null"]},
{"name":"user_location","type":["string","null"]},
{"name":"user_description","type":["string","null"]},
{"name":"user_statuses_count","type":["int","null"]},
{"name":"user_followers_count","type":["int","null"]},
{"name":"user_name","type":["string","null"]},
{"name":"user_screen_name","type":["string","null"]},
{"name":"created_at","type":["string","null"]},
{"name":"text","type":["string","null"]},
{"name":"retweet_count","type":["long","null"]},
{"name":"retweeted","type":["boolean","null"]},
{"name":"in_reply_to_user_id","type":["long","null"]},
{"name":"source","type":["string","null"]},
{"name":"in_reply_to_status_id","type":["long","null"]},
{"name":"media_url_https","type":["string","null"]},
{"name":"expanded_url","type":["string","null"]}
]
}
Used commend bellow to create hive table
CREATE TABLE my_tweets
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES ('avro.schema.url'='file:///home/cloudera/twitterDataAvroSchema.avsc') ;
Used folloing command to upload data to Hive table
LOAD DATA INPATH '/user/hive/warehouse/tweets/FlumeData.*' OVERWRITE INTO TABLE my_tweets;
== output ===
Loading data to table robin.my_tweets
Table robin.my_tweets stats: [numFiles=1, numRows=0, totalSize=421380, rawDataSize=0]
OK
Time taken: 1.928 seconds
Got Error while Trying SQL from
ERROR
hive> select user_location from robin.my_tweets;
OK
Failed with exception java.io.IOException:org.apache.avro.AvroRuntimeException: java.io.IOException: Block size invalid or too large for this implementation: -40
Time taken: 1.247 seconds
I am using Cloureda
version=2.6.0-cdh5.5.0
Any assistance on this issue is appreciated.
Thanks
Robin

Error loading large JSON files using Scala Play Framework 2

I'm trying to use Apache Bench to load test a group of large (4MB each) JSON requests. When running with a large file and many concurrent requests I get the following error:
Exception caught in RequestBodyHandler java.nio.channels.ClosedChannelException: null
Here is my ab command:
ab -p large.json -n 1000 -c 10 http://127.0.0.1:9000/json-tests
If I run this with no concurrency and only 10 requests it works fine. Increasing the number of requests or the concurrency causes this error to occur over and over.
My controller currently has no logic in it:
def addJsonTest = Action {
Ok("OK")
}
Here is the full error:
[error] play - Exception caught in RequestBodyHandler
java.nio.channels.ClosedChannelException: null
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.setInterestOps(AbstractNioWorker.java:506) [netty-3.9.3.Final.jar:na]
at org.jboss.netty.channel.socket.nio.AbstractNioWorker$1.run(AbstractNioWorker.java:455) [netty-3.9.3.Final.jar:na]
at org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40) [netty-3.9.3.Final.jar:na]
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:372) [netty-3.9.3.Final.jar:na]
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:296) [netty-3.9.3.Final.jar:na]
This is just using Play in development mode, is there any setup or configuration for having Play handle multiple large requests?
Thanks!
You need to do it reactively, with Iterators
val iteratee = Iteratee.foldM[Array[Byte], Either[Result, String]](Right("start")) { case (str, bytes) =>
Future.successful(Left(Ok))
}
val parser = BodyParser(rh => iteratee)
def eatDust = Action(parser) { req =>
Ok
}
See these links.
https://www.playframework.com/documentation/2.2.x/Iteratees
Play 2.x : Reactive file upload with Iteratees

Error in fromJSON(paste(raw.data, collapse = "")) : unclosed string

I am using the R package rjson to download weather data from Wunderground.com. Often I leave the program to run and there are no problems, with the data being collected fine. However, often the program stops running and I get the following error message:
Error in fromJSON(paste(raw.data, collapse = "")) : unclosed string
In addition: Warning message:
In readLines(conn, n = -1L, ok = TRUE) :
incomplete final line found on 'http://api.wunderground.com/api/[my_API_code]/history_20121214pws:1/q/pws:IBIRMING7.json'
Does anyone know what this means, and how I can avoid it since it stops my program from collecting data as I would like?
Many thanks,
Ben
I can recreate your error message using the rjson package.
Here's an example that works.
rjson::fromJSON('{"x":"a string"}')
# $x
# [1] "a string"
If we omit a double quote from the value of x, then we get the error message.
rjson::fromJSON('{"x":"a string}')
# Error in rjson::fromJSON("{\"x\":\"a string}") : unclosed string
The RJSONIO package behaves slightly differently. Rather than throwing an error, it silently returns a NULL value.
RJSONIO::fromJSON('{"x":"a string}')
# $x
# NULL