JsonLoader throws error in pig - json

I am unable to decode this simple json , i dont know what i am doing wrong.
please help me in this pig script.
I have to decode the below data in json format.
3.json
{
"id": 6668,
"source_name": "National Stock Exchange of India",
"source_code": "NSE"
}
and my pig script is
a = LOAD '3.json' USING org.apache.pig.builtin.JsonLoader ('id:int, source_name:chararray, source_code:chararray');
dump a;
the error i get is given below:
2015-07-23 13:40:08,715 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - Starting task: attempt_local1664361500_0001_m_000000_0
2015-07-23 13:40:08,775 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorProcessTree : [ ]
2015-07-23 13:40:08,780 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Processing split: Number of splits :1
Total Length = 88
Input split[0]:
Length = 88
Locations:
-----------------------
2015-07-23 13:40:08,793 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed file:/home/hariprasad.sudo/3.json:0+88
2015-07-23 13:40:08,844 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: a[1,4] C: R:
2015-07-23 13:40:08,861 [Thread-5] INFO org.apache.hadoop.mapred.LocalJobRunner - map task executor complete.
2015-07-23 13:40:08,867 [Thread-5] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local1664361500_0001
java.lang.Exception: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: expected close marker for OBJECT (from [Source: java.io.ByteArrayInputStream#61a79110; line: 1, column: 0])
at [Source: java.io.ByteArrayInputStream#61a79110; line: 1, column: 3]
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: expected close marker for OBJECT (from [Source: java.io.ByteArrayInputStream#61a79110; line: 1, column: 0])
at [Source: java.io.ByteArrayInputStream#61a79110; line: 1, column: 3]
at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1291)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportInvalidEOF(JsonParserMinimalBase.java:318)
at org.codehaus.jackson.impl.JsonParserBase._handleEOF(JsonParserBase.java:354)
at org.codehaus.jackson.impl.Utf8StreamParser._skipWSOrEnd(Utf8StreamParser.java:1841)
at org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:275)
at org.apache.pig.builtin.JsonLoader.readField(JsonLoader.java:180)
at org.apache.pig.builtin.JsonLoader.getNext(JsonLoader.java:164)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-07-23 13:40:09,179 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2015-07-23 13:40:09,179 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local1664361500_0001 has failed! Stop running all dependent jobs
2015-07-23 13:40:09,179 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2015-07-23 13:40:09,180 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2015-07-23 13:40:09,180 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Detected Local mode. Stats reported below may be incomplete
2015-07-23 13:40:09,181 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.3.0-cdh5.1.3 0.12.0-cdh5.1.3 hariprasad.sudo 2015-07-23 13:40:07 2015-07-23 13:40:09 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_local1664361500_0001 a MAP_ONLY Message: Job failed! file:/tmp/temp-65649055/tmp1240506051,
Input(s):
Failed to read data from "file:///home/hariprasad.sudo/3.json"
Output(s):
Failed to produce result in "file:/tmp/temp-65649055/tmp1240506051"
Job DAG:
job_local1664361500_0001
2015-07-23 13:40:09,181 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2015-07-23 13:40:09,186 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias a
Details at logfile: /home/hariprasad.sudo/pig_1437673203961.log
grunt> 2015-07-23 13:40:14,754 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - map > map
Please help me in understanding what is wrong.
Thanks,
Hari

Have the compact version of json in 3.json. We can use http://www.jsoneditoronline.org for the same.
3.json
{"id":6668,"source_name":"National Stock Exchange of India","source_code":"NSE"}
with this we are able to dump the data :
(6668,National Stock Exchange of India,NSE)
Ref : Error from Json Loader in Pig where similar issue is discussed.
Extract from the above ref. link :
Pig doesn't usually like "human readable" json. Get rid of the spaces and/or indentations, and you're good.

Related

Thingsboard: JSON parse error when reading MQTT timestamp

I get this MQTT payload from a device: (and I can't change how the device sends it)
{t:2021-11-08T16:17:15Z,10:99,14:24,55:20.85,56:64.38,53:36.00}
This is the timestamp: t:2021-11-08T16:17:15Z
I know that TB expects the UNIX style timestamp, and I expected to change this timestamp into the UNIX style in a transform node in the Rule Chain.
But, it never gets there because I believe that TB parses it before getting to the Rule Chain, and throughs an exception and rejects the readings.
How can I prevent TB to do that early parsing so I can get to the value and change it to the expected ts format? Is there a configuration or any other way besides forking the project and rewriting the parser?
Thank you!
Here is the error I get from the logs:
2021-11-08 23:39:25,124 [nioEventLoopGroup-4-5] INFO o.t.s.t.mqtt.MqttTransportHandler - [f5a4a965-8177-4daf-9a56-9197a55fab7d] Processing connect msg for client: xxxx!
2021-11-08 23:39:25,125 [nioEventLoopGroup-4-5] INFO o.t.s.t.mqtt.MqttTransportHandler - [f5a4a965-8177-4daf-9a56-9197a55fab7d] Processing connect msg for client with user name: null!
2021-11-08 23:39:25,167 [DefaultTransportService-18-34] INFO o.t.s.t.mqtt.MqttTransportHandler - [f5a4a965-8177-4daf-9a56-9197a55fab7d] Client connected!
2021-11-08 23:39:25,183 [nioEventLoopGroup-4-5] WARN o.t.s.t.mqtt.MqttTransportHandler - [f5a4a965-8177-4daf-9a56-9197a55fab7d] Failed to process publish msg [device/lre/readings][1]
org.thingsboard.server.common.transport.adaptor.AdaptorException: com.google.gson.JsonSyntaxException: com.google.gson.stream.MalformedJsonException: Unterminated object at line 1 column 18 path $.t
at org.thingsboard.server.transport.mqtt.adaptors.JsonMqttAdaptor.convertToPostTelemetry(JsonMqttAdaptor.java:67)
at org.thingsboard.server.transport.mqtt.MqttTransportHandler.processDevicePublish(MqttTransportHandler.java:343)
at org.thingsboard.server.transport.mqtt.MqttTransportHandler.processPublish(MqttTransportHandler.java:298)
at org.thingsboard.server.transport.mqtt.MqttTransportHandler.processRegularSessionMsg(MqttTransportHandler.java:255)
at org.thingsboard.server.transport.mqtt.MqttTransportHandler.lambda$processMsgQueue$0(MqttTransportHandler.java:249)
at org.thingsboard.server.transport.mqtt.session.DeviceSessionCtx.tryProcessQueuedMsgs(DeviceSessionCtx.java:181)
at org.thingsboard.server.transport.mqtt.MqttTransportHandler.processMsgQueue(MqttTransportHandler.java:249)
at org.thingsboard.server.transport.mqtt.MqttTransportHandler.enqueueRegularSessionMsg(MqttTransportHandler.java:241)
at org.thingsboard.server.transport.mqtt.MqttTransportHandler.processMqttMsg(MqttTransportHandler.java:183)
at org.thingsboard.server.transport.mqtt.MqttTransportHandler.channelRead(MqttTransportHandler.java:156)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: com.google.gson.JsonSyntaxException: com.google.gson.stream.MalformedJsonException: Unterminated object at line 1 column 18 path $.t
at com.google.gson.internal.Streams.parse(Streams.java:60)
at com.google.gson.JsonParser.parse(JsonParser.java:84)
at com.google.gson.JsonParser.parse(JsonParser.java:59)
at com.google.gson.JsonParser.parse(JsonParser.java:45)
at org.thingsboard.server.transport.mqtt.adaptors.JsonMqttAdaptor.convertToPostTelemetry(JsonMqttAdaptor.java:65)
... 30 common frames omitted
Caused by: com.google.gson.stream.MalformedJsonException: Unterminated object at line 1 column 18 path $.t
at com.google.gson.stream.JsonReader.syntaxError(JsonReader.java:1567)
at com.google.gson.stream.JsonReader.doPeek(JsonReader.java:495)
at com.google.gson.stream.JsonReader.hasNext(JsonReader.java:418)
at com.google.gson.internal.bind.TypeAdapters$29.read(TypeAdapters.java:742)
at com.google.gson.internal.bind.TypeAdapters$29.read(TypeAdapters.java:718)
at com.google.gson.internal.Streams.parse(Streams.java:48)
... 34 common frames omitted
2021-11-08 23:39:25,183 [nioEventLoopGroup-4-5] INFO o.t.s.t.mqtt.MqttTransportHandler - [f5a4a965-8177-4daf-9a56-9197a55fab7d] Closing current session due to invalid publish msg [device/lre/readings][1]
2021-11-08 23:39:25,184 [nioEventLoopGroup-4-5] INFO o.t.s.t.mqtt.MqttTransportHandler - [f5a4a965-8177-4daf-9a56-9197a55fab7d] Client disconnected!
2021-11-08 23:39:28,533 [queue-scheduler-11-thread-1] INFO o.t.s.q.u.DefaultTbApiUsageClient - Reporting API usage statistics for 3 tenants and customers
2021-11-08 23:39:29,253 [sql-log-1-thread-1] INFO o.t.s.dao.sql.TbSqlBlockingQueue - Queue-0 [TS] queueSize [0] totalAdded [6] totalSaved [6] totalFailed [0]
2021-11-08 23:39:29,254 [sql-log-1-thread-1] INFO o.t.s.dao.sql.TbSqlBlockingQueue - Queue-2 [TS] queueSize [0] totalAdded [90] totalSaved [90] totalFailed [0]
2021-11-08 23:39:29,351 [sql-log-1-thread-1] INFO o.t.s.dao.sql.TbSqlBlockingQueue - Queue-0 [TS Latest] queueSize [0] totalAdded [6] totalSaved [6] totalFailed [0]
2021-11-08 23:39:29,351 [sql-log-1-thread-1] INFO o.t.s.dao.sql.TbSqlBlockingQueue - Queue-2 [TS Latest] queueSize [0] totalAdded [90] totalSaved [90] totalFailed [0]
2021-11-08 23:39:29,530 [TB-Scheduling-6] INFO o.t.server.actors.ActorSystemContext - Rule Engine JS Invoke Stats: requests [6] responses [3] failures [0]
2021-11-08 23:39:29,612 [TB-Scheduling-5] INFO o.t.s.s.s.DefaultTbEntityDataSubscriptionService - Stats: regularQueryInvocationCnt = [1], regularQueryInvocationTime = [2], dynamicQueryCnt = [3] dynamicQueryInvocationCnt = [1], dynamicQueryInvocationTime = [2], alarmQueryInvocationCnt = [0], alarmQueryInvocationTime = [0]

Error while trying to run connector class 'io.debezium.connector.mysql.MySqlConnector'

We have a Java 11 Spring boot app integrated with Kafka/Debezium.
On a Mysql (5.7) database database change is made (user dropped/ tables recreated with data) Debezium's binlogreader starts to read the bin logs, but then logs the following Exception
2021-07-22 10:05:08.584 INFO 24570 --- [debyte.com:3306] i.d.r.history.DatabaseHistoryMetrics : Already applied 10969 database changes
2021-07-22 10:05:10.134 ERROR 24570 --- [debyte.com:3306] i.debezium.connector.mysql.BinlogReader : Error while deserializing binlog event at offset {ts_sec=1626941231, file=mysql-bin.000532, pos=75496603, server_id=2001, event=876}.
Use the mysqlbinlog tool to view the problematic event: mysqlbinlog --start-position=82610388 --stop-position=82618497 --verbose mysql-bin.000532
2021-07-22 10:05:10.135 ERROR 24570 --- [debyte.com:3306] i.debezium.connector.mysql.BinlogReader : Error during binlog processing. Last offset stored = {ts_sec=1626941227, file=mysql-bin.000532, pos=71066122, row=272, server_id=2001, event=518}, binlog reader near position = mysql-bin.000532/82610388
2021-07-22 10:05:10.150 ERROR 24570 --- [debyte.com:3306] i.debezium.connector.mysql.BinlogReader : Failed due to error: Error processing binlog event
org.apache.kafka.connect.errors.ConnectException: com.github.shyiko.mysql.binlog.event.deserialization.EventDataDeserializationException: Failed to deserialize data of EventHeaderV4{timestamp=1626941231000, eventType=EXT_WRITE_ROWS, serverId=2001, headerLength=19, dataLength=8090, nextPosition=82618497, flags=0}
at io.debezium.connector.mysql.AbstractReader.wrap(AbstractReader.java:230) ~[debezium-connector-mysql-1.0.0.Final.jar:1.0.0.Final]
at io.debezium.connector.mysql.AbstractReader.failed(AbstractReader.java:207) ~[debezium-connector-mysql-1.0.0.Final.jar:1.0.0.Final]
at io.debezium.connector.mysql.BinlogReader.handleEvent(BinlogReader.java:536) ~[debezium-connector-mysql-1.0.0.Final.jar:1.0.0.Final]
at com.github.shyiko.mysql.binlog.BinaryLogClient.notifyEventListeners(BinaryLogClient.java:1158) ~[mysql-binlog-connector-java-0.21.0.jar:0.21.0]
at com.github.shyiko.mysql.binlog.BinaryLogClient.listenForEventPackets(BinaryLogClient.java:1005) ~[mysql-binlog-connector-java-0.21.0.jar:0.21.0]
at com.github.shyiko.mysql.binlog.BinaryLogClient.connectWithTimeout(BinaryLogClient.java:517) ~[mysql-binlog-connector-java-0.21.0.jar:0.21.0]
at com.github.shyiko.mysql.binlog.BinaryLogClient.access$1100(BinaryLogClient.java:90) ~[mysql-binlog-connector-java-0.21.0.jar:0.21.0]
at com.github.shyiko.mysql.binlog.BinaryLogClient$7.run(BinaryLogClient.java:881) ~[mysql-binlog-connector-java-0.21.0.jar:0.21.0]
at java.base/java.lang.Thread.run(Thread.java:829) ~[na:na]
Caused by: java.lang.RuntimeException: com.github.shyiko.mysql.binlog.event.deserialization.EventDataDeserializationException: Failed to deserialize data of EventHeaderV4{timestamp=1626941231000, eventType=EXT_WRITE_ROWS, serverId=2001, headerLength=19, dataLength=8090, nextPosition=82618497, flags=0}
at io.debezium.connector.mysql.BinlogReader.handleServerIncident(BinlogReader.java:604) ~[debezium-connector-mysql-1.0.0.Final.jar:1.0.0.Final]
at io.debezium.connector.mysql.BinlogReader.handleEvent(BinlogReader.java:519) ~[debezium-connector-mysql-1.0.0.Final.jar:1.0.0.Final]
... 6 common frames omitted
Caused by: com.github.shyiko.mysql.binlog.event.deserialization.EventDataDeserializationException: Failed to deserialize data of EventHeaderV4{timestamp=1626941231000, eventType=EXT_WRITE_ROWS, serverId=2001, headerLength=19, dataLength=8090, nextPosition=82618497, flags=0}
at com.github.shyiko.mysql.binlog.event.deserialization.EventDeserializer.deserializeEventData(EventDeserializer.java:300) ~[mysql-binlog-connector-java-0.21.0.jar:0.21.0]
at com.github.shyiko.mysql.binlog.event.deserialization.EventDeserializer.nextEvent(EventDeserializer.java:223) ~[mysql-binlog-connector-java-0.21.0.jar:0.21.0]
at io.debezium.connector.mysql.BinlogReader$1.nextEvent(BinlogReader.java:239) ~[debezium-connector-mysql-1.0.0.Final.jar:1.0.0.Final]
at com.github.shyiko.mysql.binlog.BinaryLogClient.listenForEventPackets(BinaryLogClient.java:984) ~[mysql-binlog-connector-java-0.21.0.jar:0.21.0]
... 4 common frames omitted
Caused by: java.io.EOFException: null
at com.github.shyiko.mysql.binlog.io.ByteArrayInputStream.read(ByteArrayInputStream.java:190) ~[mysql-binlog-connector-java-0.21.0.jar:0.21.0]
at java.base/java.io.InputStream.read(InputStream.java:271) ~[na:na]
at java.base/java.io.InputStream.skip(InputStream.java:531) ~[na:na]
at com.github.shyiko.mysql.binlog.io.ByteArrayInputStream.skipToTheEndOfTheBlock(ByteArrayInputStream.java:216) ~[mysql-binlog-connector-java-0.21.0.jar:0.21.0]
at com.github.shyiko.mysql.binlog.event.deserialization.EventDeserializer.deserializeEventData(EventDeserializer.java:296) ~[mysql-binlog-connector-java-0.21.0.jar:0.21.0]
... 7 common frames omitted
2021-07-22 10:05:10.150 INFO 24570 --- [debyte.com:3306] i.debezium.connector.mysql.BinlogReader : Error processing binlog event, and propagating to Kafka Connect so it stops this connector. Future binlog events read before connector is shutdown will be ignored.
2021-07-22 10:05:10.152 INFO 24570 --- [debyte.com:3306] i.debezium.connector.mysql.BinlogReader : Stopped reading binlog after 167936 events, last recorded offset: {ts_sec=1626941227, file=mysql-bin.000532, pos=71066122, row=272, server_id=2001, event=518}
2021-07-22 10:05:37.521 INFO 24570 --- [pool-1-thread-1] i.d.connector.mysql.MySqlConnectorTask : Stopping MySQL connector task
This also is logged on application startup.
The suggested mysqlbinlog command:
mysqlbinlog --start-position=82610388 --stop-position=82618497 --verbose mysql-bin.000532
Returns
/*!50530 SET ##SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!50003 SET #OLD_COMPLETION_TYPE=##COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#210722 8:06:26 server id 2001 end_log_pos 123 CRC32 0xc7bff6dc Start: binlog v 4, server v 5.7.19-log created 210722 8:06:26
BINLOG '
Aif5YA/RBwAAdwAAAHsAAAAAAAQANS43LjE5LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAXwAEGggAAAAICAgCAAAACgoKKioAEjQA
Adz2v8c=
'/*!*/;
# at 82610388
#210722 8:07:11 server id 2001 end_log_pos 82618497 CRC32 0xd2561ccc Write_rows: table id 43469
WARNING: The range of printed events ends with a row event or a table map event that does not have the STMT_END_F flag set. This might be because the last statement was not fully written to the log, or because you are using a --stop-position or --stop-datetime that refers to an event in the middle of a statement. The event(s) from the partial statement have not been written to output.
SET ##SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
/*!50003 SET COMPLETION_TYPE=#OLD_COMPLETION_TYPE*/;
/*!50530 SET ##SESSION.PSEUDO_SLAVE_MODE=0*/;
I have no idea why this is happening or how to code around it. The error is not thrown up to the app.
Has anyone else had a similar issue?
I found a solution on debezium's doc:
What is causing intermittent EventDataDeserializationExceptions with the MySQL connector?
When you run into intermittent deserialization exceptions around 1 minute after starting connector, with a root cause of type EOFException or java.net.SocketException: Connection reset:
Caused by: com.github.shyiko.mysql.binlog.event.deserialization.EventDataDeserializati>onException: Failed to deserialize data of EventHeaderV4{timestamp=1542193955000, eventType=GTID, serverId=91111, headerLength=19, dataLength=46, nextPosition=1058898202, flags=0}
Caused by: java.lang.RuntimeException: com.github.shyiko.mysql.binlog.event.deserialization.EventDataDeserializationException: Failed to deserialize data of EventHeaderV4{timestamp=1542193955000, eventType=GTID, serverId=91111, headerLength=19, dataLength=46, nextPosition=1058898202, flags=0}
Caused by: java.io.EOFException
or
Caused by: java.net.SocketException: Connection reset
Then updating these MySQL server global properties like this will fix it:
set global slave_net_timeout = 120; (default was 30sec)
set global thread_pool_idle_timeout = 120;

ERROR 1066: Unable to open iterator for alias- PIG SCRIPT

I have been facing this issue from long time. I tried to solve this but i couldn't. I need some experts advice to solve this.
I am trying to load a sample tweets json file.
sample.json;-
{"filter_level":"low","retweeted":false,"in_reply_to_screen_name":"FilmFan","truncated":false,"lang":"en","in_reply_to_status_id_str":null,"id":689085590822891521,"in_reply_to_user_id_str":"6048122","timestamp_ms":"1453125782100","in_reply_to_status_id":null,"created_at":"Mon Jan 18 14:03:02 +0000 2016","favorite_count":0,"place":null,"coordinates":null,"text":"#filmfan hey its time for you guys follow #acadgild To #AchieveMore and participate in contest Win Rs.500 worth vouchers","contributors":null,"geo":null,"entities":{"symbols":[],"urls":[],"hashtags":[{"text":"AchieveMore","indices":[56,68]}],"user_mentions":[{"id":6048122,"name":"Tanya","indices":[0,8],"screen_name":"FilmFan","id_str":"6048122"},{"id":2649945906,"name":"ACADGILD","indices":[42,51],"screen_name":"acadgild","id_str":"2649945906"}]},"is_quote_status":false,"source":"<a href=\"https://about.twitter.com/products/tweetdeck\" rel=\"nofollow\">TweetDeck<\/a>","favorited":false,"in_reply_to_user_id":6048122,"retweet_count":0,"id_str":"689085590822891521","user":{"location":"India ","default_profile":false,"profile_background_tile":false,"statuses_count":86548,"lang":"en","profile_link_color":"94D487","profile_banner_url":"https://pbs.twimg.com/profile_banners/197865769/1436198000","id":197865769,"following":null,"protected":false,"favourites_count":1002,"profile_text_color":"000000","verified":false,"description":"Proud Indian, Digital Marketing Consultant,Traveler, Foodie, Adventurer, Data Architect, Movie Lover, Namo Fan","contributors_enabled":false,"profile_sidebar_border_color":"000000","name":"Bahubali","profile_background_color":"000000","created_at":"Sat Oct 02 17:41:02 +0000 2010","default_profile_image":false,"followers_count":4467,"profile_image_url_https":"https://pbs.twimg.com/profile_images/664486535040000000/GOjDUiuK_normal.jpg","geo_enabled":true,"profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","follow_request_sent":null,"url":null,"utc_offset":19800,"time_zone":"Chennai","notifications":null,"profile_use_background_image":false,"friends_count":810,"profile_sidebar_fill_color":"000000","screen_name":"Ashok_Uppuluri","id_str":"197865769","profile_image_url":"http://pbs.twimg.com/profile_images/664486535040000000/GOjDUiuK_normal.jpg","listed_count":50,"is_translator":false}}
I have tried to load this json file using ELEPHANT BIRD
script:-
REGISTER json-simple-1.1.1.jar
REGISTER elephant-bird-2.2.3.jar
REGISTER guava-11.0.2.jar
REGISTER avro-1.7.7.jar
REGISTER piggybank-0.12.0.jar
twitter = LOAD 'sample.json' USING com.twitter.elephantbird.pig.load.JsonLoader();
B = foreach twitter generate (chararray)$0#'created_at' as created_at,(chararray)$0#'id' as id,(chararray)$0#'id_str' as id_str,(chararray)$0#'text' as text,(chararray)$0#'source' as source,com.twitter.elephantbird.pig.piggybank.JsonStringToMap($0#'entities') as entities,(boolean)$0#'favorited' as favorited;
describe B;
OUTPUT:-
B: {created_at: chararray,id: chararray,id_str: chararray,text: chararray,source: chararray,entitis: map[chararray],favorited: boolean}
But when I tried to DUMP B the follwoing error has occured
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias B
I am providing the complete logs here.
2016-09-11 14:07:57,184 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1 2016-09-11 14:07:57,184 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1 2016-09-11 14:07:57,194 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2016-09-11 14:07:57,194 [main] INFO
org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script
settings are added to the job 2016-09-11 14:07:57,194 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2016-09-11 14:07:57,199 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job 2016-09-11 14:07:57,199 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is
false, will not generate code. 2016-09-11 14:07:57,199 [main] INFO
org.apache.pig.data.SchemaTupleFrontend - Starting process to move
generated code to distributed cacche 2016-09-11 14:07:57,199 [main]
INFO org.apache.pig.data.SchemaTupleFrontend - Distributed cache not
supported or needed in local mode. Setting key
[pig.schematuple.local.dir] with code temp directory:
/tmp/1473583077199-0 2016-09-11 14:07:57,206 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission. 2016-09-11 14:07:57,207 [JobControl] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot
initialize JVM Metrics with processName=JobTracker, sessionId= -
already initialized 2016-09-11 14:07:57,208 [JobControl] WARN
org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set.
User classes may not be found. See Job or Job#setJar(String).
2016-09-11 14:07:57,211 [JobControl] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
paths to process : 1 2016-09-11 14:07:57,211 [JobControl] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths (combined) to process : 1 2016-09-11 14:07:57,212
[JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of
splits:1 2016-09-11 14:07:57,216 [JobControl] INFO
org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job:
job_local360376249_0009 2016-09-11 14:07:57,267 [JobControl] INFO
org.apache.hadoop.mapreduce.Job - The url to track the job:
http://localhost:8080/ 2016-09-11 14:07:57,267 [Thread-214] INFO
org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter set in
config null 2016-09-11 14:07:57,270 [Thread-214] INFO
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File
Output Committer Algorithm version is 1 2016-09-11 14:07:57,270
[Thread-214] INFO
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter -
FileOutputCommitter skip cleanup _temporary folders under output
directory:false, ignore cleanup failures: false 2016-09-11
14:07:57,270 [Thread-214] INFO org.apache.hadoop.mapred.LocalJobRunner
- OutputCommitter is org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter
2016-09-11 14:07:57,271 [Thread-214] INFO
org.apache.hadoop.mapred.LocalJobRunner - Waiting for map tasks
2016-09-11 14:07:57,272 [LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapred.LocalJobRunner - Starting task:
attempt_local360376249_0009_m_000000_0 2016-09-11 14:07:57,277
[LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File
Output Committer Algorithm version is 1 2016-09-11 14:07:57,277
[LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter -
FileOutputCommitter skip cleanup _temporary folders under output
directory:false, ignore cleanup failures: false 2016-09-11
14:07:57,277 [LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapred.Task - Using ResourceCalculatorProcessTree :
[ ] 2016-09-11 14:07:57,278 [LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapred.MapTask - Processing split: Number of splits
:1 Total Length = 2416 Input split[0]: Length = 2416 ClassName:
org.apache.hadoop.mapreduce.lib.input.FileSplit Locations:
----------------------- 2016-09-11 14:07:57,282 [LocalJobRunner Map Task Executor #0] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader
- Current split being processed file:/root/PIG/PIG/sample.json:0+2416 2016-09-11 14:07:57,282 [LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File
Output Committer Algorithm version is 1 2016-09-11 14:07:57,282
[LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter -
FileOutputCommitter skip cleanup _temporary folders under output
directory:false, ignore cleanup failures: false 2016-09-11
14:07:57,288 [LocalJobRunner Map Task Executor #0] INFO
org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not
set... will not generate code. 2016-09-11 14:07:57,290 [LocalJobRunner
Map Task Executor #0] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map
- Aliases being processed per job phase (AliasName[line,offset]): M: twitter[20,10],B[21,4] C: R: 2016-09-11 14:07:57,291 [Thread-214] INFO
org.apache.hadoop.mapred.LocalJobRunner - map task executor complete.
2016-09-11 14:07:57,296 [Thread-214] WARN
org.apache.hadoop.mapred.LocalJobRunner - job_local360376249_0009
java.lang.Exception: java.lang.IncompatibleClassChangeError: Found
interface org.apache.hadoop.mapreduce.Counter, but class was expected
at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.IncompatibleClassChangeError: Found interface
org.apache.hadoop.mapreduce.Counter, but class was expected at
com.twitter.elephantbird.pig.util.PigCounterHelper.incrCounter(PigCounterHelper.java:55)
at
com.twitter.elephantbird.pig.load.LzoBaseLoadFunc.incrCounter(LzoBaseLoadFunc.java:70)
at
com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:130)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745) 2016-09-11 14:07:57,467
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_local360376249_0009 2016-09-11 14:07:57,467 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Processing aliases B,twitter 2016-09-11 14:07:57,467 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- detailed locations: M: twitter[20,10],B[21,4] C: R: 2016-09-11 14:07:57,468 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete 2016-09-11 14:07:57,468 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure. 2016-09-11 14:07:57,468 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- job job_local360376249_0009 has failed! Stop running all dependent jobs 2016-09-11 14:07:57,468 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete 2016-09-11 14:07:57,469 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2016-09-11 14:07:57,469 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2016-09-11 14:07:57,469 [main] ERROR
org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce
job(s) failed! 2016-09-11 14:07:57,470 [main] INFO
org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script
Statistics: HadoopVersionPigVersionUserIdStartedAtFinishedAtFeatures
2.7.1.2.3.4.7-40.15.0.2.3.4.7-4root2016-09-11 14:07:572016-09-11 14:07:57UNKNOWN Failed! Failed Jobs: JobIdAliasFeatureMessageOutputs
job_local360376249_0009B,twitterMAP_ONLYMessage: Job
failed!file:/tmp/temp252944192/tmp-470484503, Input(s): Failed to read
data from "file:///root/PIG/PIG/sample.json" Output(s): Failed to
produce result in "file:/tmp/temp252944192/tmp-470484503" Counters:
Total records written : 0 Total bytes written : 0 Spillable Memory
Manager spill count : 0 Total bags proactively spilled: 0 Total
records proactively spilled: 0 Job DAG: job_local360376249_0009
And please give a clarification on how to use jar files,
And what are the versions to use.I am so confused which version to use.
Someone says use Elephant Bird, and Someone says use AVRO. But I have with both non of them are working.
Please help.
Mohan.V
I got it on my own.
It is of jar versions issue.
script:-
REGISTER elephant-bird-core-4.1.jar
REGISTER elephant-bird-pig-4.1.jar
REGISTER elephant-bird-hadoop-compat-4.1.jar
And it worked fine.

Apache Pig error while dumping Json data

I have a JSON file and want to load using Apache Pig.
I am using the built-in JSONLOADER to load json data, Below is the sample json data.
cat jsondata1.json
{ "response": { "id": 10123, "thread": "Sloths", "comments": ["Sloths are adorable So chill"] }, "response_time": 0.425 }
{ "response": { "id": 13828, "thread": "Bigfoot", "comments": ["hello world"] } , "response_time": 0.517 }
Here I loading json data using builtin Json loader. While loading there is no error, but while dumping the data it gives the following error.
grunt> a = load '/home/cloudera/jsondata1.json' using JsonLoader('response:tuple (id:int, thread:chararray, comments:bag {tuple(comment:chararray)}), response_time:double');
grunt> dump a;
2016-04-17 01:11:13,286 [pool-4-thread-1] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed file:/home/cloudera/jsondata1.json:0+229
2016-04-17 01:11:13,287 [pool-4-thread-1] WARN org.apache.hadoop.conf.Configuration - dfs.https.address is deprecated. Instead, use dfs.namenode.https-address
2016-04-17 01:11:13,311 [pool-4-thread-1] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-04-17 01:11:13,321 [pool-4-thread-1] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: a[5,4] C: R:
2016-04-17 01:11:13,349 [Thread-16] INFO org.apache.hadoop.mapred.LocalJobRunner - Map task executor complete.
2016-04-17 01:11:13,351 [Thread-16] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local801054416_0004
java.lang.Exception: org.codehaus.jackson.JsonParseException: Current token (FIELD_NAME) not numeric, can not use numeric value accessors
at [Source: java.io.ByteArrayInputStream#2484de3c; line: 1, column: 120]
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
Caused by: org.codehaus.jackson.JsonParseException: Current token (FIELD_NAME) not numeric, can not use numeric value accessors
at [Source: java.io.ByteArrayInputStream#2484de3c; line: 1, column: 120]
at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1291)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at org.codehaus.jackson.impl.JsonNumericParserBase._parseNumericValue(JsonNumericParserBase.java:399)
at org.codehaus.jackson.impl.JsonNumericParserBase.getDoubleValue(JsonNumericParserBase.java:311)
at org.apache.pig.builtin.JsonLoader.readField(JsonLoader.java:203)
at org.apache.pig.builtin.JsonLoader.getNext(JsonLoader.java:157)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:483)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
2016-04-17 01:11:13,548 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local801054416_0004
2016-04-17 01:11:13,548 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases a
2016-04-17 01:11:13,548 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: a[5,4] C: R:
2016-04-17 01:11:18,059 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2016-04-17 01:11:18,059 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local801054416_0004 has failed! Stop running all dependent jobs
2016-04-17 01:11:18,059 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2016-04-17 01:11:18,059 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2016-04-17 01:11:18,060 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Detected Local mode. Stats reported below may be incomplete
2016-04-17 01:11:18,060 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.0.0-cdh4.7.0 0.11.0-cdh4.7.0 cloudera 2016-04-17 01:11:12 2016-04-17 01:11:18 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_local801054416_0004 a MAP_ONLY Message: Job failed! file:/tmp/temp-1766116741/tmp1151698221,
Input(s):
Failed to read data from "/home/cloudera/jsondata1.json"
Output(s):
Failed to produce result in "file:/tmp/temp-1766116741/tmp1151698221"
Job DAG:
job_local801054416_0004
2016-04-17 01:11:18,060 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2016-04-17 01:11:18,061 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias a
Details at logfile: /home/cloudera/pig_1460877001124.log
I could not able to find the issue. Can I know how to define the correct schema for the above json data?.
Try this:
comments:{(chararray)}
because this version:
comments:bag {tuple(comment:chararray)}
fits this JSON schema:
"comments": [{comment:"hello world"}]
and you have simple string values, not another nested documents:
"comments": ["hello world"]

Error processing complex json object of twitter with pig JsonLoader() of elephant-bird Jars

I wanted to process twitter json object with pig using elephant-bird jars for which i wrote the pig script as below.
REGISTER '/usr/lib/pig/lib/elephant-bird-hadoop-compat-4.1.jar';
REGISTER '/usr/lib/pig/lib/elephant-bird-pig-4.1.jar';
A = LOAD '/user/flume/tweets/data.json' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap;
B = FOREACH A GENERATE myMap#'id' AS ID,myMap#'created_at' AS createdAT;
DUMP B;
which gave me error as below
2015-08-25 11:06:34,295 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1439883208520_0177
2015-08-25 11:06:34,295 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,B
2015-08-25 11:06:34,295 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[3,4],B[4,4] C: R:
2015-08-25 11:06:34,303 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2015-08-25 11:06:34,303 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1439883208520_0177]
2015-08-25 11:07:06,449 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2015-08-25 11:07:06,449 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1439883208520_0177]
2015-08-25 11:07:09,458 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2015-08-25 11:07:09,458 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1439883208520_0177 has failed! Stop running all dependent jobs
2015-08-25 11:07:09,459 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2015-08-25 11:07:09,667 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://trinityhadoopmaster.com:8188/ws/v1/timeline/
2015-08-25 11:07:09,668 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at trinityhadoopmaster.com/192.168.1.135:8032
2015-08-25 11:07:09,678 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=FAILED. Redirecting to job history server
2015-08-25 11:07:09,779 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: java.lang.ClassNotFoundException: org.json.simple.parser.ParseException
2015-08-25 11:07:09,779 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
2015-08-25 11:07:09,780 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.6.0 0.14.0 hdfs 2015-08-25 11:06:33 2015-08-25 11:07:09 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_1439883208520_0177 A,B MAP_ONLY Message: Job failed! hdfs://trinityhadoopmaster.com:9000/tmp/temp1554332510/tmp835744559,
Input(s):
Failed to read data from "hdfs://trinityhadoopmaster.com:9000/user/flume/tweets/data.json"
Output(s):
Failed to produce result in "hdfs://trinityhadoopmaster.com:9000/tmp/temp1554332510/tmp835744559"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1439883208520_0177
2015-08-25 11:07:09,780 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2015-08-25 11:07:09,787 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias B. Backend error : java.lang.ClassNotFoundException: org.json.simple.parser.ParseException
Details at logfile: /tmp/pig-err.log
grunt>
which i have no clue on how to approach, can any one help me on this.
REGISTER '/tmp/elephant-bird-core-4.1.jar';
REGISTER '/tmp/elephant-bird-pig-4.1.jar';
REGISTER '/tmp/elephant-bird-hadoop-compat-4.1.jar';
REGISTER '/tmp/google-collections-1.0.jar';
REGISTER '/tmp/json-simple-1.1.jar';
It works.