Although I activated "prepare.chWeightings=no" in my property file it seems like the starting log still wants to prepare CHWeightings!
My config-example.properties file:
##### Vehicles #####
# Possible options: car,foot,bike,bike2,mtb,racingbike,motorcycle (comma separated)
# bike2 takes elevation data into account (like up-hill is slower than down-hill) and requires enabling graph.elevation.provider below
graph.flagEncoders=car
# Enable turn restrictions for car or motorcycle.
# Currently you need to additionally set prepare.chWeightings=no before using this (see below and #270)
# graph.flagEncoders=car|turnCosts=true
##### Elevation #####
# To populate your graph with elevation data use SRTM, default is noop (no elevation)
# graph.elevation.provider=srtm
# default location for cache is /tmp/srtm
# graph.elevation.cachedir=./srtmprovider/
# If you have a slow disk or plenty of RAM change the default MMAP to:
# graph.elevation.dataaccess=RAM_STORE
#### Speed-up mode vs. flexibility mode ####
# By default the speed-up mode with the 'fastest' weighting is used. Internally a graph preparation via
# contraction hierarchies (CH) is done to speed routing up. This requires more RAM/disc space for holding the
# graph but less for every request. You can also setup multiple weightings, by providing a coma separated list.
# prepare.chWeightings=fastest
# Disable the speed-up mode. Should be use only with routing.maxVisitedNodes
prepare.chWeightings=no
# To make preparation faster for multiple flagEncoders you can increase the default threads if you have enough RAM.
# Change this setting only if you know what you are doing and if the default worked for you and really make sure you have enough RAM!
# prepare.threads=1
##### Routing #####
# You can define the maximum visited nodes when routing. This may result in not found connections if there is no
# connection between two points wihtin the given visited nodes. The default is Integer.MAX_VALUE. Useful for flexibility mode
routing.maxVisitedNodes = 1000000
# If enabled, allows a user to run flexibility requests even if speed-up mode is enabled. Every request then has to include a hint routing.flexibleMode.force=true.
# Attention, non-CH route calculations take way more time and resources, compared to CH routing.
# A possible attacker might exploit this to slow down your service. Only enable it if you need it and with routing.maxVisitedNodes
routing.flexibleMode.allowed=true
##### Web #####
# if you want to support jsonp response type you need to add it explicitely here. By default it is disabled for stronger security.
web.jsonpAllowed=true
##### Storage #####
#
# configure the memory access, use RAM_STORE for well equipped servers (default and recommended) or MMAP_STORE_SYNC
graph.dataaccess=RAM_STORE
# if you don't need turn instruction, you can reduce storage size by not storing way names:
# osmreader.instructions=false
# will write way names in the preferred language (language code as defined in ISO 639-1 or ISO 639-2):
# osmreader.preferred-language=en
My Console:
java -jar *.jar jetty.resourcebase=webapp config=config-example.properties osmreader.osm=germany-latest.osm.pbf
[main] INFO com.graphhopper.GraphHopper - version 0.7|2016-04-27T10:00:38Z (4,13,3,2,2,1)
[main] INFO com.graphhopper.GraphHopper - graph CH|car|RAM_STORE|2D|NoExt|,,,,, details:edges:0(0MB), nodes:0(0MB), name:(0MB), geo:0(0MB), bounds:1.7976931348623157E308,-1.7976931348623157E308,1.7976931348623157E308,-1.7976931348623157E308, CHGraph|fastest|car, shortcuts:0, nodesCH:(0MB)
[main] INFO com.graphhopper.GraphHopper - start creating graph from germany-latest.osm.pbf
[main] INFO com.graphhopper.GraphHopper - using CH|car|RAM_STORE|2D|NoExt|,,,,, memory:totalMB:964, usedMB:25
[main] INFO com.graphhopper.reader.OSMReader - 5 000 000 (preprocess), osmIdMap:32 042 694 (381MB) totalMB:6281, usedMB:3934
[main] INFO com.graphhopper.reader.OSMReader - 50 000 (preprocess), osmWayMap:0 totalMB:6465, usedMB:3537
[main] INFO com.graphhopper.reader.OSMReader - 100 000 (preprocess), osmWayMap:0 totalMB:6465, usedMB:4153
[main] INFO com.graphhopper.reader.OSMReader - 150 000 (preprocess), osmWayMap:0 totalMB:6465, usedMB:4837
[main] INFO com.graphhopper.reader.OSMReader - 200 000 (preprocess), osmWayMap:0 totalMB:6465, usedMB:5420
[main] INFO com.graphhopper.reader.OSMReader - 250 000 (preprocess), osmWayMap:0 totalMB:6457, usedMB:1120
[main] INFO com.graphhopper.reader.OSMReader - 300 000 (preprocess), osmWayMap:0 totalMB:6457, usedMB:1677
[main] INFO com.graphhopper.reader.OSMReader - 350 000 (preprocess), osmWayMap:0 totalMB:6457, usedMB:2212
[main] INFO com.graphhopper.reader.OSMReader - 400 000 (preprocess), osmWayMap:0 totalMB:6457, usedMB:2896
[main] INFO com.graphhopper.reader.OSMReader - 450 000 (preprocess), osmWayMap:0 totalMB:6457, usedMB:3479
[main] INFO com.graphhopper.reader.OSMReader - 500 000 (preprocess), osmWayMap:0 totalMB:6457, usedMB:4014
[main] INFO com.graphhopper.reader.OSMReader - creating graph. Found nodes (pillar+tower):36 542 424, totalMB:6457, usedMB:4111
[main] INFO com.graphhopper.reader.OSMReader - 100 000 000, locs:24 711 933 (0) totalMB:6415, usedMB:1203
[main] INFO com.graphhopper.reader.OSMReader - 200 000 000, locs:33 458 643 (0) totalMB:6282, usedMB:2842
[main] INFO com.graphhopper.reader.OSMReader - 241 756 168, now parsing ways
[main] WARN com.graphhopper.routing.util.AbstractFlagEncoder - Unrealistic long duration ignored in way with OSMID=409892450 : Duration tag value=13:15 (=795 minutes)
[main] INFO com.graphhopper.reader.OSMReader - 280 449 722, now parsing relations
[main] INFO com.graphhopper.reader.OSMReader - finished way processing. nodes: 8773936, osmIdMap.size:36674781, osmIdMap:468MB, nodeFlagsMap.size:132357, relFlagsMap.size:0, zeroCounter:131245 totalMB:7276, usedMB:6425
[main] INFO com.graphhopper.reader.OSMReader - time(pass1): 104 pass2: 132 total:236
[main] INFO com.graphhopper.GraphHopper - start finding subnetworks, totalMB:7276, usedMB:6427
[main] INFO com.graphhopper.routing.util.PrepareRoutingSubnetworks - 165929 subnetworks found for car, totalMB:7276, usedMB:6743
[main] INFO com.graphhopper.routing.util.PrepareRoutingSubnetworks - optimize to remove subnetworks (165929), unvisited-dead-end-nodes (0), maxEdges/node (13)
[main] INFO com.graphhopper.GraphHopper - edges: 10586032, nodes 8344328, there were 165929 subnetworks. removed them => 429608 less nodes
[main] INFO com.graphhopper.storage.index.LocationIndexTree - location index created in 7.324438s, size:10 241 180, leafs:2 299 038, precision:300, depth:5, checksum:8344328, entries:[64, 64, 64, 16, 4], entriesPerLeaf:4.4545503
[main] INFO com.graphhopper.routing.ch.CHAlgoFactoryDecorator - 1/1 calling prepare.doWork for fastest|car ... (totalMB:7271, usedMB:3588)
[fastest_car] INFO com.graphhopper.routing.ch.PrepareContractionHierarchies - 0, updates:0, nodes: 8 344 328, shortcuts:0, dijkstras:33 871 012, t(dijk):5.4, t(period):0.0, t(lazy):0.0, t(neighbor):0.0, meanDegree:1, algo:127MB, totalMB:7271, usedMB:4275
[fastest_car] INFO com.graphhopper.routing.ch.PrepareContractionHierarchies - 1 668 860, updates:0, nodes: 6 675 468, shortcuts:675, dijkstras:34 551 628, t(dijk):6.55, t(period):0.0, t(lazy):0.0, t(neighbor):1.73, meanDegree:0, algo:127MB, totalMB:7271, usedMB:4301
[fastest_car] INFO com.graphhopper.routing.ch.PrepareContractionHierarchies - 3 337 720, updates:1, nodes: 5 006 608, shortcuts:1 043 389, dijkstras:65 810 344, t(dijk):22.94, t(period):11.56, t(lazy):0.0, t(neighbor):8.92, meanDegree:1, algo:127MB, totalMB:7271, usedMB:5068
[fastest_car] INFO com.graphhopper.routing.ch.PrepareContractionHierarchies - 5 006 580, updates:2, nodes: 3 337 748, shortcuts:2 164 049, dijkstras:94 377 631, t(dijk):99.06, t(period):72.35, t(lazy):0.0, t(neighbor):19.56, meanDegree:1, algo:127MB, totalMB:7271, usedMB:5784
[fastest_car] INFO com.graphhopper.routing.ch.PrepareContractionHierarchies - 6 675 440, updates:3, nodes: 1 668 888, shortcuts:3 836 152, dijkstras:117 298 010, t(dijk):182.91, t(period):123.25, t(lazy):0.0, t(neighbor):39.4, meanDegree:2, algo:127MB, totalMB:7271, usedMB:6553
[fastest_car] INFO com.graphhopper.routing.ch.PrepareContractionHierarchies - 8 344 300, updates:4, nodes: 254 254, shortcuts:5 744 154, dijkstras:146 679 623, t(dijk):333.88, t(period):171.72, t(lazy):35.62, t(neighbor):79.67, meanDegree:3, algo:127MB, totalMB:7457, usedMB:1791
[fastest_car] INFO com.graphhopper.routing.ch.PrepareContractionHierarchies - took:515, new shortcuts: 6 499 068, prepare|fastest, car, dijkstras:167252328, t(dijk):451.61, t(period):188.89, t(lazy):70.18, t(neighbor):134.1, meanDegree:1, initSize:8344328, periodic:20, lazy:10, neighbor:20, totalMB:7457, usedMB:2019
[main] INFO com.graphhopper.GraphHopper - flushing graph CH|car|RAM_STORE|2D|NoExt|4,13,3,2,2, details:edges:10 586 032(324MB), nodes:8 344 328(96MB), name:(35MB), geo:52 329 514(200MB), bounds:5.863066148677457,25.196558055204704,47.27804356689848,60.22003669783555, CHGraph|fastest|car, shortcuts:6 499 068, nodesCH:(64MB), totalMB:7457, usedMB:2036)
[main] INFO com.graphhopper.http.DefaultModule - loaded graph at:germany-latest.osm-gh, source:germany-latest.osm.pbf, flagEncoders:car, class:edges:10 586 032(324MB), nodes:8 344 328(96MB), name:(35MB), geo:52 329 514(200MB), bounds:5.863066148677457,25.196558055204704,47.27804356689848,60.22003669783555, CHGraph|fastest|car, shortcuts:6 499 068, nodesCH:(64MB)
[main] INFO com.graphhopper.http.GHServer - Started server at HTTP : 8888
So is my config file correct for flexible mode?
And is "PrepareContractionHierarchies" mentioned in the logs of graphhopper still correct when I want to start it in flexible mode?
thanks for answers
I have a JSON file and want to load using Apache Pig.
I am using the built-in JSONLOADER to load json data, Below is the sample json data.
cat jsondata1.json
{ "response": { "id": 10123, "thread": "Sloths", "comments": ["Sloths are adorable So chill"] }, "response_time": 0.425 }
{ "response": { "id": 13828, "thread": "Bigfoot", "comments": ["hello world"] } , "response_time": 0.517 }
Here I loading json data using builtin Json loader. While loading there is no error, but while dumping the data it gives the following error.
grunt> a = load '/home/cloudera/jsondata1.json' using JsonLoader('response:tuple (id:int, thread:chararray, comments:bag {tuple(comment:chararray)}), response_time:double');
grunt> dump a;
2016-04-17 01:11:13,286 [pool-4-thread-1] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed file:/home/cloudera/jsondata1.json:0+229
2016-04-17 01:11:13,287 [pool-4-thread-1] WARN org.apache.hadoop.conf.Configuration - dfs.https.address is deprecated. Instead, use dfs.namenode.https-address
2016-04-17 01:11:13,311 [pool-4-thread-1] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-04-17 01:11:13,321 [pool-4-thread-1] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: a[5,4] C: R:
2016-04-17 01:11:13,349 [Thread-16] INFO org.apache.hadoop.mapred.LocalJobRunner - Map task executor complete.
2016-04-17 01:11:13,351 [Thread-16] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local801054416_0004
java.lang.Exception: org.codehaus.jackson.JsonParseException: Current token (FIELD_NAME) not numeric, can not use numeric value accessors
at [Source: java.io.ByteArrayInputStream#2484de3c; line: 1, column: 120]
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
Caused by: org.codehaus.jackson.JsonParseException: Current token (FIELD_NAME) not numeric, can not use numeric value accessors
at [Source: java.io.ByteArrayInputStream#2484de3c; line: 1, column: 120]
at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1291)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at org.codehaus.jackson.impl.JsonNumericParserBase._parseNumericValue(JsonNumericParserBase.java:399)
at org.codehaus.jackson.impl.JsonNumericParserBase.getDoubleValue(JsonNumericParserBase.java:311)
at org.apache.pig.builtin.JsonLoader.readField(JsonLoader.java:203)
at org.apache.pig.builtin.JsonLoader.getNext(JsonLoader.java:157)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:483)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
2016-04-17 01:11:13,548 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local801054416_0004
2016-04-17 01:11:13,548 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases a
2016-04-17 01:11:13,548 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: a[5,4] C: R:
2016-04-17 01:11:18,059 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2016-04-17 01:11:18,059 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local801054416_0004 has failed! Stop running all dependent jobs
2016-04-17 01:11:18,059 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2016-04-17 01:11:18,059 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2016-04-17 01:11:18,060 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Detected Local mode. Stats reported below may be incomplete
2016-04-17 01:11:18,060 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.0.0-cdh4.7.0 0.11.0-cdh4.7.0 cloudera 2016-04-17 01:11:12 2016-04-17 01:11:18 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_local801054416_0004 a MAP_ONLY Message: Job failed! file:/tmp/temp-1766116741/tmp1151698221,
Input(s):
Failed to read data from "/home/cloudera/jsondata1.json"
Output(s):
Failed to produce result in "file:/tmp/temp-1766116741/tmp1151698221"
Job DAG:
job_local801054416_0004
2016-04-17 01:11:18,060 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2016-04-17 01:11:18,061 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias a
Details at logfile: /home/cloudera/pig_1460877001124.log
I could not able to find the issue. Can I know how to define the correct schema for the above json data?.
Try this:
comments:{(chararray)}
because this version:
comments:bag {tuple(comment:chararray)}
fits this JSON schema:
"comments": [{comment:"hello world"}]
and you have simple string values, not another nested documents:
"comments": ["hello world"]
I wanted to process twitter json object with pig using elephant-bird jars for which i wrote the pig script as below.
REGISTER '/usr/lib/pig/lib/elephant-bird-hadoop-compat-4.1.jar';
REGISTER '/usr/lib/pig/lib/elephant-bird-pig-4.1.jar';
A = LOAD '/user/flume/tweets/data.json' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap;
B = FOREACH A GENERATE myMap#'id' AS ID,myMap#'created_at' AS createdAT;
DUMP B;
which gave me error as below
2015-08-25 11:06:34,295 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1439883208520_0177
2015-08-25 11:06:34,295 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,B
2015-08-25 11:06:34,295 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[3,4],B[4,4] C: R:
2015-08-25 11:06:34,303 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2015-08-25 11:06:34,303 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1439883208520_0177]
2015-08-25 11:07:06,449 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2015-08-25 11:07:06,449 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1439883208520_0177]
2015-08-25 11:07:09,458 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2015-08-25 11:07:09,458 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1439883208520_0177 has failed! Stop running all dependent jobs
2015-08-25 11:07:09,459 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2015-08-25 11:07:09,667 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://trinityhadoopmaster.com:8188/ws/v1/timeline/
2015-08-25 11:07:09,668 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at trinityhadoopmaster.com/192.168.1.135:8032
2015-08-25 11:07:09,678 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=FAILED. Redirecting to job history server
2015-08-25 11:07:09,779 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: java.lang.ClassNotFoundException: org.json.simple.parser.ParseException
2015-08-25 11:07:09,779 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
2015-08-25 11:07:09,780 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.6.0 0.14.0 hdfs 2015-08-25 11:06:33 2015-08-25 11:07:09 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_1439883208520_0177 A,B MAP_ONLY Message: Job failed! hdfs://trinityhadoopmaster.com:9000/tmp/temp1554332510/tmp835744559,
Input(s):
Failed to read data from "hdfs://trinityhadoopmaster.com:9000/user/flume/tweets/data.json"
Output(s):
Failed to produce result in "hdfs://trinityhadoopmaster.com:9000/tmp/temp1554332510/tmp835744559"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1439883208520_0177
2015-08-25 11:07:09,780 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2015-08-25 11:07:09,787 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias B. Backend error : java.lang.ClassNotFoundException: org.json.simple.parser.ParseException
Details at logfile: /tmp/pig-err.log
grunt>
which i have no clue on how to approach, can any one help me on this.
REGISTER '/tmp/elephant-bird-core-4.1.jar';
REGISTER '/tmp/elephant-bird-pig-4.1.jar';
REGISTER '/tmp/elephant-bird-hadoop-compat-4.1.jar';
REGISTER '/tmp/google-collections-1.0.jar';
REGISTER '/tmp/json-simple-1.1.jar';
It works.
I am unable to decode this simple json , i dont know what i am doing wrong.
please help me in this pig script.
I have to decode the below data in json format.
3.json
{
"id": 6668,
"source_name": "National Stock Exchange of India",
"source_code": "NSE"
}
and my pig script is
a = LOAD '3.json' USING org.apache.pig.builtin.JsonLoader ('id:int, source_name:chararray, source_code:chararray');
dump a;
the error i get is given below:
2015-07-23 13:40:08,715 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - Starting task: attempt_local1664361500_0001_m_000000_0
2015-07-23 13:40:08,775 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorProcessTree : [ ]
2015-07-23 13:40:08,780 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Processing split: Number of splits :1
Total Length = 88
Input split[0]:
Length = 88
Locations:
-----------------------
2015-07-23 13:40:08,793 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed file:/home/hariprasad.sudo/3.json:0+88
2015-07-23 13:40:08,844 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: a[1,4] C: R:
2015-07-23 13:40:08,861 [Thread-5] INFO org.apache.hadoop.mapred.LocalJobRunner - map task executor complete.
2015-07-23 13:40:08,867 [Thread-5] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local1664361500_0001
java.lang.Exception: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: expected close marker for OBJECT (from [Source: java.io.ByteArrayInputStream#61a79110; line: 1, column: 0])
at [Source: java.io.ByteArrayInputStream#61a79110; line: 1, column: 3]
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: expected close marker for OBJECT (from [Source: java.io.ByteArrayInputStream#61a79110; line: 1, column: 0])
at [Source: java.io.ByteArrayInputStream#61a79110; line: 1, column: 3]
at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1291)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportInvalidEOF(JsonParserMinimalBase.java:318)
at org.codehaus.jackson.impl.JsonParserBase._handleEOF(JsonParserBase.java:354)
at org.codehaus.jackson.impl.Utf8StreamParser._skipWSOrEnd(Utf8StreamParser.java:1841)
at org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:275)
at org.apache.pig.builtin.JsonLoader.readField(JsonLoader.java:180)
at org.apache.pig.builtin.JsonLoader.getNext(JsonLoader.java:164)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-07-23 13:40:09,179 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2015-07-23 13:40:09,179 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local1664361500_0001 has failed! Stop running all dependent jobs
2015-07-23 13:40:09,179 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2015-07-23 13:40:09,180 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2015-07-23 13:40:09,180 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Detected Local mode. Stats reported below may be incomplete
2015-07-23 13:40:09,181 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.3.0-cdh5.1.3 0.12.0-cdh5.1.3 hariprasad.sudo 2015-07-23 13:40:07 2015-07-23 13:40:09 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_local1664361500_0001 a MAP_ONLY Message: Job failed! file:/tmp/temp-65649055/tmp1240506051,
Input(s):
Failed to read data from "file:///home/hariprasad.sudo/3.json"
Output(s):
Failed to produce result in "file:/tmp/temp-65649055/tmp1240506051"
Job DAG:
job_local1664361500_0001
2015-07-23 13:40:09,181 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2015-07-23 13:40:09,186 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias a
Details at logfile: /home/hariprasad.sudo/pig_1437673203961.log
grunt> 2015-07-23 13:40:14,754 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - map > map
Please help me in understanding what is wrong.
Thanks,
Hari
Have the compact version of json in 3.json. We can use http://www.jsoneditoronline.org for the same.
3.json
{"id":6668,"source_name":"National Stock Exchange of India","source_code":"NSE"}
with this we are able to dump the data :
(6668,National Stock Exchange of India,NSE)
Ref : Error from Json Loader in Pig where similar issue is discussed.
Extract from the above ref. link :
Pig doesn't usually like "human readable" json. Get rid of the spaces and/or indentations, and you're good.
I've setup hadoop on a two node cluster. The first node "namenode" runs the following daemons:
hadoop#namenode:~$ jps
2916 SecondaryNameNode
2692 NameNode
3159 NodeManager
5834 Jps
2771 DataNode
3076 ResourceManager
The seconds node "datanode" runs the following daemons:
hadoop#datanode:~$ jps
2559 Jps
2087 DataNode
2198 NodeManager
In the /etc/hosts file I added on BOTH machines:
10.240.40.246 namenode
10.240.172.201 datanode
which are the corresponding ips and I check that I can ssh to any other machine from each machine. Now, I wanted to test my hadoop installation by performing a sample map reduce benchmark job:
hadoop#namenode:~$ hadoop jar /opt/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -write -nrFiles 20 -fileSize 10
However this job fails:
14/02/17 22:22:53 INFO fs.TestDFSIO: TestDFSIO.1.7
14/02/17 22:22:53 INFO fs.TestDFSIO: nrFiles = 20
14/02/17 22:22:53 INFO fs.TestDFSIO: nrBytes (MB) = 10.0
14/02/17 22:22:53 INFO fs.TestDFSIO: bufferSize = 1000000
14/02/17 22:22:53 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
14/02/17 22:22:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/02/17 22:22:55 INFO fs.TestDFSIO: creating control file: 10485760 bytes, 20 files
14/02/17 22:22:56 INFO fs.TestDFSIO: created control files for: 20 files
14/02/17 22:22:56 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/02/17 22:22:56 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/02/17 22:22:57 INFO mapred.FileInputFormat: Total input paths to process : 20
14/02/17 22:22:57 INFO mapreduce.JobSubmitter: number of splits:20
14/02/17 22:22:57 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/02/17 22:22:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1392675199090_0001
14/02/17 22:22:59 INFO impl.YarnClientImpl: Submitted application application_1392675199090_0001 to ResourceManager at /0.0.0.0:8032
14/02/17 22:22:59 INFO mapreduce.Job: The url to track the job: http://namenode.c.forward-camera-473.internal:8088/proxy/application_1392675199090_0001/
14/02/17 22:22:59 INFO mapreduce.Job: Running job: job_1392675199090_0001
14/02/17 22:23:10 INFO mapreduce.Job: Job job_1392675199090_0001 running in uber mode : false
14/02/17 22:23:10 INFO mapreduce.Job: map 0% reduce 0%
14/02/17 22:23:42 INFO mapreduce.Job: map 20% reduce 0%
14/02/17 22:23:43 INFO mapreduce.Job: map 30% reduce 0%
14/02/17 22:24:14 INFO mapreduce.Job: map 60% reduce 0%
14/02/17 22:24:41 INFO mapreduce.Job: map 60% reduce 20%
14/02/17 22:24:45 INFO mapreduce.Job: map 85% reduce 20%
14/02/17 22:24:48 INFO mapreduce.Job: map 85% reduce 28%
14/02/17 22:24:59 INFO mapreduce.Job: map 90% reduce 28%
14/02/17 22:25:00 INFO mapreduce.Job: map 90% reduce 30%
14/02/17 22:25:02 INFO mapreduce.Job: map 100% reduce 30%
14/02/17 22:25:03 INFO mapreduce.Job: map 100% reduce 100%
14/02/17 22:25:16 INFO mapreduce.Job: map 0% reduce 0%
14/02/17 22:25:16 INFO mapreduce.Job: Job job_1392675199090_0001 failed with state FAILED due to: Application application_1392675199090_0001 failed 2 times due to AM Container for appattempt_1392675199090_0001_000002 exited with exitCode: 1 due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
.Failing this attempt.. Failing the application.
14/02/17 22:25:16 INFO mapreduce.Job: Counters: 0
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
at org.apache.hadoop.fs.TestDFSIO.runIOTest(TestDFSIO.java:443)
at org.apache.hadoop.fs.TestDFSIO.writeTest(TestDFSIO.java:425)
at org.apache.hadoop.fs.TestDFSIO.run(TestDFSIO.java:755)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:650)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:115)
at org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:123)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Having a look into the log file I find on the machine datanode in that:
hadoop#datanode:/opt/hadoop-2.2.0/logs$ cat yarn-hadoop-nodemanager-datanode.log
...
2014-02-17 22:29:33,432 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
On my namenode I did:
hadoop#namenode:/opt/hadoop-2.2.0/logs$ cat yarn-hadoop-*log
2014-02-17 22:13:20,833 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: STARTUP_MSG:
...
2014-02-17 22:13:25,240 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for class class org.apache.hadoop.mapred.ShuffleHandler which has a name of 'httpshuffle'. Because these are not the same tools trying to send ServiceData and read Service Meta Data may have issues unless the refer to the name in the config.
...
2014-02-17 22:13:25,505 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: NodeManager configured with 8 G physical memory allocated to containers, which is more than 80% of the total physical memory available (3.6 G). Thrashing might happen.
...
2014-02-17 22:24:48,779 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: KILL_CONTAINER sent to absent container container_1392675199090_0001_01_000023
2014-02-17 22:24:48,779 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: KILL_CONTAINER sent to absent container container_1392675199090_0001_01_000024
...
2014-02-17 22:25:15,733 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1392675199090_0001_02_000001 is : 1
2014-02-17 22:25:15,734 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1392675199090_0001_02_000001 and exit code: 1
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
...
2014-02-17 22:25:15,736 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 1
...
2014-02-17 22:25:15,751 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hadoop OPERATION=Container Finished - Failed TARGET=ContainerImpl RESULT=FAILURE DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE APPID=application_1392675199090_0001 CONTAINERID=container_1392675199090_0001_02_000001
...
2014-02-17 22:13:19,150 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: STARTUP_MSG:
...
2014-02-17 22:25:15,837 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1392675199090_0001 failed 2 times due to AM Container for appattempt_1392675199090_0001_000002 exited with exitCode: 1 due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
.Failing this attempt.. Failing the application. APPID=application_1392675199090_0001
However, I checked on machine namenode that port 8031 is listening. I get:
hadoop#namenode:~$ netstat
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 namenode.c.forwar:36975 metadata.google.in:http TIME_WAIT
tcp 0 0 namenode.c.forwar:36969 metadata.google.in:http TIME_WAIT
tcp 0 0 namenode.c.forwar:40616 namenode.c.forwar:10001 TIME_WAIT
tcp 0 0 namenode.c.forwar:36974 metadata.google.in:http ESTABLISHED
tcp 0 0 namenode.c.forward:8031 namenode.c.forwar:41229 ESTABLISHED
tcp 0 352 namenode.c.forward-:ssh e178064245.adsl.a:64305 ESTABLISHED
tcp 0 0 namenode.c.forwar:41229 namenode.c.forward:8031 ESTABLISHED
tcp 0 0 namenode.c.forwar:40365 namenode.c.forwar:10001 ESTABLISHED
tcp 0 0 namenode.c.forwar:10001 namenode.c.forwar:40365 ESTABLISHED
tcp 0 0 namenode.c.forwar:10001 datanode:48786 ESTABLISHED
Active UNIX domain sockets (w/o servers)
Proto RefCnt Flags Type State I-Node Path
unix 10 [ ] DGRAM 4604 /dev/log
unix 2 [ ] STREAM CONNECTED 10490
unix 2 [ ] STREAM CONNECTED 10488
unix 2 [ ] STREAM CONNECTED 10452
unix 2 [ ] STREAM CONNECTED 8452
unix 2 [ ] STREAM CONNECTED 7800
unix 2 [ ] STREAM CONNECTED 7797
unix 2 [ ] STREAM CONNECTED 6762
unix 2 [ ] STREAM CONNECTED 6702
unix 2 [ ] STREAM CONNECTED 6698
unix 2 [ ] STREAM CONNECTED 6208
unix 2 [ ] DGRAM 5750
unix 2 [ ] DGRAM 5737
unix 2 [ ] DGRAM 5734
unix 3 [ ] STREAM CONNECTED 5643
unix 3 [ ] STREAM CONNECTED 5642
unix 2 [ ] DGRAM 5640
unix 2 [ ] DGRAM 5192
unix 2 [ ] DGRAM 5171
unix 2 [ ] DGRAM 4889
unix 2 [ ] DGRAM 4723
unix 2 [ ] DGRAM 4663
unix 3 [ ] DGRAM 3132
unix 3 [ ] DGRAM 3131
So, what could be the problem here. In my opinion everything is setup fine. Why is my job failing then?
The log on the datanode says
Retrying connect to server: 0.0.0.0/0.0.0.0:8031
So it trys to connect to this port on the local machine which is datanode. However, the service runs on namenode. Therefore one has to add the following config lines to yarn-site.xml
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>namenode:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>namenode:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>namenode:8030</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>namenode:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>namenode:8088</value>
</property>
where namenode is an alias in /etc/hosts for the machine that runs the resource manager daemon.
Also add the same properties in the yarn-site.xml file on the namenode to ensure that these services connect to the same ports.