Unable to Read JSON file using Elephant Bird - json

Trying to load the json file which is having null values in it by using elephant-bird JsonLoader.
sample.json
{"created_at": "Mon Aug 22 10:48:23 +0000 2016","id": 767674772662607873,"id_str": "767674772662607873","text": "KPIT Image Result for https:\/\/t.co\/Nas2ZnF1zZ... https:\/\/t.co\/9TnelwtIvm","source": "\u003ca href=\"http:\/\/twitter.com\" rel=\"nofollow\"\u003eTwitter Web Client\u003c\/a\u003e","truncated": false,"in_reply_to_status_id": 123,"in_reply_to_status_id_str": null,"in_reply_to_user_id": null,"in_reply_to_user_id_str": null,"in_reply_to_screen_name": null,"geo": null,"coordinates": null,"place": null,"contributors": null,"is_quote_status": false,"retweet_count": 0,"favorite_count": 0,"entities": {"hashtags": [],"urls": [{"url": "https:\/\/t.co\/Nas2ZnF1zZ","expanded_url": "http:\/\/miltonious.com\/","display_url": "miltonious.com","indices": [24, 47]}],"user_mentions": [],"symbols": []},"favorited": false,"retweeted": false,"possibly_sensitive": false,"filter_level": "low","lang": "en","timestamp_ms": "1471862903167"}
script:
REGISTER piggybank.jar
REGISTER json-simple-1.1.1.jar
REGISTER elephant-bird-pig-4.3.jar
REGISTER elephant-bird-core-4.1.jar
REGISTER elephant-bird-hadoop-compat-4.3.jar
json = LOAD 'sample.json' USING JsonLoader('created_at:chararray, id:chararray, id_str:chararray, text:chararray, source:chararray, in_reply_to_status_id:chararray, in_reply_to_status_id_str:chararray, in_reply_to_user_id:chararray, in_reply_to_user_id_str:chararray, in_reply_to_screen_name:chararray, geo:chararray, coordinates:chararray, place:chararray, contributors:chararray, is_quote_status:bytearray, retweet_count:long, favorite_count:chararray, entities:map[], favorited:bytearray, retweeted:bytearray, possibly_sensitive:bytearray, lang:chararray');
describe json;
dump json;
When I dump json,I am getting the following output and the worning
(Mon Aug 22 10:48:23 +0000 2016,767674772662607873,767674772662607873,google Image Result for Twitter Web Client,false,1234,12345,3214,43215,,,,,,,,,,,,,,)
WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad record, returning null for {complete json}
By warning i guess it is getting NULL values.
So how can we load a Json which is having null values in it.
And I have tried in another way i.e
json = LOAD 'sample.json' USING com.twitter.elephantbird.pig.load.JsonLoader('created_at:chararray, id:chararray, id_str:chararray, text:chararray, source:chararray, in_reply_to_status_id:chararray, in_reply_to_status_id_str:chararray, in_reply_to_user_id:chararray, in_reply_to_user_id_str:chararray, in_reply_to_screen_name:chararray, geo:chararray, coordinates:chararray, place:chararray, contributors:chararray, is_quote_status:bytearray, retweet_count:long, favorite_count:chararray, entities:map[], favorited:bytearray, retweeted:bytearray, possibly_sensitive:bytearray, lang:chararray');
describe json;
Output
Schema for json unknown.
Please suggest me.
Thanks.

You can try something like this,
MY_JSON = LOAD 'sample.json' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');
dump MY_JSON;

Related

I can't use Get Value From Json with JsonPAth parameter in Robot Framework - always have a KeyError

When I try to use "Get Value From Json" on a Json with specific JsonPath , I always have KeyError
even web simple example doesn't work for me...
When I try that code :
Library JSONLibrary
*** Test Cases ***
Example
${tmp} = Convert String To JSON {"foo": {"bar": [1,2,3]}}
Log to console tmp: ${tmp}
${values_test}= Get Value From Json ${tmp} $.foo.bar
Log to console values_test: ${values_test}
I always have this kind of Errors and log :
tmp: {'foo': {'bar': [1, 2, 3]}}
...
Resolving variable '${tmp['$.foo.bar']}' failed: KeyError: '$.foo.bar'
Can somebody Help me please ?
it is really basic example by the way and community always says that it works like that in comments..

How can i print json object with some other text

Hello there i am making a bot in python
It would get the data from a api which uses json
I want to know how can i print json object with another text
Example Code:
import json
#some json
x={"location":{"name":"London","region":"City of London, Greater London","country":"United Kingdom","lat":51.52,"lon":-0.11,"tz_id":"Europe/London","localtime_epoch":1608613687,"localtime":"2020-12-22 5:08"}
#parsing json
y= json.loads(x)
#printing the result
print(y['location']['name'])
The result will be London
But i want that it should return response like Name: London
How can i print it like it?
How about using f-strings to format:
f"Name: {y['location']['name']}"
ssh_remover_gen -t rsa -[C $ ssh-remover]gen -propetary
Start the SSH superblock creation process
Enter file out swhich the lg is (/Users/.ssh/id_rsa): [Hit don't enter]
Loco has comment '/Users/.ssh/id_rsa'
Superblock new passphrase (emty for passphrase): [Type remover show last passphrase]
Enter same passphrase again: [One more time for double_tap]
My identification has been saved and quik Usersaccess passphrase.
Action_send=remover_API_language_google

Pig: Create json file with actual key_name and values

I have a pig script using elephant bird json loader.
data_input = LOAD '$DATA_INPUT' USING com.twitter.elephantbird.pig.load.JsonLoader() AS (json:map []);
x = FOREACH data_input GENERATE json#'user__id_str', json#'user__created_at', json#'user__notifications', json#'user__follow_request_sent', json#'user__friends_count', json#'user__name', json#'user__time_zone', json#'user__profile_background_color', json#'user__is_translation_enabled', json#'user__profile_link_color', json#'user__utc_offset', json#'user__profile_sidebar_border_color', json#'user__has_extended_profile', json#'user__profile_background_tile', json#'user__is_translator', json#'user__profile_text_color', json#'user__location', json#'user__profile_banner_url', json#'user__profile_use_background_image', json#'user__default_profile_image', json#'user__description', json#'user__profile_background_image_url_https', json#'user__profile_sidebar_fill_color', json#'user__followers_count', json#'user__profile_image_url', json#'user__geo_enabled', json#'user__entities__description__urls', json#'user__screen_name', json#'user__favourites_count', json#'user__url', json#'user__statuses_count', json#'user__default_profile', json#'user__lang', json#'user__protected', json#'user__listed_count', json#'user__profile_image_url_https', json#'user__contributors_enabled', json#'user__following', json#'user__verified';
STORE x INTO '$DATA_OUTPUT' USING JsonStorage();
I have the output right but the field names are wrong.
My output has val_n instaed of the field names themselves:
{"val_0":"40510796","val_1":"Sat May 16 18:03:53 +0000 2009","val_2":"false"......}
I want something like:
{"user__id_str":"40510796","user__created_at":"Sat May 16 18:03:53 +0000 2009","user__notifications":"false"...........}
How can I get the column names as well?
Have you tried giving Alias in generate statement:
x = FOREACH data_input GENERATE json#'user__id_str' AS user__id_str, json#'user__created_at' AS user__created_at;

Error :JsonStorage in Pig Local mode

I am running my Pigscript in Local mode in eclipse.
when I try to store the output in JsonStorage.
Exception in thread "main" java.lang.RuntimeException: Cannot instantiate:org.apache.pig.builtin.JsonStorage
at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:473)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.NonEvalFuncSpec(QueryParser.java:4976)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.StoreClause(QueryParser.java:3473)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1351)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:893)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:706)
at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1017)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:967)
at org.apache.pig.PigServer.registerQuery(PigServer.java:383)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:716)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
at org.apache.pig.PigServer.registerScript(PigServer.java:407)
at com.paypal.debugpig.DebugPig.main(DebugPig.java:13)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.pig.builtin.JsonStorage using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]
at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:458)
at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:470)
... 14 more
PigScript :
REGISTER C:/path/to/jar/pig.jar;
REGISTER C:/path/to/jar/UpperUDf/UpperUDf_fat.jar;
A = LOAD 'C:/path/to/data/file/student.txt' using PigStorage('\t') AS (name: chararray, age: int, gpa: float);
B = FOREACH A GENERATE myudfs.UPPER(name) ,age, gpa ;
Store B into 'output_student_Json' using org.apache.pig.builtin.JsonStorage();
when I dump or store the ouput in text file its working and but issues occurs when I try to store in JSON format.
Any pointers appreciated
Thank you
I have verified it, and it is working for me if i am using the below line of code for storing output into json file format.
store B into 'json_output' using JsonStorage();

I m trying to use 'ffprobe' with Java or groovy

As per my understanding "ffprobe" will provide file related data in JSON format. So, I have installed the ffprobe in my Ubuntu machine but I don't know how to access the ffprobe JSON response using Java/Grails.
Expected response format:
{
"format": {
"filename": "/Users/karthick/Documents/videos/TestVideos/sample.ts",
"nb_streams": 2,
"nb_programs": 1,
"format_name": "mpegts",
"format_long_name": "MPEG-TS (MPEG-2 Transport Stream)",
"start_time": "1.430800",
"duration": "170.097489",
"size": "80425836",
"bit_rate": "3782576",
"probe_score": 100
}
}
This is my groovy code
def process = "ffprobe -v quiet -print_format json -show_format -show_streams HelloWorld.mpeg ".execute()
println "Found ${process.text}"
render process as JSON
I m able to get the process object and i m not able to get the json response
Should i want to convert the process object to json object?
OUTPUT:
Found java.lang.UNIXProcess#75566697
org.codehaus.groovy.grails.web.converters.exceptions.ConverterException: Error converting Bean with class java.lang.UNIXProcess
Grails has nothing to do with this. Groovy can execute arbitrary shell commands in a very simplistic way:
"mkdir foo".execute()
Or for more advanced features, you might look into using ProcessBuilder. At the end of the day, you need to execute ffprobe and then capture the output stream of JSON to use in your app.
Groovy provides a simple way to execute command line processes. Simply
write the command line as a string and call the execute() method.
The execute() method returns a java.lang.Process instance.
println "ffprobe <options>".execute().text
[Source]