Load JSON array into Pig - json

I have a json file with the following format
[
{
"id": 2,
"createdBy": 0,
"status": 0,
"utcTime": "Oct 14, 2014 4:49:47 PM",
"placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia",
"longitude": 77.5983817,
"latitude": 12.9832418,
"createdDate": "Sep 16, 2014 2:59:03 PM",
"accuracy": 5,
"loginType": 1,
"mobileNo": "0000005567"
},
{
"id": 4,
"createdBy": 0,
"status": 0,
"utcTime": "Oct 14, 2014 4:52:48 PM",
"placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia",
"longitude": 77.5983817,
"latitude": 12.9832418,
"createdDate": "Oct 8, 2014 5:24:42 PM",
"accuracy": 5,
"loginType": 1,
"mobileNo": "0000005566"
}
]
when i try to load the data to pig using JsonLoader class I am getting a error like Unexpected end-of-input: expected close marker for OBJECT
a = LOAD '/user/root/jsoneg/exp.json' USING JsonLoader('id:int,createdBy:int,status:int,utcTime:chararray,placeName:chararray,longitude:double,latitude:double,createdDate:chararray,accuracy:double,loginType:double,mobileNo:chararray');
b = foreach a generate $0,$1,$2;
dump b;

I am also faced similar kind of problem sometime back, later i came to know that Pig JSON will not support multiline json format. It will always expect the json input must be in single line.
Instead of native Jsonloader, i suggest you to use elephantbird json loader. It is pretty good for Jsons formats.
You can download the jars from the below link
http://www.java2s.com/Code/Jar/e/elephant.htm
I changed your input format to single line and loaded through elephantbird as below
input.json
{"test":[{"id": 2,"createdBy": 0,"status": 0,"utcTime": "Oct 14, 2014 4:49:47 PM","placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia","longitude": 77.5983817,"latitude": 12.9832418,"createdDate": "Sep 16, 2014 2:59:03 PM","accuracy": 5,"loginType": 1,"mobileNo": "0000005567"},{"id": 4,"createdBy": 0,"status": 0,"utcTime": "Oct 14, 2014 4:52:48 PM","placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia","longitude": 77.5983817,"latitude": 12.9832418,"createdDate": "Oct 8, 2014 5:24:42 PM","accuracy": 5,"loginType": 1,"mobileNo": "0000005566"}]}
PigScript:
REGISTER '/tmp/elephant-bird-hadoop-compat-4.1.jar';
REGISTER '/tmp/elephant-bird-pig-4.1.jar';
A = LOAD 'input.json ' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');
B = FOREACH A GENERATE FLATTEN($0#'test');
C = FOREACH B GENERATE FLATTEN($0) AS mymap;
D = FOREACH C GENERATE mymap#'id',mymap#'placeName',mymap#'status';
DUMP D;
Output:
(2,21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia,0)
(4,21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia,0)

Related

Strip Content From a Data Object in Vuejs

I have a JSON data and I would like to strip "description" into two different data objects.
JSON FILE:
{
"#attributes": {
"version": "2.0"
},
"channel": {
"title": "Rust News",
"description": "The latest news posts from Rust",
"lastBuildDate": "Thu, 07 Jul 2022 07:00:00 Z",
"item": [
{
"guid": "https:\/\/rust.facepunch.com\/news\/july-2022-update\/",
"link": "https:\/\/rust.facepunch.com\/news\/july-2022-update\/",
"title": "July Update",
"description": "<img src=\"https:\/\/files.facepunch.com\/paddy\/20220705\/july_2022_header.jpg\"><br\/>Combat balance, faster load times, chat filter and much more!\u00a0\u00a0",
"pubDate": "Thu, 07 Jul 2022 07:00:00 Z"
},
{
"guid": "https:\/\/rust.facepunch.com\/news\/community-update-243\/",
"link": "https:\/\/rust.facepunch.com\/news\/community-update-243\/",
"title": "Community Update 243",
"description": "<img src=\"https:\/\/files.facepunch.com\/errn\/1b1311b1\/FTTfqglWQAMxkzD.jpg\"><br\/>Custom desk mats, Mars monuments, thoughtful poetry, and more!",
"pubDate": "Wed, 15 Jun 2022 12:00:00 Z"
},
...
I would like to fetch two objects from the same "description" string:
First Object:
<img src="https://files.facepunch.com/paddy/20220705/july_2022_header.jpg">
Second Object:
Combat balance, faster load times, chat filter and much more!
Here is my Script file:
data() {
return {
items: []
}
},
mounted() {
this.axios.get("./api/json.php").then(response => {
this.items = response.data.channel.item;
});
}
how would I assign item.image and item.description string separately?
Or perhaps there is a way to do it from the server-side since the JSON is generated with a PHP from an XML file
Any help with this would be much appreciated!

Classroom API returns wrong due date

I am requesting for a list of assignment for each course. For this example I created the due date to be on the 11th but the API returns the due date as the 12th.
courseWork": [
{
"courseId": "116315138435",
"id": "116726071520",
"title": "Test Assignments",
"description": "ojoijoijoijoijoij",
"state": "PUBLISHED",
"alternateLink": "https://classroom.google.com/c/MTE2MzE1MTM4NDM1/a/MTE2NzI2MDcxNTIw/details",
"creationTime": "2020-07-09T17:53:00.220Z",
"updateTime": "2020-07-09T17:53:10.544Z",
"dueDate": {
"year": 2020,
"month": 7,
"day": 12
},
"dueTime": {
"hours": 4,
"minutes": 59
},
"maxPoints": 100,
"workType": "ASSIGNMENT",
"submissionModificationMode": "MODIFIABLE_UNTIL_TURNED_IN",
"creatorUserId": "111094682610866207024"
}
]
I am not sure that I can use a time zone to fix this because it is simply returning the day/month/year without a specified time zone. Can I simply subtract the due dates day by one? or is there a more logical approach?

Dealing with newlines embedded within strings

I'm working with twitter data which fetched in jsonl form. I've converted it to json and am trying to convert it to a csv (to import into a program which accepts either csv or MySQL). However, some people put forced new lines into their tweets or bios. This is causing the csv file to have multiple lines for entries, often breaking up in the middle of a tweet. I've tried a few of the python json to csv codes floating on github.
The latest attempt I tried:
jq -s "." tiny00subset.jsonl > tiny00subset.json
json2csv -i tiny00subset.json -o tiny00subset.csv
Partial example tweet (json format):
{
"created_at": "Mon Aug 13 10:40:34 +0000 2018",
"id": 1028954459110555600,
"id_str": "1028954459110555649",
"full_text": "Oh well, they deal with it quite well. Like they add numbers and facts and such crazy stuff.\nhttps://REPLACED/DuBGmHCnG8\n#climatechange https://REPLACED/d5IBchM3Uk",
"truncated": false,
"display_text_range": [
0,
131
],
"entities": {
"hashtags": [
{
"text": "climatechange",
"indices": [
117,
131
]
}
],
"symbols": [],
"user_mentions": [],
"urls": [
{
"url": "https://REPLACED/DuBGmHCnG8",
"expanded_url": "https://tamino.wordpress.com/2018/08/08/usa-temperature-can-i-sucker-you/",
"display_url": "tamino.wordpress.com/2018/08/08/usa…",
"indices": [
93,
116
]
},
{
"url": "https://REPLACED/d5IBchM3Uk",
"expanded_url": "https://twitter.com/Tony__Heller/status/1028672939753758720",
"display_url": "twitter.com/Tony__Heller/s…",
"indices": [
132,
155
]
}
]
},
}
CSV Output:
"Mon Aug 13 10:40:34 +0000 2018",1028954459110555600,"1028954459110555649","Oh well, they deal with it quite well. Like they add numbers and facts and such crazy stuff.
https://REPLACED/DuBGmHCnG8
#climatechange https://REPLACED/d5IBchM3Uk",false,"[0,131]","{""hashtags"":[{""text"":""climatechange"",""indices"":[117,131]}],""symbols"":[],""user_mentions"":[],""urls"":[{""url"":""https://REPLACED/DuBGmHCnG8"",""expanded_url"":""https://tamino.wordpress.com/2018/08/08/usa-temperature-can-i-sucker-you/"",""display_url"":""tamino.wordpress.com/2018/08/08/usa…"",""indices"":[93,116]},{""url"":""https://REPLACED/d5IBchM3Uk"",""expanded_url"":""https://twitter.com/Tony__Heller/status/1028672939753758720"",""display_url"":""twitter.com/Tony__Heller/s…"",""indices"":[132,155]}]}","TweetDeck",,,,,,"{""id"":59806323,""id_str"":""59806323"",""name"":""Daniel"",""screen_name"":""sleeksorrow"",""location"":""Karlsruhe, Germany"",""description"":""Politik, IT, Blödsinn und deren Schnittmenge. Ebenfalls: Hochmittelalter Darstellung, Falknerei, Greifvogelschutz - profile picture by #herrkausk"",""url"":""https://REPLACED/E8aNHIhCtg"",""entities"":{""url"":{""urls"":[{""url"":""https://REPLACED/E8aNHIhCtg"",""expanded_url"":""http://sleeksorrow.blogspot.com/"",""display_url"":""sleeksorrow.blogspot.com"",""indices"":[0,23]}]},""description"":{""urls"":[]}},""protected"":false,""followers_count"":572,""friends_count"":392,""listed_count"":47,""created_at"":""Fri Jul 24 15:15:25 +0000 2009"",""favourites_count"":13259,""utc_offset"":null,""time_zone"":null,""geo_enabled"":false,""verified"":false,""statuses_count"":48861,""lang"":null,""contributors_enabled"":false,""is_translator"":false,""is_translation_enabled"":false,""profile_background_color"":""1A1B1F"",""profile_background_image_url"":""http://abs.twimg.com/images/themes/theme9/bg.gif"",""profile_background_image_url_https"":""https://abs.twimg.com/images/themes/theme9/bg.gif"",""profile_background_tile"":false,""profile_image_url"":""http://pbs.twimg.com/profile_images/877219681513480192/1rj4xqpK_normal.jpg"",""profile_image_url_https"":""https://pbs.twimg.com/profile_images/877219681513480192/1rj4xqpK_normal.jpg"",""profile_banner_url"":""https://pbs.twimg.com/profile_banners/59806323/1397029131"",""profile_image_extensions_alt_text"":null,""profile_banner_extensions_alt_text"":null,""profile_link_color"":""2FC2EF"",""profile_sidebar_border_color"":""181A1E"",""profile_sidebar_fill_color"":""252429"",""profile_text_color"":""666666"",""profile_use_background_image"":true,""has_extended_profile"":false,""default_profile"":false,""default_profile_image"":false,""can_media_tag"":true,""followed_by"":false,""following"":false,""follow_request_sent"":false,""notifications"":false,""translator_type"":""none""}",,,,,true,1028672939753758700,"1028672939753758720","{""url"":""https://REPLACED/d5IBchM3Uk"",""expanded"":""https://twitter.com/Tony__Heller/status/1028672939753758720"",""display"":""twitter.com/Tony__Heller/s…""}","{""created_at"":""Sun Aug 12 16:01:55 +0000 2018"",""id"":1028672939753758700,""id_str"":""1028672939753758720"",""full_text"":""#DeanFieldingF1 It is very difficult or impossible for climate alarmists to deal with reality. https://REPLACED/wOJTptxIqH"",""truncated"":false,""display_text_range"":[16,94],""entities"":{""hashtags"":[],""symbols"":[],""user_mentions"":[{""screen_name"":""DeanFieldingF1"",""name"":""Dean Fielding"",""id"":797295219825897500,""id_str"":""797295219825897472"",""indices"":[0,15]}],""urls"":[],""media"":[{""id"":1028672868849090600,""id_str"":""1028672868849090560"",""indices"":[95,118],""media_url"":""http://pbs.twimg.com/media/DkaUhinVAAARrIY.jpg"",""media_url_https"":""https://pbs.twimg.com/media/DkaUhinVAAARrIY.jpg"",""url"":""https://REPLACED/wOJTptxIqH"",""display_url"":""pic.twitter.com/wOJTptxIqH"",""expanded_url"":""https://twitter.com/SteveSGoddard/status/1028672939753758720/photo/1"",""type"":""photo"",""sizes"":{""thumb"":{""w"":150,""h"":150,""resize"":""crop""},""medium"":{""w"":1070,""h"":983,""resize"":""fit""},""large"":{""w"":1070,""h"":983,""resize"":""fit""},""small"":{""w"":680,""h"":625,""resize"":""fit""}},""features"":{""orig"":{""faces"":[]},""medium"":{""faces"":[]},""large"":{""faces"":[]},""small"":{""faces"":[]}}}]},""extended_entities"":{""media"":[{""id"":1028672868849090600,""id_str"":""1028672868849090560"",""indices"":[95,118],""media_url"":""http://pbs.twimg.com/media/DkaUhinVAAARrIY.jpg"",""media_url_https"":""https://pbs.twimg.com/media/DkaUhinVAAARrIY.jpg"",""url"":""https://REPLACED/wOJTptxIqH"",""display_url"":""pic.twitter.com/wOJTptxIqH"",""expanded_url"":""https://twitter.com/SteveSGoddard/status/1028672939753758720/photo/1"",""type"":""photo"",""sizes"":{""thumb"":{""w"":150,""h"":150,""resize"":""crop""},""medium"":{""w"":1070,""h"":983,""resize"":""fit""},""large"":{""w"":1070,""h"":983,""resize"":""fit""},""small"":{""w"":680,""h"":625,""resize"":""fit""}},""features"":{""orig"":{""faces"":[]},""medium"":{""faces"":[]},""large"":{""faces"":[]},""small"":{""faces"":[]}},""ext_alt_text"":null},{""id"":1028672883986333700,""id_str"":""1028672883986333697"",""indices"":[95,118],""media_url"":""http://pbs.twimg.com/media/DkaUibAVAAEaQt0.jpg"",""media_url_https"":""https://pbs.twimg.com/media/DkaUibAVAAEaQt0.jpg"",""url"":""https://REPLACED/wOJTptxIqH"",""display_url"":""pic.twitter.com/wOJTptxIqH"",""expanded_url"":""https://twitter.com/SteveSGoddard/status/1028672939753758720/photo/1"",""type"":""photo"",""sizes"":{""thumb"":{""w"":150,""h"":150,""resize"":""crop""},""medium"":{""w"":1070,""h"":983,""resize"":""fit""},""large"":{""w"":1070,""h"":983,""resize"":""fit""},""small"":{""w"":680,""h"":625,""resize"":""fit""}},""features"":{""orig"":{""faces"":[]},""medium"":{""faces"":[]},""large"":{""faces"":[]},""small"":{""faces"":[]}},""ext_alt_text"":null}]},""source"":""Twitter Web Client"",""in_reply_to_status_id"":1028671170802081800,""in_reply_to_status_id_str"":""1028671170802081793"",""in_reply_to_user_id"":797295219825897500,""in_reply_to_user_id_str"":""797295219825897472"",""in_reply_to_screen_name"":""DeanFieldingF1"",""user"":{""id"":435704007,""id_str"":""435704007"",""name"":""Tony Heller"",""screen_name"":""Tony__Heller"",""location"":""Colorado"",""description"":""https://REPLACED/j5CaDNyIqE"",""url"":""https://REPLACED/Pyn117xXna"",""entities"":{""url"":{""urls"":[{""url"":""https://REPLACED/Pyn117xXna"",""expanded_url"":""http://realclimatescience.com"",""display_url"":""realclimatescience.com"",""indices"":[0,23]}]},""description"":{""urls"":[{""url"":""https://REPLACED/j5CaDNyIqE"",""expanded_url"":""https://realclimatescience.com/who-is-tony-heller/"",""display_url"":""realclimatescience.com/who-is-tony-he…"",""indices"":[0,23]}]}},""protected"":false,""followers_count"":44955,""friends_count"":374,""listed_count"":886,""created_at"":""Tue Dec 13 10:44:34 +0000 2011"",""favourites_count"":3740,""utc_offset"":null,""time_zone"":null,""geo_enabled"":true,""verified"":false,""statuses_count"":165165,""lang"":null,""contributors_enabled"":false,""is_translator"":false,""is_translation_enabled"":false,""profile_background_color"":""185370"",""profile_background_image_url"":""http://abs.twimg.com/images/themes/theme1/bg.png"",""profile_background_image_url_https"":""https://abs.twimg.com/images/themes/theme1/bg.png"",""profile_background_tile"":false,""profile_image_url"":""http://pbs.twimg.com/profile_images/1175541923508916225/0qEi4yIj_normal.jpg"",""profile_image_url_https"":""https://pbs.twimg.com/profile_images/1175541923508916225/0qEi4yIj_normal.jpg"",""profile_banner_url"":""https://pbs.twimg.com/profile_banners/435704007/1469798959"",""profile_image_extensions_alt_text"":null,""profile_banner_extensions_alt_text"":null,""profile_link_color"":""0084B4"",""profile_sidebar_border_color"":""FFFFFF"",""profile_sidebar_fill_color"":""DDEEF6"",""profile_text_color"":""333333"",""profile_use_background_image"":true,""has_extended_profile"":false,""default_profile"":false,""default_profile_image"":false,""can_media_tag"":false,""followed_by"":false,""following"":false,""follow_request_sent"":false,""notifications"":false,""translator_type"":""none""},""geo"":null,""coordinates"":null,""place"":null,""contributors"":null,""is_quote_status"":false,""retweet_count"":16,""favorite_count"":27,""favorited"":false,""retweeted"":false,""possibly_sensitive"":false,""lang"":""en""}",0,0,false,false,false,"en"
starting from
{
"created_at": "Mon Aug 13 10:40:34 +0000 2018",
"id": 1028954459110555600,
"id_str": "1028954459110555649",
"full_text": "Oh well, they deal with it quite well. Like they add numbers and facts and such crazy stuff.\nhttps://REPLACED/DuBGmHCnG8\n#climatechange https://REPLACED/d5IBchM3Uk",
"truncated": false,
"display_text_range": [
0,
131
],
"entities": {
"hashtags": [
{
"text": "climatechange",
"indices": [
117,
131
]
}
],
"symbols": [],
"user_mentions": [],
"urls": [
{
"url": "https://REPLACED/DuBGmHCnG8",
"expanded_url": "https://tamino.wordpress.com/2018/08/08/usa-temperature-can-i-sucker-you/",
"display_url": "tamino.wordpress.com/2018/08/08/usa…",
"indices": [
93,
116
]
},
{
"url": "https://REPLACED/d5IBchM3Uk",
"expanded_url": "https://twitter.com/Tony__Heller/status/1028672939753758720",
"display_url": "twitter.com/Tony__Heller/s…",
"indices": [
132,
155
]
}
]
}
}
and running (it's https://github.com/johnkerl/miller)
mlr --j2c unsparsify input.json >input.csv
you have this kind of output https://gist.github.com/aborruso/6e0361923a3c45b9fe55ebf7590953de#file-output-csv
If you open it as raw you have the carriage return. And a spreasheet read it properly.
Then, using properly the import process you need to use, the \n is not a problem.

Greater and less than operator in a URL

Hello everyone i have a db.json file which i run in http://localhost:3004
The format is like this:
[
{
"Symbol": "AAPL",
"prodType": "STK",
"Market": "Symbol Industries Inc",
"Country": "US",
"Quantity": 10,
"NominalValue": 123.45,
"AvgPrice": 131.16,
"PositionCost": 123.87,
"LastPrice": 123.567,
"PositionValue": 145.78,
"TotalValue": 123.347,
"PortfolioPercentage": 12,
"ProfitLoss": 1235.09,
"Results": "12 Dec, 2017",
"Dividend": "29 Apr, 2017"
},
{
"Symbol": "fsdfsd",
"prodType": "DER",
"Market": "Symbol Industries Inc",
"Country": "Greece",
"Quantity": 10,
"NominalValue": 123.45,
"AvgPrice": 131.16,
"PositionCost": 123.87,
"LastPrice": 123.567,
"PositionValue": 145.78,
"TotalValue": 123.347,
"PortfolioPercentage": 12,
"ProfitLoss": 1235.09,
"Results": "12 Dec, 2017",
"Dividend": "29 Apr, 2017"
}
]
i want to make a query to the URL to bring me the records which have total value greater than 100 and lesser than 130 for example but i can't. I use different approaches but no one is working properly. How can i implement such a query in this URL?

How does DROPBOX response (JSON) data look like?

Good day, does anyone know typical JSON response data, when accessing a file? I am more interested in whether or not one can check if a response object is a file or a directory!
Thanx mates
This is a common JSON answer, when you try to get information about file/folder.
You can see more information about request here
if is_dir is true, it's a folder.
{
"size": "225.4KB",
"rev": "35e97029684fe",
"thumb_exists": false,
"bytes": 230783,
"modified": "Tue, 19 Jul 2011 21:55:38 +0000",
"client_mtime": "Mon, 18 Jul 2011 18:04:35 +0000",
"path": "/Getting_Started.pdf",
"is_dir": false,
"icon": "page_white_acrobat",
"root": "dropbox",
"mime_type": "application/pdf",
"revision": 220823
}