I'm wondering how to achieve the following thing:
I have a string version of a bibteX file (obtained through requests in the following representation:
'#article{blablabla,\n key={\n\t1234567\n\t},\n title={\n\tblablabla\n\t},\n author={\n\t Name_of_the_authors \n\t},\n journal={\n\t Name_of_the_journal \n\t},\n volume={\n\t\n\t},\n pages={\n\t\n\t},\n year={\n\t 2020 \n\t},\n url={\n\t DOI URL \n\t}\n}'
From this I would like to obtain a dict specifying the information that I need from above string, for example:
dict1 = {author : 'Name_of_the_author', 'year' : 2020, 'url':'DOI URL'}
Maybe I could exploit the curly brackets for getting the information?
Many thanks,
James
This is not the most beautiful code to get there, but assuming your sample string in the question is representative of the actual data, one way to handle your problem is through some string manipulation:
data = """#article{blablabla,\n key={\n\t1234567\n\t},\n title={\n\tblablabla\n\t},\n author={\n\t Name_of_the_authors \n\t},\n journal={\n\t Name_of_the_journal \n\t},\n volume={\n\t\n\t},\n pages={\n\t\n\t},\n year={\n\t 2020 \n\t},\n url={\n\t DOI URL \n\t}\n}"""
data2=data.split(',')[1:]
targets = [2,-2,-1] #this is here because you're not interested in all the info in the string - but only 3 items
dict1={}
for target in targets:
item=data2[target].replace('{\n\t','').replace('\n\t}','').strip().split("=")
dict1[item[0]]=item[1].strip()
dict1
Output:
{'author': 'Name_of_the_authors', 'year': '2020', 'url': 'DOI URL \n}'}
I have created mongodb collection using nodejs as below:
var mongoose = require('mongoose');
var TTTUsersSchema = new mongoose.Schema({
username: String,
password:String,
active: Boolean,
created_at: { type: Date, default: Date.now },
updated_at: { type: Date, default: Date.now },
});
module.exports = mongoose.model('Users', TTTUsersSchema);
Since creating record the default date is stored as Tue Nov 13 2018
20:53:47 GMT+0000 (GMT Standard Time) so when I fetch this field then it
is displayed in HTML as above. I want to display it in DD/MM/YY HH:MM
format.
Where I need to change? In HTML UI level? or at Mondgodb collection level.
Please help.
Time stamps should always be stored in a consistent way, e.g. like ISODate() in MongoDB, and always be handled in the code in a consistent way, e.g. like Date object in Javascript.
Only when you
present a time stamp to a user as string, or
parse a time string from the user
then you do conversions.
Unfortunately JavaScript Date object is severely limited, so I would suggest to use a package like Moment.js which offers lots of formatting capabilities:
const moment = require('moment');
// date is a Date object you got, e.g. from MongoDB
const time = moment(date);
console.log(time.format("DD/MM/YY HH:mm"));
You need to format your date time at view layer.
one option is to format at your controller if you are using MVC or at the layer that produce view model.
the another option is to format at html using javascript. You may need to use lib like moment.js
Assume current day is Tue Nov 13 2018 20:53:47 GMT+0000
Use .toDateString()
var date = new Date();
console.log(date.toDateString());
It will display:
Tue Nov 13 2018
The end goal for this is to be part of a chatbot that returns an airport's weather.
Using import.io, I built an endpoint to query the weather service I'd which provides this response:
{"extractorData"=>
{"url"=>
"https://www.aviationweather.gov/metar/data?ids=kokb&format=decoded&hours=0&taf=off&layout=on&date=0",
"resourceId"=>"66ca907842aabb6b08b8bc12049ad533",
"data"=>
[{"group"=>
[{"Timestamp"=>[{"text"=>"Data at: 2135 UTC 12 Dec 2016"}],
"Airport"=>[{"text"=>"KOKB (Oceanside Muni, CA, US)"}],
"FullText"=>
[{"text"=>
"KOKB 122052Z AUTO 24008KT 10SM CLR 18/13 A3006 RMK AO2 SLP179 T01780133 58021"}],
"Temperature"=>[{"text"=>"17.8°C ( 64°F)"}],
"Dewpoint"=>[{"text"=>"13.3°C ( 56°F) [RH = 75%]"}],
"Pressure"=>
[{"text"=>
"30.06 inches Hg (1018.0 mb) [Sea level pressure: 1017.9 mb]"}],
"Winds"=>
[{"text"=>"from the WSW (240 degrees) at 9 MPH (8 knots; 4.1 m/s)"}],
"Visibility"=>[{"text"=>"10 or more sm (16+ km)"}],
"Ceiling"=>[{"text"=>"at least 12,000 feet AGL"}],
"Clouds"=>[{"text"=>"sky clear below 12,000 feet AGL"}]}]}]},
"pageData"=>
{"resourceId"=>"66ca907842aabb6b08b8bc12049ad533",
"statusCode"=>200,
"timestamp"=>1481578559306},
"url"=>
"https://www.aviationweather.gov/metar/data?ids=kokb&format=decoded&hours=0&taf=off&layout=on&date=0",
"runtimeConfigId"=>"2ddb288f-9e57-4b58-a690-1cd409f9edd3",
"timestamp"=>1481579246454,
"sequenceNumber"=>-1}
I seem to be running into two issues. How do I:
pull each field and write it into its own variable
ignore the "text" modifier in the response.
If you're getting a response object, you might want to do something like
parsed_json = JSON.parse(response.body)
Then you can do things like parsed_json[:some_field]
The simple answer is:
require 'json'
foo = JSON['{"a":1}']
foo # => {"a"=>1}
JSON is smart enough to look at the parameter and, based on whether it's a string or an Array or Hash, parse it or serialize it. In the above case it parsed it back into a Hash.
From that point it takes normal Ruby to dive into the hash you got back and access particular values:
foo = JSON['{"a":1, "b":[{"c":3}]}']
foo # => {"a"=>1, "b"=>[{"c"=>3}]}
foo['b'][0]['c'] # => 3
How to walk through a hash is covered extensively on the internet and here on Stack Overflow, so search around and see what you can find.
I have a complex JSON file that looks like this: http://pastebin.com/4UfadbqS
I would like to load only several values from these JSON objects using Pig Latin. I tried doing that like this:
mydata = LOAD 'data.json'
USING JsonLoader('id:chararray, created_at:chararray,
user: {(language:chararray)}’);
STORE mydata
INTO 'output';
But it seems that Pig Latin is just taking the first 3 values from the JSON and saving them (it does not recognize the column name as a key). Is there a way to achieve this? OR should I just list ALL the values from JSON in a Pig and filter them after that?
There are few problems in the above approach
1. JsonLoader will always expect the full schema of your input but you gave only three fields.
2. JsonLoader will always expect the entire input as a single line but your input is multiline.
3. JsonLoader will not support nested schema but your input contains nested schema.
To solve all the above problems you have use the thirdparty library elephant-bird jar.
Download the (elephant-bird-pig-4.1.jar and elephant-bird-hadoop-compat-4.1.jar) jar file from this link
http://www.java2s.com/Code/Jar/e/elephant.htm and try the below approach
I copied your entire input and formatted as a single line as below.
input.json
{"filter_level":"medium","retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"truncated":false,"lang":"en","in_reply_to_status_id_str":null,"id":488927960280211456,"in_reply_to_user_id_str":null,"in_reply_to_status_id":null,"created_at":"Tue Jul 15 06:08:04 +0000 2014","favorite_count":0,"place":null,"coordinates":null,"text":"RT #BulleyBufton: #MinaANDMaya PLEASE RT /VOTE BULLEY. Last day to help me win my old rescue #HilbraesDogs £5k https://t.co/Y8g47fLYY1 http\u2026","contributors":null,"retweeted_stt
atus":{"filter_level":"low","contributors":null,"text":"#MinaANDMaya PLEASE RT /VOTE BULLEY. Last day to help me win my old rescue #HilbraesDogs £5k https://t.co/Y8g47fLYY1 httpp
://t.co/DDco9wVXtP","geo":null,"retweeted":false,"in_reply_to_screen_name":"MinaANDMaya","possibly_sensitive":false,"truncated":false,"lang":"en","entities":{"trends":[],"symbols":[],"urls":[{"expanded_url":"https://www.animalfriendsquote.co.uk/fb-worldcup/","indices":[93,116],"display_url":"animalfriendsquote.co.uk/fb-worldcup/","url":"https://t.co/Y8g47fLYY1"}],"hashtags":[],"media":[{"sizes":{"thumb":{"w":150,"resize":"crop","h":150},"small":{"w":340,"resize":"fit","h":455},"large":{"w":706,"resize":"fit","h":946},"medium":{"w":600,"resize":"fit","h":803}},"id":488926730481332224,"media_url_https":"https://pbs.twimg.com/media/BskERVuIcAAJZGu.jpg","media_url":"http://pbs.twimg.com/media/BskERVuIcAAJZGu.jpg","expanded_url":"http://twitter.com/BulleyBufton/status/488926827394904064/photo/1","indices":[117,139],"id_str":"488926730481332224","type":"photo","display_url":"pic.twitter.com/DDco9wVXtP","url":"http://t.co/DDco9wVXtP"}],"user_mentions":[{"id":132204038,"name":"Mina*Bad Yoga Kitty*","indices":[0,12],"screen_name":"MinaANDMaya","id_str":"132204038"},{"id":2308374684,"name":"Julianna Kaminski","indices":[75,88],"screen_name":"HilbraesDogs","id_str":"2308374684"}]},"in_reply_to_status_id_str":null,"id":488926827394904064,"source":"<a href=\"http://twitter.com/download/android\" rel=\"nofollow\">Twitter for Android<\/a>","in_reply_to_user_id_str":"132204038","favorited":false,"in_reply_to_status_id":null,"retweet_count":6,"created_at":"Tue Jul 15 06:03:34 +0000 2014","in_reply_to_user_id":132204038,"favorite_count":3,"id_str":"488926827394904064","place":null,"user":{"location":"CHICAGO , USA","default_profile":false,"statuses_count":8868,"profile_background_tile":true,"lang":"en","profile_link_color":"AD54E8","profile_banner_url":"https://pbs.twimg.com/profile_banners/225136520/1403608773","id":225136520,"following":null,"favourites_count":5082,"protected":false,"profile_text_color":"3D1957","verified":false,"description":"I'm Bulley, I'm proof that there is always hope.\r\nI was in rescue kennels in UK for 9yrs. #ada_bscakes took me in.\r\nWe've moved to America to start a new life.","contributors_enabled":false,"profile_sidebar_border_color":"000000","name":"BULLEY","profile_background_color":"0A0A0A","created_at":"Fri Dec 10 19:55:17 +0000 2010","default_profile_image":false,"followers_count":3421,"profile_image_url_https":"https://pbs.twimg.com/profile_images/486614595457789952/gtcLac9w_normal.jpeg","geo_enabled":true,"profile_background_image_url":"http://pbs.twimg.com/profile_background_images/378800000166829702/isbjd7O4.jpeg","profile_background_image_url_https":"https://pbs.twimg.com/profile_background_images/378800000166829702/isbjd7O4.jpeg","follow_request_sent":null,"url":null,"utc_offset":-39600,"time_zone":"International Date Line West","notifications":null,"profile_use_background_image":true,"friends_count":3702,"profile_sidebar_fill_color":"7AC3EE","screen_name":"BulleyBufton","id_str":"225136520","profile_image_url":"http://pbs.twimg.com/profile_images/486614595457789952/gtcLac9w_normal.jpeg","listed_count":29,"is_translator":false},"coordinates":null},"geo":null,"entities":{"trends":[],"symbols":[],"urls":[{"expanded_url":"https://www.animalfriendsquote.co.uk/fb-worldcup/","indices":[111,134],"display_url":"animalfriendsquote.co.uk/fb-worldcup/","url":"https://t.co/Y8g47fLYY1"}],"hashtags":[],"media":[{"sizes":{"thumb":{"w":150,"resize":"crop","h":150},"small":{"w":340,"resize":"fit","h":455},"large":{"w":706,"resize":"fit","h":946},"medium":{"w":600,"resize":"fit","h":803}},"id":488926730481332224,"media_url_https":"https://pbs.twimg.com/media/BskERVuIcAAJZGu.jpg","media_url":"http://pbs.twimg.com/media/BskERVuIcAAJZGu.jpg","expanded_url":"http://twitter.com/BulleyBufton/status/488926827394904064/photo/1","source_status_id_str":"488926827394904064","indices":[139,140],"source_status_id":488926827394904064,"id_str":"488926730481332224","type":"photo","display_url":"pic.twitter.com/DDco9wVXtP","url":"http://t.co/DDco9wVXtP"}],"user_mentions":[{"id":225136520,"name":"BULLEY","indices":[3,16],"screen_name":"BulleyBufton","id_str":"225136520"},{"id":132204038,"name":"Mina*Bad Yoga Kitty*","indices":[18,30],"screen_name":"MinaANDMaya","id_str":"132204038"},{"id":2308374684,"name":"Julianna Kaminski","indices":[93,106],"screen_name":"HilbraesDogs","id_str":"2308374684"}]},"source":"<a href=\"http://twitter.com/download/android\" rel=\"nofollow\">Twitter for Android<\/a>","favorited":false,"in_reply_to_user_id":null,"retweet_count":0,"id_str":"488927960280211456","user":{"location":"","default_profile":false,"statuses_count":1370,"profile_background_tile":true,"lang":"zh-tw","profile_link_color":"038544","profile_banner_url":"https://pbs.twimg.com/profile_banners/2272804116/1404662156","id":2272804116,"following":null,"favourites_count":2000,"protected":false,"profile_text_color":"333333","verified":false,"description":"No More Sorrow","contributors_enabled":false,"profile_sidebar_border_color":"000000","name":"Winnie","profile_background_color":"14DBBA","created_at":"Thu Jan 02 10:13:01 +0000 2014","default_profile_image":false,"followers_count":311,"profile_image_url_https":"https://pbs.twimg.com/profile_images/478106512083017728/4ao_8JjE_normal.jpeg","geo_enabled":false,"profile_background_image_url":"http://pbs.twimg.com/profile_background_images/431815421189029888/YrRNpUfd.jpeg","profile_background_image_url_https":"https://pbs.twimg.com/profile_background_images/431815421189029888/YrRNpUfd.jpeg","follow_request_sent":null,"url":null,"utc_offset":null,"time_zone":null,"notifications":null,"profile_use_background_image":true,"friends_count":455,"profile_sidebar_fill_color":"DDEEF6","screen_name":"winnie341881","id_str":"2272804116","profile_image_url":"http://pbs.twimg.com/profile_images/478106512083017728/4ao_8JjE_normal.jpeg","listed_count":0,"is_translator":false}}
PigScript:
REGISTER '/tmp/elephant-bird-hadoop-compat-4.1.jar';
REGISTER '/tmp/elephant-bird-pig-4.1.jar';
A = LOAD 'input.json ' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap;
B = FOREACH A GENERATE myMap#'id' AS ID,myMap#'created_at' AS createdAT,myMap#'user' AS User;
DUMP B;
Output:
(488927960280211456,Tue Jul 15 06:08:04 +0000 2014,[location#,default_profile#false,profile_background_tile#true,statuses_count#1370,lang#zh-tw,profile_link_color#038544,profile_banner_url#https://pbs.twimg.com/profile_banners/2272804116/1404662156,id#2272804116,following#,protected#false,favourites_count#2000,profile_text_color#333333,contributors_enabled#false,description#No More Sorrow,verified#false,name#Winnie,profile_sidebar_border_color#000000,profile_background_color#14DBBA,created_at#Thu Jan 02 10:13:01 +0000 2014,default_profile_image#false,followers_count#311,geo_enabled#false,profile_image_url_https#https://pbs.twimg.com/profile_images/478106512083017728/4ao_8JjE_normal.jpeg,profile_background_image_url#http://pbs.twimg.com/profile_background_images/431815421189029888/YrRNpUfd.jpeg,profile_background_image_url_https#https://pbs.twimg.com/profile_background_images/431815421189029888/YrRNpUfd.jpeg,follow_request_sent#,url#,utc_offset#,time_zone#,notifications#,friends_count#455,profile_use_background_image#true,profile_sidebar_fill_color#DDEEF6,screen_name#winnie341881,id_str#2272804116,profile_image_url#http://pbs.twimg.com/profile_images/478106512083017728/4ao_8JjE_normal.jpeg,is_translator#false,listed_count#0])
In elephantbird library all the values will be stored as key/value pair(ie MAP datatype), so it will be easy to extract the required fields from the loaded data.
In the above pigscript i have extracted the value of 'id','created_at' and 'user' as per your need.
Suppose you want to extract some fields from 'user' data( ex: 'friends_count' and 'followers_count'), in that case you need to project the 'user' field and extract the required data. sample code below.
PigScript:
REGISTER '/tmp/elephant-bird-hadoop-compat-4.1.jar';
REGISTER '/tmp/elephant-bird-pig-4.1.jar';
A = LOAD 'input.json ' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap;
B = FOREACH A GENERATE 'user' AS User;
C = FOREACH B GENERATE User#'friends_count', User#'followers_count';
DUMP C;
Output:
(455,311)