I'd like to parse the JSON output from an IEX Cloud stock quote query: https://cloud.iexapis.com/stable/stock/aapl/quote?token=YOUR_TOKEN_HERE
I have tired to use Regex101 to solve the issue:
https://regex101.com/r/y8i01T/1/
Here is the Regex expression that I tried:"([^"]+)":"?([^",\s]+)
Here is the example of a IEX Cloud stock quote output for Apple:
{
"symbol":"AAPL",
"companyName":"Apple, Inc.",
"calculationPrice":"close",
"open":204.86,
"openTime":1556285400914,
"close":204.3,
"closeTime":1556308800303,
"high":205,
"low":202.12,
"latestPrice":204.3,
"latestSource":"Close",
"latestTime":"April 26, 2019",
"latestUpdate":1556308800303,
"latestVolume":18604306,
"iexRealtimePrice":204.34,
"iexRealtimeSize":48,
"iexLastUpdated":1556308799763,
"delayedPrice":204.3,
"delayedPriceTime":1556308800303,
"extendedPrice":204.46,
"extendedChange":0.16,
"extendedChangePercent":0.00078,
"extendedPriceTime":1556310657637,
"previousClose":205.28,
"change":-0.98,
"changePercent":-0.00477,
"iexMarketPercent":0.030716437366704246,
"iexVolume":571458,
"avgTotalVolume":27717780,
"iexBidPrice":0,
"iexBidSize":0,
"iexAskPrice":0,
"iexAskSize":0,
"marketCap":963331704000,
"peRatio":16.65,
"week52High":233.47,
"week52Low":142,
"ytdChange":0.29512900000000003
}
I want to save the key value pairs in the JSON response without quotes around the key and gather the value starting after the colon (:). I need to exclude any quotes for text, the comma at the end of each line and include the last key-value pair that does not include a comma at the end of the line.
For example, "peRatio":16.65, should have the key equal to peRatio and the value equal to 16.65. Or another example, "changePercent":-0.00477, should have a key equal to changePercent and a value of -0.00477. If it's a text such as "companyName":"Apple, Inc.",, it should have a key equal to companyName and a value equal to Apple, Inc.
Also, the last JSON key-value entry: "ytdChange":0.29512900000000003 does not have a comma and that needs to be accounted for.
You most likely do not need to parse your data using regex. However, if you wish/have to do so, maybe for practicing regular expressions, you could do so by defining a few boundaries in your expression.
This RegEx might help you to do that, which divides your input JSON values into three categories of string, numeric, and last no-comma values:
"([^"]+)":("(.+)"|(.+))(,{1}|\n\})
Then, you can use the \n} boundary for the last value, "" boundary for your string values and no boundary for numeric values.
Related
I have an outlet_details table having two columns(id and extended_attributes as a JSON object).
extended_attributes have values like
{
"parent-0-0-id": "DS-606",
"parent-0-1-id": "SD066",
"secondaryOutletCode": "MG_918"
}
I want to get parent-0-0-id's value, but when I'm trying to hit
SELECT extended_attributes->>'$.parent-0-0-id' AS 'parent00id' FROM outlet_details;
I'm getting an:
invalid JSON path expression error(3143).
You could just enclose the column name under quotes to separate out the name from escape characters.
SELECT extended_attributes->>"$.\"parent-0-0-id\"" AS 'parent00id' FROM outlet_details; should work
How can I take some JSON data that contains a number and insert commas in the numbers?
Example: I fetch some JSON data from a url and can display it, it contains a number. let's say 100000. (100,000). It doesn't have a comma to better show 100,000.
language used: Angular 6 (Typescript)
There's many ways to do this, pick your poison:
Intl Number Format
var formatter = new Intl.NumberFormat();
formatter.format(number);
Reg-ex:
function addThousandsSeparator(n) {
return n.replace(/\B(?=(\d{3})+(?!\d))/g, ",")
}
Numeral.js
numeral(number).format('0,0')
Number.toLocaleString("en-US") should insert commas, the way you want it to.
Number("100000").toLocaleString("en-US")
// "100,000"
I've tried a couple of different solutions to fix my problem with some "funny" newlines within my json dictionary and none of them works, so I thought I might make a post. The dictionary is achieved by scraping a website.
I have a json dictionary:
my_dict = {
u"Danish title": u"Avanceret",
u"Course type": u"MScTechnol",
u"Type of": u"assessmen",
u"Date": u"\nof exami",
u"Evaluation": u"7 step sca",
u"Learning objectives": u"\nA studen",
u"Participants restrictions": u"Minimum 10",
u"Aid": u"No Aid",
u"Duration of Course": u"13 weeks",
u"name": u"Advanced u",
u"Department": u"31\n",
u"Mandatory Prerequisites": u"31545",
u"General course objectives": u"\nThe cour",
u"Responsible": u"\nMartin C",
u"Location": u"Campus Lyn",
u"Scope and form": u"Lectures, ",
u"Point( ECTS )": u"10",
u"Language": u"English",
u"number": u"31548",
u"Content": u"\nThe cour",
u"Schedule": u"F4 (Tues 1"
}
I have stripped the value content to [:10] to reduce clutter, but some of the values have a length of 300 characters. It might not be portrayed well here, but some of values have a lot of newline characters in them and I've tried a lot of different solutions to remove them, such as str.strip and str.replace but without success because my 'values' are unicode. And by values I mean key, value in my_dict.items().
How do I remove all the newlines appearing in my dictionary? (With the values in focus as some of the newlines are trailing, some are leading and others are in the middle of the content: e.i \nI have a\ngood\n idea\n).
EDIT
I am using Python v. 2.7.11 and the following piece of code doesn't produce what I need. I want all the newlines to be changed to a single whitespace character.
for key, value in test.items():
value = str(value[:10]).replace("\n", " ")
print key, value
If you're trying to remove all \n or any junk character apart from numbers or letters then use regex
for key in my_dict.keys():
my_dict[key] = mydict[key].replace('\\n', '')
my_dict[key] = re.sub('[^A-Za-z0-9 ]+', '', my_dict[key])
print my_dict
If you wish to keep anything apart from those then add it on to the character class inside the regex
for remove '\n' try this ....
for key, value in my_dict.items():
my_dict[key] = ''.join(value.split('\n'))
you need to put the updated value back to your dictionary ( similar to "by value vs. by reference" situation ;) ) ...
to remove the "/n" this one liner may be more "pythonic" :
new_test ={ k:v.replace("\n", "") for k,v in test.iteritems()}
to do what you try to do in your loop try something like:
new_test ={ k:str(value[:10]).replace("\n", " ") for k,v in test.iteritems()}
In your code, value takes the new value, but you never write it back...
So for example, this would work (but be slower, also you would be changing the values inside the loop, which should not cause problems, but the interpreter might not like...):
for key, value in test.items():
value = str(value[:10]).replace("\n", " ")
#now put it back to the dictionary...
test[key]=value
print key, value
I want to import many informations from a CSV file to Elastic Search.
My issue is I don't how can I use a equivalent of substring to select information into a CSV column.
In my case I have a field date (YYYYMMDD) and I want to have (YYYY-MM-DD).
I use filter, mutate, gsub like:
filter
{
mutate
{
gsub => ["date", "[0123456789][0123456789][0123456789][0123456789][0123456789][0123456789][0123456789][0123456789]", "[0123456789][0123456789][0123456789][0123456789]-[0123456789][0123456789]-[0123456789][0123456789]"]
}
}
But my result is false.
I can indentified my string but I don't how can I extract part of this.
My target it's to have something like:
gsub => ["date", "[0123456789][0123456789][0123456789][0123456789][0123456789][0123456789][0123456789][0123456789]","%{date}(0..3}-%{date}(4..5)-%{date}"(6..7)]
%{date}(0..3} : select from the first to the 4 characters of csv columns date
You can use ruby plugin to do conversion. As you say, you will have a date field. So, we can use it directly in ruby
filter {
ruby {
code => "
date = Time.strptime(event['date'],'%Y%m%d')
event['date_new'] = date.strftime('%Y-%m-%d')
"
}
}
The date_new field is the format you want.
First, you can use a regexp range to match a sequence, so rather than [0123456789], you can do [0-9]. If you know there will be 4 numbers, you can do [0-9]{4}.
Second, you want to "capture" parts of your input string and reorder them in the output. For that, you need capture groups:
([0-9]{4})([0-9]{2})([0-9]{2})
where parens define the groups. Then you can reference those on the right side of your gsub:
\1-\2-\3
\1 is the first capture group, etc.
You might also consider getting these three fields when you do the grok{}, and then putting them together again later (perhaps with add_field).
The text file is comma separated. However, one of the columns ex: "Issue" with value "Other (phone, health club, etc)" also contains commas.
Question: What should the data type of "Issue" be? And how should I format the table (row format delimited terminated by) so that the comma in the column (issue) is accounted for correctly
I had set it this way:
create table consumercomplaints (ComplaintID int,
Product string,
Subproduct string,
Issue string,
Subissue string,
State string,
ZIPcode int,
Submittedvia string,
Datereceived string,
Datesenttocompany string,
Company string,
Companyresponse string,
Timelyresponse string,
Consumerdisputed string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
location '/user/hive/warehouse/mydb/consumer_complaints.csv';
Sample data --
Complaint ID,Product,Sub-product,Issue,Sub-issue,State,ZIP code,Submitted via,Date received,Date sent to company,Company,Company response,Timely response?,Consumer disputed?
943291,Debt collection,,Cont'd attempts collect debt not owed,Debt is not mine,MO,63123,Web,07/18/2014,07/18/2014,"Enhanced Recovery Company, LLC",Closed with non-monetary relief,Yes,
943698,Bank account or service,Checking account,Deposits and withdrawals,,CA,93030,Web,07/18/2014,07/18/2014,U.S. Bancorp,In progress,Yes,
943521,Debt collection,,Cont'd attempts collect debt not owed,Debt is not mine,OH,44116,Web,07/18/2014,07/18/2014,"Vital Solutions, Inc.",Closed with explanation,Yes,
943400,Debt collection,"Other (phone, health club, etc.)",Communication tactics,Frequent or repeated calls,MD,21133,Web,07/18/2014,07/18/2014,"The CBE Group, Inc.",Closed with explanation,Yes,
I think you need to format your output data by some control character like Control-A. I don't think there will be any data type to support this. OR you can write a UDF to load the data and take care of formatting in the UDF logic.
Short of writing a serde, you could do 2 things,
escape the comma in the original data before loading, using some character. for e.g. \
and then use the hive create table command using row format delimited fields terminated by ',' escaped by **'\'**
you can use a regex that takes care of the comma enclosed within double quotes,
so first you apply a regex to data as shown in hortonworks/apache manuals,
regexp_extract(col_value, '^(?:([^,]*)\,?){1}', 1) player_id source:https://web.archive.org/web/20171125014202/https://hortonworks.com/tutorial/how-to-process-data-with-apache-hive/
Ensure that you are able to load and see your data using this expression ( barring the enclosed commas).
Then modify the expression to account for enclosed commas. You can do something like this,
String s = "a,\"hi, I am here\",c,d,\"ahoy, mateys\"";
String pattern ="^(?:([^\",]*|\"[^\"]*\"),?){4}";
p = Pattern.compile(pattern);
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println("YES-"+m.groupCount());
System.out.println("=>"+m.group(1));
}
by changing {4} to {1}, {2}, ... you can get respective fields.