decode_json and return first key in hash - json

JSON string input: https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=MSFT&apikey=demo
I am trying to return just the first key (current day) in the hash but have been unable to do so. My code looks like the following
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
use Data::Dumper;
use JSON;
my $html = get("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=AMD&apikey=CMDPTEHVYH7W5VSZ");
my $decoded = decode_json($html);
my ($open) = $decoded->{'Time Series (Daily)'}->[0]->{'1. open'};
I keep getting "Not an ARRAY reference" which I researched and got more confused.
I can access what I want directly with the below code but I want to access just the first result or the current day:
my ($open) = $decoded->{'Time Series (Daily)'}{'2017-12-20'}{'1. open'};
Also if I do something like this:
my ($open) = $decoded->{'Time Series (Daily)'};
print Dumper($open);
The output is as follows:
$VAR1 = {
'2017-09-07' => {
'1. open' => '12.8400',
'5. volume' => '35467788',
'2. high' => '12.9400',
'4. close' => '12.6300',
'3. low' => '12.6000'
},
'2017-11-15' => {
'3. low' => '10.7700',
'4. close' => '11.0700',
'2. high' => '11.1300',
'5. volume' => '33326871',
'1. open' => '11.0100'
},
'2017-11-30' => {
'1. open' => '10.8700',
'2. high' => '11.0300',
'5. volume' => '43101899',
'3. low' => '10.7600',
'4. close' => '10.8900'
},
Thank you in advance for any help you can provide a noob.

Problem 1: { denotes the start of a JSON object, which gets decoded into a hash. Trying to derefence an array is going to fail.
Problem 2: Like Perl hashes, JSON objects are unordered, so talking about the
"first key" makes no sense. Perhaps you want the most recent date?
use List::Util qw( maxstr );
my $time_series_daily = $decoded->{'Time Series (Daily)'};
my $latest_date = maxstr #$time_series_daily;
my $open = $time_series_daily->{$latest_date}{'1. open'};

You are picking among hashref keys, not array (sequential container) elements. Since hashes are inherently unordered you can't index into that list but need to sort keys as needed.
With the exact format you show this works
my $top = (sort { $b cmp $a } keys %{ $decoded->{'Time Series (Daily)'} } )[0];
say $decoded->{'Time Series (Daily)'}{$top}{'1. open'};
It gets the list of keys, inverse-sorts them (alphabetically), and takes the first element of that list.
If your date-time format may vary then you'll need to parse it for sorting.
If you will really ever only want the most-recent one this is inefficient since it sorts the whole list. Then use a more specific tool to extract only the "largest" element, like
use List::Util qw(reduce);
my $top = reduce { $a gt $b ? $a : $b }
keys %{ $decoded->{'Time Series (Daily)'} };
But then in your case this can be done simply by maxstr from the same List::Util module, as shown in ikegami's answer. On the other hand, if the datetime format doesn't lend itself to a direct lexicographical comparison used by strmax then the reduce allows use of custom comparisons.

Related

perl: Finding mean and variance of large numbers without overflow

I am using a subroutine (stats) to calculate statistics for a list of numbers.
These numbers may be big enough to lose precision if stored as normal perl numbers.
I recieve such numbers as JSON formatted strings.
To decode these strings without losing precision,
I use a JSON::PP object with allow_nonref and allow_bignum activated.
I send the list of such decoded numbers to stats subroutine
(see in code shown below).
This routine calculates some statistics.
These statistics are then encoded to JSON and saved to file.
Most of the time the process seems to work correctly, but
for some inputs (see code for examples) the calculated value of mean and variance statistics
are either clearly wrong, or are encoded as JSON strings by the encoder, or both.
I suspect this is due to interaction of Math::BigInt and Math::BigFloat objects created by JSON decode, and List::Util::sum0.
I am trying to figure out what causes this and a way to avoid/fix this,
preferably without resorting to big non core modules.
I am willing to accept imprecise calculation of mean and variance,
but not entirely inaccurate results
or numerical results encoded as string in JSON.
A script (stats.pl) to demonstrate the problem:
use strict;
use warnings;
use Data::Dumper;
$Data::Dumper::Varname = "DUMPED_RAWDATA";
use JSON::PP;
use List::Util;
my $JSON = JSON::PP->new->allow_bignum->utf8->pretty->canonical;
sub stats {
#TODO fix bug about negative variance. AVOID OVERFLOW
#TODO use GMP, XS?
# #_ has decoded numbers (called RAWDATA here)
my $n = scalar #_;
my $sum = List::Util::sum0(#_);
my $mean = $sum / $n;
my $var = List::Util::sum0( map { $_**2 } #_ ) / $n - $mean**2;
my $s = {
n => $n,
sum => $sum,
max => List::Util::max(#_),
min => List::Util::min(#_),
mean => $mean,
variance => $var
};
# DUMP STATE IF SOME ERROR OCCURS
print Dumper( \#_ ),
$JSON->encode( { json_encoded_stats => $s, json_encoded_rawdata => \#_ } )
if ( '"' eq substr( $JSON->encode($var), 0, 1 ) #MEAN ENCODED AS STRING
or '"' eq substr( $JSON->encode($mean), 0, 1 ) #VARIANCE ENCODED AS STRING
or $var < 0 ); #VARIANCE IS NEGATIVE!
$s;
}
my #test = (
[
qw( 919300112739897344 919305709216464896 919305709216464896 985592115567603712 959299136196456448)
],
[qw(479655558 429035600 3281034608 3281034608 2606592908 3490045576)],
[ qw(914426431563644928) x 3142 ]
);
for (#test) {
print "---\n";
stats( map { $JSON->decode($_) } #$_ );
}
Below is the curtailed output of perl stats.pl with problems indicated as <---.
---
$DUMPED_RAWDATA1 = [
'919300112739897344',
'919305709216464896',
'919305709216464896',
'985592115567603712',
'959299136196456448'
];
{
"json_encoded_rawdata" : [
919300112739897344,
919305709216464896,
919305709216464896,
985592115567603712,
959299136196456448
],
"json_encoded_stats" : {
"max" : 985592115567603712,
"mean" : "9.40560556587377e+17", <--- ENCODED AS STRING
"min" : 919300112739897344,
"n" : 5,
"sum" : 4702802782936887296,
"variance" : 7.46903843214008e+32
}
}
---
$DUMPED_RAWDATA1 = [
479655558,
429035600,
3281034608,
3281034608,
2606592908,
3490045576
];
{
"json_encoded_rawdata" : [
479655558,
429035600,
3281034608,
3281034608,
2606592908,
3490045576
],
"json_encoded_stats" : {
"max" : 3490045576,
"mean" : 2261233143,
"min" : 429035600,
"n" : 6,
"sum" : 13567398858,
"variance" : "-1.36775568782523e+18" <--- NEGATIVE VARIANCE, STRING ENCODED
}
}
---
$DUMPED_RAWDATA1 = [
'914426431563644928',
.
.
.
<snip 3140 identical lines>
'914426431563644928'
];
{
"json_encoded_rawdata" : [
914426431563644928,
.
.
.
<snip 3140 identical lines>
914426431563644928
],
"json_encoded_stats" : {
"max" : 914426431563644928,
"mean" : "9.14426431563676e+17", <--- STRING ENCODED
"min" : 914426431563644928,
"n" : 3142,
"sum" : 2.87312784797307e+21,
"variance" : -9.75463826617761e+22 <--- NEGATIVE VARIANCE
}
}
None of your inputs are big enough to require JSON::PP to create Math::BigInt objects on a system with 64-bit ints, so it doesn't.
You could do something like the following at the start of your sub.
#_ = map { Math::BigInt->new($_) } #_; # Or ::BigFloat?
Alternatively,
my $zero_B = Math::BigInt->new(0);
sub stats {
my $n = #_;
my $sum_B = sum($zero_B, #_);
my $mean_B = $sum_B / $n;
my $var_B = sum( map { Math::BigInt->new($_) ** 2 } #_ ) / $n - $mean_B ** 2;
my ($min, $max) = minmax(#_);
return {
n => $n,
sum => $sum_B,
max => $max,
min => $min,
mean => $mean_B,
variance => $var_B,
};
}
All together:
use strict;
use warnings;
use Data::Dumper qw( Dumper );
use JSON::PP qw( );
use List::MoreUtils qw( minmax );
use List::Util qw( sum );
use Math::BigInt qw( );
my $zero_B = Math::BigInt->new(0);
my $JSON = JSON::PP->new->allow_bignum->utf8->pretty->canonical;
sub stats {
my $n = #_;
my $sum_B = sum($zero_B, #_);
my $mean_B = $sum_B / $n;
my $var_B = sum( map { Math::BigInt->new($_) ** 2 } #_ ) / $n - $mean_B ** 2;
my ($min, $max) = minmax(#_);
return {
n => $n,
sum => $sum_B,
max => $max,
min => $min,
mean => $mean_B,
variance => $var_B,
};
}
my #test = (
[qw( 919300112739897344 919305709216464896 919305709216464896 985592115567603712 959299136196456448 )],
[qw( 479655558 429035600 3281034608 3281034608 2606592908 3490045576 )],
[ qw( 914426431563644928 ) x 3142 ]
);
for (#test) {
print "---\n";
my $s = stats( map { $JSON->decode($_) } #$_ );
if (
$JSON->encode($s->{variance}) =~ /"/ # MEAN ENCODED AS STRING
|| $JSON->encode($s->{mean}) =~ /"/ # VARIANCE ENCODED AS STRING
|| $s->{variance} < 0 # VARIANCE IS NEGATIVE!
) {
local $Data::Dumper::Varname = "DUMPED_RAWDATA";
print Dumper($_);
print $JSON->encode({
json_encoded_rawdata => $_,
json_encoded_stats => $s,
});
} else {
print "ok\n";
}
}
Notes:
Both approaches will work even if the objects are already Math::* objects.
I identified the vars are guaranteed to contain a Math:Big* object using _B for clarity.
I moved the testing code to the test harness.
I used minmax because it's more efficient than calling min and max separately.
I imported the subs from the modules to avoid having to use use their full name.
No need to force something in scalar context into scalar context.
#ikegami's answer works correctly
but it is too slow for me as this subroutine
is called a lot of times in my program's inner loop.
I think that is the cost of ensuring that
all numbers are converted to arbitrary precision ones.
I ended up using the following implementation
which avoids converting all numbers to arbitrary
precision type.
sub stats {
my $n = scalar #_;
my $sum = List::Util::sum0(#_);
my $mean = $sum / $n;
my $var = List::Util::sum0( map { ( $_ - $mean )**2 } #_ ) / $n;
$mean += 0;
$var += 0; # TO ENSURE THAT THEY ARE ENCODED AS NUMBERS IN JSON
{
n => $n,
sum => $sum,
max => List::Util::max(#_),
min => List::Util::min(#_),
mean => $mean,
variance => $var,
};
}
I changed the method of calculating variance
to ensure that negative results are avoided
(as suggested by #Robert).
It may sacrifice precision in $sum
(and everything that depends on $sum)
due to floating point addition of large integers.
It completes the job in an acceptable execution time though.
The unintended JSON encoding of numbers as strings
is explained in https://metacpan.org/pod/JSON::PP#simple-scalars.
This problem is solved by using the method
suggested there to force encoding as numbers.
JSON::PP will encode undefined scalars as JSON null values, scalars
that have last been used in a string context before encoding as JSON
strings, and anything else as number value
You can force the type to be a JSON number by numifying it:
my $x = "3"; # some variable containing a string
$x += 0; # numify it, ensuring it will be dumped as a number
$x *= 1; # same thing, the choice is yours. in to force

Inserting several "new" items into the database with DBIC

I'm working in a bioinformatics project that requires me to read genomic data (nothing too fancy, just think of it as strings) from various organisms and insert it into a database. Each read belongs to one organism, and can contain from 5000 to 50000 thousand genes, which I need to process and analyze prior to storage.
The script currently doing this is written in perl and, after all calculations, stores the results in a hash likie this:
$new{$id}{gene_name} = $id;
$new{$id}{gene_database_source} = $gene_database_source
$new{$id}{product} = $product;
$new{$id}{sequence} = $sequence;
$new{$id}{seqlength} = $seqlength;
$new{$id}{digest} = $digest;
$new{$id}{mw} = $mw;
$new{$id}{iep} = $iep;
$new{$id}{tms} = $tms;
After all genes are read and, the insertions are made looping through the hash into an eval{} statement.
eval {
foreach my $id (keys %new) {
my $rs = $schema->resultset('Genes')->create(
{
gene_name => $new{$id}{gene_name},
gene_product => $new{$id}{product},
sequence => $new{$id}{sequence},
gene_protein_length => $new{$id}{seqlength},
digest => $new{$id}{digest},
gene_isoelectric_point => $new{$id}{iep},
gene_molecular_weight => $new{$id}{mw},
gene_tmd_count => $new{$id}{tms},
gene_species => $species,
species_code => $spc,
user_id => $tdruserid,
gene_database_source => $new{$id}{gene_database_source}
}
);
};
While this "works", it has at least two problems I'd like to solve:
The eval statement is intended to "failsafe" the insertions: if one of the insertions fail, the eval dies and no insertion is done. This is clearly not how eval works. I'm pretty sure all insertions made
until failure point will be done and there's no rollback whatsoever.
The script needs to loop twice through very large datasets (one while reading and creating the hashes, and once again when reading
the hashes and performing the insertions). This makes the process' performance rather poor.
Instead of creating the hashes, I'd been thinking of using the new directive of DBIX $schema->new({..stuff..}); and then doing a massive insert transaction. That would solve the double iteration and the eval would either work (or not) with a single transaction, which would do the expected behaviour of < either all insertions or none > ... Is there a way to do this?
You can create your massive transaction by using a TxnScopeGuard in DBIC. In the most basic form, that would be as follows.
eval { # or try from Try::Tiny
my $guard = $schema->txn_scope_guard;
foreach my $id ( keys %new ) {
my $rs = $schema->resultset('Genes')->create(
{
gene_name => $new{$id}{gene_name},
gene_product => $new{$id}{product},
sequence => $new{$id}{sequence},
gene_protein_length => $new{$id}{seqlength},
digest => $new{$id}{digest},
gene_isoelectric_point => $new{$id}{iep},
gene_molecular_weight => $new{$id}{mw},
gene_tmd_count => $new{$id}{tms},
gene_species => $species,
species_code => $spc,
user_id => $tdruserid,
gene_database_source => $new{$id}{gene_database_source}
}
);
}
$guard->commit;
}
You create a scope guard object, and when you're done setting up your transaction, you commit it. If the object goes out of scope, i.e. because something died, it will rollback the transaction automatically.
The eval can catch the die, and your program will not crash. You had that part correct, but you're also right that your code will not undo previous inserts. Note that Try::Tiny's try provides nicer syntax. But it's not needed here.
Transaction in this case means that all queries are collected and run at the same time.
Note that this will still insert one row per INSERT statement only!
If you want to instead create larger INSERT statements, like the following, you need populate, not new.
INSERT INTO foo (bar, baz) VALUES
(1, 1),
(2, 2),
(3, 3),
...
The populate method lets you pass in an array reference with multiple rows at one time. This is supposed to be way faster than inserting one at a time.
$schema->resultset("Artist")->populate([
[ qw( artistid name ) ],
[ 100, 'A Formally Unknown Singer' ],
[ 101, 'A singer that jumped the shark two albums ago' ],
[ 102, 'An actually cool singer' ],
]);
Translated to your loop, that would be as follows. Note that the documentation claims that it's faster if you run it in void context.
eval {
$schema->resultset('Genes')->populate(
[
[
qw(
gene_name gene_product sequence
gene_protein_length digest gene_isoelectric_point
gene_molecular_weight gene_tmd_count gene_species
species_code user_id gene_database_source
)
],
map {
[
$new{$_}{gene_name}, $new{$_}{product},
$new{$_}{sequence}, $new{$_}{seqlength},
$new{$_}{digest}, $new{$_}{iep},
$new{$_}{mw}, $new{$_}{tms},
$species, $spc,
$tdruserid, $new{$_}{gene_database_source},
]
} keys %new
],
);
}
Like this the scope guard is not needed. However, I would advise you to not do more than 1000 rows per statement though. Processing it in chunks might be a good idea for performance reasons. In that case, you'd loop over the keys 1000 at a time. List::MoreUtils has a nice natatime function for that.
use List::MoreUtils 'natatime';
eval {
my $guard = $schema->txn_scope_guard;
my $it = natatime 1_000, keys %new;
while ( my #keys = $it->() ) {
$schema->resultset('Genes')->populate(
[
[
qw(
gene_name gene_product sequence
gene_protein_length digest gene_isoelectric_point
gene_molecular_weight gene_tmd_count gene_species
species_code user_id gene_database_source
)
],
map {
[
$new{$_}{gene_name}, $new{$_}{product},
$new{$_}{sequence}, $new{$_}{seqlength},
$new{$_}{digest}, $new{$_}{iep},
$new{$_}{mw}, $new{$_}{tms},
$species, $spc,
$tdruserid, $new{$_}{gene_database_source},
]
} #keys
],
);
}
$guard->commit;
}
Now it will do 1000 rows per insertion, and run all those queries in one big transaction. If one of them fails, none will be done.
The script needs to loop twice through very large datasets (one while reading and creating the hashes, and once again when reading the hashes and performing the insertions). This makes the process' performance rather poor.
You're not showing how you create the data, besides this assignment.
$new{$id}{gene_name} = $id;
$new{$id}{gene_database_source} = $gene_database_source
$new{$id}{product} = $product;
If that's all there is to it, nothing is stopping you from using the approach I've shown above directly where you're processing the data the first time and building the hash. The following code is incomplete, because you're not telling us where the data is coming from, but you should get the gist.
eval {
my $guard = $schema->txn_scope_guard;
# we use this to collect rows to process
my #rows;
# this is where your data comes in
while ( my $foo = <DATA> ) {
# here you process the data and come up with your variables
my ( $id, $gene_database_source, $product, $sequence, $seqlength,
$digest, $mw, $iep, $tms );
# collect the row so we can insert it later
push(
#rows,
[
$id, $gene_database_source, $product, $sequence, $seqlength,
$digest, $mw, $iep, $tms,
]
);
# only insert if we reached the limit
if ( scalar #rows == 1000 ) {
$schema->resultset('Genes')->populate(
[
[
qw(
gene_name gene_product sequence
gene_protein_length digest gene_isoelectric_point
gene_molecular_weight gene_tmd_count gene_species
species_code user_id gene_database_source
)
],
\#rows,
],
);
# empty the list of values
#rows = ();
}
}
$guard->commit;
}
Essentially we collect up to 1000 rows directly as array references while we process them, and when we've reached the limit, we pass them to the database. We then reset our row array and start over. Again, all of this is wrapped in a transaction, so it will only be committed if all the inserts are fine.
There is more information on transactions in DBIC in the cookbook.
Please note that I have not tested any of this code.

Parse any number of subkeys in a perl hash

I have a perl hash that is obtained from parsing JSON. The JSON could be anything a user defined API could generated. The goal is to obtain a date/time string and determine if that date/time is out of bounds according to a user defined threshold. The only issue I have is that perl seems a bit cumbersome when dealing with hash key/subkey iteration. How can I look through all the keys and determine if a key or subkey exists throughout the hash? I have read many threads throughout stackoverflow, but nothing that exactly meets my needs. I only started perl last week so I may be missing something... Let me know if that's the case.
Below is the "relevant" code/subs. For all code see: https://gitlab.com/Jedimaster0/check_http_freshness
use warnings;
use strict;
use LWP::UserAgent;
use Getopt::Std;
use JSON::Parse 'parse_json';
use JSON::Parse 'assert_valid_json';
use DateTime;
use DateTime::Format::Strptime;
# Verify the content-type of the response is JSON
eval {
assert_valid_json ($response->content);
};
if ( $# ){
print "[ERROR] Response isn't valid JSON. Please verify source data. \n$#";
exit EXIT_UNKNOWN;
} else {
# Convert the JSON data into a perl hashrefs
$jsonDecoded = parse_json($response->content);
if ($verbose){print "[SUCCESS] JSON FOUND -> ", $response->content , "\n";}
if (defined $jsonDecoded->{$opts{K}}){
if ($verbose){print "[SUCCESS] JSON KEY FOUND -> ", $opts{K}, ": ", $jsonDecoded->{$opts{K}}, "\n";}
NAGIOS_STATUS(DATETIME_DIFFERENCE(DATETIME_LOOKUP($opts{F}, $jsonDecoded->{$opts{K}})));
} else {
print "[ERROR] Retreived JSON does not contain any data for the specified key: $opts{K}\n";
exit EXIT_UNKNOWN;
}
}
sub DATETIME_LOOKUP {
my $dateFormat = $_[0];
my $dateFromJSON = $_[1];
my $strp = DateTime::Format::Strptime->new(
pattern => $dateFormat,
time_zone => $opts{z},
on_error => sub { print "[ERROR] INVALID TIME FORMAT: $dateFormat OR TIME ZONE: $opts{z} \n$_[1] \n" ; HELP_MESSAGE(); exit EXIT_UNKNOWN; },
);
my $dt = $strp->parse_datetime($dateFromJSON);
if (defined $dt){
if ($verbose){print "[SUCCESS] Time formatted using -> $dateFormat\n", "[SUCCESS] JSON date converted -> $dt $opts{z}\n";}
return $dt;
} else {
print "[ERROR] DATE VARIABLE IS NOT DEFINED. Pattern or timezone incorrect."; exit EXIT_UNKNOWN
}
}
# Subtract JSON date/time from now and return delta
sub DATETIME_DIFFERENCE {
my $dateInitial = $_[0];
my $deltaDate;
# Convert to UTC for standardization of computations and it's just easier to read when everything matches.
$dateInitial->set_time_zone('UTC');
$deltaDate = $dateNowUTC->delta_ms($dateInitial);
if ($verbose){print "[SUCCESS] (NOW) $dateNowUTC UTC - (JSON DATE) $dateInitial ", $dateInitial->time_zone->short_name_for_datetime($dateInitial), " = ", $deltaDate->in_units($opts{u}), " $opts{u} \n";}
return $deltaDate->in_units($opts{u});
}
Sample Data
{
"localDate":"Wednesday 23rd November 2016 11:03:37 PM",
"utcDate":"Wednesday 23rd November 2016 11:03:37 PM",
"format":"l jS F Y h:i:s A",
"returnType":"json",
"timestamp":1479942217,
"timezone":"UTC",
"daylightSavingTime":false,
"url":"http:\/\/www.convert-unix-time.com?t=1479942217",
"subkey":{
"altTimestamp":1479942217,
"altSubkey":{
"thirdTimestamp":1479942217
}
}
}
[SOLVED]
I have used the answer that #HåkonHægland provided. Here are the below code changes. Using the flatten module, I can use any input string that matches the JSON keys. I still have some work to do, but you can see the issue is resolved. Thanks #HåkonHægland.
use warnings;
use strict;
use Data::Dumper;
use LWP::UserAgent;
use Getopt::Std;
use JSON::Parse 'parse_json';
use JSON::Parse 'assert_valid_json';
use Hash::Flatten qw(:all);
use DateTime;
use DateTime::Format::Strptime;
# Verify the content-type of the response is JSON
eval {
assert_valid_json ($response->content);
};
if ( $# ){
print "[ERROR] Response isn't valid JSON. Please verify source data. \n$#";
exit EXIT_UNKNOWN;
} else {
# Convert the JSON data into a perl hashrefs
my $jsonDecoded = parse_json($response->content);
my $flatHash = flatten($jsonDecoded);
if ($verbose){print "[SUCCESS] JSON FOUND -> ", Dumper($flatHash), "\n";}
if (defined $flatHash->{$opts{K}}){
if ($verbose){print "[SUCCESS] JSON KEY FOUND -> ", $opts{K}, ": ", $flatHash>{$opts{K}}, "\n";}
NAGIOS_STATUS(DATETIME_DIFFERENCE(DATETIME_LOOKUP($opts{F}, $flatHash->{$opts{K}})));
} else {
print "[ERROR] Retreived JSON does not contain any data for the specified key: $opts{K}\n";
exit EXIT_UNKNOWN;
}
}
Example:
./check_http_freshness.pl -U http://bastion.mimir-tech.org/json.html -K result.creation_date -v
[SUCCESS] JSON FOUND -> $VAR1 = {
'timestamp' => '20161122T200649',
'result.data_version' => 'data_20161122T200649_data_news_topics',
'result.source_version' => 'kg_release_20160509_r33',
'result.seed_version' => 'seed_20161016',
'success' => 1,
'result.creation_date' => '20161122T200649',
'result.data_id' => 'data_news_topics',
'result.data_tgz_name' => 'data_news_topics_20161122T200649.tgz',
'result.source_data_version' => 'seed_vtv: data_20161016T102932_seed_vtv',
'result.data_digest' => '6b5bf1c2202d6f3983d62c275f689d51'
};
Odd number of elements in anonymous hash at ./check_http_freshness.pl line 78, <DATA> line 1.
[SUCCESS] JSON KEY FOUND -> result.creation_date:
[SUCCESS] Time formatted using -> %Y%m%dT%H%M%S
[SUCCESS] JSON date converted -> 2016-11-22T20:06:49 UTC
[SUCCESS] (NOW) 2016-11-26T19:02:15 UTC - (JSON DATE) 2016-11-22T20:06:49 UTC = 94 hours
[CRITICAL] Delta hours (94) is >= (24) hours. Data is stale.
You could try use Hash::Flatten. For example:
use Hash::Flatten qw(flatten);
my $json_decoded = parse_json($json_str);
my $flat = flatten( $json_decoded );
say "found" if grep /(?:^|\.)\Q$key\E(?:\.?|$)/, keys %$flat;
You can use Data::Visitor::Callback to traverse the data structure. It lets you define callbacks for different kinds of data types inside your structure. Since we're only looking at a hash it's relatively simple.
The following program has a predefined list of keys to find (those would be user input in your case). I converted your example JSON to a Perl hashref and included it in the code because the conversion is not relevant. The program visits every hashref in this data structure (including the top level) and runs the callback.
Callbacks in Perl are code references. These can be created in two ways. We're doing the anonymous subroutine (sometimes called lambda function in other languages). The callback gets passed two arguments: the visitor object and the current data substructure.
We'll iterate all the keys we want to find and simply check if they exist in that current data structure. If we see one, we count it's existence in the %seen hash. Using a hash to store things we have seen is a common idiom in Perl.
We're using a postfix if here, which is convenient and easy to read. %seen is a hash, so we access the value behind the $key with $seen{$key}, while $data is a hash reference, so we use the dereferencing operator -> to access the value behind $key with $data->{$key}.
The callback needs us to return the $data again so it continues. The last line is just there, it's not important.
I've used Data::Printer to output the %seen hash because it's convenient. You can also use Data::Dumper if you want. In production, you will not need that.
use strict;
use warnings;
use Data::Printer;
use Data::Visitor::Callback;
my $from_json = {
"localDate" => "Wednesday 23rd November 2016 11:03:37 PM",
"utcDate" => "Wednesday 23rd November 2016 11:03:37 PM",
"format" => "l jS F Y h:i:s A",
"returnType" => "json",
"timestamp" => 1479942217,
"timezone" => "UTC",
"daylightSavingTime" =>
0, # this was false, I used 0 because that's a non-true value
"url" => "http:\/\/www.convert-unix-time.com?t=1479942217",
"subkey" => {
"altTimestamp" => 1479942217,
"altSubkey" => {
"thirdTimestamp" => 1479942217
}
}
};
my #keys_to_find = qw(timestamp altTimestamp thirdTimestamp missingTimestamp);
my %seen;
my $visitor = Data::Visitor::Callback->new(
hash => sub {
my ( $visitor, $data ) = #_;
foreach my $key (#keys_to_find) {
$seen{$key}++ if exists $data->{$key};
}
return $data;
},
);
$visitor->visit($from_json);
p %seen;
The program outputs the following. Note this is not a Perl data structure. Data::Printer is not a serializer, it's a tool to make data human readable in a convenient way.
{
altTimestamp 1,
thirdTimestamp 1,
timestamp 1
}
Since you also wanted to constraint the input, here's an example how to do that. The following program is a modification of the one above. It allows to give a set of different constraints for every required key.
I've done that by using a dispatch table. Essentially, that's a hash that contains code references. Kind of like the callbacks we use for the Visitor.
The constraints I've included are doing some things with dates. An easy way to work with dates in Perl is the core module Time::Piece. There are lots of questions around here about various date things where Time::Piece is the answer.
I've only done one constraint per key, but you could easily include several checks in those code refs, or make a list of code refs and put them in an array ref (keys => [ sub(), sub(), sub() ]) and then iterate that later.
In the visitor callback we are now also keeping track of the keys that have %passed the constraints check. We're calling the coderef with $coderef->($arg). If a constraint check returns a true value, it gets noted in the hash.
use strict;
use warnings;
use Data::Printer;
use Data::Visitor::Callback;
use Time::Piece;
use Time::Seconds; # for ONE_DAY
my $from_json = { ... }; # same as above
# prepare one of the constraints
# where I'm from, Christmas eve is considered Christmas
my $christmas = Time::Piece->strptime('24 Dec 2016', '%d %b %Y');
# set up the constraints per required key
my %constraints = (
timestamp => sub {
my ($epoch) = #_;
# not older than one day
return $epoch < time && $epoch > time - ONE_DAY;
},
altTimestamp => sub {
my ($epoch) = #_;
# epoch value should be an even number
return ! $epoch % 2;
},
thirdTimestamp => sub {
my ($epoch) = #_;
# before Christmas 2016
return $epoch < $christmas;
},
);
my %seen;
my %passed;
my $visitor = Data::Visitor::Callback->new(
hash => sub {
my ( $visitor, $data ) = #_;
foreach my $key (%constraints) {
if ( exists $data->{$key} ) {
$seen{$key}++;
$passed{$key}++ if $constraints{$key}->( $data->{$key} );
}
}
return $data;
},
);
$visitor->visit($from_json);
p %passed;
The output this time is:
{
thirdTimestamp 1,
timestamp 1
}
If you want to learn more about the dispatch tables, take a look at chapter two of the book Higher Order Perl by Mark Jason Dominus which is legally available for free here.

Replace variables extracted from database with theirs values

I'd like to translate a perl web site in several languages. I search for and tried many ideas, but I think the best one is to save all translations inside the mySQL database. But I get a problem...
When a sentence extracted from the database contains a variable (scalar), it prints on the web page as a scalar:
You have $number new messages.
Is there a proper way to reassign $number its actual value ?
Thank you for your help !
You can use printf format strings in your database and pass in values to that.
printf allows you to specify the position of the argument so only have know what position with the list of parameters "$number" is.
For example
#!/usr/bin/perl
use strict;
my $Details = {
'Name' => 'Mr Satch',
'Age' => '31',
'LocationEn' => 'England',
'LocationFr' => 'Angleterre',
'NewMessages' => 20,
'OldMessages' => 120,
};
my $English = q(
Hello %1$s, I see you are %2$s years old and from %3$s
How are you today?
You have %5$i new messages and %6$i old messages
Have a nice day
);
my $French = q{
Bonjour %1$s, je vous vois d'%4$s et âgés de %2$s ans.
Comment allez-vous aujourd'hui?
Vous avez %5$i nouveaux messages et %6$i anciens messages.
Bonne journée.
};
printf($English, #$Details{qw/Name Age LocationEn LocationFr NewMessages OldMessages/});
printf($French, #$Details{qw/Name Age LocationEn LocationFr NewMessages OldMessages/});
This would be a nightmare to maintain so an alternative might be to include an argument list in the database:
#!/usr/bin/perl
use strict;
sub fetch_row {
return {
'Format' => 'You have %i new messages and %i old messages' . "\n",
'Arguments' => 'NewMessages OldMessages',
}
}
sub PrintMessage {
my ($info,$row) = #_;
printf($row->{'Format'},#$info{split(/ +/,$row->{'Arguments'})});
}
my $Details = {
'Name' => 'Mr Satch',
'Age' => '31',
'LocationEn' => 'England',
'LocationFr' => 'Angleterre',
'NewMessages' => 20,
'OldMessages' => 120,
};
my $row = fetch_row();
PrintMessage($Details,$row)

Logstash indexing JSON arrays

Logstash is awesome. I can send it JSON like this (multi-lined for readability):
{
"a": "one"
"b": {
"alpha":"awesome"
}
}
And then query for that line in kibana using the search term b.alpha:awesome. Nice.
However I now have a JSON log line like this:
{
"different":[
{
"this": "one",
"that": "uno"
},
{
"this": "two"
}
]
}
And I'd like to be able to find this line with a search like different.this:two (or different.this:one, or different.that:uno)
If I was using Lucene directly I'd iterate through the different array, and generate a new search index for each hash within it, but Logstash currently seems to ingest that line like this:
different: {this: one, that: uno}, {this: two}
Which isn't going to help me searching for log lines using different.this or different.that.
Any got any thoughts as to a codec, filter or code change I can make to enable this?
You can write your own filter (copy & paste, rename the class name, the config_name and rewrite the filter(event) method) or modify the current JSON filter (source on Github)
You can find the JSON filter (Ruby class) source code in the following path logstash-1.x.x\lib\logstash\filters named as json.rb. The JSON filter parse the content as JSON as follows
begin
# TODO(sissel): Note, this will not successfully handle json lists
# like your text is '[ 1,2,3 ]' JSON.parse gives you an array (correctly)
# which won't merge into a hash. If someone needs this, we can fix it
# later.
dest.merge!(JSON.parse(source))
# If no target, we target the root of the event object. This can allow
# you to overwrite #timestamp. If so, let's parse it as a timestamp!
if !#target && event[TIMESTAMP].is_a?(String)
# This is a hack to help folks who are mucking with #timestamp during
# their json filter. You aren't supposed to do anything with
# "#timestamp" outside of the date filter, but nobody listens... ;)
event[TIMESTAMP] = Time.parse(event[TIMESTAMP]).utc
end
filter_matched(event)
rescue => e
event.tag("_jsonparsefailure")
#logger.warn("Trouble parsing json", :source => #source,
:raw => event[#source], :exception => e)
return
end
You can modify the parsing procedure to modify the original JSON
json = JSON.parse(source)
if json.is_a?(Hash)
json.each do |key, value|
if value.is_a?(Array)
value.each_with_index do |object, index|
#modify as you need
object["index"]=index
end
end
end
end
#save modified json
......
dest.merge!(json)
then you can modify your config file to use the/your new/modified JSON filter and place in \logstash-1.x.x\lib\logstash\config
This is mine elastic_with_json.conf with a modified json.rb filter
input{
stdin{
}
}filter{
json{
source => "message"
}
}output{
elasticsearch{
host=>localhost
}stdout{
}
}
if you want to use your new filter you can configure it with the config_name
class LogStash::Filters::Json_index < LogStash::Filters::Base
config_name "json_index"
milestone 2
....
end
and configure it
input{
stdin{
}
}filter{
json_index{
source => "message"
}
}output{
elasticsearch{
host=>localhost
}stdout{
}
}
Hope this helps.
For a quick and dirty hack, I used the Ruby filter and below code , no need to use the out of box 'json' filter anymore
input {
stdin{}
}
filter {
grok {
match => ["message","(?<json_raw>.*)"]
}
ruby {
init => "
def parse_json obj, pname=nil, event
obj = JSON.parse(obj) unless obj.is_a? Hash
obj = obj.to_hash unless obj.is_a? Hash
obj.each {|k,v|
p = pname.nil?? k : pname
if v.is_a? Array
v.each_with_index {|oo,ii|
parse_json_array(oo,ii,p,event)
}
elsif v.is_a? Hash
parse_json(v,p,event)
else
p = pname.nil?? k : [pname,k].join('.')
event[p] = v
end
}
end
def parse_json_array obj, i,pname, event
obj = JSON.parse(obj) unless obj.is_a? Hash
pname_ = pname
if obj.is_a? Hash
obj.each {|k,v|
p=[pname_,i,k].join('.')
if v.is_a? Array
v.each_with_index {|oo,ii|
parse_json_array(oo,ii,p,event)
}
elsif v.is_a? Hash
parse_json(v,p, event)
else
event[p] = v
end
}
else
n = [pname_, i].join('.')
event[n] = obj
end
end
"
code => "parse_json(event['json_raw'].to_s,nil,event) if event['json_raw'].to_s.include? ':'"
}
}
output {
stdout{codec => rubydebug}
}
Test json structure
{"id":123, "members":[{"i":1, "arr":[{"ii":11},{"ii":22}]},{"i":2}], "im_json":{"id":234, "members":[{"i":3},{"i":4}]}}
and this is whats output
{
"message" => "{\"id\":123, \"members\":[{\"i\":1, \"arr\":[{\"ii\":11},{\"ii\":22}]},{\"i\":2}], \"im_json\":{\"id\":234, \"members\":[{\"i\":3},{\"i\":4}]}}",
"#version" => "1",
"#timestamp" => "2014-07-25T00:06:00.814Z",
"host" => "Leis-MacBook-Pro.local",
"json_raw" => "{\"id\":123, \"members\":[{\"i\":1, \"arr\":[{\"ii\":11},{\"ii\":22}]},{\"i\":2}], \"im_json\":{\"id\":234, \"members\":[{\"i\":3},{\"i\":4}]}}",
"id" => 123,
"members.0.i" => 1,
"members.0.arr.0.ii" => 11,
"members.0.arr.1.ii" => 22,
"members.1.i" => 2,
"im_json" => 234,
"im_json.0.i" => 3,
"im_json.1.i" => 4
}
The solution I liked is the ruby filter because that requires us to not write another filter. However, that solution creates fields that are on the "root" of JSON and it's hard to keep track of how the original document looked.
I came up with something similar that's easier to follow and is a recursive solution so it's cleaner.
ruby {
init => "
def arrays_to_hash(h)
h.each do |k,v|
# If v is nil, an array is being iterated and the value is k.
# If v is not nil, a hash is being iterated and the value is v.
value = v || k
if value.is_a?(Array)
# "value" is replaced with "value_hash" later.
value_hash = {}
value.each_with_index do |v, i|
value_hash[i.to_s] = v
end
h[k] = value_hash
end
if value.is_a?(Hash) || value.is_a?(Array)
arrays_to_hash(value)
end
end
end
"
code => "arrays_to_hash(event.to_hash)"
}
It converts arrays to has with each key as the index number. More details:- http://blog.abhijeetr.com/2016/11/logstashelasticsearch-best-way-to.html