Process a file containing a collection of JSON strings - json

(Title edited. The original title: "json content: print out parts of it as is, pipe parts of it to get human readable timestamps, output from the same command")
I have a json alike content in a file:
{
"newState": "runnable",
"setAt": 1587421159359
}
{
"newState": "running",
"setAt": 1587421282891
}
{
"newState": "debug_hold",
"setAt": 1587422014895
}
{
"newState": "terminating",
"setAt": 1587424788577
}
{
"newState": "failed",
"setAt": 1587424796544
}
I can extract the 'newState' by cat timestamps.json | jq -r '.newState':
runnable
running
debug_hold
terminating
failed
I can extract the epoch timestamps and format it into a human readable form by cat timestamps.json | jq -r '.setAt' | rev | cut -c 4- | rev | perl -pe 's/(\d+)/localtime($1)/e':
Mon Apr 20 18:19:19 2020
Mon Apr 20 18:21:22 2020
Mon Apr 20 18:33:34 2020
Mon Apr 20 19:19:48 2020
Mon Apr 20 19:19:56 2020
How can I combine the two outputs so the result becomes
runnable Mon Apr 20 18:19:19 2020
running Mon Apr 20 18:21:22 2020
debug_hold Mon Apr 20 18:33:34 2020
terminating Mon Apr 20 19:19:48 2020
failed Mon Apr 20 19:19:56 2020
I think I can do some bash for loop and array input but was wondering if jq has something that can pipe a portion of the content (e.g. epoch time in this case) out, process it, then feed the value back into the jq parse output.

You may be looking something like this.
cat timestamps.json | jq -r '[.newState, .setAt] | join(" ")'

With input being a collection of (unrelated) valid JSON strings you can read in {} chunks.
Set the input record separator ($/) to } and then the <> operator each time reads up to }
use warnings;
use strict;
use feature 'say';
use JSON qw(decode_json);
my $file = shift // die "Usage: $0 file\n"; #/
open my $fh, '<', $file or die "Can't open $file: $!";
local $/ = '}'; # presumably all this code is in some local scope
while (my $record = <$fh>) {
next if not $record =~ /\S/;
my $json = decode_json($record);
say $json->{newState}, ' ', scalar localtime $json->{setAt}/1000;
}
Comments
This relies on the shown format of the input, in particular that it has no nested objects. If there are nested {...} then slurp the whole file and extract JSON strings using Text::Balanced or equivalent (or, of course, use another approach)
I'd actually recommend to use Cpanel::JSON::XS
When global variables like $/ need be changed that is best done in the smallest scope needed and with local. Here it doesn't matter but I presume this to be a part of a larger program
There may be empty strings and, in particular, newlines left over when reading this way thus the check of whether the record contains any non-whitespace
The timestamps in your input are off by a factor of thousand from seconds-since-epoch, I guess because they carry milliseconds as well. I just divide by 1000 for simplicity
Note that the shown desired timestamps may become a problem if daylight saving time gets involved, and if that is the case you want to extract and include the time zone as well
The simplest (and flexible) way to get timezone from the epoch is by using POSIX::strftime. It takes the list from localtime and returns a string generated according to the given format.
The %z specifier produces the timezone as the UTC offset, while %Z produces the (notorious and unportable) short name. See your system's strftime manpage for details. Example
use POSIX qw(strftime);
say strftime "%z %Z", localtime; #--> -0700 PDT
(thanks to ikegami's answer which proded me to add the timezone discussion)

Using the incremental parsing feature of JSON parsers, one can safely parse sequences of JSON documents such as the one you have with very little code. This means there's no point in hacking together a JSON parser using regex matches.
use Cpanel::JSON::XS qw( );
my $decoder = Cpanel::JSON::XS->new();
while (<>) {
$decoder->incr_parse($_);
while ( my $rec = $decoder->incr_parse() ) {
say sprintf "%-11s %s",
$rec->{newState},
format_ts($rec->{setAt});
}
}
Complete program:
#!/usr/bin/perl
use strict;
use warnings;
use feature qw( say );
use utf8;
use open ':std', ':encoding(UTF-8)';
use Cpanel::JSON::XS qw( );
use POSIX qw( strftime );
sub format_ts {
my ($ts) = #_;
my $ms = $ts % 1000;
my $epoch = ( $ts - $ms ) / 1000;
my #lt = localtime($epoch);
return sprintf("%s.%03d %s",
strftime("%a %b %d %H:%M:%S", #lt),
$ms,
strftime("%Y %z", #lt),
);
}
my $decoder = Cpanel::JSON::XS->new();
while (<>) {
$decoder->incr_parse($_);
while ( my $rec = $decoder->incr_parse() ) {
say sprintf "%-11s %s",
$rec->{newState},
format_ts($rec->{setAt});
}
}
Output:
runnable Mon Apr 20 18:19:19.359 2020 -0400
running Mon Apr 20 18:21:22.891 2020 -0400
debug_hold Mon Apr 20 18:33:34.895 2020 -0400
terminating Mon Apr 20 19:19:48.577 2020 -0400
failed Mon Apr 20 19:19:56.544 2020 -0400
Note that I added time zone information because the timestamps would be ambiguous without it (because of overlaps when switching from daylight-saving time to standard time). I also showed how you can keep the milliseconds if you so desire.

A small perl script can process such data with ease
USAGE: script_name.pl timestamps.json
#!/usr/bin/perl
use strict;
use warnings;
my($state,$time);
while(<>) {
chomp;
$state = $1 if /"newState": "(.*)"/;
$time = $1 if /"setAt": (\d+)/;
printf "%-12s %s\n", $state, "".localtime($time/1000) if /}/;
}
Alternative version
use strict;
use warnings;
my $data = do { local $/; <> };
my %state = $data =~ /"newState": "(.*?)".*?"setAt": (\d+)/sg;
while(my($s,$t) = each %state) {
printf "%-12s %s\n", $s, "".localtime($t/1000);
}
Input file timestamps.json
{
"newState": "runnable",
"setAt": 1587421159359
}
{
"newState": "running",
"setAt": 1587421282891
}
{
"newState": "debug_hold",
"setAt": 1587422014895
}
{
"newState": "terminating",
"setAt": 1587424788577
}
{
"newState": "failed",
"setAt": 1587424796544
}
Output
runnable Mon Apr 20 15:19:19 2020
running Mon Apr 20 15:21:22 2020
debug_hold Mon Apr 20 15:33:34 2020
terminating Mon Apr 20 16:19:48 2020
failed Mon Apr 20 16:19:56 2020

Related

Convert Multiline String (utmpdump results) into JSON [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 months ago.
Improve this question
All,
This is my first time submitting a stack overflow question, so thanks in advance for taking the time to read/consider my question. I'm currently using the 'utmpdump' utility to dump linux authentication log results each hour from a bash script, which is done using the syntax shown below:
dateLastHour=$(date +"%a %b %d %H:" -d '1 hour ago')
dateNow=$(date +"%a %b %d %H:")
utmpdump /var/log/wtmp* | awk "/$dateLastHour/,/$dateNow/"
What I'm now trying to accomplish and the subject of this question is how can I take these results and delimited them by new line for each authentication log, before converting each authentication event into it's own JSON file to be exported to an external syslog collector for additional analysis and long term storage?
As an example, here's some of the test results I've been using:
[7] [08579] [ts/0] [egecko] [pts/0 ] [10.0.2.6 ] [1.1.1.1 ] [Fri Nov 04 23:40:29 2022 EDT]
[8] [08579] [ ] [ ] [pts/0 ] [ ] [0.0.0.0 ] [Fri Nov 04 23:55:16 2022 EDT]
[2] [00000] [~~ ] [reboot ] [~ ] [3.10.0-1160.80.1.el7.x86_64] [0.0.0.0 ] [Sat Dec 03 12:28:05 2022 EST]
[5] [00811] [tty1] [ ] [tty1 ] [ ] [0.0.0.0 ] [Sat Dec 03 12:28:12 2022 EST]
[6] [00811] [tty1] [LOGIN ] [tty1 ] [ ] [0.0.0.0 ] [Sat Dec 03 12:28:12 2022 EST]
[1] [00051] [~~ ] [runlevel] [~ ] [3.10.0-1160.80.1.el7.x86_64] [0.0.0.0 ] [Sat Dec 03 12:28:58 2022 EST]
[7] [02118] [ts/0] [egecko] [pts/0 ] [1.1.1.1 ] [1.1.1.1 ] [Sat Dec 03 12:51:22 2022 EST]
Any assistance or pointers here is greatly appreciated!
I've been using the following SED commands to trim out unnessecary whitespace, and I know that what I probably should do is using IDF to split the results string into new lines before using brackets as the delimeter:
utmpResults=$(echo "$utmpResults" | sed 's/ */ /g')
IFS="\n" read -a array <<< "$utmpResults"
echo $array
But when I echo $array it only returns the first line...?
With the help of jq (sed for json), it's an easy task:
#!/bin/bash
jq -R -c '
select(length > 0) | # remove empty lines
[match("\\[(.*?)\\]"; "g").captures[].string # find content within square brackets
| sub("^\\s+";"") | sub("\\s+$";"")] # trim content
| { # convert to json object
"type" : .[0],
"pid" : .[1],
"terminal_name_suffix" : .[2],
"user" : .[3],
"tty" : .[4],
"remote_hostname" : .[5],
"remote_host" : .[6],
"datetime" : .[7],
"timestamp" : (.[7] | strptime("%a %b %d %T %Y %Z") | mktime)
}' input.txt
Output
{"type":"7","pid":"08579","terminal_name_suffix":"ts/0","user":"egecko","tty":"pts/0","remote_hostname":"10.0.2.6","remote_host":"1.1.1.1","datetime":"Fri Nov 04 23:40:29 2022 EDT","timestamp":1667605229}
{"type":"8","pid":"08579","terminal_name_suffix":"","user":"","tty":"pts/0","remote_hostname":"","remote_host":"0.0.0.0","datetime":"Fri Nov 04 23:55:16 2022 EDT","timestamp":1667606116}
{"type":"2","pid":"00000","terminal_name_suffix":"~~","user":"reboot","tty":"~","remote_hostname":"3.10.0-1160.80.1.el7.x86_64","remote_host":"0.0.0.0","datetime":"Sat Dec 03 12:28:05 2022 EST","timestamp":1670070485}
{"type":"5","pid":"00811","terminal_name_suffix":"tty1","user":"","tty":"tty1","remote_hostname":"","remote_host":"0.0.0.0","datetime":"Sat Dec 03 12:28:12 2022 EST","timestamp":1670070492}
{"type":"6","pid":"00811","terminal_name_suffix":"tty1","user":"LOGIN","tty":"tty1","remote_hostname":"","remote_host":"0.0.0.0","datetime":"Sat Dec 03 12:28:12 2022 EST","timestamp":1670070492}
{"type":"1","pid":"00051","terminal_name_suffix":"~~","user":"runlevel","tty":"~","remote_hostname":"3.10.0-1160.80.1.el7.x86_64","remote_host":"0.0.0.0","datetime":"Sat Dec 03 12:28:58 2022 EST","timestamp":1670070538}
{"type":"7","pid":"02118","terminal_name_suffix":"ts/0","user":"egecko","tty":"pts/0","remote_hostname":"1.1.1.1","remote_host":"1.1.1.1","datetime":"Sat Dec 03 12:51:22 2022 EST","timestamp":1670071882}
Without the option -c you can create formatted output.
To save each line in a file, you can do it like this in bash.
I have chosen the timestamp as the file name.
INPUT_AS_JSON_LINES=$(
jq -R -c '
select(length > 0) | # remove empty lines
[match("\\[(.*?)\\]"; "g").captures[].string # find content within square brackets
| sub("^\\s+";"") | sub("\\s+$";"")] # trim content
| { # convert to json object
"type" : .[0],
"pid" : .[1],
"terminal_name_suffix" : .[2],
"user" : .[3],
"tty" : .[4],
"remote_hostname" : .[5],
"remote_host" : .[6],
"datetime" : .[7],
"timestamp" : (.[7] | strptime("%a %b %d %T %Y %Z") | mktime)
}' input.txt
)
while read line
do
FILENAME="$(jq '.timestamp' <<< "$line").json"
CONTENT=$(jq <<< "$line") # format json
echo "writing file '$FILENAME'"
echo "$CONTENT" > "$FILENAME"
done <<< "$INPUT_AS_JSON_LINES"
Output
writing file '1667605229.json'
writing file '1667606116.json'
writing file '1670070485.json'
writing file '1670070492.json'
writing file '1670070492.json'
writing file '1670070538.json'
writing file '1670071882.json'

How do I parse this supposedly JSON format

I have a huge file with data in the below format. (It's the response from an API call I made to one of Twitter's APIs). I want to extract the value of the field "followers_count" from it. Ordinarily, I would do this with jq with the following command : cat | jq -r '.followers_count'
But this contains special characters so jq cannot handle it. Can someone help by telling me how do I convert it in JSON (e.g. using a shell script) or alternatively how to get the followers_count field without conversion? If this format has a specific name, I would be interested to know about it.
Thanks.
SAMPLE LINE IN FILE:
b'[{"id":2361407554,"id_str":"2361407554","name":"hakimo ait","screen_name":"hakimo_ait","location":"","description":"","url":null,"entities":{"description":{"urls":[]}},"protected":false,"followers_count":0,"friends_count":6,"listed_count":0,"created_at":"Sun Feb 23 19:08:04 +0000 2014","favourites_count":0,"utc_offset":null,"time_zone":null,"geo_enabled":false,"verified":false,"statuses_count":1,"lang":"fr","status":{"created_at":"Sun Feb 23 19:09:21 +0000 2014","id":437665498961293312,"id_str":"437665498961293312","text":"c.ronaldo","truncated":false,"entities":{"hashtags":[],"symbols":[],"user_mentions":[],"urls":[]},"source":"\u003ca href=\"https:\/\/mobile.twitter.com\" rel=\"nofollow\"\u003eMobile Web (M2)\u003c\/a\u003e","in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"retweet_count":0,"favorite_count":0,"favorited":false,"retweeted":false,"lang":"es"},"contributors_enabled":false,"is_translator":false,"is_translation_enabled":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_image_url":"http:\/\/abs.twimg.com\/sticky\/default_profile_images\/default_profile_normal.png","profile_image_url_https":"https:\/\/abs.twimg.com\/sticky\/default_profile_images\/default_profile_normal.png","profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"has_extended_profile":false,"default_profile":true,"default_profile_image":true,"following":false,"follow_request_sent":false,"notifications":false,"translator_type":"none"}]'
This is not the valid JSON, if you want to grab some certain part from this response, you can dump this result in file and then iterate over it and get the text you want to grab.
otherwise, if response will be in JSON, it will be easily parse through jq library, you can also dump this record in file, convert it into json and then parse it !
there are multiple ways 'grep,awk,sed' ..... you can go for it !
remove 'b from beginning and ' from bottom,it will become valid JSON !
Well i have removed the b' from beginning and ' from the bottom ! and look it is a valid JSON, now we can easily use jq with it like this !
i am doing it with my file....
jq -r '.accounts|keys[]' ../saadaccounts.json | while read key ;
do
DATA="$(jq ".accounts [$key]" ../saadaccounts.json )"
FNAME=$( echo $DATA | jq -r '.first_name' )
LNAME=$( echo $DATA | jq -r '.Last_name' )
done
*** YOUR JSON FILE ***
[
{
"id":2361393867,
"id_str":"2361393867",
"name":"graam a7bab",
"screen_name":"bedoo691",
"location":"",
"description":"\u0627\u0633\u062a\u063a\u0641\u0631\u0627\u0644\u0644\u0647 \u0648\u0627\u062a\u0648\u0628 \u0627\u0644\u064a\u0647\u0647 ..!*",
"url":null,
"entities":{
"description":{
"urls":[
]
}
},
"protected":false,
"followers_count":1,
"friends_count":6,
"listed_count":0,
"created_at":"Sun Feb 23 19:03:21 +0000 2014",
"favourites_count":1,
"utc_offset":null,
"time_zone":null,
"geo_enabled":false,
"verified":false,
"statuses_count":7,
"lang":"ar",
"status":{
"created_at":"Tue Mar 04 16:07:44 +0000 2014",
"id":440881284383256576,
"id_str":"440881284383256576",
"text":"#Naif8989",
"truncated":false,
"entities":{
"hashtags":[
],
"symbols":[
],
"user_mentions":[
{
"screen_name":"Naif8989",
"name":"\u200f naif alharbi",
"id":540343286,
"id_str":"540343286",
"indices":[
0,
9
]
}
],
"urls":[
]
},
"source":"\u003ca href=\"http:\/\/twitter.com\/download\/android\" rel=\"nofollow\"\u003eTwitter for Android\u003c\/a\u003e",
"in_reply_to_status_id":437675858485321728,
"in_reply_to_status_id_str":"437675858485321728",
"in_reply_to_user_id":2361393867,
"in_reply_to_user_id_str":"2361393867",
"in_reply_to_screen_name":"bedoo691",
"geo":null,
"coordinates":null,
"place":null,
"contributors":null,
"is_quote_status":false,
"retweet_count":0,
"favorite_count":0,
"favorited":false,
"retweeted":false,
"lang":"und"
},
"contributors_enabled":false,
"is_translator":false,
"is_translation_enabled":false,
"profile_background_color":"C0DEED",
"profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png",
"profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png",
"profile_background_tile":false,
"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/437664693373911040\/ydODsIeh_normal.jpeg",
"profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/437664693373911040\/ydODsIeh_normal.jpeg",
"profile_link_color":"1DA1F2",
"profile_sidebar_border_color":"C0DEED",
"profile_sidebar_fill_color":"DDEEF6",
"profile_text_color":"333333",
"profile_use_background_image":true,
"has_extended_profile":false,
"default_profile":true,
"default_profile_image":false,
"following":false,
"follow_request_sent":false,
"notifications":false,
"translator_type":"none"
}
]

Perl converting time based on text match

I am using Perl to read in variables from a json file and handle them accordingly. The spot I need help with is when I read a time in from the file that could look like the following:
"StartTime":"2015-07-08T03:38:08Z",
"EndTime":"2015-07-10T03:38:08Z"
This is easy to handle, however here is the tricky part:
"StartTime":"now-10",
"EndTime":"now+10"
I have a function which gets these variables from the json file and checks if the string contains the word "now". But after that, I'm not sure what to do. I'm trying to convert "now" to localtime(time), but it's getting ugly fast. Here is my code:
my $_StartTime = getFromJson("StartTime");
my $_EndTime = getFromJson("EndTime");
if($_StartTime =~ /now/) {
(my $sec, my $min, my $hour, my $mday, my $mon, my $year, my $wday, my $yday, my $isdst) = localtime(time);
my $now = sprintf("%04d-%02d-%02dT%02d:%02d:%02dZ", $year+1900, $mon+1, $mday, $hour, $min, $sec);
}
# end time is handled the same way
Am I on the right track? And if so, how can I add the "+/-10" after the "now" in the file? (Note: assume the +/-10 always refers to hours)
There are lots of good modules on the CPAN that could help in this instance. You don't need to use them but it's worth knowing about them nonetheless.
Firstly, JSON might make your life easier when parsing the JSON files as it has easy methods for converting the JSON into native Perl structures.
Secondly, the DateTime family of modules might make it easier to parse and manipulate the dates. Specifically, instead of using sprintf, you could use DateTime::Format::ISO8601 to parse the date:
my $dt = DateTime::Format::ISO8601->parse_datetime( $_StartTime );
DateTime has methods for accessing the day, year, month and so on. These are documented on the main module page.
You could then keep your special case for the now input and do something like:
# work out if it's addition or subtraction and grab the amount
# then use the appropriate DateTime function:
my $dt = DateTime->now()->add( seconds => 10 );
# or
my $dt = DateTime->now()->subtract( seconds => 10 );
Using POSIX::strftime will make your life easier.
use POSIX 'strftime';
my #test_times = qw[now+10 now now-10];
foreach my $start_time (#test_times) {
if (my ($adjust) = $start_time =~ /^now([-+]\d+)?/) {
$adjust //= 0;
$adjust *= 60 * 60; # Convert hours to seconds
my $time = strftime '%Y-%m-%dT%H:%M:%SZ', gmtime(time + $adjust);
say $time;
}
}
Thinking about it further, I think I'd prefer to use Time::Piece. The principle is almost identical.
use Time::Piece;
my #test_times = qw[now+10 now now-10];
foreach my $start_time (#test_times) {
if (my ($adjust) = $start_time =~ /^now([-+]\d+)?/) {
$adjust //= 0;
$adjust *= 60 * 60; # Convert hours to seconds
my $time = gmtime(time + $adjust);
say $time->strftime('%Y-%m-%dT%H:%M:%SZ');
}
}
I would change this to:
my $_StartTime = getFromJson("StartTime");
my $_EndTime = getFromJson("EndTime");
if($_StartTime =~ s/now//) {
my $time = time;
if ($_StartTime =~ /^([-+]?)([0-9]+)/) {
my ($sign, $number) = ($1, $2);
$time += ($sign eq '-' ? -1 : 1) * $number * 3_600;
}
(my $sec, my $min, my $hour, my $mday, my $mon, my $year, my $wday, my $yday, my $isdst) = localtime($rime);
$_StartTime = sprintf("%04d-%02d-%02dT%02d:%02d:%02dZ", $year+1900, $mon+1, $mday, $hour, $min, $sec);
}
You give little information about the format of the original data, and what result you want from this. I assume the code you show is to convert the times formatted with now to one that you recognize so that you can go on from there. But it's best to handle both formats in one place to generate the same final result regardless of the input
This program uses an imaginary JSON data structure and processes all elements inside it. The core is the use of the Time::Piece module, which will parse and format times for you and do date/time arithmetic
I have encapsulated the code that processes both sorts of time values in a subroutine convert_time which returns a Time::Piece object. The code just uses the module's own stringify method to make the value readable, but you can generate any form of string you want using the object's methods
use strict;
use warnings 'all';
use feature 'say';
use JSON 'from_json';
use Time::Piece;
use Time::Seconds 'ONE_HOUR';
my $json = <<END;
[
{
"StartTime": "2015-07-08T03:38:08Z",
"EndTime": "2015-07-10T03:38:08Z"
},
{
"StartTime": "now-10",
"EndTime": "now+10"
}
]
END
my $data = from_json($json);
for my $item ( #$data ) {
for my $key ( keys %$item ) {
my $time = $item->{$key};
say "$key $time";
my $ans = convert_time($time);
print $ans, "\n\n";
}
}
sub convert_time {
my ($time) = #_;
if ( $time =~ /now([+-]\d+)/ ) {
return localtime() + $1 * ONE_HOUR;
}
else {
return Time::Piece->strptime($time, '%Y-%m-%dT%H:%M:%SZ');
}
}
output
StartTime 2015-07-08T03:38:08Z
Wed Jul 8 03:38:08 2015
EndTime 2015-07-10T03:38:08Z
Fri Jul 10 03:38:08 2015
StartTime now-10
Wed Jan 6 05:57:04 2016
EndTime now+10
Thu Jan 7 01:57:04 2016

Converting JSON to .csv

I've found some data that someone is downloading into a JSON file (I think! - I'm a newb!). The file contains data on nearly 600 football players.
Here's the file: https://raw.githubusercontent.com/llimllib/fantasypl_stats/f944410c21f90e7c5897cd60ecca5dc72b5ab619/data/players.1426687570.json
Is there a way I can grab some of the data and convert it to .csv? Specifically the 'Fixture History'?
Thanks in advance for any help :)
Here is a solution using jq
If the file filter.jq contains
.[]
| {first_name, second_name, all:.fixture_history.all[]}
| [.first_name, .second_name, .all[]]
| #csv
and data.json contains the sample data then the command
jq -M -r -f filter.jq data.json
will produce the output (note only 10 rows shown here)
"Wojciech","Szczesny","16 Aug 17:30",1,"CRY(H) 2-1",90,0,0,0,1,0,0,0,0,0,1,0,13,7,0,55,2
"Wojciech","Szczesny","23 Aug 17:30",2,"EVE(A) 2-2",90,0,0,0,2,0,0,0,0,0,0,0,5,9,-9306,55,1
"Wojciech","Szczesny","31 Aug 16:00",3,"LEI(A) 1-1",90,0,0,0,1,0,0,0,1,0,2,0,7,15,-20971,55,1
"Wojciech","Szczesny","13 Sep 12:45",4,"MCI(H) 2-2",90,0,0,0,2,0,0,0,0,0,6,0,12,17,-39686,55,3
"Wojciech","Szczesny","20 Sep 15:00",5,"AVL(A) 3-0",90,0,0,1,0,0,0,0,0,0,2,0,14,22,-15931,55,6
"Wojciech","Szczesny","27 Sep 17:30",6,"TOT(H) 1-1",90,0,0,0,1,0,0,0,0,0,4,0,10,13,-5389,55,3
"Wojciech","Szczesny","05 Oct 14:05",7,"CHE(A) 0-2",90,0,0,0,2,0,0,0,0,0,1,0,3,9,-8654,55,1
"Wojciech","Szczesny","18 Oct 15:00",8,"HUL(H) 2-2",90,0,0,0,2,0,0,0,0,0,2,0,7,9,-824,54,1
"Wojciech","Szczesny","25 Oct 15:00",9,"SUN(A) 2-0",90,0,0,1,0,0,0,0,0,0,3,0,16,22,-11582,54,7
JSON is a more detailed data format than CSV - it allows for more complex data structures. Inevitably if you do this, you 'lose detail'.
If you want to fetch it automatically - that's doable, but I've skipped it because 'doing' https URLs is slightly more complicated.
So assuming you've downloaded your file, here's a possible solution in Perl (You've already got one for Python - both are very powerful scripting languages, but can pretty much cover the same ground - so it's as much a matter of taste as to which you use).
#!/usr/bin/perl
use strict;
use warnings;
use JSON;
my $file = 'players.json';
open( my $input, "<", $file ) or die $!;
my $json_data = decode_json(
do { local $/; <$input> }
);
foreach my $player_id ( keys %{$json_data} ) {
foreach my $fixture (
#{ $json_data->{$player_id}->{fixture_history}->{all} } )
{
print join( ",",
$player_id, $json_data->{$player_id}->{web_name},
#{$fixture}, "\n", );
}
}
Hopefully you can see what's going on here - you load the file $input, and decode_json to create a data structure.
This data structure is a nested hash (perl's term for the type of data structure). hashes are key-value pairs.
So we extract the keys from this hash - which is the ID number right at the beginning of each entry.
Then we loop through each of them - extracting the the fixture_history array. And for each element in that array, we print the player ID, their web_name and then the data from fixture_history.
This gives output like:
1,Szczesny,10 Feb 19:45,25,LEI(H) 2-1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-2413,52,0,
1,Szczesny,21 Feb 15:00,26,CRY(A) 2-1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-2805,52,0,
1,Szczesny,01 Mar 14:05,27,EVE(H) 2-0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1862,52,0,
1,Szczesny,04 Mar 19:45,28,QPR(A) 2-1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1248,52,0,
1,Szczesny,14 Mar 15:00,29,WHU(H) 3-0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1897,52,0,
Does this make sense?
Python has some good libraries for doing this. If you copy the following code into a file and save it as fix_hist.py or something, then save your JSON file as file.json in the same directory, it will create a csv file with the fixture histories each saved as a row. Just run python fix_hist.py in your command prompt (or terminal for mac):
import csv
import json
json_data = open("file.json")
data = json.load(json_data)
f = csv.writer(open("fix_hists.csv","wb+"))
for i in data:
fh = data[i]["fixture_history"]
array = fh["all"]
for j in array:
f.writerow(j)
json_data.close()
To add additional data to the fixture history, you can add insert statements before writing the rows:
import csv
import json
json_data = open("file.json")
data = json.load(json_data)
f = csv.writer(open("fix_hists.csv","wb+"))
arr = []
for i in data:
fh = data[i]["fixture_history"]
array = fh["all"]
for j in array:
try:
j.insert(0,str(data[i]["first_name"]))
except:
j.insert(0,'error')
try:
j.insert(1,data[i]["web_name"])
except:
j.insert(1,'error')
try:
f.writerow(j)
except:
f.writerow(['error','error'])
json_data.close()
With insert(), just indicate the position in the row you want the data point to occupy as the first argument.

Convert Unix Date "Wed Sep 15 14:21:36 2010" to unix timestamp in perl or MySQL to store in mySQL

I am trying to convert a regular Unix date in human readable format back into unix timestamp without making arrays for Months and Days of the week.
Obviously the "%" needs to be escaped in perl
I tried "%%" and "\%" and RTM, which is how I ended up with \%%
$myDate="Wed Sep 15 14:21:36 2010";
$datePattern="\%%a \%%b \%%e \%%H:\%%i:\%%s \%%Y";
MySQL has a function
UNIX_TIMESTAMP (STR_TO_DATE (\'$myDate\' \, \'$datePattern\'))";
My final statement looks like this:
The command works directly from mySQL but not when I call it from perl.
replace mytable values ('some value', UNIX_TIMESTAMP (STR_TO_DATE (\'$myDate\' \, \'$datePattern\')))";
Another choice might be to use Time::Piece
#!/usr/bin/env perl
use strict;
use warnings;
use Time::Piece;
my $mydate = "Wed Sep 15 14:21:36 2010";
my $mytime = Time::Piece->strptime($mydate, "%a %b %e %H:%M:%S %Y");
print $mytime->strftime("%Y-%m-%d %H:%M:%S\n");
Try Date::Manip
use strict;
use warnings;
BEGIN {
$Date::Manip::Backend = 'DM5';
}
use Date::Manip;
my $myDate = "Wed Sep 15 14:21:36 2010";
print Date::Manip::UnixDate( $myDate, '%Y-%m-%d %H:%M:%S' );
or look through the DateTime modules
Here is my try at DateTime. One with the mysql DateTime module and one without.
use strict;
use warnings;
use DateTime::Format::Strptime;
use DateTime::Format::MySQL;
my $myDate = "Wed Sep 15 14:21:36 2010";
my $Strp = new DateTime::Format::Strptime( pattern => '%a %b %e %H:%M:%S %Y', );
my $dt = $Strp->parse_datetime($myDate);
print $dt->strftime('%Y-%m-%d %H:%M:%S') . "\n";
print DateTime::Format::MySQL->format_datetime($dt) . "\n";
One option would be to use Date::Parse:
use strict;
use warnings;
use Date::Parse;
my $date = "Wed Sep 15 14:21:36 2010";
my $time = str2time($date);
From there, you can issue a query that updates the database records to use $time.