Timestamp to Epoch in a CSV file with GAWK - csv

Looking to convert human readable timestamps to epoch/Unix time within a CSV file using GAWK in preparation for loading into a MySQL DB.
Data Example:
{null};2013-11-26;Text & Device;Location;/file/path/to/;Tuesday, November 26 12:17 PM;1;1385845647
Looking to take column 6, Tuesday, November 26 12:17 PM, and convert to epoch time for storage. All times shown will be in EST format. I realize AWK is the tool for this, but can't quite seem to structure the command. Currently have:
cat FILE_IN.CSV | awk 'BEGIN {FS=OFS=";"}{$6=strftime("%s")} {print}'
However this returns:
{null};2013-11-26;Text & Device;Location;/file/path/to/;1385848848;1;1385845647
Presumably, this means I'm calling the current epoch time (1385848848 was current epoch at time of execution) and not asking strftime to convert the string; but I can't imagine another way to doing this.
What is the proper syntax for gawk/strftime to convert an existing timestamp to epoch?
Edit: This question seems loosely related to How do I use output from awk in another command?

$ cat file
{null};2013-11-26;Text & Device;Location;/file/path/to/;Tuesday, November 26 12:17 PM;1;1385845647
$ gawk 'BEGIN{FS=OFS=";"} {gsub(/-/," ",$2); $2=mktime($2" 0 0 0")}1' file
{null};1385445600;Text & Device;Location;/file/path/to/;Tuesday, November 26 12:17 PM;1;1385845647
Here's how to generally convert a date from any format to seconds since the epoch using your current format as an example and with comments to show the conversion process step by step:
$ cat tst.awk
function cvttime(t, a) {
split(t,a,/[,: ]+/)
# 2013 Tuesday, November 26 10:17 PM
# =>
# a[1] = "2013"
# a[2] = "Tuesday"
# a[3] = "November"
# a[4] = "26"
# a[5] = "10"
# a[6] = "17"
# a[7] = "PM"
if ( (a[7] == "PM") && (a[5] < 12) ) {
a[5] += 12
}
# => a[5] = "22"
a[3] = substr(a[3],1,3)
# => a[3] = "Nov"
match("JanFebMarAprMayJunJulAugSepOctNovDec",a[3])
a[3] = (RSTART+2)/3
# => a[3] = 11
return( mktime(a[1]" "a[3]" "a[4]" "a[5]" "a[6]" 0") )
}
BEGIN {
mdt ="Tuesday, November 26 10:17 PM"
secs = cvttime(2013" "mdt)
dt = strftime("%Y-%m-%d %H:%M:%S",secs)
print mdt ORS "\t-> " secs ORS "\t\t-> " dt
}
$ awk -f tst.awk
Tuesday, November 26 10:17 PM
-> 1385525820
-> 2013-11-26 22:17:00
I'm sure you can modify that for the current problem.
Also, if you don't have gawk you can write the cvttime() function as (borrowing #sputnik's date command string):
$ cat tst2.awk
function cvttime(t, cmd,secs) {
cmd = "date -d \"" t "\" '+%s'"
cmd | getline secs
close(cmd)
return secs
}
BEGIN {
mdt ="Tuesday, November 26 10:17 PM"
secs = cvttime(mdt)
dt = strftime("%Y-%m-%d %H:%M:%S",secs)
print mdt ORS "\t-> " secs ORS "\t\t-> " dt
}
$
$ awk -f tst2.awk
Tuesday, November 26 10:17 PM
-> 1385525820
-> 2013-11-26 22:17:00
I left srtftime() in there just to show that the secs was correct - replace with date as you see fit.
For the non-gawk version, you just need to figure out how to get the year into the input month/date/time string in a way that date understands if that maters to you - shouldn't be hard.

You can convert date to epoch with this snippet :
$ date -d 'Tuesday, November 26 12:17 PM' +%s
1385464620
So finally :
awk -F";" '{system("date -d \""$6"\" '+%s'")}' file
Thanks #Keiron for the snippet.

Related

Subtract fixed number of days from date column using awk and add it to new column

Let's assume that we have a file with the values as seen bellow:
% head test.csv
20220601,A,B,1
20220530,A,B,1
And we want to add two new columns, one with the date minus 1 day and one with minus 7 days, resulting the following:
% head new_test.csv
20220601,A,B,20220525,20220531,1
20220530,A,B,20220523,20220529,1
The awk that was used to produce the above is:
% awk 'BEGIN{FS=OFS=","} { a="date -d \"$(date -d \""$1"\") -7 days\" +'%Y%m%d'"; a | getline st ; close(a) ;b="date -d \"$(date -d \""$1"\") -1 days\" +'%Y%m%d'"; b | getline cb ; close(b) ;print $1","$2","$3","st","cb","$4}' test.csv > new_test.csv
But after applying the above in a large file with more than 100K lines it runs for 20 minutes, is there any way to optimize the awk?
One GNU awk approach:
awk '
BEGIN { FS=OFS=","
secs_in_day = 60 * 60 * 24
}
{ dt = mktime( substr($1,1,4) " " substr($1,5,2) " " substr($1,7,2) " 12 0 0" )
dt1 = strftime("%Y%m%d",dt - secs_in_day )
dt7 = strftime("%Y%m%d",dt - (secs_in_day * 7) )
print $1,$2,$3,dt7,dt1,$4
}
' test.csv
This generates:
20220601,A,B,20220525,20220531,1
20220530,A,B,20220523,20220529,1
NOTES:
requires GNU awk for the mktime() and strftime() functions; see GNU awk time functions for more details
other flavors of awk may have similar functions, ymmv
You can try using function calls, it is faster than calling the .
awk -F, '
function cmd1(date){
a="date -d \"$(date -d \""date"\") -1days\" +'%Y%m%d'"
a | getline st
return st
close(a)
}
function cmd2(date){
b="date -d \"$(date -d \""date"\") -7days\" +'%Y%m%d'"
b | getline cm
return cm
close(b)
}
{
$5=cmd1($1)
$6=cmd2($1)
print $1","$2","$3","$5","$6","$4
}' OFS=, test > newFileTest
I executed this against a file with 20000 records in seconds, compared to the original awk which took around 5 minutes.

Complex CSV parsing with Linux commands

I have a CSV log file that records the properties HA;HB;HC;HD;HE. The following file records 6 entries (separated by the above header).
I would like to extract the 3rd property(HC) of every entry.
HA;HB;HC;HD;HE
a1;b1;14;d;e
HA;HB;HC;HD;HE
a2;b2;28;d;e
HA;HB;HC;HD;HE
a31;b31;44;d;e
a32;b32;07;d;e
HA;HB;HC;HD;HE
a4;b4;0;d;e
HA;HB;HC;HD;HE
a51;b51;32;d;e
a52;b52;0;d;e
a53;b53;5;d;e
HA;HB;HC;HD;HE
a6;b6;10;d;e
Whenever there's n lines of HC recorded per entry, I want to extract the addition of the n entries.
The expected output for the above file:
14
28
51
0
37
10
I know I can write a program for this, but is there an easy way to get this with a combination on awk and/or sed commands?
I haven't tested this; try it and let me know if it works.
awk -F';' '
$3 == "HC" {
if (NR > 1) {
print sum
sum = 0 }
next }
{ sum += $3 }
END { print sum }'
awk solution:
$ awk -F';' '$3=="HC" && p{
print sum # print current total
sum=p=0 # reinitialize sum and p
next
}
$3!="HC"{
sum=sum+($3+0) # make sure $3 is converted to integer. sum it up.
p=1 # set p to 1
} # print last sum
END{print sum}' input.txt
output:
14
28
51
0
37
10
one-liner:
$ awk -F";" '$3=="HC" && p{print sum;sum=p=0;next} $3!="HC"{sum=sum+($3+0);p=1} END{print sum}' input.txt
awk -F';' '/^H.*/{if(f)print s;s=0;f=$3=="HC"}f{s+=$3}END{if(f)print s}' infile
For given inputs:
$ cat infile
HA;HB;HC;HD;HE
a1;b1;14;d;e
HA;HB;HC;HD;HE
a2;b2;28;d;e
HA;HB;HC;HD;HE
a31;b31;44;d;e
a32;b32;07;d;e
HA;HB;HC;HD;HE
a4;b4;0;d;e
HA;HB;HC;HD;HE
a51;b51;32;d;e
a52;b52;0;d;e
a53;b53;5;d;e
HA;HB;HC;HD;HE
a6;b6;10;d;e
$ awk -F';' '/^H.*/{if(f)print s; s=0; f=$3=="HC"}f{s+=$3}END{if(f)print s}' infile
14
28
51
0
37
10
It takes little more care for example:
$ cat infile2
HA;HB;HC;HD;HE
a1;b1;14;d;e
HA;HB;HC;HD;HE
a2;b2;28;d;e
HA;HB;HC;HD;HE
a31;b31;44;d;e
a32;b32;07;d;e
HA;HB;HC;HD;HE
a4;b4;0;d;e
HA;HB;HD;HD;HE <---- Say if HC does not found
a51;b51;32;d;e
a52;b52;0;d;e
a53;b53;5;d;e
HA;HB;HC;HD;HE
a6;b6;10;d;e
# find only HC in 3rd column
$ awk -F';' '/^H.*/{if(f)print s; s=0; f=$3=="HC"}f{s+=$3}END{if(f)print s}' infile2
14
28
51
0
10
# Find HD in 3rd column
$ awk -F';' '/^H.*/{if(f)print s; s=0; f=$3=="HD"}f{s+=$3}END{if(f)print s}' infile2
37
eval "true || $(cat data.csv|cut -d ";" -f3 |sed -e s/"HC"/"0; expr 0"/g |tr '\n' '#'|sed -e s/"##"/""/g|sed -e s/"#"/" + "/g)"
Explanation:
Get contents of the file using cat
Take only the third column using cut delimiter of ;
Replace HC lines with 0; expr 0 values to start building eval-worthy bash expressions to eventually yield expr 0 + 14;
Replace \n newlines temporarily with # to circumvent possible BSD sed limitations
Replace double ## with single # to avoid blank lines turning into spaces and causing expr to bomb out.
Replace # with + to add the numbers together.
Execute the command, but with a true || 0; expr ... to avoid a guaranteed syntax error on the first line.
Which creates this:
true || 0; expr 0 + 14 + 0; expr 0 + 28 + 0; expr 0 + 44 + 07 + 0; expr 0 + 0 + 0; expr 0 + 32 + 0 + 5 + 0; expr 0 + 10
The output looks like this:
14
28
51
0
37
10
This was tested on Bash 3.2 and MacOS El Capitan.
Could you please try following and let me know if this helps you.
awk -F";" '
/^H/ && $3!="HC"{
flag="";
next
}
/^H/ && $3=="HC"{
if(NR>1){
printf("%d\n",sum)
};
sum=0;
flag=1;
next
}
flag{
sum+=$3
}
END{
printf("%d\n",sum)
}
' Input_file
Output will be as follows.
14
28
51
0
37
10
$ awk -F';' '$3=="HC"{if (NR>1) print s; s=0; next} {s+=$3} END{print s}' file
14
28
51
0
37
10

How can I format the timestamp column in a CSV file?

I'm triyng to format the first column of a CSV which is a unix timestamp in milliseconds to a format like this command:
date -d #$( echo "($line_date + 500) / 1000" | bc)
where $line_date is something like 1487693882310
And my file has this information:
1487152859086,,,,,,localhost.localdomain,ServerUpDown,ServerUp,,,,,,, ,,,,
1487613634268,,,,,,localhost.localdomain,ServerUpDown,ServerUp,,,,,,, ,,,,
1487614351573,,,,,,spadmin,logout,,,,,,,, ,,,,
1487614500536,,,,,,System,run,Perform Maintenance,,,,,,, ,,,,
I would like it to be like this:
mié feb 15 11:00:59 CET 2017,,,,,,localhost.localdomain,ServerUpDown,ServerUp,,,,,,, ,,,,
lun feb 20 19:00:34 CET 2017,,,,,,localhost.localdomain,ServerUpDown,ServerUp,,,,,,, ,,,,
lun feb 20 19:12:32 CET 2017,,,,,,spadmin,logout,,,,,,,, ,,,,
lun feb 20 19:15:01 CET 2017,,,,,,System,run,Perform Maintenance,,,,,,, ,,,,
I've tried this but it didn't work:
awk 'BEGIN{FS=OFS=","}{$1=`date -d #$( echo "($date_now + 500) / 1000" | bc)\`}1' file.csv
Any help will be much apreciated.
Thank you very much in advanced.
Kind regards.
Héctor
One way is to leave the CSV line intact and prepend it with the parsed timestamp as the first column.
Something like:
gawk -F, '{ printf "%s.%03u,",strftime("%Y-%m-%dT%H:%M:%S", $1/1000),$1%1000; print }' file.csv
Outputs:
2017-02-15T10:00:59.086,1487152859086,,,,,,localhost.localdomain,ServerUpDown,ServerUp,,,,,,, ,,,,
2017-02-20T18:00:34.268,1487613634268,,,,,,localhost.localdomain,ServerUpDown,ServerUp,,,,,,, ,,,,
2017-02-20T18:12:31.573,1487614351573,,,,,,spadmin,logout,,,,,,,, ,,,,
2017-02-20T18:15:00.536,1487614500536,,,,,,System,run,Perform Maintenance,,,,,,, ,,,,
Or you can rebuild the first field and then print the whole record like this:
echo 1487152859086,,,,,,localhost.localdomain,ServerUpDown,ServerUp,,,,,,, ,,,, | awk 'BEGIN{OFS=FS=","}{$1=strftime("%a %b %d %H:%M:%S %Z %Y", $1)}1'
You'll get this:
Fri Dec 13 14:45:52 CSTM 1901,,,,,,localhost.localdomain,ServerUpDown,ServerUp,,,,,,, ,,,,

Get time in seconds since epoch in Tcl for a specific time

I am trying to convert a date/time of the following format YYYYDDMMHHmm to seconds since epoch in Tcl.
I've tried to do this with the following, but it isn't working:
clock scan 201403251850 -format %Y%m%d%H%M
For the above, I am trying to convert 6:50PM on March 25th, 2014 to seconds since epoc.
How can I achieve this?
Chris
I tested this on Tcl 8.3.3, so should work with 8.0: The regex may need some tweaking to suit the pre-8.1 regexp engine.
proc scan_datetime {datetime} {
# expect a datetime string like "6:50PM on March 25th, 2014"
regexp {^([0-9]+):([0-9]{2})([AP]M) on ([[:alpha:]]+) ([0-9]{1,2}).., ([0-9]{4})} $datetime -> hr min ampm mon day year
if {$ampm == "PM"} then {incr hr 12} elseif {$hr == 12} then {set hr 0}
clock scan "$day $mon $year $hr:$min"
}
puts [clock format [scan_datetime "6:50PM on March 25th, 2014"]]
puts [clock format [scan_datetime "12:34AM on February 1st, 2012"]]
Tue Mar 25 18:50:00 EDT 2014
Wed Feb 01 00:34:00 EST 2012
If the above regular expression doesn't work in 8.0, try this:
proc scan_datetime {datetime} {
set d {[0-9]}
set a {[A-Za-z]}
regexp "^($d$d?):($d$d)(\[AP]M) on ($a+) ($d$d?).., ($d$d$d$d)" $datetime -> hr min ampm mon day year
if {$ampm == "PM"} then {incr hr 12} elseif {$hr == 12} then {set hr 0}
clock scan "$day $mon $year $hr:$min"
}
Specifically for the format YYYYmmddHHMM:
tcl8.3.3 % set t 201403251452
201403251452
tcl8.3.3 % set d {[0-9]}
[0-9]
tcl8.3.3 % regsub "($d$d$d$d)($d$d)($d$d)($d$d)($d$d)" $t {\2/\3/\1 \4:\5} tt
1
tcl8.3.3 % clock scan $tt
1395773520
tcl8.3.3 % clock format [clock scan $tt]
Tue Mar 25 14:52:00 EDT 2014

Extract day of week using perl and mysql date format

I am looping through a range of consecutive dates. I need to find out which are weekends so they can be discarded. One method would be to determine the day of week via perl (what my script is written in) or query with each pass through the loop. There will never be more than 30 dates, usually 5 or less to check. Would it be more efficient to load up Date::Time and use perl or run a Query without needing modules? If perl is the best method I could use a little help extracting the day of week from the YYYY-MM-DD format. I'd be ok with number 1-7 or shortname mon-sun.
2012-05-01
2012-05-02
2012-05-03
2012-05-04
2012-05-05
2012-05-06
not sure if this is possible but perhaps a more suitable solution would be to write a query (since I know the start and end and they are consecutive) to count days between x and y where dayofweek NOT IN(6,7)
See DateTime::Format::Strptime and DateTime.
use 5.010;
use strictures;
use DateTime::Format::Strptime qw();
my $parser = DateTime::Format::Strptime->new(pattern => '%F');
for my $date (qw(
2012-05-01
2012-05-02
2012-05-03
2012-05-04
2012-05-05
2012-05-06
)) {
my $dow = $parser->parse_datetime($date)->day_of_week;
say "$date is a weekend day" if 6 == $dow or 7 == $dow;
}
MySQL has a day of week function you can use directly.
The most obvious solution is to use the Time::Piece module, which has been a core module of Perl since v5.9 and so probably doesn't need installing on your system.
The wday method returns a numeric day of week where 1 == Sunday, so for the weekend you are looking for values of 7 (Saturday) or 1. This can be adjusted so that Saturday is represented by zero instead (and Sunday by 1) by writing
my $dow = $tp->wday % 7;
after which the test for a weekend is simply $dow < 2.
Here is some code to demonstrate.
use strict;
use warnings;
use Time::Piece;
while (<DATA>) {
chomp;
my $tp = Time::Piece->strptime($_, '%Y-%m-%d');
my $dow = $tp->wday % 7;
print $_;
print " (weekend)" if $dow < 2;
print "\n";
}
__DATA__
2012-05-01
2012-05-02
2012-05-03
2012-05-04
2012-05-05
2012-05-06
output
2012-05-01
2012-05-02
2012-05-03
2012-05-04
2012-05-05 (weekend)
2012-05-06 (weekend)
You could use the core Time::Local module and then compute the weekday using localtime. Weekday 0 corresponds to Sunday, and 6 is Saturday.
#! /usr/bin/env perl
use strict;
use warnings;
use Time::Local;
my #dates = qw(
2012-05-01
2012-05-02
2012-05-03
2012-05-04
2012-05-05
2012-05-06
);
my #days = qw/ Sun Mon Tue Wed Thu Fri Sat /;
foreach my $date (#dates) {
my($yyyy,$mm,$dd) = split /-/, $date;
my $time_t = timelocal 0, 0, 0, $dd, $mm-1, $yyyy-1900;
my $wday = (localtime $time_t)[6];
my $weekend = ($wday == 0 || $wday == 6) ? " *" : "";
print "$date: $days[$wday] ($wday)$weekend\n";
}
Output:
2012-05-01: Tue (2)
2012-05-02: Wed (3)
2012-05-03: Thu (4)
2012-05-04: Fri (5)
2012-05-05: Sat (6) *
2012-05-06: Sun (0) *
For fun, you could go Swiss Army Chainsaw and scrape the output of the cal utility.
#! /usr/bin/env perl
use strict;
use warnings;
use 5.10.0; # for smart matching
sub weekday_type {
my($date) = #_;
die "$0: unexpected date '$date'"
unless my($yyyy,$mm,$dd) =
$date =~ /^([0-9]{1,4})-([0-9]{1,2})-([0-9]{1,2})$/;
my $calendar = `cal -m $mm $yyyy`;
die "$0: cal -m $mm $yyyy failed" if $?;
for (split /\n/, $calendar) {
if (/^ \s* [0-9]{1,2} (?: \s+ [0-9]{1,2})* \s*$/x) {
my #dates = split;
my #weekend = splice #dates, #dates > 1 ? -2 : -1;
return "weekend" if ($dd+0) ~~ #weekend;
}
}
"weekday";
}
Use it as in
my #dates = qw(
2012-05-01
2012-05-02
2012-05-03
2012-05-04
2012-05-05
2012-05-06
);
my #days = qw/ Sun Mon Tue Wed Thu Fri Sat /;
foreach my $date (#dates) {
my $type = weekday_type $date;
print "$date: $type\n";
}
Output:
2012-05-01: weekday
2012-05-02: weekday
2012-05-03: weekday
2012-05-04: weekday
2012-05-05: weekend
2012-05-06: weekend
I don’t recommend doing it this way in production.
use Date::Simple qw/date/;
use Date::Range;
my ( $start, $end ) = ( date('2012-05-02'), date('2012-05-16') );
my $range = Date::Range->new( $start, $end );
my #all_dates = $range->dates;
foreach my $d (#all_dates) {
my $date = Date::Simple->new($d);
print $date->day_of_week." ".$d."<br />";
}
#--OUTPUT--#
3 2012-05-02
4 2012-05-03
5 2012-05-04
6 2012-05-05
0 2012-05-06
1 2012-05-07
2 2012-05-08
3 2012-05-09
4 2012-05-10
5 2012-05-11
6 2012-05-12
0 2012-05-13
1 2012-05-14
2 2012-05-15
3 2012-05-16