How to convert data from a custom format to CSV? - csv

I have file that the content of file is as bellow, I have only output two records here but there is around 1000 record in single file:
Record type : GR
address : 62.5.196
ID : 1926089329
time : Sun Aug 10 09:53:47 2014
Time zone : + 16200 seconds
address [1] : 61.5.196
PN ID : 412 1
---------- Container #1 (start) -------
inID : 101
---------- Container #1 (end) -------
timerecorded: Sun Aug 10 09:51:47 2014
Uplink data volume : 502838
Downlink data volume : 3133869
Change condition : Record closed
--------------------------------------------------------------------
Record type : GR
address : 61.5.196
ID : 1926089327
time : Sun Aug 10 09:53:47 2014
Time zone : + 16200 seconds
address [1] : 61.5.196
PN ID : 412 1
---------- Container #1 (start) -------
intID : 100
---------- Container #1 (end) -------
timerecorded: Sun Aug 10 09:55:47 2014
Uplink data volume : 502838
Downlink data volume : 3133869
Change condition : Record closed
--------------------------------------------------------------------
Record type : GR
address : 63.5.196
ID : 1926089328
time : Sun Aug 10 09:53:47 2014
Time zone : + 16200 seconds
address [1] : 61.5.196
PN ID : 412 1
---------- Container #1 (start) -------
intID : 100
---------- Container #1 (end) -------
timerecorded: Sun Aug 10 09:55:47 2014
Uplink data volume : 502838
Downlink data volume : 3133869
Change condition : Record closed
my Goal is to convert this to CSV or txt file like bellow
Record type| address |ID | time | Time zone| address [1] | PN ID
GR |61.5.196 |1926089329 |Sun Aug 10 09:53:47 2014 |+ 16200 seconds |61.5.196 |412 1
any guide would be great on how you think would be best way to start this, the sample that I provided I think will give the clear idea but in words I would want to read the header of each record once and put their data under the out put header.
thanks for your time and help or suggestion

What you're doing is creating an Extract/Transform script (the ET part of an ETL). I don't know which language you're intending to use, but essentially any language can be used. Personally, unless this is a massive file, I'd recommend Python as it's easy to grok and easy to write with the included csv module.
First, you need to understand the format thoroughly.
How are records separated?
How are fields separated?
Are there any fields that are optional?
If so, are the optional fields important, or do they need to be discarded?
Unfortunately, this is all headwork: there's no magical code solution to make this easier. Then, once you have figured out the format, you'll want to start writing code. This is essentially a series of data transformations:
Read the file.
Split it into records.
For each record, transform the fields into an appropriate data structure.
Serialize the data structure into the CSV.
If your file is larger than memory, this can become more complicated; instead of reading and then splitting, for example, you may want to read the file sequentially and create a Record object each time the record delimiter is detected. If your file is even larger, you might want to use a language with better multithreading capabilities to handle the transformation in parallel; but those are more advanced than it sounds like you need to go at the moment.

This is a simple PHP script that will read a text file containing your data and write a csv file with the results. If you are on a system which has command line PHP installed, just save it to a file in some directory, copy your data file next to it renaming it to "your_data_file.txt" and call "php whatever_you_named_the_script.php" on the command line from that directory.
<?php
$text = file_get_contents("your_data_file.txt");
$matches;
preg_match_all("/Record type[\s\v]*:[\s\v]*(.+?)address[\s\v]*:[\s\v]*(.+?)ID[\s\v]*:[\s\v]*(.+?)time[\s\v]*:[\s\v]*(.+?)Time zone[\s\v]*:[\s\v]*(.+?)address \[1\][\s\v]*:[\s\v]*(.+?)PN ID[\s\v]*:[\s\v]*(.+?)/su", $text, $matches, PREG_SET_ORDER);
$csv_file = fopen("your_csv_file.csv", "w");
if($csv_file) {
if(fputcsv($csv_file, array("Record type","address","ID","time","Time zone","address [1]","PN ID"), "|") === FALSE) {
echo "could not write headers to csv file\n";
}
foreach($matches as $match) {
$clean_values = array();
for($i=1;$i<8;$i++) {
$clean_values[] = trim($match[$i]);
}
if(fputcsv($csv_file, $clean_values, "|") === FALSE) {
echo "could not write data to csv file\n";
}
}
fclose($csv_file);
} else {
die("could not open csv file\n");
}
This script assumes that your data records are always formatted similar to the examples you have posted and that all values are always present. If the data file may have exceptions to those rules, the script probably has to be adapted accordingly. But it should give you an idea of how this can be done.
Update
Adapted the script to deal with the full format provided in the updated question. The regular expression now matches single data lines (extracting their values) as well as the record separator made up of dashes. The loop has changed a bit and does now fill up a buffer array field by field until a record separator is encountered.
<?php
$text = file_get_contents("your_data_file.txt");
// this will match whole lines
// only if they either start with an alpha-num character
// or are completely made of dashes (record separator)
// it also extracts the values of data lines one by one
$regExp = '/(^\s*[a-zA-Z0-9][^:]*:(.*)$|^-+$)/m';
$matches;
preg_match_all($regExp, $text, $matches, PREG_SET_ORDER);
$csv_file = fopen("your_csv_file.csv", "w");
if($csv_file) {
// in case the number or order of fields changes, adapt this array as well
$column_headers = array(
"Record type",
"address",
"ID",
"time",
"Time zone",
"address [1]",
"PN ID",
"inID",
"timerecorded",
"Uplink data volume",
"Downlink data volume",
"Change condition"
);
if(fputcsv($csv_file, $column_headers, "|") === FALSE) {
echo "could not write headers to csv file\n";
}
$clean_values = array();
foreach($matches as $match) {
// first entry will contain the whole line
// remove surrounding whitespace
$whole_line = trim($match[0]);
if(strpos($whole_line, '-') !== 0) {
// this match starts with something else than -
// so it must be a data field, store the extracted value
$clean_values[] = trim($match[2]);
} else {
// this match is a record separator, write csv line and reset buffer
if(fputcsv($csv_file, $clean_values, "|") === FALSE) {
echo "could not write data to csv file\n";
}
$clean_values = array();
}
}
if(!empty($clean_values)) {
// there was no record separator at the end of the file
// write the last entry that is still in the buffer
if(fputcsv($csv_file, $clean_values, "|") === FALSE) {
echo "could not write data to csv file\n";
}
}
fclose($csv_file);
} else {
die("could not open csv file\n");
}
Doing the data extraction using regular expressions is one possible method mostly useful for simple data formats with a clear structure and no surprises. As syrion pointed out in his answer, things can get much more complicated. In that case you might need to write a more sophisticated script than this one.

Related

Influxdb : CSV import problem, missing values

This is my first post on Stack OverFlow and I hope to do it properly.
I'm new to influxdb and telegraf, but as part of a project, I want to import metrics, which come from network equipment and are exported by a network manager via .csv file to influxdb.
I get:
#IP :
2020-05-YY :
CSV1
CSV2
2020-05-ZZ
...
The structure of the csv file is as follows:
TimeStamp, NetworkElement_Name, Typeofmeasurement, [depends on the measurement, but in the following example it will be the memory of each card.]
Here is an example :
TimeStamp,NetworkElement_Name,Typeofmeasurement,Object,memAbsoluteUsage
2020-05-05T20:00:00+02:00,router1,CPU/Memory Usage,card1,1075
2020-05-05T20:00:00+02:00,router1,CPU/Memory Usage,card2,832
This file exists twice for the same timestamp but with a different "NetworkElement_Name", "Object" and value.
As far as telegraf is concerned, I created a ".conf" file, for each imported CSV, as follows :
[[inputs.file]]
files = ["/metric/data/data/clean_data/**/**/router_cpuMemUsage.csv"]
data_format = "csv"
csv_header_row_count = 1
csv_skip_rows = 0
csv_skip_columns = 0
csv_delimiter = ","
csv_column_types = ["string","string","string","string","int"]
csv_measurement_column = "Typeofmeasurement"
csv_timestamp_column = "TimeStamp"
csv_timestamp_format = "2006-01-02T15:04:05-07:00"
[[outputs.influxdb]]
database = "router Metrics"
And the data seems imported... but, I realize that some values are missing...
I have difficulty understanding / explaining the problem.
But I can't get all the values recorded at a specific time.
the return request :
> SELECT * FROM "CPU/Memory Usage" WHERE "NE Name" =~ /router1/ ORDER BY DESC LIMIT 5
name: CPU/Memory Usage
time NE Name Object ID Object Type Time Stamp memUsage
---- ------- --------- ----------- ---------- ----------
2020-05-07T06:45:00Z router1 card1 CPU/Memory Usage 2020-05-07T08:45:00+02:00 1075
2020-05-07T06:30:00Z router1 card1 CPU/Memory Usage 2020-05-07T08:30:00+02:00 1075
2020-05-07T06:15:00Z router1 card1 CPU/Memory Usage 2020-05-07T08:15:00+02:00 1075
2020-05-07T06:00:00Z router1 card1 CPU/Memory Usage 2020-05-07T08:00:00+02:00 1075
2020-05-07T05:45:00Z router1 card1 CPU/Memory Usage 2020-05-07T07:45:00+02:00 1075
I have only the information of the card 1, not the one of the 2 and if I remove the "WHERE" clause, for the same "TIMESTAMP", I don't have all the information, I miss the information on the router 2.
The values present for one router at a given "TIMESTAMP" will not be present for the other.
I have trouble understanding where the problem can be prevented.
If one of you has an idea :)

Ruby: Handling different JSON response that is not what is expected

Searched online and read through the documents, but have not been able to find an answer. I am fairly new and part of learning Ruby I wanted to make the script below.
The Script essentially does a Carrier Lookup on a list of numbers that are provided through a CSV file. The CSV file has just one row with the column header "number".
Everything runs fine UNTIL the API gives me an output that is different from the others. In this example, it tells me that one of the numbers in my file is not a valid US number. This then causes my script to stop running.
I am looking to see if there is a way to either ignore it (I read about Begin and End, but was not able to get it to work) or ideally either create a separate file with those errors or just put the data into the main file.
Any help would be much appreciated. Thank you.
Ruby Code:
require 'csv'
require 'uri'
require 'net/http'
require 'json'
number = 0
CSV.foreach('data1.csv',headers: true) do |row|
number = row['number'].to_i
uri = URI("https://api.message360.com/api/v3/carrier/lookup.json?PhoneNumber=#{number}")
req = Net::HTTP::Post.new(uri)
req.basic_auth 'XXX' , 'XXX'
res = Net::HTTP.start(uri.hostname, uri.port, :use_ssl => true) {|http|
http.request(req)
}
json = JSON.parse(res.body)
new = json["Message360"]["Carrier"].values
CSV.open("new.csv", "ab") do |csv|
csv << new
end
end
File Data:
number
5556667777
9998887777
Good Response example in JSON:
{"Message360"=>{"ResponseStatus"=>1, "Carrier"=>{"ApiVersion"=>"3", "CarrierSid"=>"XXX", "AccountSid"=>"XXX", "PhoneNumber"=>"+19495554444", "Network"=>"Cellco Partnership dba Verizon Wireless - CA", "Wireless"=>"true", "ZipCode"=>"92604", "City"=>"Irvine", "Price"=>0.0003, "Status"=>"success", "DateCreated"=>"2018-05-15 23:05:15"}}}
The response that causes Script to stop:
{
"Message360": {
"ResponseStatus": 0,
"Errors": {
"Error": [
{
"Code": "ER-M360-CAR-111",
"Message": "Allowed Only Valid E164 North American Numbers.",
"MoreInfo": []
}
]
}
}
}
It would appear you can just check json["Message360"]["ResponseStatus"] first for a 0 or 1 to indicate failure or success.
I'd probably add a rescue to help catch any other errors (malformed JSON, network issue, etc.)
CSV.foreach('data1.csv',headers: true) do |row|
number = row['number'].to_i
...
json = JSON.parse(res.body)
if json["Message360"]["ResponseStatus"] == 1
new = json["Message360"]["Carrier"].values
CSV.open("new.csv", "ab") do |csv|
csv << new
end
else
# handle bad response
end
rescue StandardError => e
# request failed for some reason, log e and the number?
end

Parse any number of subkeys in a perl hash

I have a perl hash that is obtained from parsing JSON. The JSON could be anything a user defined API could generated. The goal is to obtain a date/time string and determine if that date/time is out of bounds according to a user defined threshold. The only issue I have is that perl seems a bit cumbersome when dealing with hash key/subkey iteration. How can I look through all the keys and determine if a key or subkey exists throughout the hash? I have read many threads throughout stackoverflow, but nothing that exactly meets my needs. I only started perl last week so I may be missing something... Let me know if that's the case.
Below is the "relevant" code/subs. For all code see: https://gitlab.com/Jedimaster0/check_http_freshness
use warnings;
use strict;
use LWP::UserAgent;
use Getopt::Std;
use JSON::Parse 'parse_json';
use JSON::Parse 'assert_valid_json';
use DateTime;
use DateTime::Format::Strptime;
# Verify the content-type of the response is JSON
eval {
assert_valid_json ($response->content);
};
if ( $# ){
print "[ERROR] Response isn't valid JSON. Please verify source data. \n$#";
exit EXIT_UNKNOWN;
} else {
# Convert the JSON data into a perl hashrefs
$jsonDecoded = parse_json($response->content);
if ($verbose){print "[SUCCESS] JSON FOUND -> ", $response->content , "\n";}
if (defined $jsonDecoded->{$opts{K}}){
if ($verbose){print "[SUCCESS] JSON KEY FOUND -> ", $opts{K}, ": ", $jsonDecoded->{$opts{K}}, "\n";}
NAGIOS_STATUS(DATETIME_DIFFERENCE(DATETIME_LOOKUP($opts{F}, $jsonDecoded->{$opts{K}})));
} else {
print "[ERROR] Retreived JSON does not contain any data for the specified key: $opts{K}\n";
exit EXIT_UNKNOWN;
}
}
sub DATETIME_LOOKUP {
my $dateFormat = $_[0];
my $dateFromJSON = $_[1];
my $strp = DateTime::Format::Strptime->new(
pattern => $dateFormat,
time_zone => $opts{z},
on_error => sub { print "[ERROR] INVALID TIME FORMAT: $dateFormat OR TIME ZONE: $opts{z} \n$_[1] \n" ; HELP_MESSAGE(); exit EXIT_UNKNOWN; },
);
my $dt = $strp->parse_datetime($dateFromJSON);
if (defined $dt){
if ($verbose){print "[SUCCESS] Time formatted using -> $dateFormat\n", "[SUCCESS] JSON date converted -> $dt $opts{z}\n";}
return $dt;
} else {
print "[ERROR] DATE VARIABLE IS NOT DEFINED. Pattern or timezone incorrect."; exit EXIT_UNKNOWN
}
}
# Subtract JSON date/time from now and return delta
sub DATETIME_DIFFERENCE {
my $dateInitial = $_[0];
my $deltaDate;
# Convert to UTC for standardization of computations and it's just easier to read when everything matches.
$dateInitial->set_time_zone('UTC');
$deltaDate = $dateNowUTC->delta_ms($dateInitial);
if ($verbose){print "[SUCCESS] (NOW) $dateNowUTC UTC - (JSON DATE) $dateInitial ", $dateInitial->time_zone->short_name_for_datetime($dateInitial), " = ", $deltaDate->in_units($opts{u}), " $opts{u} \n";}
return $deltaDate->in_units($opts{u});
}
Sample Data
{
"localDate":"Wednesday 23rd November 2016 11:03:37 PM",
"utcDate":"Wednesday 23rd November 2016 11:03:37 PM",
"format":"l jS F Y h:i:s A",
"returnType":"json",
"timestamp":1479942217,
"timezone":"UTC",
"daylightSavingTime":false,
"url":"http:\/\/www.convert-unix-time.com?t=1479942217",
"subkey":{
"altTimestamp":1479942217,
"altSubkey":{
"thirdTimestamp":1479942217
}
}
}
[SOLVED]
I have used the answer that #HåkonHægland provided. Here are the below code changes. Using the flatten module, I can use any input string that matches the JSON keys. I still have some work to do, but you can see the issue is resolved. Thanks #HåkonHægland.
use warnings;
use strict;
use Data::Dumper;
use LWP::UserAgent;
use Getopt::Std;
use JSON::Parse 'parse_json';
use JSON::Parse 'assert_valid_json';
use Hash::Flatten qw(:all);
use DateTime;
use DateTime::Format::Strptime;
# Verify the content-type of the response is JSON
eval {
assert_valid_json ($response->content);
};
if ( $# ){
print "[ERROR] Response isn't valid JSON. Please verify source data. \n$#";
exit EXIT_UNKNOWN;
} else {
# Convert the JSON data into a perl hashrefs
my $jsonDecoded = parse_json($response->content);
my $flatHash = flatten($jsonDecoded);
if ($verbose){print "[SUCCESS] JSON FOUND -> ", Dumper($flatHash), "\n";}
if (defined $flatHash->{$opts{K}}){
if ($verbose){print "[SUCCESS] JSON KEY FOUND -> ", $opts{K}, ": ", $flatHash>{$opts{K}}, "\n";}
NAGIOS_STATUS(DATETIME_DIFFERENCE(DATETIME_LOOKUP($opts{F}, $flatHash->{$opts{K}})));
} else {
print "[ERROR] Retreived JSON does not contain any data for the specified key: $opts{K}\n";
exit EXIT_UNKNOWN;
}
}
Example:
./check_http_freshness.pl -U http://bastion.mimir-tech.org/json.html -K result.creation_date -v
[SUCCESS] JSON FOUND -> $VAR1 = {
'timestamp' => '20161122T200649',
'result.data_version' => 'data_20161122T200649_data_news_topics',
'result.source_version' => 'kg_release_20160509_r33',
'result.seed_version' => 'seed_20161016',
'success' => 1,
'result.creation_date' => '20161122T200649',
'result.data_id' => 'data_news_topics',
'result.data_tgz_name' => 'data_news_topics_20161122T200649.tgz',
'result.source_data_version' => 'seed_vtv: data_20161016T102932_seed_vtv',
'result.data_digest' => '6b5bf1c2202d6f3983d62c275f689d51'
};
Odd number of elements in anonymous hash at ./check_http_freshness.pl line 78, <DATA> line 1.
[SUCCESS] JSON KEY FOUND -> result.creation_date:
[SUCCESS] Time formatted using -> %Y%m%dT%H%M%S
[SUCCESS] JSON date converted -> 2016-11-22T20:06:49 UTC
[SUCCESS] (NOW) 2016-11-26T19:02:15 UTC - (JSON DATE) 2016-11-22T20:06:49 UTC = 94 hours
[CRITICAL] Delta hours (94) is >= (24) hours. Data is stale.
You could try use Hash::Flatten. For example:
use Hash::Flatten qw(flatten);
my $json_decoded = parse_json($json_str);
my $flat = flatten( $json_decoded );
say "found" if grep /(?:^|\.)\Q$key\E(?:\.?|$)/, keys %$flat;
You can use Data::Visitor::Callback to traverse the data structure. It lets you define callbacks for different kinds of data types inside your structure. Since we're only looking at a hash it's relatively simple.
The following program has a predefined list of keys to find (those would be user input in your case). I converted your example JSON to a Perl hashref and included it in the code because the conversion is not relevant. The program visits every hashref in this data structure (including the top level) and runs the callback.
Callbacks in Perl are code references. These can be created in two ways. We're doing the anonymous subroutine (sometimes called lambda function in other languages). The callback gets passed two arguments: the visitor object and the current data substructure.
We'll iterate all the keys we want to find and simply check if they exist in that current data structure. If we see one, we count it's existence in the %seen hash. Using a hash to store things we have seen is a common idiom in Perl.
We're using a postfix if here, which is convenient and easy to read. %seen is a hash, so we access the value behind the $key with $seen{$key}, while $data is a hash reference, so we use the dereferencing operator -> to access the value behind $key with $data->{$key}.
The callback needs us to return the $data again so it continues. The last line is just there, it's not important.
I've used Data::Printer to output the %seen hash because it's convenient. You can also use Data::Dumper if you want. In production, you will not need that.
use strict;
use warnings;
use Data::Printer;
use Data::Visitor::Callback;
my $from_json = {
"localDate" => "Wednesday 23rd November 2016 11:03:37 PM",
"utcDate" => "Wednesday 23rd November 2016 11:03:37 PM",
"format" => "l jS F Y h:i:s A",
"returnType" => "json",
"timestamp" => 1479942217,
"timezone" => "UTC",
"daylightSavingTime" =>
0, # this was false, I used 0 because that's a non-true value
"url" => "http:\/\/www.convert-unix-time.com?t=1479942217",
"subkey" => {
"altTimestamp" => 1479942217,
"altSubkey" => {
"thirdTimestamp" => 1479942217
}
}
};
my #keys_to_find = qw(timestamp altTimestamp thirdTimestamp missingTimestamp);
my %seen;
my $visitor = Data::Visitor::Callback->new(
hash => sub {
my ( $visitor, $data ) = #_;
foreach my $key (#keys_to_find) {
$seen{$key}++ if exists $data->{$key};
}
return $data;
},
);
$visitor->visit($from_json);
p %seen;
The program outputs the following. Note this is not a Perl data structure. Data::Printer is not a serializer, it's a tool to make data human readable in a convenient way.
{
altTimestamp 1,
thirdTimestamp 1,
timestamp 1
}
Since you also wanted to constraint the input, here's an example how to do that. The following program is a modification of the one above. It allows to give a set of different constraints for every required key.
I've done that by using a dispatch table. Essentially, that's a hash that contains code references. Kind of like the callbacks we use for the Visitor.
The constraints I've included are doing some things with dates. An easy way to work with dates in Perl is the core module Time::Piece. There are lots of questions around here about various date things where Time::Piece is the answer.
I've only done one constraint per key, but you could easily include several checks in those code refs, or make a list of code refs and put them in an array ref (keys => [ sub(), sub(), sub() ]) and then iterate that later.
In the visitor callback we are now also keeping track of the keys that have %passed the constraints check. We're calling the coderef with $coderef->($arg). If a constraint check returns a true value, it gets noted in the hash.
use strict;
use warnings;
use Data::Printer;
use Data::Visitor::Callback;
use Time::Piece;
use Time::Seconds; # for ONE_DAY
my $from_json = { ... }; # same as above
# prepare one of the constraints
# where I'm from, Christmas eve is considered Christmas
my $christmas = Time::Piece->strptime('24 Dec 2016', '%d %b %Y');
# set up the constraints per required key
my %constraints = (
timestamp => sub {
my ($epoch) = #_;
# not older than one day
return $epoch < time && $epoch > time - ONE_DAY;
},
altTimestamp => sub {
my ($epoch) = #_;
# epoch value should be an even number
return ! $epoch % 2;
},
thirdTimestamp => sub {
my ($epoch) = #_;
# before Christmas 2016
return $epoch < $christmas;
},
);
my %seen;
my %passed;
my $visitor = Data::Visitor::Callback->new(
hash => sub {
my ( $visitor, $data ) = #_;
foreach my $key (%constraints) {
if ( exists $data->{$key} ) {
$seen{$key}++;
$passed{$key}++ if $constraints{$key}->( $data->{$key} );
}
}
return $data;
},
);
$visitor->visit($from_json);
p %passed;
The output this time is:
{
thirdTimestamp 1,
timestamp 1
}
If you want to learn more about the dispatch tables, take a look at chapter two of the book Higher Order Perl by Mark Jason Dominus which is legally available for free here.

A shorter non-repeating alphanumeric code than UUID in MySQL

Is it possible for MySQL database to generate a 5 or 6 digit code comprised of only numbers and letters when I insert a record? If so how?
Just like goo.gl, bit.ly and jsfiddle do it. For exaple:
http://bit.ly/3PKQcJ
http://jsfiddle.net/XzKvP
cZ6ahF, 3t5mM, xGNPN, xswUdS...
So UUID_SHORT() will not work because it returns a value like 23043966240817183
Requirements:
Must be unique (non-repeating)
Can be but not required to be based off of primary key integer value
Must scale (grow by one character when all possible combinations have been used)
Must look random. (item 1234 cannot be BCDE while item 1235 be BCDF)
Must be generated on insert.
Would greatly appreciate code examples.
Try this:
SELECT LEFT(UUID(), 6);
I recommend using Redis for this task, actually. It has all the features that make this task suitable for its use. Foremost, it is very good at searching a big list for a value.
We will create two lists, buffered_ids, and used_ids. A cronjob will run every 5 minutes (or whatever interval you like), which will check the length of buffered_ids and keep it above, say, 5000 in length. When you need to use an id, pop it from buffered_ids and add it to used_ids.
Redis has sets, which are unique items in a collection. Think of it as a hash where the keys are unique and all the values are "true".
Your cronjob, in bash:
log(){ local x=$1 n=2 l=-1;if [ "$2" != "" ];then n=$x;x=$2;fi;while((x));do let l+=1 x/=n;done;echo $l; }
scale=`redis-cli SCARD used_ids`
scale=`log 16 $scale`
scale=$[ scale + 6]
while [ `redis-cli SCARD buffered_ids` -lt 5000 ]; do
uuid=`cat /dev/urandom | tr -cd "[:alnum:]" | head -c ${1:-$scale}`
if [ `redis-cli SISMEMBER used_ids $uuid` == 1]; then
continue
fi
redis-cli SADD buffered_ids $uuid
done
To grab the next uid for use in your application (in pseudocode because you did not specify a language)
$uid = redis('SPOP buffered_ids');
redis('SADD used_ids ' . $uid);
edit actually there's a race condition there. To safely pop a value, add it to used_ids first, then remove it from buffered_ids.
$uid = redis('SRANDMEMBER buffered_ids');
redis('SADD used_ids ' . $uid);
redis('SREM buffered_ids ' . $uid);

awk set elements in array

I have a large .csv file to to process and my elements are arranged randomly like this:
xxxxxx,xx,MLOCAL,MREMOTE,33222,56,22/10/2012,18/10/2012
xxxxxx,xx,MREMOTE,MLOCAL,33222,56,22/10/2012,18/10/2012
xxxxxx,xx,MLOCAL,341993,22/10/2012
xxxxxx,xx,MREMOTE,9356828,08/10/2012
xxxxxx,xx,LOCAL,REMOTE,19316,15253,22/10/2012,22/10/2012
xxxxxx,xx,REMOTE,LOCAL,1865871,383666,22/10/2012,22/10/2012
xxxxxx,xx,REMOTE,1180306134,19/10/2012
where fields LOCAL, REMOTE, MLOCAL or MREMOTE are displayed like:
when they are displayed as pairs (LOCAL/REMOTE) if 3rd field is MLOCAL, and 4th field is MREMOTE, then 5th and 7th field represent the value and date of MLOCAL, and 6th and 8th represent the value and date of MREMOTE
when they are displayed as single (only LOCAL or only REMOTE) then the 4th and 5th fields represent the value and date of field 3.
Now, I have split these rows using:
nawk 'BEGIN{
while (getline < "'"$filedata"'")
split($0,ft,",");
name=ft[1];
ID=ft[2]
?=ft[3]
?=ft[4]
....................
but because I can't find a pattern for the 3rd and 4th field I'm pretty stuck to continue to assign var names for each of the array elements in order to use them for further processing.
Now, I tried to use "case" statement but isn't working for awk or nawk (only in gawk is working as expected). I also tried this:
if ( ft[3] == "MLOCAL" && ft[4]!= "MREMOTE" )
{
MLOCAL=ft[3];
MLOCAL_qty=ft[4];
MLOCAL_TIMESTAMP=ft[5];
}
else if ( ft[3] == MLOCAL && ft[4] == MREMOTE )
{
MLOCAL=ft[3];
MREMOTE=ft[4];
MOCAL_qty=ft[5];
MREMOTE_qty=ft[6];
MOCAL_TIMESTAMP=ft[7];
MREMOTE_TIMESTAMP=ft[8];
}
else if ( ft[3] == MREMOTE && ft[4] != MOCAL )
{
MREMOTE=ft[3];
MREMOTE_qty=ft[4];
MREMOTE_TIMESTAMP=ft[5];
..........................................
but it's not working as well.
So, if you have any idea how to handle this, I would be grateful to give me a hint in order to be able to find a pattern in order to cover all the possible situations from above.
EDIT
I don't know how to thank you for all this help. Now, what I have to do is more complex than I wrote above, I'll try to describe as simple as I can otherwise I'll make you guys pretty confused.
My output should be like following:
NAME,UNIQUE_ID,VOLUME_ALOCATED,MLOCAL_VALUE,MLOCAL_TIMESTMP,MLOCAL_limit,LOCAL_VALUE,LOCAL_TIMESTAMP,LOCAL_limit,MREMOTE_VALUE,MREMOTE_TIMESTAMP,REMOTE_VALUE,REMOTE_TIMESTAMP
(where MLOCAL_limit and LOCAL_limit are a subtract result between VOLUME_ALOCATED and MLOCAL_VALUE or LOCAL_VALUE)
So, in my output file, fields position should be arranged like:
4th field =MLOCAL_VALUE,5th field =MLOCAL_TIMESTMP,7th field=LOCAL_VALUE,
8th field=LOCAL_TIMESTAMP,10th field=MREMOTE_VALUE,11th field=MREMOTE_TIMESTAMP,12th field=REMOTE_VALUE,13th field=REMOTE_TIMESTAMP
Now, an example would be this:
for the following input: name,ID,VOLUME_ALLOCATED,MLOCAL,MREMOTE,33222,56,22/10/2012,18/10/2012
name,ID,VOLUME_ALLOCATED,REMOTE,234455,19/12/2012
I should process this line and the output should be this:
name,ID,VOLUME_ALLOCATED,33222,22/10/2012,MLOCAL_LIMIT, ,,,56,18/10/2012,,
7th ,8th, 9th,12th, and 13th fields are empty because there is no info related to: LOCAL_VALUE,LOCAL_TIMESTAMP,LOCAL_limit,REMOTE_VALUE, and REMOTE_TIMESTAMP
OR
name,ID,VOLUME_ALLOCATED,,,,,,,,,234455,9/12/2012
4th,5th,6th,7th,8th,9th,10thand ,11th, fields should be empty values because there is no info about: MLOCAL_VALUE,MLOCAL_TIMESTAMP,MLOCAL_LIMIT,LOCAL_VALUE,LOCAL_TIMESTAMP,LOCAL_LIMIT,MREMOTE_VALUE,MREMOTE_TIMESTAMP
VOLUME_ALLOCATED is retrieved from other csv file (called "info.csv") based on the ID field which is processed earlier in the script like:
info.csv
VOLUME_ALLOCATED,ID,CLIENT
5242881,64,subscriber
567743,24,visitor
data.csv
NAME,64,MLOCAL,341993,23/10/2012
NAME,24,LOCAL$REMOTE,2347$4324,19/12/2012$18/12/2012
Now, my code is this:
#! /usr/bin/bash
input="info.csv"
filedata="data.csv"
outfile="out"
nawk 'BEGIN{
while (getline < "'"$input"'")
{
split($0,ft,",");
volume=ft[1];
id=ft[2];
client=ft[3];
key=id;
volumeArr[key]=volume;
clientArr[key]=client;
}
close("'"$input"'");
while (getline < "'"$filedata"'")
{
gsub(/\$/,","); # substitute the $ separator with comma
split($0,ft,",");
volume=volumeArr[id]; # Get the volume from the volumeArr, using "id" as key
segment=clientArr[id]; # Get the client mode from the clientArr, using "id" as key
NAME=ft[1];
id=ft[2];
here I'm stuck, I can't find the right way to set the rest of the
fields since I don't know how to handle the 3rd and 4th fields.
? =ft[3];
? =ft[4];
Sorry, if I make you pretty confused but this is my current situation right now.
Thanks
You didn't provide the expected output from your sample input but here's a start to show how to get the values for the 2 different formats of input line:
$ cat tst.awk
BEGIN{ FS=","; OFS="\t" }
{
delete value # or use split("",value) if your awk cant delete arrays
if ($4 ~ /LOCAL|REMOTE/) {
value[$3] = $5
date[$3] = $7
value[$4] = $6
date[$4] = $8
}
else {
value[$3] = $4
date[$3] = $5
}
print
for (type in value) {
printf "%15s%15s%15s\n", type, value[type], date[type]
}
}
$ awk -f tst.awk file
xxxxxx,xx,MLOCAL,MREMOTE,33222,56,22/10/2012,18/10/2012
MREMOTE 56 18/10/2012
MLOCAL 33222 22/10/2012
xxxxxx,xx,MREMOTE,MLOCAL,33222,56,22/10/2012,18/10/2012
MREMOTE 33222 22/10/2012
MLOCAL 56 18/10/2012
xxxxxx,xx,MLOCAL,*341993,22/10/2012*
MLOCAL *341993 22/10/2012*
xxxxxx,xx,MREMOTE,9356828,08/10/2012
MREMOTE 9356828 08/10/2012
xxxxxx,xx,LOCAL,REMOTE,19316,15253,22/10/2012,22/10/2012
REMOTE 15253 22/10/2012
LOCAL 19316 22/10/2012
xxxxxx,xx,REMOTE,LOCAL,1865871,383666,22/10/2012,22/10/2012
REMOTE 1865871 22/10/2012
LOCAL 383666 22/10/2012
xxxxxx,xx,REMOTE,1180306134,19/10/2012
REMOTE 1180306134 19/10/2012
and if you post the expected output we could help you more.