using logstash to parse csv file

using logstash to parse csv file - csv

I have an elasticsearch index which I am using to index a set of documents.
These documents are originally in csv format and I am looking parse these using logstash as this has powerful regular expression tools such as grok.
My problem is that I have something along the following lines
field1,field2,field3,number#number#number#number#number#number
In the last column I have key value pairs key#value separated by # and there can be any number of these
Is there a way for me to use logstash to parse this and get it to store the last column as the following json in elasticsearch (or some other searchable format) so I am able to search it
[
{"key" : number, "value" : number},
{"key" : number, "value" : number},
...
]

First, You can use CSV filter to parse out the last column.
Then, you can use Ruby filter to write your own code to do what you need.
input {
stdin {
}
}
filter {
ruby {
code => '
b = event["message"].split("#");
ary = Array.new;
for c in b;
keyvar = c.split("#")[0];
valuevar = c.split("#")[1];
d = "{key : " << keyvar << ", value : " << valuevar << "}";
ary.push(d);
end;
event["lastColum"] = ary;
'
}
}
output {
stdout {debug => true}
}
With this filter, When I input
1#10#2#20
The output is
"message" => "1#10#2#20",
"#version" => "1",
"#timestamp" => "2014-03-25T01:53:56.338Z",
"lastColum" => [
[0] "{key : 1, value : 10}",
[1] "{key : 2, value : 20}"
]
FYI. Hope this can help you.

Related

JSON array into line using mule dataweave

I have the below requirement.
Input is
{ "packageConfiguration": [
{
"packageId": [
"AIM_PACKAGE"
],
"component": [
"Handbook"
],
"fieldName": [
"Upload Handbook Document"
],
"assetUrl": [
"sflydamlocation.handbookfilename.pdf"
]
}
]}
I need to convert above json array into this output format:
{
"pakage": ""packageId":"AIM_PACKAGE", "component":"Handbook", "fieldName":"Upload Handbook Document","assetUrl":"sflydamlocation.handbookfilename.pdf""
}

You can do that treating all fields as strings, however note that:
The inner quotes must be escaped. Otherwise the output is not valid JSON.
Take in account that the value of "package" is not really valid JSON either, in case you want to parse it. It should an object (eg " { \"package\":... }")
This script expects all the arrays to have exactly 1 element. More elements are ignored and less could give an error. This is not a very robust design.
Script (not recommended):
%dw 2.0
output application/json
---
package: using (pc = payload.packageConfiguration[0]) (
" \"packageId\": \"$(pc.packageId[0])\", " ++
" \"component\": \"$(pc.component[0])\" " ++
" \"fieldName\": \"$(pc.fieldName[0])\" " ++
" \"assetUrl\": \"$(pc.assetUrl[0])\" "
)
Output:
{
"package": " \"packageId\": \"AIM_PACKAGE\", \"component\": \"Handbook\" \"fieldName\": \"Upload Handbook Document\" \"assetUrl\": \"sflydamlocation.handbookfilename.pdf\" "
}
This is an ugly string concatenation. Instead I would suggest to just write the desired output as a JSON object.
Script (recommended):
%dw 2.0
output application/dw
var pc = payload.packageConfiguration[0]
---
package:
write({
packageId: pc.packageId[0],
component: pc.component[0],
fieldName: pc.fieldName[0],
assetUrl: pc.assetUrl[0]
}, "application/json") replace /\n/ with ""
Output
{
"package": "{ \"packageId\": \"AIM_PACKAGE\", \"component\": \"Handbook\", \"fieldName\": \"Upload Handbook Document\", \"assetUrl\": \"sflydamlocation.handbookfilename.pdf\"}"
}
The second script is much cleaner, less error prone and returns an escaped JSON object that you could unescape to use as JSON.

Something like this should work, unless you require something more flexible. I'm assuming you're working w/ Mule3/DW1:
%dw 1.0
%output application/json
%var packageConfig = payload.packageConfiguration[0]
---
{
package: packageConfig mapObject ((value, key) -> {
(key): value[0]
})
}

Extracting Json from a Json Data in Hive

I have the following data which is in the json format in a column called details of table named customer in Hive:
{
"customer" : {
"given_name" : "Anuvrat",
"surname" : "Singh"
},
"order" : {
"id" : "123dfe523gd"
},
"address" : {
"city" : "kolkata",
"pin" : "700091"
},
"phone" : {
"mobile" : "*********"
}
}
I have to remove the address and phone from the json data and the data should look like:
{
"customer" : {
"given_name" : "Anuvrat",
"surname" : "Singh"
},
"order" : {
"id" : "123dfe523gd"
}
}
How to do(i.e update) for every row present in the table?
I tried the following command hadoop fs -cat /home/customer/* | jq '.details[] |= del(.address,.phone)' yet I dint get the expected output rather getting error saying
parse error: Invalid numeric literal at line 1, column 93
cat: Unable to write to output stream.

If you're open to a solution not using Hive, I want to remark that this is something very easy to do with jq command line JSON parser.
Given your input file, you would do:
jq 'del(.address,.phone)' file
If you want to remove address and phone objects for all entries of the table, you can do:
jq '.[] |= del(.address,.phone)' file

This is the query which I ran to get the above result:
INSERT OVERWRITE TABLE customer Select id,CASE WHEN id is not null THEN concat('{"customer":',get_json_object(details,'$.customer'),',"order":',get_json_object(details,'$.order'),'"}') ELSE details END AS details FROM customer;

Working on JSON based logs using logstash

I have a logs which contains logs as following format
{ "start_time" : "12-May-2011", "name" : "this is first heading", "message" : "HELLO this is first message" }
{ "start_time" : "13-May-2011", "name" : "this is second heading", "message" : "HELLO this is second message" }
{ "start_time" : "14-May-2011", "name" : "this is third heading", "message" : "HELLO this is third message" }
...
I am new to logstash, I am currently having an app that is creating this log entries as JSON strings one below the other in that file (say location as /root/applog/scheduler.log)
I m looking for some help on how to parse this json from the logs into different fields to the stdout. How does the conf file should be.
note: idea is later to use it to kibana for visualization.

Example config:
input {
file {
path => ["/root/applog/scheduler.log"]
codec => "json"
start_position => "beginning" # If your file already exists
}
}
filter { } # Add filters here (optional)
output {
elasticsearch { } # pass the output to ES to prepare visualization with kibana
stdout { codec => "rubydebug" } # If you want to see the result in stdout
}

Logstash includes a json codec that will split your json into fields for you.

How can I remove field which are nil in CSV file

My CSV file contains fields which are nil like that :
{ "message" => [
[0] "m_FRA-LIENSs-R2012-1;\r"
],
"#version" => "1",
"#timestamp" => "2015-05-24T13:51:14.735Z",
"host" => "debian",
"SEXTANT_UUID" => "m_FRA-LIENSs-R2012-1",
"SEXTANT_ALTERNATE_TITLE" => nil
}
How can I remove all : messages and fields
Here is my CSV file
SEXTANT_UUID|SEXTANT_ALTERNATE_TITLE
a1afd680-543c | ZONE_ENJEU
4b80d9ad-e59d | ZICO
800d640f-1f82 |
I want to delete the last line, I used filter ruby, but it doesn't work! It remove just the field not the entire message.

If you configure your Ruby filter like this, it will work:
filter {
# let ruby check all fields of the event and remove any empty ones
ruby {
code => "event.to_hash.delete_if {|field, value| value.blank? }"
}
}

I used if ([message]=~ "^;") { drop { } } ans it's work => that for csv file

Encode complex JSON structure using Perl

I want to encode the result of a MySQL query into a JSON string using JSON::XS. The JSON string needs to look like this
{
"database" : "dbname"
"retentionPolicy" : "mytest",
"tags" : {
"type" : "generate",
"location" : "total",
"source" : "ehz"
},
"points" : [{
"precision" : "ms",
"timestamp" : "ts1",
"name" : "power",
"values" : {
"value" : "val1"
}
}, {
"precision" : "ms",
"timestamp" : "ts2",
"name" : "power",
"values" : {
"value" : "val2"
}
}, {
"precision" : "ms",
"timestamp" : "ts3",
"name" : "power",
"values" : {
"value" : "val3"
}
}
]
}
The points array with each point's values element is giving me immense headaches.
Here is the code block that generates the JSON
my %json_body = (
'database' => $db_name,
'retentionPolicy' => $retention,
'tags' => {
'source' => $metric_source,
'type' => $metric_type,
'location' => $metric_location
}
);
# loop through mysql result
while ( ($timestamp, $value) = $query->fetchrow_array() ) {
my %json_point1 = (
'name' => $series_name,
'timestamp' => ($timestamp * 1),
'precision' => "ms"
);
%json_point2 = ('value' => $value);
%json_values = (%json_point1, 'values' => \%json_point2);
push(#all_values, \%json_values);
}
$query->finish();
# Encode json
my %json_data = (%json_body, "points" => \#all_values);
$influx_json = encode_json(\%json_data);
I think the line push(#all_values, \%json_values) is my problem. If I pass %json_data as a hash reference, only the last value from the while loop is retained. If I use %json_values directly, the encoded JSON is messed up because it loses the structure.
Any hint would be appreciated. And please bear with me: this array and hash references are already making my head explode.

I'm pretty sure you problem will be because you're using a globally scoped hash for %json_point and %json_point2.
You see, the root of this is - you simply don't get a list of hashes. You get a list of hash references.
So the problem here is - when you push a reference to your hash into #all_values - you're pushing the same reference each time. But then you're overwriting the contents of the hash that you're referencing.
Try this:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my %hash_thing;
my #all_values;
for ( 1..3 ) {
%hash_thing = ( "test" => $_ );
push ( #all_values, \%hash_thing ) ;
}
print join ( "\n", #all_values );
print Dumper \#all_values;
And you'll see you have the same 'value' 3 times:
HASH(0x74478c)
HASH(0x74478c)
HASH(0x74478c)
And so if you dump it, then of course - you don't get the right array - and so your encoded JSON doesn't work either.
$VAR1 = [
{
'test' => 3
},
$VAR1->[0],
$VAR1->[0]
];
The simplest fix is to use my to scope the hashes to the loop. (And turn on use strict; and use warnings if you haven't.)
Alternatively, you can use a hash reference like this:
my #all_values;
my $hash_ref;
for ( 1..3 ) {
$hash_ref = { "test" => $_ };
push ( #all_values, $hash_ref ) ;
}
print #all_values;
print Dumper \#all_values;
Because $hash_ref is a scalar, and it's a reference to an anonymous hash, it can be inserted into the array by value, rather than by reference.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

using logstash to parse csv file - csv

Related

JSON array into line using mule dataweave

Extracting Json from a Json Data in Hive

Working on JSON based logs using logstash

How can I remove field which are nil in CSV file

Encode complex JSON structure using Perl

Categories

Resources