How to parse IP within fields in ELK - csv

I am trying to automate/ease a procedure to review firewall rules within ELK (ElasticSearch, Logstash, Kibana).
I have some data obtained from a CSV, which is structured like this:
Source;Destination;Service;Action;Comment
10.0.0.0/8 172.16.0.0/16 192.168.0.0/24 23.2.20.6;10.0.0.1 10.0.0.2 10.0.0.3;udp:53
tcp:53;accept;No.10: ID: INC0000000001
My objective is to import this data within ELK by parsing each field (for subnet and/or IP address) and, if possible, add a sequential field (IP_Source1,IP_Destination2,etc) containing each one.
Is this possible, to your knowledge? How?
Thanks for any hint you may be able to provide

You can create a logstash configuration with input as file. Then use first csv filter. CSV filter should look like this.
filter {
csv {
columns => ["source", "destination", "service", "action", "comment"]
separator => ";"
}
}
Next filter will need to be ruby filter.
filter {
ruby {
code => "
arr = event.get(source).split('')
arr.each.with_index(1) do |a, index|
event.set(ip_source+index, a)
end
"
}
}
Final will be output to elasticsearch.
I have not tested code. But I am hoping this shuld give you good hints.

Related

How can I create an EMR cluster resource that uses spot instances without hardcoding the bid_price variable?

I'm using Terraform to create an AWS EMR cluster that uses spot instances as core instances.
I know I can use the bid_price variable within the core_instance_group block on a aws_emr_cluster resource, but I don't want to hardcode prices as I'd have to change them manually every time the instance type changes.
Using the AWS Web UI, I'm able to choose the "Use on-demand as max price" option. That's exactly what I'm trying to reproduce, but in Terraform.
Right now I am trying to solve my problem using the aws_pricing_product data source. You can see what I have so far below:
data "aws_pricing_product" "m4_large_price" {
service_code = "AmazonEC2"
filters {
field = "instanceType"
value = "m4.large"
}
filters {
field = "operatingSystem"
value = "Linux"
}
filters {
field = "tenancy"
value = "Shared"
}
filters {
field = "usagetype"
value = "BoxUsage:m4.large"
}
filters {
field = "preInstalledSw"
value = "NA"
}
filters {
field = "location"
value = "US East (N. Virginia)"
}
}
data.aws_pricing_product.m4_large_price.result returns a json containing the details of a single product (you can check the response of the example here). The actual on-demand price is buried somewhere inside this json, but I don't know how can I get it (image generated with http://jsonviewer.stack.hu/):
I know I might be able solve this by using an external data source and piping the output of an aws cli call to something like jq, e.g:
aws pricing get-products --filters "Type=TERM_MATCH,Field=sku,Value=8VCNEHQMSCQS4P39" --format-version aws_v1 --service-code AmazonEC2 | jq [........]
But I'd like to know if there is any way to accomplish what I'm trying to do with pure Terraform. Thanks in advance!
Unfortunately the aws_pricing_product data source docs don't expand on how it should be used effectively but the discussion in the pull request that added it adds some insight.
In Terraform 0.12 you should be able to use the jsondecode function to nicely get at what you want with the following given as an example in the linked pull request:
data "aws_pricing_product" "example" {
service_code = "AmazonRedshift"
filters = [
{
field = "instanceType"
value = "ds1.xlarge"
},
{
field = "location"
value = "US East (N. Virginia)"
},
]
}
# Potential Terraform 0.12 syntax - may change during implementation
# Also, not sure about the exact attribute reference architecture myself :)
output "example" {
values = jsondecode(data.json_query.example.value).terms.OnDemand.*.priceDimensions.*.pricePerUnit.USD
}
If you are stuck on Terraform <0.12 you might struggle to do this natively in Terraform other than the external data source approach you've already suggested.
#cfelipe put that ${jsondecode(data.aws_pricing_product.m4_large_price.value).terms.OnDemand.*.priceDimensions.*.pricePerUnit.USD}" in a Locals

is there a way to write u-sql queries without using EXTRACT

I have a metadata activity output which is a json of blobs in my container. I want to input these names into my foreach activity where some u-sql query is performed on the blob as per the file name. Is it possible?
You need to include either a SELECT or an EXTRACT. Since you are pulling from files, you are going to want to use EXTRACT.
If I understand your question correctly, you want to run different U-SQL scripts based on the file name.
There are a couple ways to do this:
1) use If conditions in Data Factory to call different U-SQL scripts based on the file name. Nesting the if statements will allow you to have more than two options. There are several string manipulation functions to help you with this. Say one path is #item.Contains('a').
{
"name": "<Name of the activity>",
"type": "IfCondition",
"typeProperties": {
"expression": {
"value": "#item() == <file name>",
"type": "Expression"
}
"ifTrueActivities": [
{
"<U-SQL script = 1>"
}
],
"ifFalseActivities": [
{
"<U-SQL script 2>"
}
]
}
}
2) The second option is to use a single U-SQL script and do the split from there. Again, string manipulation functions can help via pattern matching. There is some advantage to this as far as organization goes as you can store the unique scripts in stored procedures and the U-SQL script would simply check the file name passed in and call the relevant stored proc.
//This would be added by data factory
DECLARE #fileName = "/Samples/Data/SearchLog.tsv";
IF #fileName == "/Samples/Data/SearchLog.tsv"
THEN
#searchlog =
EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int?,
Urls string,
ClickedUrls string
FROM "/Samples/Data/SearchLog.tsv"
USING Extractors.Tsv();
OUTPUT #searchlog
TO #fileName
USING Outputters.Csv();
ELSE
#searchlog =
EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int?,
Urls string,
ClickedUrls string
FROM #fileName
USING Extractors.Tsv();
OUTPUT #searchlog
TO "/output/SearchLogResult1.csv"
USING Outputters.Csv();
END;
Something to think about is that Data Lake Analytics is going to be more efficient if you can combine multiple files into one statement. You can have multiple EXTRACT and OUTPUT statements. I would encourage you to explore whether or not you could use pattern matching in your EXTRACT statements to split the U-SQL processing without needing the foreach loop in data factory.

Firebase Database Search Query

I am trying to search my database using a string, such as "A". I was just watching this Firebase tutorial Common SQL Queries converted for the Firebase Database - The Firebase Database For SQL Developers #4 and it explains that, in order to search the database for a string (in a certain location), you must use:
firebase.database().ref.child("child_name_here")
.queryOrdered(byChild: "child_name_here")
.queryStarting(atValue: "value_here_uppercase")
.queryEnding(atValue: "value_here_uppercase\\uf8ff")
You must use two \\ in the ending value as an escape character in order to get one \.
When I try this with my Firebase database, it does not work. Here is my database:
{
"Schools": {
"randomUID": {
"location" : "anyTown, anyState",
"name" : "anyName"
}
}
}
Here is my query:
databaseReference.child("Schools")
.queryOrdered(byChild: "name")
.queryStarting(atValue: "A")
.queryEnding(atValue: "A\\uf8ff") ...
When I go to print the snapshot from Firebase, I get back.
If I get rid of the ending .queryEnding(atValue: "A\\uf8ff"), the database returns all of the schools in the Schools node.
How can I search the Firebase database using a String?
queryStarting() and queryEnding() can be used for number. For example: you can get objects with someField varying from 3 to 10.
for searching string: you can search whole string using queryEqualToValue().
This shows all customers that match Wick. (It's not swift but may give you an idea)
// sample
let query = 'Wick'
clientsRef.orderByChild('name')
.startAt(query)
.endAt(query + '\uf8ff')
.once('value', (snapshot) => {
....
})

Logstash - Substring from CSV column

I want to import many informations from a CSV file to Elastic Search.
My issue is I don't how can I use a equivalent of substring to select information into a CSV column.
In my case I have a field date (YYYYMMDD) and I want to have (YYYY-MM-DD).
I use filter, mutate, gsub like:
filter
{
mutate
{
gsub => ["date", "[0123456789][0123456789][0123456789][0123456789][0123456789][0123456789][0123456789][0123456789]", "[0123456789][0123456789][0123456789][0123456789]-[0123456789][0123456789]-[0123456789][0123456789]"]
}
}
But my result is false.
I can indentified my string but I don't how can I extract part of this.
My target it's to have something like:
gsub => ["date", "[0123456789][0123456789][0123456789][0123456789][0123456789][0123456789][0123456789][0123456789]","%{date}(0..3}-%{date}(4..5)-%{date}"(6..7)]
%{date}(0..3} : select from the first to the 4 characters of csv columns date
You can use ruby plugin to do conversion. As you say, you will have a date field. So, we can use it directly in ruby
filter {
ruby {
code => "
date = Time.strptime(event['date'],'%Y%m%d')
event['date_new'] = date.strftime('%Y-%m-%d')
"
}
}
The date_new field is the format you want.
First, you can use a regexp range to match a sequence, so rather than [0123456789], you can do [0-9]. If you know there will be 4 numbers, you can do [0-9]{4}.
Second, you want to "capture" parts of your input string and reorder them in the output. For that, you need capture groups:
([0-9]{4})([0-9]{2})([0-9]{2})
where parens define the groups. Then you can reference those on the right side of your gsub:
\1-\2-\3
\1 is the first capture group, etc.
You might also consider getting these three fields when you do the grok{}, and then putting them together again later (perhaps with add_field).

Gathering information from a textfile. Corona SDK

Earlier I asked about gathering information from API-Link, and I have managed to get out most of the details by using the answar I got.
Now ny problem is when another API to get more information
This time the file will contain this information:
{
"username":"UserName",
"confirmed_rewards":"0",
"round_estimate":"0.00000000",
"total_hashrate":"0.000",
"payout_history":"0",
"round_shares":"0",
"workers":{
"UserName.1":{
"alive":"0",
"hashrate":"0.000"
},
"UserName.2":{
"alive":"0",
"hashrate":"0.000"
},
"UserName.3":{
"alive":"1",
"hashrate":"1517.540",
"last_share_timestamp":1369598007
},
"UserName.4":{
"alive":"0",
"hashrate":"0.000"
}
}
}
And I want to gather each of the workers and print them out. This "workers" could contain multiple information, but always start with "UserName.x", where the username come from the "username" paramter each time.
The numbers will always vary from 0 and up
I want to gether the information in the same way by accessing the document, and decode and print out all the workers, whatever the numbers of them are.
By using the script provided in my last question(look at the link in the start), i was thinking that it would be something like
local t = json.decode( txt )
print("Workers: ".. t["workers.UserName.1"])
But this was not the way.
Due to the username changing all the time, I was also thinking somthing like
print("Workers: ".. t["workers" .. "." .. "username" .. "." .. "1"])
From here I have no clue about how I should gather the information, even when the names and numbers vary
Thanks in advance
Here is the perfect solution:
local json = require "json"
local t = json.decode( jsonFile( "data.json" )
local workers = t.workers
for name, user in pairs(workers) do
print("--------------------")
print(name)
for tag, value in pairs(user) do
print(tag , value)
end
end
Here are some more info:
http://www.coronalabs.com/blog/2011/08/03/tutorial-exploring-json-usage-in-corona/
http://www.coronalabs.com/blog/2011/06/21/understanding-lua-tables-in-corona-sdk/
http://lua-users.org/wiki/TablesTutorial