libpcap/tcpdump Why does an empty filter capture less that a "tcp" filter - tcpdump

I was looking for libpcap and filter behaviour.
I found I strange behavior:
Why does an empty filter capture less packets than a "TCP" filter?

Related

Handling arbitrary JSON logs in ELK stack

I trying to set up a full ELK stack for managing logs from our Kubernetes clusters. Our applications are either logging plain text logs or JSON objects. I want to be able to handle searching in the text logs, and also be able to index and search the fields in the JSON.
I have filebeats running on each Kubernetes node, picking up the docker logs, enriching them with various kubernetes fields, and a few fields we use internally. The complete filebeat.yml is:
filebeat.inputs:
- type: container
paths:
- /var/log/containers/*.log
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"
fields:
kubernetes.cluster: <name of the cluster>
environment: <environment of the cluster>
datacenter: <datacenter the cluster is running in>
fields_under_root: true
output.logstash:
hosts: ["logstash-logstash-headless:5044"]
The filebeat is shipping the resulting logs to a central Logstash I have installed. In the logstash I attempt to parse the log message field into a new field called message_parsed. The complete pipeline looks like this:
input {
beats {
port => 5044
type => "beats"
tags => ["beats"]
}
}
filter {
json {
source => "message"
target => "message_parsed"
skip_on_invalid_json => true
}
}
output {
elasticsearch {
hosts => [
"elasticsearch-logging-ingest-headless:9200"
]
}
}
I then have an Elasticsearch cluster installed which received the logs. I have separate Data, Ingest and Master nodes. Apart from some CPU and memory configuration the cluster is completely default settings.
The trouble I'm having is that I do not control the contents of the JSON messages. They could have any field with any type, and we have many cases where the same field exists but the fields values are of differing types. One simple example is the field level, which is usually a string carrying the values "debug", "info", "warn" or "error", but we also run some software that outputs this level as a numeric value. Other cases include error fields sometimes being objects and other times being strings, and date fields sometimes being unix timestamps and sometimes being human readable dates.
This of course makes Elasticsearch complain with a mapper_parsing_exception. Here's an example of one such error:
[2021-04-07T15:57:31,200][WARN ][logstash.outputs.elasticsearch][main][19f6c57d0cbe928f269b66714ce77f539d021549b68dc20d8d3668bafe0acd21] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"logstash", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x1211193c>], :response=>{"index"=>{"_index"=>"logstash-2021.04.06-000014", "_type"=>"_doc", "_id"=>"L80NrXgBRfSv8axlknaU", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"object mapping for [message_parsed.error] tried to parse field [error] as object, but found a concrete value"}}}}
Is there any way I can make Elasticsearch handle that case?

Filtering with regex vs json

When filtering logs, Logstash may use grok to parse the received log file (let's say it is Nginx logs). Parsing with grok requires you to properly set the field type - e.g., %{HTTPDATE:timestamp}.
However, if Nginx starts logging in JSON format then Logstash does very little processing. It simply creates the index, and outputs to Elasticseach. This leads me to believe that only Elasticsearch benefits from the "way" it receives the index.
Is there any advantage for Elasticseatch in having index data that was processed with Regex vs. JSON? E.g., Does it impact query time?
For elasticsearch it doesn't matter how you are parsing the messages, it has no information about it, you only need to send a JSON document with the fields that you want to store and search on according to your index mapping.
However, how you are parsing the message matters for Logstash, since it will impact directly in the performance.
For example, consider the following message:
2020-04-17 08:10:50,123 [26] INFO ApplicationName - LogMessage From The Application
If you want to be able to search and apply filters on each part of this message, you will need to parse it into fields.
timestamp: 2020-04-17 08:10:50,123
thread: 26
loglevel: INFO
application: ApplicationName
logmessage: LogMessage From The Application
To parse this message you can use different filters, one of them is grok, which uses regex, but if your message has always the same format, you can use another filter, like dissect, in this case both will achieve the same thing, but while grok uses regex to match the fields, dissect is only positional, this make a huge difference in CPU use when you have a high number of events per seconds.
Consider now that you have the same message, but in a JSON format.
{ "timestamp":"2020-04-17 08:10:50,123", "thread":26, "loglevel":"INFO", "application":"ApplicationName","logmessage":"LogMessage From The Application" }
It is easier and fast for logstash to parse this message, you can do it in your input using the json codec or you can use the json filter in your filter block.
If you have control on how your log messages are created, choose something that will make you do not need to use grok.

Can I return document data after a FTS match?

Suppose I have this data:
{
"test": "Testing1234"
"false": "Falsify"
}
And then using curl, I write this query:
{"explain": true, "fields": [ "*" ], "highlight": {}, "query": { "query": "Testing"}}
I get a response from couchbase. This includes the document id, as well as a locations object that returns details about where my query matched text in the document, including the parent object. All useful information.
However, I do not receive any additional context. For instance, say I have 100 documents with "test": "TestingXXXX" where XXXX is a random string. My search will not provide me with XXXX. Nor does it provide me any way to read additional fields in the same object (for instance, if I wanted to fetch the "false" property). I will simply get 100 different document IDs to query. Thus, it is technically enough information to obtain all the needed information, however it results in me making 100 different requests based on parsed info from the original response.
Is there any way to return context with FTS matches when using the REST API, without simply querying every document that is matched?
You can get the complete objects by issuing the FTS query from within N1QL using the CURL() function, and then joining that up with the objects themselves.
https://developer.couchbase.com/documentation/server/current/n1ql/n1ql-language-reference/curl.html
Your query would have roughly this form:
SELECT *
FROM yourTable
USE KEYS CURL(ftsURL, ftsQuery, ...)
You'll need to wrap the CURL function in some transformation functions to turn the FTS result into an array of ids.
I realize this is quite schematic, since I don't have a full example handy. But work up through these steps:
Issue the FTS query through CURL() in N1QL.
Transform the FTS results into an array of ids.
Embed the request for the array of ids into a SELECT query using USE KEYS.
I figured it out. It's not an issue with the query. The fields were not being indexed. To fix, I changed the index setting "Store Dynamic Fields" to "True". That said the highlighting did return a lot of extra details and I'm sure it also increases the query times quite a bit. The couchbase documentation seemed to imply it is only used for debugging. Thus, I would like to leave this open in case anyone has further suggestions.

Filter by attribute value in Orion Context Broker does not work

I do not understand why but for some cases filter does not work.
Below is my example:
/v2/entities?type=carparks&q=name==Parking+Tina+Balice+Krakow&options=keyValues
returns:
[
{
"id": "15217701",
"type": "carparks",
"agglomerations": "1",
"name": "Parking Tina Balice Krakow"
}
]
The above axample works correctly but second query does not work:
/v2/entities?type=carparks&q=agglomerations==1
This query returns empty string.
How to filter out this condition:
type = carparks and agglomerations==1
for this object?
Orion:
version": "1.2.0"
Whitespaces in URL query needs to be correctly encoded, either with + or %20. Have a look to this document.
Thus, try the this way
/v2/entities?type=carparks&q=name==Parking+Tina+Balice+Krakow&options=keyValues
or this other
/v2/entities?type=carparks&q=name==Parking%20Tina%20Balice%20Krakow&options=keyValues
EDIT: regarding
/v2/entities?type=carparks&q=agglomerations==1
Note that agglomerations is an string, while by default equal filter search for numbers (when the value to search is a number, of course). Thus, you have two alternatives:
Force to interpret the value as string, using single quotes:
/v2/entities?type=carparks&q=agglomerations=='1'
Create/update the entity using numeric values for agglomerations. This is the option that probably makes more sense as I understand that the agglomerations semantic is of numeric nature.

FOSRestBundle: How to configure a fallback format

My goal is to have a default json output when the user agent is set to anything else than json or xml. I have configured FOSRestBundle 1.1.0 as follows:
fos_rest:
format_listener: true
param_fetcher_listener: true
view:
default_engine: php
formats:
json: true
xml: true
templating_formats:
html: false
view_response_listener: force
routing_loader:
default_format: json
This works. Now I added the following configuration:
format_listener:
rules:
- { fallback_format: json, prefer_extension: false, priorities: ['xml', 'json'] }
As soon as I do that, I can no longer switch between formats by either appending ?_format=json or ?_format=xml and also the Accept header seems to be ignored, it always uses whatever I specify in the accept header.
How to configure FOSRestBundle so that it accepts json or XML via Parameter/HTTP Request header and falls back to json if the format accepted by the browser is HTML?
According to the FOSRestBundle Docs:
Note that if _format is matched inside the route, then a virtual Accept header setting is added with a q setting one lower than the lowest Accept header, meaning that format is checked for a match in the priorities last. If prefer_extension is set to true then the virtual Accept header will be one higher than the highest q causing the extension to be checked first. Setting priorities to a non empty array enables Accept header negotiations.
Also, I noticed that in the rules section, you are missing the path option. So the application doesn't know which path to apply the rules to.
Take a look at the docs I linked to above, it has an example.