question on multiline.pattern different examples - multiline

new to filebeat and multiline.pattern configuration as a whole.
I was reading up on multiline.pattern examples and came across this
multiline.pattern examples.
where the example used was multiline.pattern: '^[[space]]'. But lets say if each line after the initial line beginning was a symbol like { or " instead of a whitespace, how do I put it?
multiline.pattern: '^{' or multiline.pattern: '^[[{]]' or something else entirely?
If I want to combine both options will it be multiline.pattern: '^{|^"' instead?
Sorry if it sounds like a dumb and simple question but I am not able to find any relevant/similar queries like this.

This is just a usual regex pattern.
If you have set the:
multiline.negate: false
multiline.match: after
It matches any line against the pattern and appends it to the previous line if match passes.
For your query that if the pattern start with " or { or space, the pattern would be:
^[[:space:]]|\{|[[:space:]]\"
You can refer https://www.elastic.co/guide/en/beats/filebeat/current/multiline-examples.html to get the more detailed examples.

Related

How to modify regular expressions so that it extracts same fields of both fields?

When looking for some logs in one file I got two types of logs (one with white spaces and one without).
I would now like to extract doSomething and doAnotherThing out of these logs with one regular expression.
Logfile 1:
"taskType":"doSomething"
Logfile 2:
"taskType" : "doAnotherThing"
I coded this regular expression: taskType.....(?<taskType1>\w+)
It works good for Logfile 1 but not for Logfile 2, because it cuts the first two characters of the word. Is there a way to eliminate this issue?
Thanks!
"taskType"\s?:\s?"(doSomething|doAnotherThing)" works for me, try it here https://regex101.com/r/Mx1RtT/1

REGEXP MySQL - several Groups no matter which order

i try to create a REGEX in MySQL but after 4 hours of reading other examples and try & error i hope someone can help me to fix my regex and bring it to work.
What i need is:
to match several strings in a seperatet("###") text no matter in which position/order they are inside "###TEXT###TEXT###"
My regex so far which work, but only if the strings in the exact order like in the regex. So i have to say my regex search STRING1 && STRING2 inside ### ### no matter in which position.
###([^#]*)(9034==1-wellig)([^#]*)(9037==DIN C4)([^#]*)###
My Text:
###9021==220|9034==1-wellig|9023==356|9024==230|9037==DIN C4###9021==220|9034==2-wellig|9023==356|9037==DIN C4|9024==230###9021==220|9034==1-wellig|9023==356|9037==DIN C4|9024==230###
When i modify my Text to something like that("9037==DIN C4" before "9034==1-wellig") it do not work:
###9021==220|9037==DIN C4|9034==1-wellig|9023==356|9024==230###9021==220|9034==2-wellig|9023==356|9037==DIN C4|9024==230###9021==220|9037==DIN C4|9034==1-wellig|9023==356|9024==230###
Example: https://regex101.com/r/amal7c/1
I hope i explained my problem clearly, im sure it´s only 1 small change but i do not get it to work...
Best Regards
Tom
You cannot use lookarounds in MySQL regex, you need to use alternations:
WHERE col REGEXP '###[^#]*(9034==1-wellig[^#]*9037==DIN C4|9037==DIN C4[^#]*9034==1-wellig)[^#]*###'
See this regex demo.
Details
### - literal ###
[^#]* - zero or more chars other than #
(9034==1-wellig[^#]*9037==DIN C4|9037==DIN C4[^#]*9034==1-wellig) -
an alternation capturing group matching
9034==1-wellig[^#]*9037==DIN C4 - 9034==1-wellig, zero or more chars other than # and 9037==DIN C4
| - or
9037==DIN C4[^#]*9034==1-wellig - 9037==DIN C4, zero or more chars other than # and 9034==1-wellig
[^#]*### - zero or more chars other than # followed with ###.
OMG ok i think i just found the solution by myself, it working on regex101 but not in MySQL :/
###(?=[^#]*(9034==1-wellig))(?=[^#]*(9037==DIN C4))[^#]*###
But if some of you experts can enhance my regex im also very thankfull :)
https://regex101.com/r/amal7c/2

I need to remove a piece of every line in my json file

I have a json output on my notepad and i know it is not in the correct format. At the end of each line there is a time stamp which is causing the bad format. I want to get rid of it using find and replace since the file is pretty big. The format is as follows :
"eventtimestamp": "05 23 2017 04:01:02"}
The above piece comes in at the end of every line. How can i get rid of it using find a replace or any other way.
All help is appreciated.
Thank you
If you need to alter every line in a consistent way then regex find/replace is a good option. Free tools like atom.io, Notepad++, and plenty of others offer this feature.
Assuming "eventtimestamp" is constant, then a simple regex that says "find everything starting with "eventtimestamp" and up to a '}'" will work.
"eventtimestamp".*(?=})
And "replace" that with an empty string.
ps) here's a demo of the regex in regexr.com--hovering over the parts of the pattern will explain what they do.
If you are not sure that the eventtimestamp field always comes in at the end of a line and/or as the last element of the object, prefer that kind of pattern: "eventtimestamp":\s*"[^"]+",?.
Note the useful surrounded excepted character class pattern "[^"]+" that can be declined with any other delimiter.

word2vec : find words similar in a case insensitive manner

I have access to word vectors on a text corpus of my interest. Now, the issue I am faced with is that these vectors are case sensitive, i.e for example "Him" is different from "him" is different from "HIM".
I would like to find words most similar to the word "Him" is a case insensitive manner. I use the distance.c program that comes bundled with the Google word2vec package. Here is where I am faced with an issue.
Should I pass as arguments "Him him HIM" to the distance.c executable. This would return the sent of words closed to the 3 words.
Or should I run the distance.c program separately with each of the 3 arguments ("Him" and "him" and "HIM"), and then put together these lists in a sensible way to arrive at the most similar words? Please suggest.
If you want to find similar words in a case-insensitive manner, you should convert all your word vectors to lowercase or uppercase, and then run the compiled version of distance.c.
This is fairly easy to do using standard shell tools.
For example, if your original data in a file called input.txt, the following will work on most Unix-like shells.
tr '[:upper:]' '[:lower:]' < input.txt > output.txt
You can transform the binary format to text, then manipulate as you see fit.

Find and Replace with Notepad++

I have a document that was converted from PDF to HTML for use on a company website to be referenced and indexed for search. I'm attempting to format the converted document to meet my needs and in doing so I am attempting to clean up some of the junk that was pulled over from when it was a PDF such as page numbers, headers, and footers. luckily all of these lines that need to be removed are in blocks of 4 lines unfortunately they are not exactly the same therefore cannot be removed with a simple literal replace. The lines contain numbers which are incremental as they correlate with the pages. How can I remove the following example from my html file.
Title<br>
10<br>
<hr>
<A name=11></a>Footer<br>
I've tried many different regular expression attempts but as my skill in that area is limited I can't find the proper syntax. I'm sure i'm missing something fairly easy as it would seem all I need is a wildcard replace for the two numbers in the code and the rest is literal.
any help is apprciated
The search & replace of npp is quite odd. I can't find newline charactes with regular expression, although the documentation says:
As of v4.9 the Simple find/replace (control+h) has changed, allowing the use of \r \n and \t in regex mode and the extended mode.
I updated to the last version, but it just doesn't work. Using the extended mode allows me to find newlines, but I can't specify wildcards.
However, you can use the macros to overcome this problems.
prepare a search that will find a unique passage (like Title<br>\r\n, here you can use the extended mode)
start recording a macro
press F3 to use your search
mark the four lines and delete them
stop recording the macro ... done!
Just replay it and it deletes what you wanted to delete.
If I have understood your request correctly this pattern matches your string:
Title<br>( ?)\n([0-9]+)<br>( ?)\n<hr>( ?)\n<A name=([0-9]+)></a>Footer<br>
I use the Regex Coach to try out complicated regex patterns. Other utilities are available.
edit
As I do not use Notepad++ I cannot be sure that this pattern will work for you. Apologies if that transpires to be the case. (I'm a TextPad man myself, and it does work with that tool).