Splunk: How to extract field directly in Search command using regular expressions? - extract

I have some log files which looks like this one:
2020-11-18 00:11:22.333 INFO [ABC_service,[{"method":"doSomething","id":"123456789","jsonrpc":"2.0","params":{"taskType":"certainType","clientNotificationInfo":{"priority":xy,"expirationDate":111111111},"priority":xy,"deviceId":"000000000000000","taskPayload":{},"timeout":22222222}}, XYZ]
I now would like to extract fields directly in my search and make a table of the extracted values. I would like to extract the taskType, here: certainType. Now, I was wondering about how to do this.
I tried this command:
source="/log/ABCDE/ABCDE_service.log" doSomething | rex field=_raw "taskType: (?<taskType>.*)" | table taskType
But got an empty table. What is wrong here?
But I got an empty table for both values.

You have the right idea, but the regular expression in the rex command does not match the sample data. Try this.
source="/log/ABCDE/ABCDE_service.log" doSomething
| rex field=_raw "taskType\\\":\\\"(?<taskType>[^\\\"]+)"
| table taskType
The extra backslashes are needed for the multiple layers of escaping needed to get the quotation marks into the regex processor.
BTW, I like to use regex101.com to test regular expressions.

Related

SQL replace all specified keys

I have one column(varchar) containing only json string within one table. I want replace all keys with "" on that column. How can I do that using sql? My database is MySQL.
For example:
|--------------------------------------------------------------------|
| t_column |
|--------------------------------------------------------------------|
| {"name":"mike","email":"xxx#example.com","isManage":false,"age":22}|
|--------------------------------------------------------------------|
SELECT replace(t_column, regexp, "") FROM t_table
I expect:
mikexxx#example.comfalse22
How to write that regexp?
Start from
select t_column->'$.*' from test
This will return a JSON array of attribute values:
[22, "mike", "xxx#example.com", false]
This might be already all you need, and you can try something like
select *
from test
where t_column->'$.*' like '%mike%';
Unfortunately there seems to be no native way to join array values to a single string like JSON_ARRAY_CONCAT(). In MySQL 8.0 you can try REGEXP_REPLACE() and strip all JSON characters:
select regexp_replace(t_column->'$.*', '[" ,\\[\\]]', '') from test
which will return '22mikexxx#example.comfalse'.
If the values can contain one of those characters, they will also be removed.
Note: That isn't very reliable. But it's all I can do in a "simple" way.
See demo on db-fiddle.
I could be making it too simplistic, but this is just a mockup based on your comment. I can formalize it into a query if it fits your requirement.
Let's say you get your JSON string to this format where you replace all the double quotes and curly brackets and then add a comma at the end. After playing with replace and concat_ws, you are now left with:
name:mike,email:xxx#example.com,isManage:false,age:22,
With this format, every value is now preceded by a semicolon and followed by a comma, which is not true for the key. Let's say you now want to see if this JSON string has the value "mike" in it. This, you could achieve using
select * from your_table where json_col like '%:mike,%';
If you really want to solve the problem with your approach then the question becomes
What is the regex that selects all the undesired text from the string {"name":"mike","email":"xxx#example.com","isManage":false,"age":22} ?
Then the answer would be: {\"name\":\"|\"email\":\"|\",\"isManage\":|,\"age\":|}
But as others let you notice I would actually approach the problem parsing JSONs. Look up for functions json_value and json_query
Hope I helped
PS: Keep close attention on how I structured the bolded sentence. Any difference changes the problem.
EDIT:
If you want a more generic expression, something like select all the text that is not a value on a json-formatted string, you can use this one:
{|",|"\w+\":|"|,|}

MYSQL REGEXP with JSON array

I have an JSON string stored in the database and I need to SQL COUNT based on the WHERE condition that is in the JSON string. I need it to work on the MYSQL 5.5.
The only solution that I found and could work is to use the REGEXP function in the SQL query.
Here is my JSON string stored in the custom_data column:
{"language_display":["1","2","3"],"quantity":1500,"meta_display:":["1","2","3"]}
https://regex101.com/r/G8gfzj/1
I now need to create a SQL sentence:
SELECT COUNT(..) WHERE custom_data REGEXP '[HELP_HERE]'
The condition that I look for is that the language_display has to be either 1, 2 or 3... or whatever value I will define when I create the SQL sentence.
So far I came here with the REGEX expression, but it does not work:
(?:\"language_display\":\[(?:"1")\])
Where 1 is replaced with the value that I look for. I could in general look also for "1" (with quotes), but it will also be found in the meta_display array, that will have different values.
I am not good with REGEX! Any suggestions?
I used the following regex to get matches on your test string
\"language_display\":\[(:?\"[0-9]\"\,)*?\"3\"(:?\,\"[0-9]\")*?\]
https://regex101.com/ is a free online regex tester, it seems to work great. Start small and work big.
Sorry it doesn't work for you. It must be failing on the non greedy '*?' perhaps try without the '?'
Have a look at how to serialize this data, with an eye to serializing the language display fields.
How to store a list in a column of a database table
Even if you were to get your idea working it will be slow as fvck. Better off to process through each row once and generate something more easily searched via sql. Even a field containing the comma separated list would be better.

Replace not working

I have a text-file of data in key-value pairs that I have managed to convert to a format where the key-value pairs are all separated by an underscore between them, and the key is separated from the value by a colon. I thought this format would be useful for keeping spaces intact within the data. Here's an example with the data substituted for ~~~~~~~s.
_ID:~~~_NAME:~~~~~_DESCRIPTION:~~~~~~~_TYPE1:~~~~~~_TYPE2:~~~~~~ ...etc
I want to convert this to a MySQL script to insert the data into a table. My problem is there are nullable fields that aren't included in every record. e.g. A record has a _TYPE1: and may or may not have a _TYPE2:
... _DESCRIPTION:~~~~~~_TYPE1:~~~~~~_TYPE2:~~~~~~_ADDRESS:~~~~~~~ ...
... _DESCRIPTION:~~~~~~_TYPE1:~~~~~~_ADDRESS:~~~~~~~ ...
... _DESCRIPTION:~~~~~~_TYPE1:~~~~~~_ADDRESS:~~~~~~~ ...
... _DESCRIPTION:~~~~~~_TYPE1:~~~~~~_TYPE2:~~~~~~_ADDRESS:~~~~~~~ ...
... _DESCRIPTION:~~~~~~_TYPE1:~~~~~~_ADDRESS:~~~~~~~ ...
I thought to fix this by inserting _TYPE2: after every _TYPE1 without a _TYPE2:. Since there are only a few different possible types, I managed to select the _ after each _TYPE1:~~~~~~ without a TYPE2: following it. I used the following regex, where egtype is one example of a possible type:
(?<=_TYPE1:egtype)_(?!TYPE2:)
At this point, all I have to do is replace that _ with _TYPE2:_ and every field is present in every line, which makes it easy to convert every row to a MySQL insert statement! Unfortunately, Notepad++ is not replacing it when I click the Replace button. I'm not sure why.
Does anyone know why it wouldn't replace an _ with _TYPE2:_ using that particular regex? Or does anyone have any other suggestions on how to turn all this data into a MySQL insert script?
Regex
To do what you want, try this:
Find:
_TYPE1:[^_]+\K(?!.*_TYPE2)
Replace:
_TYPE2:
You can test it with your sample data and have it explained here.
Python Script plugin
As a side note, I don't think it's possible to convert your data into SQL insert statements with the use of one and only one regular expression, and while I see what you are trying to do by adding fake TYPE2, I don't think it is your best option.
So, my suggestion is to use Notepad++'s Python Script plugin.
Install Python Script plugin, from Plugin Manager or from the official website.
Then go to Plugins > Python Script > New Script. Choose a filename for your new file (eg sql_insert.py) and copy the code that follows.
Run Plugins > Python Script > Scripts > sql_insert.py and a new tab will show up the desired result.
Script:
columns = [[]]
values = [[]]
current_line = 0
def insert(line, match):
global current_line
if line > current_line:
current_line += 1
columns.append([])
values.append([])
if match:
i = 0
for m in match.groups():
if i % 2 == 0:
columns[line].append(m)
else:
values[line].append(m)
i += 1
editor.pysearch("_([A-Z0-9]+):([^_\n]+)", insert)
notepad.new()
for line in range(len(columns)):
editor.addText("INSERT INTO table (" + ",".join(columns[line]) + ") values (" + ",".join(values[line]) +");\n")
Note: I'm still learning Python and I've a feeling that this one could be written in a better way. Feel free to edit my answer or drop a comment if you can suggest improvements!
Example input:
_ID:~~~_NAME:~~~~~_DESCRIPTION:~~~~~~~_TYPE1:~~~~~~_TYPE2:~~~~~~
_ID:~~~_NAME:~~~~~_DESCRIPTION:~~~~~~_TYPE1:~~~~~~_TYPE2:~~~~~~_ADDRESS:~~~~~~~
_ID:~~~_NAME:~~~~~_DESCRIPTION:~~~~~~_TYPE1:~~~~~~_ADDRESS:~~~~~~~
Example output:
INSERT INTO table (ID,NAME,DESCRIPTION,TYPE1,TYPE2) values (~~~,~~~~~,~~~~~~~,~~~~~~,~~~~~~);
INSERT INTO table (ID,NAME,DESCRIPTION,TYPE1,TYPE2,ADDRESS) values (~~~,~~~~~,~~~~~~,~~~~~~,~~~~~~,~~~~~~~);
INSERT INTO table (ID,NAME,DESCRIPTION,TYPE1,ADDRESS) values (~~~,~~~~~,~~~~~~,~~~~~~,~~~~~~~);
try searching for (_TYPE1:)(\S\S\S\S\S\S)(_ADDRESS:)
and replacing with \1\2_TYPE2:~~~~~~\3
i tested in notepad++ with your data and it works
don't forget to change the Search Mode to regular expression.
to turn it into an INSERT script just keep using regular expression like i did above, and bracket which ever field you want and then replace with a \number whichever field and move them around it should be pretty simple manual labor, have fun.
for example search for your whole line here i am only doing DESCRIPTION,TYPE1,and TYPE2
search for using regular expression
(_DESCRIPTION)(:)(\S\S\S\S\S\S)(_TYPE1)(:)(\S\S\S\S\S\S)(_TYPE2)(:)(\S\S\S\S\S\S)
then replace with something like
INSERT INTO table1\(desc,type1,type2\)values\('\3','\6','\9'\); (in notepad++)
If this is a once-off problem then a two step process would work. First step would add a _TYPE2:SomeDefaultValue to every line. Step two would remove it from lines where it was not needed.
Step 1: Find what: $, Replace with: _TYPE2:xxx
Step 2: Find what: (_TYPE2:.*)_TYPE2:xxx$, Replace with: \1
In both steps select "regular expression" and un-select "dot matches newline". Also change xxx to your default value.

Query a table to find A OR B using REGEX

I am trying to do a regex match to MySQL query (actually, it MariaDB) a table to find any word in a filepath that contains the string "!Mutex" or were the folder ends with a capital "M".
So if the cell contained the following paths.
-------------
|Path_Folder|
-------------------------------------------------------
|E:\folder01\folder01\folder03\!Mutex\folder05 |
|E:\folder01\folder01\folder03\folder4\!Mutex\folder06|
|E:\folder01\folder01\folder03\folder04\folderM |
-------------------------------------------------------
I'm NOT trying to port this anywhere (no php), just trying to find the results.
I know you asked for a regex solution, but sometimes that's not the answer. :-)
You can do this instead with a normal SQL LIKE expression.
SELECT
Path_Folder
FROM
Your_Table
WHERE
(Path_Folder LIKE '%!Mutex%')
OR
(Path_Folder LIKE '%M')
A LIKE should work just fine for what you need, and be faster (and easier to read/maintain) than a regex.

mysql selecting multiple values wrapped in identical tags

I don't know if there is a better description of my problem, but here is what I need help with:
I have a field with lots of data, and the part I need to solve looks like this:
::field_x::<br />||field_x||519||/field_x||<br />||field_x||281||/field_x||<br />::/field_x::
I have to extract each number (id) from this, 519 and 281 in this example, and insert them in a field in another table, separated by spaces or commas. I know how to use SUBSTRING - LOCATE method, but that would return only the first instance, so is there a method to extract them all in one go?
SUBSTRING INDEX LOCATE will work. There is no built in functionality for regular expressions so unless you handle it before it gets to mysql...you're stuck using the SUBSTRING INDEX LOCATE method...
If you need to iterate through a dataset, you will need to initiate a cursor, or FOR loop and use a stored proc.
parse results in MySQL via REGEX