Splunk: How to extract fields directly in search bar without having to use regular expressions? - extract

I find it very hard to use regular expressions directly in the search bar to extract fields. Another problem is that I do not have the permission to share my extracted fields (extracted by the field extractor and stated in field extractions) with other people. I am now looking for another way to extract fields directly in the search bar. Is there something like this possible in Splunk?
Thanks!

Regular expressions aren't so bad once you've had some practice. Think of them as another programming language to know.
There are other ways to extract fields, but most are less efficient and all are less flexible.
The spath and xpath commands will extract fields from JSON and XML, respectively.
multikv extracts fields from table-formatted data (like from top).
The extract command can be used to parse key/value pairs into fields.
The eval command can be used in combination with various functions to parse events into fields.

Related

Regex for replacing unnecessary quotation marks within a JSON object containing an array

I am currently trying to format a JSON object using LabVIEW and have ran into the issue where it adds additional quotation marks invalidating my JSON formatting. I have not found a way around this so I thought just formatting the string manually would be enough.
Here is the JSON object that I have:
{
"contentType":"application/json",
"content":{
"msgType":2,
"objects":"["cat","dog","bird"]",
"count":3
}
}
Here is the JSON object I want with the quotation marks removed.
{
"contentType":"application/json",
"content":{
"msgType":2,
"objects":["cat","dog","bird"],
"count":3
}
}
I am still not an expert with regex and using a regex tester I was only able to grab the "objects" and "count" fields but I would still feel I would have to utilize substrings to remove the quotation marks.
Example I am using (would use a "count" to find the start of the next field and work backwards from there)
"([objects]*)"
Additionally, all the other Regex I have been looking at removes all instances of quotation marks whereas I only need a specific area trimmed. Thus, I feel that a specific regex replace would be a much more elegant solution.
If there is a better way to go about this I am happy to hear any suggestions!
Your question suggests that the built-in LabVIEW JSON tools are insufficient for your use case.
The built-in library converts LabVIEW clusters to JSON in a one-shot approach. Bundle all your data into a cluster and then convert it to JSON.
When it comes to parsing JSON, you use the path input terminal and the default type terminals to control what data is parsed from a JSON string.
If you need to handle JSON in a manner similar to say JavaScript, I would recommend something like the JSONText Toolkit which is free to use (and distribute) under the BSD licence. This allows more complex and iterative building of JSON strings from LabVIEW types and has text-path style element access along with many more features.
The Output controls from both my examples are identical - although JSONText provides a handy Pretty Print vi.
After using a regex from one of the comments, I ended up with this regex which allowed me to match the array itself.
(\[(?:"[^"]*"|[^"])+\])
I was able to split the the JSON string into before match, match and after match and removed the quotation marks from the end of 'before match' and start of 'after match' and concatenated the strings again to form a new output.

Remove JSON keys with wildcards from a MySQL field

I have a MySQL 8.0.20 database with a table that describes metadata about uploaded image files. One column contains a JSON object with a whole bunch of auto-generated data that I'm trying to clean up.
This JSON object sometimes contains one or more variable key names that match a specific pattern. Something like
{
"image_name": "P10043983",
"image_size": "60138",
"image_original_exifdata": "{
'FileName':'P10043983.jpg',
'MimeType':'image/jpeg',
'UndefinedTag:0xA435':'\u0000\u0000\u0000\u0000\u0000\u0000'
}"
}
That UndefinedTag:0xA435 (with many permutations) is the problem. It's referring to various image Exif details like lens type, GPS data, etc. It's stuff that I'm not interested in and that these cameras mostly don't provide, so I've ended up with a table full of long strings of useless characters just taking up space. I want those JSON fields gone for performance and cleanliness.
Is there a way to run a SQL query that would use wildcards or regular expressions to find (and, ideally, remove) all of these pesky variable keys? I'd like to avoid manually making a list of all of the possible "UndefinedTag" keys to search against, and I also didn't like the results when I just treated the whole thing as a string and did REGEXP_REPLACE calls (it sometimes left trailing commas that broke my JSON and were difficult for me to avoid/resolve).
I know some of the JSON functions like JSON_SEARCH() accept wildcards, but it explicitly says the search path can't end in a wildcard (so no UndefinedTag:0x** allowed). Many of the functions I'm after (e.g., JSON_REMOVE()) don't accept wildcards at all. Hell, I've even had trouble finding known keys, and I suspect that silly colon in the key name might have something to do with it.
So, how can I clean up my table and remove the many forms of this UndefinedTag problem? Maybe it's easier to just go back to the regex_replace plan and deal instead with the trailing commas?

Regex for matching with and without quotes for dynamic JSON

I have the following text strings:
"Name":"John"}]
"Age":36
"Address":"ABC,PQR234[]/.,#ANYCHARACTERS"
"Gender":null
I need to get two groups (key value pair) from this such that the output would be only:
Key|Value
Name|John
Age|36
Address|ABC,PQR234[]/.,#ANYCHARACTERS
The requirement is to have a single regex to grab everything in the double quotes if the double quotes are present. If not, take the value without the quotes.
In our example above, 36 and null are the one without the quotes and they need to be captured as well.
I have tried a lot but have failed to do so.
UPDATE:
I don't know why I am getting down votes for this question. Yes this is JSON that I am trying to parse but there is a reason behind why I am doing this and not using any document parser.
I am supposed to use Talend for getting a dynamic JSON converted into Key Value Pair. What I mean by dynamic is the fields of the JSON can vary and hence I do not have a fixed schema and hence cannot use a document parser (which demands a fixed structure of JSON). I am devising a solution to get around this using Normalizer (on comma) and then extracting the key value pair which will be in double quotes using Regular Expressions. I tried many things on my own and since I am not an expert in Regular expressions, I have come here to get inputs.
If you know any better solution to this, I would be very happy to get your inputs.
How about this?
/"?([^\n"]*)"?:"?([^\n"]*)"?/
Explained in detail at:
https://regex101.com/r/UM0rl2/1/

Using regex to extract data from structured data

The problem I'm facing here is that I have a blob of text which contains structured data (in the form of a JSON payload) and I'm interested in extracting the value of one of the keys for a specific JSON instance, picture the structured data inside as the following:
"Item 1": {"key1":"item1_key1_value", "key2":"item1_key2_value", "key3":"item1_key3_value"}, "Item 2": {"key1":"item2_key1_value", "key2":"item2_key2_value", "key3":"item2_key3_value"}
What I would like to use is use regex to grab item1_key2_value for instance. The keys all have the same name but the items are different. So I know which key for which Item I need but am not quite sure of the regex to retrieve that value. I've tried a few approaches to some basic matching but was wondering if any other more experienced regex users could direct me a bit here and explain what I'm doing wrong
1(.)(?=item1_key2_value.) will match a chunk of data from here but I'm not sure of the best way to reduce it to the value that I need.
The regex syntax for JSON is clearly specified at http://www.json.org. If you scroll down a little to where it says "A string is a sequence of", you will find the proper string structure.
Assuming the string follows the correct JSON structure, you could use
"key2"\s*:\s*"((\\.|[^\\"])*)"
where \s means whitespace and * means 0 or more times. \\ means a slosh (backslash) character and can be followed by . (any character). If it does not encounter a slosh, then it instead looks for [^\\"], which means not slosh nor quote.
If you want to be a little more strict to the exact JSON form, you could try
"key2"\s*:\s*"((\\["\\/bfnrtu]|[^\\"])*)"
which you can see follows the string form on the webpage more closely.

Concatenate multiple PostreSQL JSON columns in a single JSON root object

I have a table (preferences), with several JSON columns (design_prefs, country_prefs, view_prefs, payment_prefs, etc.)
design_prefs --> {"color":"red", "height":15}
country_prefs --> {"language":"en", "currency":"pound"}
view_prefs --> {"layout":"list", "sortby":"category1", "view_brand":1, "view_price":1}
etc.
I retrieve a single row, but only with the columns required by the web context. Maybe just design_prefs, or design_prefs AND country_prefs, or design_prefs AND payment_prefs or whatever combination.
In the application (using Perl, Mojolicious and Mojo::Pg) I convert each JSON column in a Hash (key/value dictionary object used in Perl). And then I concatenate the multiple resulting hashes in a big "preferences" hash object.
It is working, but I would like to do the concatenation in Postgres, so in Mojolicious I only need to do a "JSON to hash" process.
Looking for aggregation functions in Postgresql I have found ways to do something like this:
{"color":"red", "height":15 }, {"language":"en", "currency":"pound"}
But I need this:
{"color":"red", "height":15, "language":"en", "currency":"pound"}
JSON type is stored as text in Pg 9.3, including the initial and ending bracket characters, so direct text concatenation is not a way.
Any guidance is appreciated, even recommendation to forget the idea and keep using the application instead the database.