Splunk not recognizing regex - json

I'm struggling to make a regex work with splunk. It works with regex 101, but splunk doesn't seem to recognize it!
Regex: \"([\w]+)\":([^,}]+)
Log entry:
May 20 12:22:21 127.0.0.1 {"rootId": "AXIxikL8ao-yaSvA", "requestId": "f6a873jkjjkjk:-8000:5738",
"details": {"flag": false, "title": "task 1", "status": "Waiting", "group": "", "order": 0},
"operation": "Creation", "objectId": "AXIyCN5Oao-H5aYyaSvd", "startDate": 1589977341890,
"objectType": "case_task", "base": true, "object": {"_routing": "AXIxikL8ao-H5aYyaSvA", "flag":
false, "_type": "case_task", "title": "task 1", "createdAt": 1589977341516, "_parent": "AXIxikL8ao-
H5aYyaSvA", "createdBy": "user", "_id": "AXIyCN5Oao-H5aYyaSvd", "id": "AXIyCN5Oao-H5aYyaSvd",
"_version": 1, "order": 0, "status": "Waiting", "group": ""}}
Regex 101 link:
https://regex101.com/r/XBuz9Y/2/
I suspect splunk may have a different regex syntax, but i don't really know how to adapt it.
Any help?
Thanks!

You may use
... | rex max_match=0 "\"(?<key>\w+)\":(?<value>[^,}]+)"
Here, max_match=0 will enable multiple matching (by defauly, if you do not use max_match parameter, only the first match is returned) and the named capturing groups (here, see (?<key>...) and (?<value>...)) will ensure field creation.
See more about the Splunk rex command.

Grab the JSON fragment of your event using rex, and then use spath to do the extraction.
rex field=_raw "^[^{]+(?<json>.*)" | spath input=json
This should extract the JSON fields with the appropriate structure.

Related

How can I parse nested JSON in PowerShell?

I'm trying to parse the results of a cURL command and the information I need is in a structure.
I tried getting to the data unsuccessfully and tried converting to PS Object but not sure how to access the structure as I'm new to PS.
Below is a sample of our cURL response.
I have a git commit hash ('c64a568399a572e82c223d55cb650b87ea1c22b8' matches latestCommit in fromRef for entry id 1101) and I need to find the corresponding displayId ('develop' in toRef)
I've done this in Linux using jq but need to replicate this in PS.
jq '.values | map(select(.fromRef.latestCommit=="'"$HASH"'")) | .[0].toRef.displayId'
I'm having 2 issues.
I can get to fromRef but it looks like #{id=refs/heads/feature/add-support; displayId=feature/add-support; latestCommit=c64a568399a572e82c223d55cb650b87ea1c22b8; repository=} and I cannot figure out how to parse
I'm not sure how to get the id so I can find the correct corresponding toRef
Any help would be greatly appreciated.
{
"size": 15,
"limit": 20,
"isLastPage": true,
"values": [
{
"id": 1101,
"version": 0,
"title": "Added header",
"description": "Added notes in header",
"state": "OPEN",
"open": true,
"closed": false,
"createdDate": 1595161367863,
"updatedDate": 1595161367863,
"fromRef": "#{id=refs/heads/feature/add-support; displayId=feature/add-support; latestCommit=c64a568399a572e82c223d55cb650b87ea1c22b8; repository=}",
"toRef": "#{id=refs/heads/develop; displayId=develop; latestCommit=58b3e3482bb35f3a735048849c2474cc676fbd9b; repository=}",
"locked": false,
"author": "#{user=; role=AUTHOR; approved=False; status=UNAPPROVED}",
"reviewers": " ",
"participants": "",
"properties": "#{mergeResult=; resolvedTaskCount=0; openTaskCount=0}",
"links": "#{self=System.Object[]}"
},
{
"id": 1053,
"version": 4,
"title": "Help with checking,",
"description": "fixed up code.",
"state": "OPEN",
"open": true,
"closed": false,
"createdDate": 1591826401310,
"updatedDate": 1595018917357,
"fromRef": "#{id=refs/heads/bugfix/checking-2.7; displayId=bugfix/checking-2.7; latestCommit=cf7d8860262c6a46b0b65ef5b6d66ae8cd698b75; repository=}",
"toRef": "#{id=refs/heads/hotfix/2.7_Improvements; displayId=hotfix/2.7_Improvements; latestCommit=01f1100c559ba41ec317421399c3bfb9a0aea91f; repository=}",
"locked": false,
"author": "#{user=; role=AUTHOR; approved=False; status=UNAPPROVED}",
"reviewers": " ",
"participants": "",
"properties": "#{mergeResult=; resolvedTaskCount=0; commentCount=4; openTaskCount=0}",
"links": "#{self=System.Object[]}"
}
],
"start": 0
}
Once you have converted the result with ConvertTo-Json and the correct -Depth parameter, you can get the values of the returned object quite easily in PowerShell.
Let's say you have used something like $json = $curlResult | ConvertTo-Json -Depth 100, then finding the displayId from the corresponding toRef can be done like this:
# this is the known hashvalue of the `fromRef` value to look for
$latestCommitHash = "c64a568399a572e82c223d55cb650b87ea1c22b8"
# get the value item. from here you can get all other properties belonging to that item
$valueItem = $json.values | Where-Object { $_.fromRef.latestCommit -eq $latestCommitHash }
# get the displayId value of the corresponding 'toRef' element:
$displayId = $valueItem.toRef.displayId
Returns
develop

How can I can improve raw JSON data in order to use it?

I'm trying to use some results exported in JSON of a script called "Mixed Content Scan" (it's a script in order to search on a website if there is some mixed HTTP/HTTPS content and if all your pages are ok in HTTPS).
I'm a beginner with JSON, I read and watched a lot of tutorials in order to understand how to structure JSON data but I'm stumbling on something.
Here is a sample of my data (first 3 lines) :
{"message":"Scanning https://mywebsite.com/","context":[],"level":250,"level_name":"NOTICE","channel":"MCS","datetime":{"date":"2018-10-05 23:48:50.268196","timezone_type":3,"timezone":"America/New_York"},"extra":[]}
{"message":"00000 - https://mywebsite.com/","context":[],"level":400,"level_name":"ERROR","channel":"MCS","datetime":{"date":"2018-10-05 23:48:50.760948","timezone_type":3,"timezone":"America/New_York"},"extra":[]}
{"message":"http://mywebsite.com/wp-content/uploads/2015/03/image.jpg","context":[],"level":300,"level_name":"WARNING","channel":"MCS","datetime":{"date":"2018-10-05 23:48:50.761082","timezone_type":3,"timezone":"America/New_York"},"extra":[]}
I know I need to wrap my data around some {} or [] (tried both), but I think I'm missing something, for example, every JSON data validator websites are telling me that I have an error between 2 lines when I add a "," when I try to have multiple results into it.
How can I upgrade this raw data in order for a JSON validator to validate it?
Thanks!
How's this
[{
"message": "Scanning https://mywebsite.com/",
"context": [],
"level": 250,
"level_name": "NOTICE",
"channel": "MCS",
"datetime": {
"date": "2018-10-05 23:48:50.268196",
"timezone_type": 3,
"timezone": "America/New_York"
},
"extra": []
}, {
"message": "00000 - https://mywebsite.com/",
"context": [],
"level": 400,
"level_name": "ERROR",
"channel": "MCS",
"datetime": {
"date": "2018-10-05 23:48:50.760948",
"timezone_type": 3,
"timezone": "America/New_York"
},
"extra": []
}, {
"message": "http://mywebsite.com/wp-content/uploads/2015/03/image.jpg",
"context": [],
"level": 300,
"level_name": "WARNING",
"channel": "MCS",
"datetime": {
"date": "2018-10-05 23:48:50.761082",
"timezone_type": 3,
"timezone": "America/New_York"
},
"extra": []
}]
Entries in an array need to be separated by commas.

JSON Structure Types Naming Conventions

I'm seeing JSON presented in a couple of different formats/styles, and I'm wondering if there are any standard names for these different formats/styles.
My searches haven't turned up any info - I'd appreciate anything anyone could share.
Format 1:
{
"KEYS": ["first", "last", "middle", "age"],
"VALUES": [
["joe", "smith", "a", 34],
["mary", "morris", "p", 65],
["phillip", "jones", "a", 33]
]
}
Format 2:
[{
"first": "joe",
"last": "smith",
"middle": "a",
"age": 34
}, {
"first": "mary",
"last": "morris",
"middle": "p",
"age": 33
}, {
"first": "phillip",
"last": "jones",
"middle": "a",
"age": 33
}]
The first JSON structure is more suitable for table representation while the second is a classic JSON representation of a list of objects.
The second format seems much more standard, as it's using key-value pairs the way JSON intends. It's the format produced by d3.dsv for instance.
It's a bit hard to be definitive though.
A colleague suggested "tabular" for format 1 (I also like "mirrored arrays") and "standard" for format 2. Unless someone knows of some more formal/common names, I'll stick with these for now.

Access deeper elements of a JSON using postgresql 9.4

I want to be able to access deeper elements stored in a json in the field json, stored in a postgresql database. For example, I would like to be able to access the elements that traverse the path states->events->time from the json provided below. Here is the postgreSQL query I'm using:
SELECT
data#>> '{userId}' as user,
data#>> '{region}' as region,
data#>>'{priorTimeSpentInApp}' as priotTimeSpentInApp,
data#>>'{userAttributes, "Total Friends"}' as totalFriends
from game_json
WHERE game_name LIKE 'myNewGame'
LIMIT 1000
and here is an example record from the json field
{
"region": "oh",
"deviceModel": "inHouseDevice",
"states": [
{
"events": [
{
"time": 1430247045.176,
"name": "Session Start",
"value": 0,
"parameters": {
"Balance": "40"
},
"info": ""
},
{
"time": 1430247293.501,
"name": "Mission1",
"value": 1,
"parameters": {
"Result": "Win ",
"Replay": "no",
"Attempt Number": "1"
},
"info": ""
}
]
}
],
"priorTimeSpentInApp": 28989.41467999999,
"country": "CA",
"city": "vancouver",
"isDeveloper": true,
"time": 1430247044.414,
"duration": 411.53,
"timezone": "America/Cleveland",
"priorSessions": 47,
"experiments": [],
"systemVersion": "3.8.1",
"appVersion": "14312",
"userId": "ef617d7ad4c6982e2cb7f6902801eb8a",
"isSession": true,
"firstRun": 1429572011.15,
"priorEvents": 69,
"userAttributes": {
"Total Friends": "0",
"Device Type": "Tablet",
"Social Connection": "None",
"Item Slots Owned": "12",
"Total Levels Played": "0",
"Retention Cohort": "Day 0",
"Player Progression": "0",
"Characters Owned": "1"
},
"deviceId": "ef617d7ad4c6982e2cb7f6902801eb8a"
}
That SQL query works, except that it doesn't give me any return values for totalFriends (e.g. data#>>'{userAttributes, "Total Friends"}' as totalFriends). I assume that part of the problem is that events falls within a square bracket (I don't know what that indicates in the json format) as opposed to a curly brace, but I'm also unable to extract values from the userAttributes key.
I would appreciate it if anyone could help me.
I'm sorry if this question has been asked elsewhere. I'm so new to postgresql and even json that I'm having trouble coming up with the proper terminology to find the answers to this (and related) questions.
You should definitely familiarize yourself with the basics of json
and json functions and operators in Postgres.
In the second source pay attention to the operators -> and ->>.
General rule: use -> to get a json object, ->> to get a json value as text.
Using these operators you can rewrite your query in the way which returns correct value of 'Total Friends':
select
data->>'userId' as user,
data->>'region' as region,
data->>'priorTimeSpentInApp' as priotTimeSpentInApp,
data->'userAttributes'->>'Total Friends' as totalFriends
from game_json
where game_name like 'myNewGame';
Json objects in square brackets are elements of a json array.
Json arrays may have many elements.
The elements are accessed by an index.
Json arrays are indexed from 0 (the first element of an array has an index 0).
Example:
select
data->'states'->0->'events'->1->>'name'
from game_json
where game_name like 'myNewGame';
-- returns "Mission1"
select
data->'states'->0->'events'->1->>'name'
from game_json
where game_name like 'myNewGame';
This did help me

Elasticseach no results for json query

I am learning elasticsearch and following along with the tutorial. I uploaded three documents into an index. When I supply the following query:
curl 'localhost:9200/vehicles/_search?query=driver.name:Jon'
I as expected get back object two and object three. However when I try querying using json:
curl localhost:9200/vehicles/_search -d'
{
"query":{
"prefix":{
"driver.name":"Jon"
}}}'
I get no results back. I am following the tutorial very closely, so I don't understand what the issue is. Any help would be really appreciated. The uploaded objects are below.
Thank you!
id:one
'{
"color": "green",
"driver": {
"born":"1989-09-12",
"name": "Ben"
},
"make": "BMW",
"model": "Aztek",
"value": 3000.0,
"year": 2003
}'
id:two
'{
"color": "black",
"driver": {
"born":"1934-09-08",
"name": "Jon"
},
"make": "Mercedes",
"model": "Benz",
"value": 10000.0,
"year": 2012
}'
id:three
'{
"color": "green",
"driver": {
"born":"1934-09-08",
"name": "Jon"
},
"make": "BMW",
"model": "Benz",
"value": 10000.0,
"year": 2012
}'
The prefix-query "matches documents that have fields containing terms with a specified prefix (not analyzed)".
Note the "not analyzed"-part. Lucene is looking for anything starting with "Jon" in the index, but the standard analyzer lowercases terms. That is, "jon" is in the index, but "Jon" is not.
Thus, if you lowercase the text in your prefix-query, it should work. Here is a runnable example: https://www.found.no/play/gist/7629456
Try:
curl -XGET "http://localhost:9200/vehicles/_search" -d '
{
"query": {"query_string" : { "query" : "driver.name:Jon" }}
}'
In any case, If you are new to elasticsearch I really recommend you read the documentation because there are lots of types of queries. Besides, the results of queries also depends on how you index the documents, how you define the mapping, etc.
In order to use the prefix query, you need to hit a non-analyzed field. In your mappings for driver.name, if you set "index" to "not_analyzed", you can use the prefix query. Otherwise, you should use a match query or something similar.