Regular expression to extract a JSON array - json

I'm trying to use a PCRE regular expression to extract some JSON. I'm using a version of MariaDB which does not have JSON functions but does have REGEX functions.
My string is:
{"device_types":["smartphone"],"isps":["a","B"],"network_types":[],"countries":[],"category":["Jebb","Bush"],"carriers":[],"exclude_carriers":[]}
I want to grab the contents of category. I'd like a matching group that contains 2 items, Jebb and Bush (or however many items are in the array).
I've tried this pattern but it only matches the first occurrence: /(?<=category":\[).([^"]*).*?(?=\])/g

Does this match your needs? It should match the category array regardless of its size.
"category":(\[.*?\])
regex101 example

JSON not a regular language. Since it allows arbitrary embedding of balanced delimiters, it must be at least context-free.
For example, consider an array of arrays of arrays:
[ [ [ 1, 2], [2, 3] ] , [ [ 3, 4], [ 4, 5] ] ]
Clearly you couldn't parse that with true regular expressions.
See This Topic:
Regex for parsing single key: values out of JSON in Javascript
Maybe Helpful for you.

Using a set of non-capturing group you can extract a predefined json array
regex answer: (?:\"category\":)(?:\[)(.*)(?:\"\])
That expression extract "category":["Jebb","Bush"], so access the first group
to extract the array, sample java code:
Pattern pattern = Pattern.compile("(?:\"category\":)(?:\\[)(.*)(?:\"\\])");
String body = "{\"device_types\":[\"smartphone\"],\"isps\":[\"a\",\"B\"],\"network_types\":[],\"countries\":[],\"category\":[\"Jebb\",\"Bush\"],\"carriers\":[],\"exclude_carriers\":[]}";
Matcher matcher = pattern.matcher(body);
assertThat(matcher.find(), is(true));
String[] categories = matcher.group(1).replaceAll("\"","").split(",");
assertThat(categories.length, is(2));
assertThat(categories[0], is("Jebb"));
assertThat(categories[1], is("Bush"));

There are many ways. One sloppy way to do it is /([A-Z])\w+/g
Please try it on your console like
var data = '{"device_types":["smartphone"],"isps":["a","B"],"network_types":[],"countries":[],"category":["Jebb","Bush"],"carriers":[],"exclude_carriers":[]}',
res = [];
data.match(/([A-Z])\w+/g); // ["Jebb", "Bush"]
OK the above was pretty sloppy however a solid single regex solution to extract every single element regardless of the number, one by one and to place them in an array (res) is the following...
var rex = /[",]+(\w*)(?=[",\w]*"],"carriers)/g,
str = '{"device_types":["smartphone"],"isps":["a","B"],"network_types":[],"countries":[],"category":["Jebb","Bush","Donald","Trump"],"carriers":[],"exclude_carriers":[]}',
arr = [],
res = [];
while ((arr = rex.exec(str)) !== null) {
res.push(arr[1]); // <- ["Jebb", "Bush", "Donald", "Trump"]
}
Check it out # http://regexr.com/3d4ee
OK lets do it. I have come up with a devilish idea. If JS had look-behinds this could have been done simply by reversing the applied logic in the previous example where i had used a look-forward. Alas, there aren't... So i decided to turn the world the other way around. Check this out.
String.prototype.reverse = function(){
return this.split("").reverse().join("");
};
var rex = /[",]+(\w*)(?=[",\w]*"\[:"yrogetac)/g,
str = '{"device_types":["smartphone"],"isps":["a","B"],"network_types":[],"countries":[],"category":["Jebb","Bush","Donald","Trump"],"carriers":[],"exclude_carriers":[]}',
rev = str.reverse();
arr = [],
res = [];
while ((arr = rex.exec(rev)) !== null) {
res.push(arr[1].reverse()); // <- ["Trump", "Donald", "Bush", "Jebb"]
}
res.reverse(); // <- ["Jebb", "Bush", "Donald", "Trump"]
Just use your console to confirm.

In c++ you can do it like this
bool foundmatch = false;
try {
std::regex re("\"([a-zA-Z]+)\"*.:*.\\[[^\\]\r\n]+\\]");
foundmatch = std::regex_search(subject, re);
} catch (std::regex_error& e) {
// Syntax error in the regular expression
}

If the number of items in the array is limited (and manageable), you could define it with a finite number of optional items. Like this one with a maximum of 5 items:
"category":\["([^"]*)"(?:,"([^"]*)"(?:,"([^"]*)"(?:,"([^"]*)"(?:,"([^"]*)")?)?)?)?
regex101 example here.
Regards.

Related

How to merge a dynamically named record with a static one in Dhall?

I'm creating an AWS Step Function definition in Dhall. However, I don't know how to create a common structure they use for Choice states such as the example below:
{
"Not": {
"Variable": "$.type",
"StringEquals": "Private"
},
"Next": "Public"
}
The Not is pretty straightforward using mapKey and mapValue. If I define a basic Comparison:
{ Type =
{ Variable : Text
, StringEquals : Optional Text
}
, default =
{ Variable = "foo"
, StringEquals = None Text
}
}
And the types:
let ComparisonType = < And | Or | Not >
And adding a helper function to render the type as Text for the mapKey:
let renderComparisonType = \(comparisonType : ComparisonType )
-> merge
{ And = "And"
, Or = "Or"
, Not = "Not"
}
comparisonType
Then I can use them in a function to generate the record halfway:
let renderRuleComparisons =
\( comparisonType : ComparisonType ) ->
\( comparisons : List ComparisonOperator.Type ) ->
let keyName = renderComparisonType comparisonType
let compare = [ { mapKey = keyName, mapValue = comparisons } ]
in compare
If I run that using:
let rando = ComparisonOperator::{ Variable = "$.name", StringEquals = Some "Cow" }
let comparisons = renderRuleComparisons ComparisonType.Not [ rando ]
in comparisons
Using dhall-to-json, she'll output the first part:
{
"Not": {
"Variable": "$.name",
"StringEquals": "Cow"
}
}
... but I've been struggling to merge that with "Next": "Sup". I've used all the record merges like /\, //, etc. and it keeps giving me various type errors I don't truly understand yet.
First, I'll include an approach that does not type-check as a starting point to motivate the solution:
let rando = ComparisonOperator::{ Variable = "$.name", StringEquals = Some "Cow" }
let comparisons = renderRuleComparisons ComparisonType.Not [ rando ]
in comparisons # toMap { Next = "Public" }
toMap is a keyword that converts records to key-value lists, and # is the list concatenation operator. The Dhall CheatSheet has a few examples of how to use both of them.
The above solution doesn't work because # cannot merge lists with different element types. The left-hand side of the # operator has this type:
comparisons : List { mapKey : Text, mapValue : Comparison.Type }
... whereas the right-hand side of the # operator has this type:
toMap { Next = "Public" } : List { mapKey : Text, mapValue : Text }
... so the two Lists cannot be merged as-is due to the different types for the mapValue field.
There are two ways to resolve this:
Approach 1: Use a union whenever there is a type conflict
Approach 2: Use a weakly-typed JSON representation that can hold arbitrary values
Approach 1 is the simpler solution for this particular example and Approach 2 is the more general solution that can handle really weird JSON schemas.
For Approach 1, dhall-to-json will automatically strip non-empty union constructors (leaving behind the value they were wrapping) when translating to JSON. This means that you can transform both arguments of the # operator to agree on this common type:
List { mapKey : Text, mapValue : < State : Text | Comparison : Comparison.Type > }
... and then you should be able to concatenate the two lists of key-value pairs and dhall-to-json will render them correctly.
There is a second solution for dealing with weakly-typed JSON schemas that you can learn more about here:
Dhall Manual - How to convert an existing YAML configuration file to Dhall
The basic idea is that all of the JSON/YAML integrations recognize and support a weakly-typed JSON representation that can hold arbitrary JSON data, including dictionaries with keys of different shapes (like in your example). You don't even need to convert the entire the expression to this weakly-typed representation; you only need to use this representation for the subset of your configuration where you run into schema issues.
What this means for your example, is that you would change both arguments to the # operator to have this type:
let Prelude = https://prelude.dhall-lang.org/v12.0.0/package.dhall
in List { mapKey : Text, mapValue : Prelude.JSON.Type }
The documentation for Prelude.JSON.Type also has more details on how to use this type.

Iterating over a list in JSON using TypeScript

I'm having a problem iterating over my json in TypeScript. I'm having trouble with one specific json field, the tribe. For some reason I can't iterate over that one. In the debugger, I'm expecting the Orc to show up but instead I get a 0. Why is this? How do I iterate correctly over my tribe data?
// Maps a profession or tribe group name to a bucket of characters
let professionMap = new Map<string, Character[]>()
let tribeMap = new Map<string, Character[]>()
let herolistJson = require('./data/HeroList.json')
for (let hero of herolistJson){
// Certain characters can have more than one tribe
// !!!!! The trouble begins here, tribe is 0???
for (let tribe in hero.tribe){
let tribeBucket = tribeMap.get(tribe) as Character[]
// If the hero does not already exist in this tribe bucket, add it
if(tribeBucket.find(x => x.name == hero.name) === undefined )
{
tribeBucket.push(new Character(hero.name, hero.tribe, hero.profession, hero.cost))
}
}
}
My json file looks like this
[
{
"name": "Axe",
"tribe": ["Orc"],
"profession": "Warrior",
"cost": 1
},
{
"name": "Enchantress",
"tribe": ["Beast"],
"profession": "Druid",
"cost": 1
}
]
in iterates over the keys of an object, not the values. The keys of an array are its indices. If you use of instead, you'll use the newer iterator protocol and an Array's iterator provides values instead of keys.
for (let tribe of /* in */ hero.tribe) {
Note, that this won't work in IE 11, but will work in most other browsers as well many JS environments that are ES2015 compatible. kangax/compat has a partial list.
Change the "in" to "of" in second loop.

Azure tables unable to store flattened JSON

I am using the npm flat package, and arrays/objects are flattened, but object/array keys are surrounded by '' , like in 'task_status.0.data' using the object below.
These specific fields do not get stored into AzureTables - other fields go through, but these are silently ignored. How would I fix this?
var obj1 = {
"studentId": "abc",
"task_status": [
{
"status":"Current",
"date":516760078
},
{
"status":"Late",
"date":1516414446
}
],
"student_plan": "n"
}
Here is how I am using it - simplified code example: Again, it successfully gets written to the table, but does not write the properties that were flattened (see further below):
var flatten = require('flat')
newObj1 = flatten(obj1);
var entGen = azure.TableUtilities.entityGenerator;
newObj1.PartitionKey = entGen.String(uniqueIDFromMyDB);
newObj1.RowKey = entGen.String(uniqueStudentId);
tableService.insertEntity(myTableName, newObj1, myCallbackFunc);
In the above example, the flattened object would look like:
var obj1 = {
studentId: "abc",
'task_status.0.status': 'Current',
'task_status.0.date': 516760078,
'task_status.1.status': 'Late',
'task_status.1.date': 516760078,
student_plan: "n"
}
Then I would add PartitionKey and RowKey.
all the task_status fields would silently fail to be inserted.
EDIT: This does not have anything to do with the actual flattening process - I just checked a perfectly good JSON object, with keys that had 'x.y.z' in it, i.e. AzureTables doesn't seem to accept these column names....which almost completely destroys the value proposition of storing schema-less data, without significant rework.
. in column name is not supported. You can use a custom delimiter to flatten your objects instead.
For example:
newObj1 = flatten(obj1, {delimiter: '__'});

Which JSONPath expression should I use to split my JSON string?

I want to apply SplitJson in order to split the following JSON file into 2 FlowFiles (according to hits):
{"took":0,"timed_out":false,"_shards":
{"total":5,"successful":5,"failed":0},
"hits":{"total":2,"max_score":0.0,
"hits":
[
{"_index":"my_index","_type":"my_entry","_id":"111","_score":0.0,"_source":{"ZoneId":"1","OriginId":"1"},
"fields":{"ttime":[11000]}},
{"_index":"my_index","_type":"my_entry","_id":"222","_score":0.0,"_source":{"ZoneId":"1","OriginId":"2"},
"fields":{"ttime":[5000]}}
]
}
}
Which JsonPath Expression should I use? I tried $.hits[*], but it splits the content according to the first level hits. In my case I have hits[hits[...]], but how should I specify it in the expression?
UPDATE:
I want to get two FlowFiles:
FlowFile #1: {"_index":"my_index","_type":"my_entry","_id":"111","_score":0.0,"_source":{"ZoneId":"1","OriginId":"1"},"fields":{"ttime":[11000]}}
FlowFile #2:
{"_index":"my_index","_type":"my_entry","_id":"222","_score":0.0,"_source":{"ZoneId":"1","OriginId":"2"},"fields":{"ttime":[5000]}}
var arr = $.hits.hits;
Will give you the array with 2 objects you desire.
var o1 = arr[0];
var o2 = arr[1];
Will give you 2 objects you desire.
var json1 = JSON.stringify(arr[0]);
var json2 = JSON.stringify(arr[1]);
Will give you 2 JSON files as requested.
Is this what you needed?
You can use this website for testing JSONPath Index for your case.
The right answer is $.hits.hits[*].
As mentioned DanteTheSmith, you can simply use $.hits.hits in your case. It depends on the post-processing. Both methods work fine.

JSON - look up values in array

With the following json
{
"Count":0,
"Message":{
"AppId":0
},
"Data":"[{\"application_name\": \"Grand Central\",\"feature_name\": \"1 Click Fix\",\"access_type_id\": 2,\"member_name\": \"GC_Remote_Support_Security\"},{\"application_name\": \"Grand Central\",\"feature_name\": \"Account Details\",\"access_type_id\": 2,\"member_name\": \"GC_Remote_Support_Security\"},{\"application_name\": \"Grand Central\",\"feature_name\": \"Account Summary\",\"access_type_id\": 2,\"member_name\": \"GC_Remote_Support_Security\"}]"
}
how do I go through the Data array, in the most succinct coding manner possible, to see if any feature_name matches a given string?
Since your JSON contains nested, quoted JSON, you will need nested deserializations using LINQ to JSON to parse your Data array. Having done so, you can use use SelectTokens to query with a JSONPath query to find nested properties named feature_name, then check their value:
var testString = "Account Summary";
var found = JToken.Parse(JObject.Parse(jsonString)["Data"].ToString()).SelectTokens("..feature_name").Any(t => (string)t == testString);
Debug.Assert(found == true); // No assert.
Update
If you want the all JObject with a "feature_name" property matching a given value, you can do:
var foundItems = JToken.Parse(JObject.Parse(jsonString)["Data"].ToString())
.SelectTokens("..feature_name")
.Where(t => (string)t == testString)
.Select(t => t.Ancestors().OfType<JObject>().First()) // Get the immediate parent JObject of the matching value
.ToList();