Related
I'm extracting AVRO data which has a JSON field that I need to get values from. The JSON has an array, and I don't know what order the different elements of the array may appear in. How can I target specific node/values?
For example, Filters[0] could be Category one time, but could be AddressType another time.
I'm extracting AVRO data - i.e.
#rs =
EXTRACT date DateTime,
Body byte[]
FROM #input_file
USING new Microsoft.Analytics.Samples.Formats.ApacheAvro.AvroExtractor(#"
...
The Body is JSON that can look like this (but Category is not always Filter[0]. This is a small example; there are 7 different types of "Field"s):
{
""TimeStamp"": ""2019-02-19T15:00:29.1067771-05:00"",
""Filters"": [{
""Operator"": ""eq"",
""Field"": ""Category"",
""Value"": ""Sale""
}, {
""Operator"": ""eq"",
""Field"": ""AddressType"",
""Value"": ""Home""
}
]
}
My U-SQL looks like this, which of course does not always work.
#keyvalues =
SELECT JsonFunctions.JsonTuple(Encoding.UTF8.GetString(Body),
"TimeStamp",
"$.Filters[?(#.Field == 'Category')].Value",
"$.Filters[?(#.Field == 'AddressType')].Value"
) AS message
FROM #rs;
#results =
SELECT
message["TimeStamp"] AS TimeStamp,
message["Filters[0].Value"] AS Category,
message["Filters[1].Value"] AS AddressType
FROM #keyvalues;
Although this does not actually answer my question, as a workaround, I modified the Microsoft 'sample' JsonFunctions.JsonTuple method to be able to specify my own key name for extracted values:
/// Added - Prefix a path expression with a specified key. Use key~$e in the expression.
/// eg:
/// JsonTuple(json, "myId~id", "myName~name") -> field names MAP{ {myId, 1 }, {myName, Ed } }
The modified code:
private static IEnumerable<KeyValuePair<string, T>> ApplyPath<T>(JToken root, string path)
{
var keySeparatorPos = path.IndexOf("~");
string key = null;
var searchPath = path;
if (keySeparatorPos > 0) // =0?if just a leading "=", i.e. no key provided, then don't parse out a key.
{
key = path.Substring(0, keySeparatorPos).Trim();
searchPath = path.Substring(keySeparatorPos + 1);
}
// Children
var children = SelectChildren<T>(root, searchPath);
foreach (var token in children)
{
// Token => T
var value = (T)JsonFunctions.ConvertToken(token, typeof(T));
// Tuple(path, value)
yield return new KeyValuePair<string, T>(key ?? token.Path, value);
}
}
For example, I can access vales and name them
#keyvalues =
SELECT JsonFunctions.JsonTuple(Encoding.UTF8.GetString(Body),
"TimeStamp",
"EventName",
"Plan~ $.UrlParams.plan",
"Category~ $.Filters[?(#.Field == 'Category')].Value",
"AddressType~ $.Filters[?(#.Field == 'AddressType')].Value"
) AS message
FROM #rs;
#results =
SELECT
message["TimeStamp"] AS TimeStamp,
message["EventName"] AS EventName,
message["Plan"] AS Plan,
message["Category"] AS Category,
message["AddressType"] AS AddressType
FROM #keyvalues;
(I've not tested to see what would happen if the same Field appears multiple times in the array; That won't happen in my case)
What is the relationship between the kid supplied to MediaKeySession.generateRequest() and the one provided via MediaKeyMessageEvent?
If they are supposed to be the same - why are they different in the code below? note, this won't run here due to security restrictions
navigator.requestMediaKeySystemAccess("org.w3.clearkey", [{
initDataTypes: ['webm'],
audioCapabilities: [{
contentType: 'audio/webm; codecs="opus"'
}],
videoCapabilities: [{
contentType: 'video/webm; codecs="vp8"'
},
{
contentType: 'video/webm; codecs="vp9"'
}
],
}]).then((keySystemAccess) => {
return keySystemAccess.createMediaKeys();
}).then((mediaKeys) => {
var session = mediaKeys.createSession("temporary");
var keyId = "VHM2iIMGiSg";
var initData = '{"kids":["' + keyId + '"]}';
console.log(keyId);
session.addEventListener('message', (evt) => {
var requestJson = new TextDecoder().decode(evt.message);
var request = JSON.parse(requestJson);
console.log(request.kids[0]);
});
this.session.generateRequest("webm", new TextEncoder().encode(initData));
});
output:
VHM2iIMGiSg
eyJraWRzIjpbIlZITTJpSU1HaVNnIl19
expected output is for second line to also be VHM2iIMGiSg
eyJraWRzIjpbIlZITTJpSU1HaVNnIl19 is the base64url encoded value of initData that was passed to generateRequest.
The reason that request.kids[0] is the full value of initData and not the value of keyId is because generateRequest was invoked with the initDataType parameter set to webm. Had the initDataType parameter been set to keyids then request.kids[0] would be the value of keyId.
When the initDataType parameter is set to webm the initData parameter is expected to be a single key ID of one or more bytes. Whereas when the initDataType parameter is set to keyids the initData parameter is expected to be a JSON object encoded as UTF-8, containing a single member kids which is an array of base64url encoded Key ID(s).
I'm trying to use a PCRE regular expression to extract some JSON. I'm using a version of MariaDB which does not have JSON functions but does have REGEX functions.
My string is:
{"device_types":["smartphone"],"isps":["a","B"],"network_types":[],"countries":[],"category":["Jebb","Bush"],"carriers":[],"exclude_carriers":[]}
I want to grab the contents of category. I'd like a matching group that contains 2 items, Jebb and Bush (or however many items are in the array).
I've tried this pattern but it only matches the first occurrence: /(?<=category":\[).([^"]*).*?(?=\])/g
Does this match your needs? It should match the category array regardless of its size.
"category":(\[.*?\])
regex101 example
JSON not a regular language. Since it allows arbitrary embedding of balanced delimiters, it must be at least context-free.
For example, consider an array of arrays of arrays:
[ [ [ 1, 2], [2, 3] ] , [ [ 3, 4], [ 4, 5] ] ]
Clearly you couldn't parse that with true regular expressions.
See This Topic:
Regex for parsing single key: values out of JSON in Javascript
Maybe Helpful for you.
Using a set of non-capturing group you can extract a predefined json array
regex answer: (?:\"category\":)(?:\[)(.*)(?:\"\])
That expression extract "category":["Jebb","Bush"], so access the first group
to extract the array, sample java code:
Pattern pattern = Pattern.compile("(?:\"category\":)(?:\\[)(.*)(?:\"\\])");
String body = "{\"device_types\":[\"smartphone\"],\"isps\":[\"a\",\"B\"],\"network_types\":[],\"countries\":[],\"category\":[\"Jebb\",\"Bush\"],\"carriers\":[],\"exclude_carriers\":[]}";
Matcher matcher = pattern.matcher(body);
assertThat(matcher.find(), is(true));
String[] categories = matcher.group(1).replaceAll("\"","").split(",");
assertThat(categories.length, is(2));
assertThat(categories[0], is("Jebb"));
assertThat(categories[1], is("Bush"));
There are many ways. One sloppy way to do it is /([A-Z])\w+/g
Please try it on your console like
var data = '{"device_types":["smartphone"],"isps":["a","B"],"network_types":[],"countries":[],"category":["Jebb","Bush"],"carriers":[],"exclude_carriers":[]}',
res = [];
data.match(/([A-Z])\w+/g); // ["Jebb", "Bush"]
OK the above was pretty sloppy however a solid single regex solution to extract every single element regardless of the number, one by one and to place them in an array (res) is the following...
var rex = /[",]+(\w*)(?=[",\w]*"],"carriers)/g,
str = '{"device_types":["smartphone"],"isps":["a","B"],"network_types":[],"countries":[],"category":["Jebb","Bush","Donald","Trump"],"carriers":[],"exclude_carriers":[]}',
arr = [],
res = [];
while ((arr = rex.exec(str)) !== null) {
res.push(arr[1]); // <- ["Jebb", "Bush", "Donald", "Trump"]
}
Check it out # http://regexr.com/3d4ee
OK lets do it. I have come up with a devilish idea. If JS had look-behinds this could have been done simply by reversing the applied logic in the previous example where i had used a look-forward. Alas, there aren't... So i decided to turn the world the other way around. Check this out.
String.prototype.reverse = function(){
return this.split("").reverse().join("");
};
var rex = /[",]+(\w*)(?=[",\w]*"\[:"yrogetac)/g,
str = '{"device_types":["smartphone"],"isps":["a","B"],"network_types":[],"countries":[],"category":["Jebb","Bush","Donald","Trump"],"carriers":[],"exclude_carriers":[]}',
rev = str.reverse();
arr = [],
res = [];
while ((arr = rex.exec(rev)) !== null) {
res.push(arr[1].reverse()); // <- ["Trump", "Donald", "Bush", "Jebb"]
}
res.reverse(); // <- ["Jebb", "Bush", "Donald", "Trump"]
Just use your console to confirm.
In c++ you can do it like this
bool foundmatch = false;
try {
std::regex re("\"([a-zA-Z]+)\"*.:*.\\[[^\\]\r\n]+\\]");
foundmatch = std::regex_search(subject, re);
} catch (std::regex_error& e) {
// Syntax error in the regular expression
}
If the number of items in the array is limited (and manageable), you could define it with a finite number of optional items. Like this one with a maximum of 5 items:
"category":\["([^"]*)"(?:,"([^"]*)"(?:,"([^"]*)"(?:,"([^"]*)"(?:,"([^"]*)")?)?)?)?
regex101 example here.
Regards.
I am confused about accessing the contents of some JSON data that I have decoded. Here is an example
I don't understand why this solution works and my own does not. My questions are rephrased below
my $json_raw = getJSON();
my $content = decode_json($json_raw);
print Data::Dumper($content);
At this point my JSON data has been transformed into this
$VAR1 = { 'items' => [ 1, 2, 3, 4 ] };
My guess tells me that, once decoded, the object will be a hash with one element that has the key items and an array reference as the value.
$content{'items'}[0]
where $content{'items'} would obtain the array reference, and the outer $...[0] would access the first element in the array and interpret it as a scalar. However this does not work. I get an error message use of uninitialized value [...]
However, the following does work:
$content->{items}[0]
where $content->{items} yields the array reference and [0] accesses the first element of that array.
Questions
Why does $content{'items'} not return an array reference? I even tried #{content{'items'}}, thinking that, once I got the value from content{'items'}, it would need to be interpreted as an array. But still, I receive the uninitialized array reference.
How can I access the array reference without using the arrow operator?
Beginner's answer to beginner :) Sure not as profesional as should be, but maybe helps you.
use strict; #use this all times
use warnings; #this too - helps a lot!
use JSON;
my $json_str = ' { "items" : [ 1, 2, 3, 4 ] } ';
my $content = decode_json($json_str);
You wrote:
My guess tells me that, once decoded, the object will be a hash with
one element that has the key items and an array reference as the value.
Yes, it is a hash, but the the decode_json returns a reference, in this case, the reference to hash. (from the docs)
expects an UTF-8 (binary) string and tries to parse that
as an UTF-8 encoded JSON text,
returning the resulting reference.
In the line
my $content = decode_json($json_str);
you assigning to an SCALAR variable (not to hash).
Because you know: it is a reference, you can do the next:
printf "reftype:%s\n", ref($content);
#print: reftype:HASH ^
#therefore the +------- is a SCALAR value containing a reference to hash
It is a hashref - you can dump all keys
print "key: $_\n" for keys %{$content}; #or in short %$content
#prints: key: items
also you can assing the value of the "items" (arrayref) to an scalar variable
my $aref = $content->{items}; #$hashref->{key}
#or
#my $aref = ${$content}{items}; #$hash{key}
but NOT
#my $aref = $content{items}; #throws error if "use strict;"
#Global symbol "%content" requires explicit package name at script.pl line 20.
The $content{item} is requesting a value from the hash %content and you never defined/assigned such variable. the $content is an scalar variable not hash variable %content.
{
#in perl 5.20 you can also
use 5.020;
use experimental 'postderef';
print "key-postderef: $_\n" for keys $content->%*;
}
Now step deeper - to the arrayref - again you can print out the reference type
printf "reftype:%s\n", ref($aref);
#reftype:ARRAY
print all elements of array
print "arr-item: $_\n" for #{$aref};
but again NOT
#print "$_\n" for #aref;
#dies: Global symbol "#aref" requires explicit package name at script.pl line 37.
{
#in perl 5.20 you can also
use 5.020;
use experimental 'postderef';
print "aref-postderef: $_\n" for $aref->#*;
}
Here is an simple rule:
my #arr; #array variable
my $arr_ref = \#arr; #scalar - containing a reference to #arr
#{$arr_ref} is the same as #arr
^^^^^^^^^^ - array reference in curly brackets
If you have an $arrayref - use the #{$array_ref} everywhere you want use the array.
my %hash; #hash variable
my $hash_ref = \%hash; #scalar - containing a reference to %hash
%{$hash_ref} is the same as %hash
^^^^^^^^^^^ - hash reference in curly brackets
If you have an $hash_ref - use the %{$hash_ref} everywhere you want use the hash.
For the whole structure, the following
say $content->{items}->[0];
say $content->{items}[0];
say ${$content}{items}->[0];
say ${$content}{items}[0];
say ${$content->{items}}[0];
say ${${$content}{items}}[0];
prints the same value 1.
$content is a hash reference, so you always need to use an arrow to access its contents. $content{items} would refer to a %content hash, which you don't have. That's where you're getting that "use of uninitialized value" error from.
I actually asked a similar question here
The answer:
In Perl, a function can only really return a scalar or a list.
Since hashes can be initialized or assigned from lists (e.g. %foo = (a => 1, b => 2)), I guess you're asking why json_decode returns something like { a => 1, b => 2 } (a reference to an anonymous hash) rather than (a => 1, b => 2) (a list that can be copied into a hash).
I can think of a few good reasons for this:
in Perl, an array or hash always contains scalars. So in something like { "a": { "b": 3 } }, the { "b": 3 } part has to be a scalar; and for consistency, it makes sense for the whole thing to be a scalar in the same way.
if the hash is quite large (many keys at top-level), it's pointless and expensive to iterate over all the elements to convert it into a list, and then build a new hash from that list.
in JSON, the top-level element can be either an object (= Perl hash) or an array (= Perl array). If json_decode returned a list in the former case, it's not clear what it would return in the latter case. After decoding the JSON string, how could you examine the result to know what to do with it? (And it wouldn't be safe to write %foo = json_decode(...) unless you already knew that you had a hash.) So json_decode's behavior works better for any general-purpose library code that has to use it without already knowing very much about the data it's working with.
I have to wonder exactly what you passed as an array to json_decode, because my results differ from yours.
#!/usr/bin/perl
use JSON qw (decode_json);
use Data::Dumper;
my $json = '["1", "2", "3", "4"]';
my $fromJSON = decode_json($json);
print Dumper($fromJSON);
The result is $VAR1 = [ '1', '2', '3', '4' ];
Which is an array ref, where your result is a hash ref
So did you pass in a hash with element items which was a reference to an array?
In my example you would get the array by doing
my #array = #{ $fromJSON };
In yours
my #array = #{ $content->{'items'} }
I don't understand why you dislike the arrow operator so much!
The decode_json function from the JSON module will always return a data reference.
Suppose you have a Perl program like this
use strict;
use warnings;
use JSON;
my $json_data = '{ "items": [ 1, 2, 3, 4 ] }';
my $content = decode_json($json_data);
use Data::Dump;
dd $content;
which outputs this text
{ items => [1 .. 4] }
showing that $content is a hash reference. Then you can access the array reference, as you found, with
dd $content->{items};
which shows
[1 .. 4]
and you can print the first element of the array by writing
print $content->{items}[0], "\n";
which, again as you have found, shows just
1
which is the first element of the array.
As #cjm mentions in a comment, it is imperative that you use strict and use warnings at the start of every Perl program. If you had those in place in the program where you tried to access $content{items}, your program would have failed to compile, and you would have seen the message
Global symbol "%content" requires explicit package name
which is a (poorly-phrased) way of telling you that there is no %content so there can be no items element.
The scalar variable $content is completely independent from the hash variable %content, which you are trying to access when you write $content{items}. %content has never been mentioned before and it is empty, so there is no items element. If you had tried #{$content->{items}} then it would have worked, as would #{${$content}{items}}
If you really have a problem with the arrow operator, then you could write
print ${$content}{items}[0], "\n";
which produces the same output; but I don't understand what is wrong with the original version.
I've a function that gets 2 values name and value and now I'd like to turn them into a JSON, so for example lets say ive this function
function addin(name,val)
and then i'll call it
addin("age","28") -> this will return -> {age: 28}
addin("name","Roni") -> {name: Roni}
I've tried many things to find out how to make it done, Im getting many data so I tried this
var full_js = { };
_.forEach(data, function(val,name) {
full_js.name = val;
console.log(JSON.stringify(full_js));
});
again it can be any value and any name both of them are Random Strings.
but its not working, I get it as {"name": "Roni"} and {"name": "55"}.
thanks for the help.
use a different notation:
function addin(name, val) {
var full_js = {};
full_js[name] = val;
return JSON.stringify(full_js);
}
Ex:
at the moment you are always assigning the value to the name key, instead of the dynamic name key, keep in mind that the key will always be the string representation of the variable used.
for example, addin({}, 'rony'), will return {"[object Object]":"test"}