Elixir: find by value prefix in nested JSON - json

I'm trying to find URLs in a nested JSON response and map them. My function so far looks like this:
def list(env, id) do
Service.get_document(env, id)
|> Poison.decode!
|> Enum.find(fn {_key, val} -> String.starts_with?(val, 'https') end)
end
The JSON looks roughly like this:
"stacks": [
{
"boxes": [
{
"content": "https://ddd.cloudfront.net/photos/uploaded_images/000/001/610/original/1449447147677.jpg?1505956120",
"box": "photo"
}
]
}
],
"logo": "https://ddd.cloudfront.net/users/cmyk_banners/000/000/002/original/banner_CMYK.jpg?1397201875"
So URLs can have any key, and be at any level.
With that code I get this error:
no function clause matching in String.starts_with?/2
Anyone got a better way to find in JSON responses?

You'll have to use recursive function for this, which handles three types of data:
For map, it recurses over all its values.
For list, it recurses over all its elements.
For string, it selects strings that start with "https"
Here's a simple implementation which accepts a term and a string to check with starts_with?:
defmodule A do
def recursive_starts_with(thing, start, acc \\ [])
def recursive_starts_with(binary, start, acc) when is_binary(binary) do
if String.starts_with?(binary, start) do
[binary | acc]
else
acc
end
end
def recursive_starts_with(map, start, acc) when is_map(map) do
Enum.reduce(map, acc, fn {_, v}, acc -> A.recursive_starts_with(v, start, acc) end)
end
def recursive_starts_with(list, start, acc) when is_list(list) do
Enum.reduce(list, acc, fn v, acc -> A.recursive_starts_with(v, start, acc) end)
end
end
data = %{
"stacks" => [
%{
"boxes" => [
%{
"content" => "https://ddd.cloudfront.net/photos/uploaded_images/000/001/610/original/1449447147677.jpg?1505956120",
"box" => "photo"
}
]
}
],
"logo" => "https://ddd.cloudfront.net/users/cmyk_banners/000/000/002/original/banner_CMYK.jpg?1397201875"
}
data |> A.recursive_starts_with("https") |> IO.inspect
Output:
["https://ddd.cloudfront.net/photos/uploaded_images/000/001/610/original/1449447147677.jpg?1505956120",
"https://ddd.cloudfront.net/users/cmyk_banners/000/000/002/original/banner_CMYK.jpg?1397201875"]

Related

Auto-generate json paths from given json structure using Python

I am trying to generate auto json paths from given json structure but stuck in the programatic part. Can someone please help out with the idea to take it further?
Below is the code so far i have achieved.
def iterate_dict(dict_data, key, tmp_key):
for k, v in dict_data.items():
key = key + tmp_key + '.' + k
key = key.replace('$$', '$')
if type(v) is dict:
tmp_key = key
key = '$'
iterate_dict(v, key, tmp_key)
elif type(v) is list:
str_encountered = False
for i in v:
if type(i) is str:
str_encountered = True
tmp_key = key
break
tmp_key = key
key = '$'
iterate_dict(i, key, tmp_key)
if str_encountered:
print(key, v)
if tmp_key is not None:
tmp_key = str(tmp_key)[:-str(k).__len__() - 1]
key = '$'
else:
print(key, v)
key = '$'
import json
iterate_dict_new(dict(json.loads(d_data)), '$', '')
consider the below json structure
{
"id": "1",
"categories": [
{
"name": "author",
"book": "fiction",
"leaders": [
{
"ref": ["wiki", "google"],
"athlete": {
"$ref": "some data"
},
"data": {
"$data": "some other data"
}
}
]
},
{
"name": "dummy name"
}
]
}
Expected output out of python script:
$id = 1
$categories[0].name = author
$categories[0].book = fiction
$categories[0].leaders[0].ref[0] = wiki
$categories[0].leaders[0].ref[1] = google
$categories[0].leaders[0].athlete.$ref = some data
$categories[0].leaders[0].data.$data = some other data
$categories[1].name = dummy name
Current output with above python script:
$.id 1
$$.categories.name author
$$.categories.book fiction
$$$.categories.leaders.ref ["wiki", "google"]
$$$$$.categories.leaders.athlete.$ref some data
$$$$$$.categories.leaders.athlete.data.$data some other data
$$.name dummy name
The following recursive function is similar to yours, but instead of just displaying a dictionary, it can also take a list. This means that if you passed in a dictionary where one of the values was a nested list, then the output would still be correct (printing things like dict.key[3][4] = element).
def disp_paths(it, p='$'):
for k, v in (it.items() if type(it) is dict else enumerate(it)):
if type(v) is dict:
disp_paths(v, '{}.{}'.format(p, k))
elif type(v) is list:
for i, e in enumerate(v):
if type(e) is dict or type(e) is list:
disp_paths(e, '{}.{}[{}]'.format(p, k, i))
else:
print('{}.{}[{}] = {}'.format(p, k, i, e))
else:
f = '{}.{} = {}' if type(it) is dict else '{}[{}] = {}'
print(f.format(p, k, v))
which, when ran with your dictionary (disp_paths(d)), gives the expected output of:
$.categories[0].leaders[0].athlete.$ref = some data
$.categories[0].leaders[0].data.$data = some other data
$.categories[0].leaders[0].ref[0] = wiki
$.categories[0].leaders[0].ref[1] = google
$.categories[0].book = fiction
$.categories[0].name = author
$.categories[1].name = dummy name
$.id = 1
Note that this is unfortunately not ordered, but that is unavoidable as dictionaries have no inherent order (they are just sets of key:value pairs)
If you need help understanding my modifications, just drop a comment!

Play JSON Reads[T]: split a JsArray into multiple subsets

I have a JSON structure that contains an array of events. The array is "polymorphic" in the sense that there are three possible event types A, B and C:
{
...
"events": [
{ "eventType": "A", ...},
{ "eventType": "B", ...},
{ "eventType": "C", ...},
...
]
}
The three event types don't have the same object structure, so I need different Reads for them. And apart from that, the target case class of the whole JSON document distinguishes between the events:
case class Doc(
...,
aEvents: Seq[EventA],
bEvents: Seq[EventB],
cEvents: Seq[EventC],
...
)
How can I define the internals of Reads[Doc] so that the json array events is split into three subsets which are mapped to aEvents, bEvents and cEvents?
What I tried so far (without being succesful):
First, I defined a Reads[JsArray] to transform the original JsArray to another JsArray that only contains events of a particular type:
def eventReads(eventTypeName: String) = new Reads[JsArray] {
override def reads(json: JsValue): JsResult[JsArray] = json match {
case JsArray(seq) =>
val filtered = seq.filter { jsVal =>
(jsVal \ "eventType").asOpt[String].contains(eventTypeName)
}
JsSuccess(JsArray(filtered))
case _ => JsError("Must be an array")
}
}
Then the idea is to use it like this within Reads[Doc]:
implicit val docReads: Reads[Doc] = (
...
(__ \ "events").read[JsArray](eventReads("A")).andThen... and
(__ \ "events").read[JsArray](eventReads("B")).andThen... and
(__ \ "events").read[JsArray](eventReads("C")).andThen... and
...
)(Doc.apply _)
However, I don't know how to go on from here. I assume the andThen part should look something like this (in case of event a):
.andThen[Seq[EventA]](EventA.reads)
But that doesn't work since I expect the API to create a Seq[EventA] by explicitly passing a Reads[EventA] instead of Reads[Seq[EventA]]. And apart from that, since I've never got it running, I'm not sure if this whole approach is reasonable in the first place.
edit: in case the original JsArray contains unknown event types (e.g. D and E), these types should be ignored and left out from the final result (instead of making the whole Reads fail).
put implicit read for every Event type like
def eventRead[A](et: String, er: Reads[A]) = (__ \ "eventType").read[String].filter(_ == et).andKeep(er)
implicit val eventARead = eventRead("A", Json.reads[EventA])
implicit val eventBRead = eventRead("B", Json.reads[EventB])
implicit val eventCRead = eventRead("C", Json.reads[EventC])
and use Reads[Doc] (folding event list to separate sequences by types and apply result to Doc):
Reads[Doc] = (__ \ "events").read[List[JsValue]].map(
_.foldLeft[JsResult[ (Seq[EventA], Seq[EventB], Seq[EventC]) ]]( JsSuccess( (Seq.empty[EventA], Seq.empty[EventB], Seq.empty[EventC]) ) ){
case (JsSuccess(a, _), v) =>
(v.validate[EventA].map(e => a.copy(_1 = e +: a._1)) or v.validate[EventB].map(e => a.copy(_2 = e +: a._2)) or v.validate[EventC].map(e => a.copy(_3 = e +: a._3)))
case (e, _) => e
}
).flatMap(p => Reads[Doc]{js => p.map(Doc.tupled)})
it will create Doc in one pass through events list
JsSuccess(Doc(List(EventA(a)),List(EventB(b2), EventB(b1)),List(EventC(c))),)
the source data
val json = Json.parse("""{"events": [
| { "eventType": "A", "e": "a"},
| { "eventType": "B", "ev": "b1"},
| { "eventType": "C", "event": "c"},
| { "eventType": "B", "ev": "b2"}
| ]
|}
|""")
case class EventA(e: String)
case class EventB(ev: String)
case class EventC(event: String)
I would model the fact that you store different event types in your JS array as a class hierarchy to keep it type safe.
sealed abstract class Event
case class EventA() extends Event
case class EventB() extends Event
case class EventC() extends Event
Then you can store all your events in a single collection and use pattern matching later to refine them. For example:
case class Doc(events: Seq[Event]) {
def getEventsA: Seq[EventA] = events.flatMap(_ match {
case e: EventA => Some(e)
case _ => None
})
}
Doc(Seq(EventA(), EventB(), EventC())).getEventsA // res0: Seq[EventA] = List(EventA())
For implementing your Reads, Doc will be naturally mapped to the case class, you only need to provide a mapping for Event. Here is what it could look like:
implicit val eventReads = new Reads[Event] {
override def reads(json: JsValue): JsResult[Event] = json \ "eventType" match {
case JsDefined(JsString("A")) => JsSuccess(EventA())
case JsDefined(JsString("B")) => JsSuccess(EventB())
case JsDefined(JsString("C")) => JsSuccess(EventC())
case _ => JsError("???")
}
}
implicit val docReads = Json.reads[Doc]
You can then use it like this:
val jsValue = Json.parse("""
{
"events": [
{ "eventType": "A"},
{ "eventType": "B"},
{ "eventType": "C"}
]
}
""")
val docJsResults = docReads.reads(jsValue) // docJsResults: play.api.libs.json.JsResult[Doc] = JsSuccess(Doc(List(EventA(), EventB(), EventC())),/events)
docJsResults.get.events.length // res1: Int = 3
docJsResults.get.getEventsA // res2: Seq[EventA] = List(EventA())
Hope this helps.

Logstash indexing JSON arrays

Logstash is awesome. I can send it JSON like this (multi-lined for readability):
{
"a": "one"
"b": {
"alpha":"awesome"
}
}
And then query for that line in kibana using the search term b.alpha:awesome. Nice.
However I now have a JSON log line like this:
{
"different":[
{
"this": "one",
"that": "uno"
},
{
"this": "two"
}
]
}
And I'd like to be able to find this line with a search like different.this:two (or different.this:one, or different.that:uno)
If I was using Lucene directly I'd iterate through the different array, and generate a new search index for each hash within it, but Logstash currently seems to ingest that line like this:
different: {this: one, that: uno}, {this: two}
Which isn't going to help me searching for log lines using different.this or different.that.
Any got any thoughts as to a codec, filter or code change I can make to enable this?
You can write your own filter (copy & paste, rename the class name, the config_name and rewrite the filter(event) method) or modify the current JSON filter (source on Github)
You can find the JSON filter (Ruby class) source code in the following path logstash-1.x.x\lib\logstash\filters named as json.rb. The JSON filter parse the content as JSON as follows
begin
# TODO(sissel): Note, this will not successfully handle json lists
# like your text is '[ 1,2,3 ]' JSON.parse gives you an array (correctly)
# which won't merge into a hash. If someone needs this, we can fix it
# later.
dest.merge!(JSON.parse(source))
# If no target, we target the root of the event object. This can allow
# you to overwrite #timestamp. If so, let's parse it as a timestamp!
if !#target && event[TIMESTAMP].is_a?(String)
# This is a hack to help folks who are mucking with #timestamp during
# their json filter. You aren't supposed to do anything with
# "#timestamp" outside of the date filter, but nobody listens... ;)
event[TIMESTAMP] = Time.parse(event[TIMESTAMP]).utc
end
filter_matched(event)
rescue => e
event.tag("_jsonparsefailure")
#logger.warn("Trouble parsing json", :source => #source,
:raw => event[#source], :exception => e)
return
end
You can modify the parsing procedure to modify the original JSON
json = JSON.parse(source)
if json.is_a?(Hash)
json.each do |key, value|
if value.is_a?(Array)
value.each_with_index do |object, index|
#modify as you need
object["index"]=index
end
end
end
end
#save modified json
......
dest.merge!(json)
then you can modify your config file to use the/your new/modified JSON filter and place in \logstash-1.x.x\lib\logstash\config
This is mine elastic_with_json.conf with a modified json.rb filter
input{
stdin{
}
}filter{
json{
source => "message"
}
}output{
elasticsearch{
host=>localhost
}stdout{
}
}
if you want to use your new filter you can configure it with the config_name
class LogStash::Filters::Json_index < LogStash::Filters::Base
config_name "json_index"
milestone 2
....
end
and configure it
input{
stdin{
}
}filter{
json_index{
source => "message"
}
}output{
elasticsearch{
host=>localhost
}stdout{
}
}
Hope this helps.
For a quick and dirty hack, I used the Ruby filter and below code , no need to use the out of box 'json' filter anymore
input {
stdin{}
}
filter {
grok {
match => ["message","(?<json_raw>.*)"]
}
ruby {
init => "
def parse_json obj, pname=nil, event
obj = JSON.parse(obj) unless obj.is_a? Hash
obj = obj.to_hash unless obj.is_a? Hash
obj.each {|k,v|
p = pname.nil?? k : pname
if v.is_a? Array
v.each_with_index {|oo,ii|
parse_json_array(oo,ii,p,event)
}
elsif v.is_a? Hash
parse_json(v,p,event)
else
p = pname.nil?? k : [pname,k].join('.')
event[p] = v
end
}
end
def parse_json_array obj, i,pname, event
obj = JSON.parse(obj) unless obj.is_a? Hash
pname_ = pname
if obj.is_a? Hash
obj.each {|k,v|
p=[pname_,i,k].join('.')
if v.is_a? Array
v.each_with_index {|oo,ii|
parse_json_array(oo,ii,p,event)
}
elsif v.is_a? Hash
parse_json(v,p, event)
else
event[p] = v
end
}
else
n = [pname_, i].join('.')
event[n] = obj
end
end
"
code => "parse_json(event['json_raw'].to_s,nil,event) if event['json_raw'].to_s.include? ':'"
}
}
output {
stdout{codec => rubydebug}
}
Test json structure
{"id":123, "members":[{"i":1, "arr":[{"ii":11},{"ii":22}]},{"i":2}], "im_json":{"id":234, "members":[{"i":3},{"i":4}]}}
and this is whats output
{
"message" => "{\"id\":123, \"members\":[{\"i\":1, \"arr\":[{\"ii\":11},{\"ii\":22}]},{\"i\":2}], \"im_json\":{\"id\":234, \"members\":[{\"i\":3},{\"i\":4}]}}",
"#version" => "1",
"#timestamp" => "2014-07-25T00:06:00.814Z",
"host" => "Leis-MacBook-Pro.local",
"json_raw" => "{\"id\":123, \"members\":[{\"i\":1, \"arr\":[{\"ii\":11},{\"ii\":22}]},{\"i\":2}], \"im_json\":{\"id\":234, \"members\":[{\"i\":3},{\"i\":4}]}}",
"id" => 123,
"members.0.i" => 1,
"members.0.arr.0.ii" => 11,
"members.0.arr.1.ii" => 22,
"members.1.i" => 2,
"im_json" => 234,
"im_json.0.i" => 3,
"im_json.1.i" => 4
}
The solution I liked is the ruby filter because that requires us to not write another filter. However, that solution creates fields that are on the "root" of JSON and it's hard to keep track of how the original document looked.
I came up with something similar that's easier to follow and is a recursive solution so it's cleaner.
ruby {
init => "
def arrays_to_hash(h)
h.each do |k,v|
# If v is nil, an array is being iterated and the value is k.
# If v is not nil, a hash is being iterated and the value is v.
value = v || k
if value.is_a?(Array)
# "value" is replaced with "value_hash" later.
value_hash = {}
value.each_with_index do |v, i|
value_hash[i.to_s] = v
end
h[k] = value_hash
end
if value.is_a?(Hash) || value.is_a?(Array)
arrays_to_hash(value)
end
end
end
"
code => "arrays_to_hash(event.to_hash)"
}
It converts arrays to has with each key as the index number. More details:- http://blog.abhijeetr.com/2016/11/logstashelasticsearch-best-way-to.html

Scala: Refactoring a case-statement to use for-comprehension

I'm trying to parse the following Json into a Scala object:
{
"oneOf": [
{ "$ref": "..." },
{ "$ref": "..." },
{ "$ref": "..." }
}
The field "oneOf" could also be "anyOf" or "allOf"; it will only be one of these values. I am constructing a case class, ComplexType, using Play's JSON library. The logic is simple; it looks for a given field and reads it if present, otherwise, checks a different field.
(json \ "allOf") match {
case a:JsArray => ComplexType("object", "allOf", a.as[Seq[JsObject]].flatMap(_.values.map(_.as[String])))
case _ =>
(json \ "anyOf") match {
case a:JsArray => ComplexType("object", "anyOf", a.as[Seq[JsObject]].flatMap(_.values.map(_.as[String])))
case _ =>
(json \ "oneOf") match {
case a:JsArray => ComplexType("object", "oneOf", a.as[Seq[JsObject]].flatMap(_.values.map(_.as[String])))
case _ => ComplexType("object", "oneOf", "Unspecified" :: Nil)
}
}
}
I'm not happy with this syntax; even though it works I don't see why I need to have nested match statements if no match is found. I believe a for-comprehension will work well: I can check for (json \ "allOf"), (json \ "oneOf) etc in a guard clause and yield the available result, but not sure how to get the syntax correct.
Is there a more elegant way to build this case class?
Thanks,
Mike
I don't think for-comprehension is useful here.
You could make your code more readable with custom extractors like this:
class Extractor(s: String) {
def unapply(v: JsValue): Option[JsArray] = json \ s match {
case a: JsArray => Some(a)
case _ => None
}
}
val AllOf = new Extractor("allOf")
val AnyOf = new Extractor("anyOf")
val OneOf = new Extractor("oneOf")
val (name, es) = json match {
case AllOf(a) => "allOf" -> Some(a)
case AnyOf(a) => "anyOf" -> Some(a)
case OneOf(a) => "oneOf" -> Some(a)
case _ => "oneOf" -> None
}
val result =
es.
map{ a => ComplexType("object", name, a.as[Seq[JsObject]].flatMap(_.values.map(_.as[String]))) }.
getOrElse("Unspecified" :: Nil)

Scala 2.10: Array + JSON arrays to hashmap

After reading a JSON result from a web service response:
val jsonResult: JsValue = Json.parse(response.body)
Containing content something like:
{
result: [
["Name 1", "Row1 Val1", "Row1 Val2"],
["Name 2", "Row2 Val1", "Row2 Val2"]
]
}
How can I efficiently map the contents of the result array in the JSON with a list (or something similar) like:
val keys = List("Name", "Val1", "Val2")
To get an array of hashmaps?
Something like this ?
This solution is functional and handles None/Failure cases "properly" (by returning a None)
val j = JSON.parseFull( json ).asInstanceOf[ Option[ Map[ String, List[ List[ String ] ] ] ] ]
val res = j.map { m ⇒
val r = m get "result"
r.map { ll ⇒
ll.foldRight( List(): List[ Map[ String, String ] ] ) { ( l, acc ) ⇒
Map( ( "Name" -> l( 0 ) ), ( "Val1" -> l( 1 ) ), ( "Val2" -> l( 2 ) ) ) :: acc
}
}.getOrElse(None)
}.getOrElse(None)
Note 1: I had to put double quotes around result in the JSON String to get the JSON parser to work
Note 2: the code could look nicer using more "monadic" sugar such as for comprehensions or using applicative functors