Knime MongoDBReader read an Object with an array in it - knime

I tried to read an Object with the MongoDB reader and I always get the following error message:
"ERROR MongoDB Reader 0:19 Execute failed: Invalid type 19 for field value".
The typ of the field value is an Array.
I want to read the object and want to get the array inside the Object.
Here you can see the MongoDB with the Object I want to read.

“Type 19” indicates a Decimal128 type, see this link.
I’d assume that this type is just not supported by the MongoDB nodes (note how above link says “New in version 3.4.”).
[personal opinion] The KNIME MongoDB nodes are not very well maintained. Beside some basic “hello world” attempts I wasn’t able to use them for any of my real-world usage scenarios.

Related

Filtering with regex vs json

When filtering logs, Logstash may use grok to parse the received log file (let's say it is Nginx logs). Parsing with grok requires you to properly set the field type - e.g., %{HTTPDATE:timestamp}.
However, if Nginx starts logging in JSON format then Logstash does very little processing. It simply creates the index, and outputs to Elasticseach. This leads me to believe that only Elasticsearch benefits from the "way" it receives the index.
Is there any advantage for Elasticseatch in having index data that was processed with Regex vs. JSON? E.g., Does it impact query time?
For elasticsearch it doesn't matter how you are parsing the messages, it has no information about it, you only need to send a JSON document with the fields that you want to store and search on according to your index mapping.
However, how you are parsing the message matters for Logstash, since it will impact directly in the performance.
For example, consider the following message:
2020-04-17 08:10:50,123 [26] INFO ApplicationName - LogMessage From The Application
If you want to be able to search and apply filters on each part of this message, you will need to parse it into fields.
timestamp: 2020-04-17 08:10:50,123
thread: 26
loglevel: INFO
application: ApplicationName
logmessage: LogMessage From The Application
To parse this message you can use different filters, one of them is grok, which uses regex, but if your message has always the same format, you can use another filter, like dissect, in this case both will achieve the same thing, but while grok uses regex to match the fields, dissect is only positional, this make a huge difference in CPU use when you have a high number of events per seconds.
Consider now that you have the same message, but in a JSON format.
{ "timestamp":"2020-04-17 08:10:50,123", "thread":26, "loglevel":"INFO", "application":"ApplicationName","logmessage":"LogMessage From The Application" }
It is easier and fast for logstash to parse this message, you can do it in your input using the json codec or you can use the json filter in your filter block.
If you have control on how your log messages are created, choose something that will make you do not need to use grok.

Mapping JSON to Apama Events

I am trying to use the Mapper codec in my connectivity chain to convert a JSON object that looks like this:
{"test2":[
["column1","column2","column3"],
["16091", "449", "05302018"],
["16092", "705", "05302018"]
]}
to an EPL type. To me it looks like a sequence of sequences, so I used
event test1 {
sequence<string> values;
}
event test2 {
sequence<test1> tests;
}
But this gives me the error
Unable to parse event test.1: Incorrect type in get (you asked for map but its' actually list)
Any ideas how I should be using the Mapper codec to this end?
Unless explicitly remapped, that won't quite work. You have to consider the entire structure of the document from top to bottom. It's not a sequence of strings - it's a JSON object/dictionary at top-level, with a value that is a sequence of sequences of string.
A JSON object/dictionary can map to an event type based on field names. So as Matt's answer said, a JSON document like yours would need an event type like
event SomeEventType {
sequence<sequence<string > > test2;
}
If it's not appropriate to create an event type that exactly corresponds to the JSON document's structure, then you'll need to use the mapping codec to rearrange the fields in the JSON document to match the fields and sub-fields in an event type. Or possibly a custom codec; I think Matt's right that the mapper can't do exactly what you want.
Further, because JSON documents are type-less at the top-level, you'll need to make sure that the event type is defined somehow. There are multiple ways of doing that.
(1) If this particular connectivity will only send you events of one type, you can use the 'defaultEventType' configuration option of the apama.eventMap host plug-in at the top of your chain e.g.
apama.eventMap:
defaultEventMap: SomeEventType
(2) If it depends on the structure of the document, you'll need to use the classifier codec. That can take a message going towards the correlator, and assign it an event type based on the content of fields (or simply their presence). You can learn about it in the documentation.
(3) The transport will sometimes define it on messages being sent towards the correlator. For example, in the case of the Universal Messaging transport, then the 'tag' of the UM event will be used as the type. This may or may not be appropriate.
If you do end up doing anything non-trivial with the classifier or mapper, I'd strongly recommend use of the 'diagnostic codec' to help in developing the classifier or mapper rules. This is a codec you can put anywhere in the chain of codecs that will log every event going through it, so you can see how your rules are operating by seeing what happens before and after classification/mapping. You can read about it in the documentation, but it's usually as simple as putting '- diagnosticCodec' somewhere in your chain. I've found it absolutely invaluable when debugging connectivity chains.
you want your event type to look like:
event type1 {
sequence<sequence<string> > data;
}
it's not possible in the mapper directly to convert to your type2/type1 schema, but you'd be able to write your own codec to do that or do post-filtering in EPL.
HTH,
Matt

Json Object with one attribute or primitive Json Data type?

I am building a REST API which creates a resource. The resource has only one attribute which is a rather long and unique string. I am planning to send this data to the API as JSON. I see two choices for modeling the data as JSON
A primitive JSON String data type
A JSON object with one String attribute.
Both the options work.
Which of these two options is preferred for this context? And why?
Basic Answer for Returning
I would personally use option 2, which is: `A JSON object with one String attribute.'
Also, in terms of design: I prefer to return an object, that has a key/value. The key is also a name that provides context as to what has been returned.
Returning just a string, basically a "" or {""} lacks that context ( the name of the returned variable.
Debate: Are primitive Strings Json Objects?
There seems to be also some confusion as to if a String by itself is a valid JSON document.
This confusion and debate, are quite evident in the following posts where various technical specs are mentioned: Is a primitive type considered JSON?
The only thing for sure is that a JSON object with a key-value pair is definitely valid!
As to a string by itself.. I'm not sure ( requires more reading).
Update: Answer In terms of creating/updating an entity (Post/Put)
In the specific case above, relating to such a large string that "runs into a few kilobytes"... my feeling is that this would be included within the request body.
In the specific context of sending data, I would actually be comfortable with using either 1 or 2. Additionally, 1 seems more optimized ( if your frameworks support it), since the context about what the data is, is related to the rest API method.
However, if in the future you need to add one more parameter, you will have to use a JSON entity with more than one key.

Deserialize an anonymous JSON array?

I got an anonymous array which I want to deserialize, here the example of the first array object
[
{ "time":"08:55:54",
"date":"2016-05-27",
"timestamp":1464332154807,
"level":3,
"message":"registerResourcePath ('', '/sap/bc/ui5_ui5/ui2/ushell/resources/')",
"details":"","component":"sap.ui.ModuleSystem"},
{"time":"08:55:54","date":"2016-05-27","timestamp":1464332154808,"level":3,"message":"URL prefixes set to:","details":"","component":"sap.ui.ModuleSystem"},
{"time":"08:55:54","date":"2016-05-27","timestamp":1464332154808,"level":3,"message":" (default) : /sap/bc/ui5_ui5/ui2/ushell/resources/","details":"","component":"sap.ui.ModuleSystem"}
]
I tried deserializing using CL_TREX_JSON_SERIALIZER, but it is corrupt and does not work with my JSON, here is why
Then I tried /UI2/CL_JSON, but it needs a "structure" that perfectly fits the object given by the JSON Object. "Structure" means in my case an internal table of objects with the attributes time, date, timestamp, level, messageanddetails. And there was the problem: it does not properly handle references and uses class description to describe the field assigned to the field-symbol. Since I can not have a list of objects but only a list of references to objects that solution also doesn't works.
As a third attempt I tried with the CALL TRANSFORMATION as described by Horst Keller, but with this method I was not able to read in an anonymous array, and here is why
My major points:
I do not want to change the JSON, since that is what I get from sap.ui.log
I prefere to use built-in functionality and not a thirdparty framework
Your problem comes out not from the anonymity of array, but from the awkwardness of SAP JSON (De)serializer, which doesn't respect double quotes, which enclose JSON attributes. The issue is thoroughly described in this answer.
If you don't want to change your JSON on-the-fly, the only way you have is to change CL_TREX_JSON_DESERIALIZER class like this.
/UI5/CL_JSON_PARSER parses JSONs with unknown format.
Note that it's got "for internal use" written on it so many times that you probably should take it seriously and clone its code to fixate it.

Parsing JSON with duplicate keys (json-cpp)

I'm using JsonCpp v0.6.0 to parse the following JSON string:
{
"3.7":"de305d54-75b4-431b-adb2-eb6b9e546011",
"3.7":"de305d54-75b4-431b-adb2-eb6b9e546012",
"3.8":"de305d54-75b4-431b-adb2-eb6b9e546013"
}
as follows:
Json::Value root;
Json::Reader reader;
// value contains the JSON string
if (!reader.parse(value, root, false))
{
// parse error
}
After the call to parse, root contains two entries in a map:
[0] first = "3.7", second = "de305d54-75b4-431b-adb2-eb6b9e546012",
[1] first = "3.8", second = "de305d54-75b4-431b-adb2-eb6b9e546013",
i.e. the first JSON record has been overwritten by the second. No errors are reported.
Is this behaviour expected? Is it correct?
I thought that an error might have been reported indicating that there is a duplicate key in the JSON string.
Like the JSON RFC sad the object names (keys) should be unique.
The names within an object SHOULD be unique.
Also the RFC defines if they are not, the behavior is unpredictable.
See this quote from the RFC:
An object whose names are all unique is interoperable in the sense
that all software implementations receiving that object will agree on
the name-value mappings. When the names within an object are not
unique, the behavior of software that receives such an object is
unpredictable. Many implementations report the last name/value pair
only. Other implementations report an error or fail to parse the
object, and some implementations report all of the name/value pairs,
including duplicates.
I agree with what you say, but I think JsonCpp tries to be a helpful tool, not something that tries to scrape by with minimal conformance to the RFCs.
It would make more sense if it either maintained the structure of the input stream, and supported duplicate keys, or (and this is what the OP and I'd expect) if it doesn't like it, to flag an error.
Silently changing the structure is unhelpful, as a check for the validity of the JSON input would have to be made with some other JSON tool prior to sending it to JsonCpp.