can elastic search index values such as
"key": [
14.0,
"somestring"
]
if i try to ingest this data, i get this error
org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable error [Bad Request(400) - [WriteFailureException; nested: MapperParsingException[failed to parse [FIXMessage.key]]; nested: NumberFormatException[For input string: "somestring"]; ]]; Bailing out..
first, is the above a valid JSON format ? If so, then why is elastic not able to index it?
Elasticsearch is trying to convert "somestring" to number, and failing.
Your json is valid, and Elasticsearch will happily index multiple values for the same field from an array of values, but they have to all have the same type, and the type must match the type of the corresponding field in the mapping.
If the field doesn't exist in the mapping, Elasticsearch will create it using a sensible default type based on the first value it sees for that field. So if you tried to index what you posted above, and the "key" field didn't exist, Elasticsearch would create the field, look at the first value it found (14.0) and decide to use the float type. Then it would try to index the second value, "somestring", fail to convert it to a float, and send back an error.
Make sense?
You might find this post helpful.
It depends on what you plan to do with the data.
If you only want to store and to get the full array, you can simply convert the JSON array as string.
When you want to get the array from Elastic, that you have to convert back from string to JSON array.
If you want also to search inside this array, you can split the array into multiple arrays, distinguish it after type.
Related
I'm consuming a Kafka topic published by another team (so I have very limited influence over the message format). The message has a field that holds an ARRAY of STRUCTS (an array of objects), but if the array has only one value then it just holds that STRUCT (no array, just an object). I'm trying to transform the message using Confluent KSQL. Unfortunately, I cannot figure out how to do this.
For example:
{ "field": {...} } <-- STRUCT (single element)
{ "field": [ {...}, {...} ] } <-- ARRAY (multiple elements)
{ "field": [ {...}, {...}, {...} ] <-- ARRAY (multiple elements)
If I configure the field in my message schema as a STRUCT then all messages with multiple values error. If I configure the field in my message schema as an ARRAY then all messages with a single value error. I could create two streams and merge them, but then my error log will be polluted with irrelevant errors.
I've tried capturing this field as a STRING/VARCHAR which is fine and I can split the messages into two streams. If I do this, then I can parse the single value messages and extract the data I need, but I cannot figure out how to parse the multivalue messages. None of the KSQL JSON functions seem to allow parsing of JSON Arrays out of JSON Strings. I can use EXTRACTJSONFIELD() to extract a particular element of the array, but not all of the elements.
Am I missing something? Is there any way to handle this reasonably?
In my experience, this is one use-case where KSQL just doesn't work. You would need to use Kafka Streams or a plain consumer to deserialize the event as a generic JSON type, then check object.get("field").isArray() or isObject(), and handle accordingly.
Even if you used a UDF in KSQL, the STREAM definition would be required to know ahead of time if you have field ARRAY<?> or field STRUCT<...>
I finally solved this in a roundabout way...
First, I created an initial stream reading the transaction as a stream of bytes using KAFKA format instead of JSON format. This allows me to put a filter conditional filter on the data so I can fork the stream into a version for the single (STRUCT) variation and a version for the multiple (ARRAY) variation.
The initial stream looks like:
CREATE OR REPLACE STREAM `my-topic-stream` (
id STRING KEY,
data BYTES
)
WITH (
KAFKA_TOPIC='my-topic',
VALUE_FORMAT='KAFKA'
);
Forking that stream looks like this with a second for a multiple version filtering for IS NOT NULL:
CREATE OR REPLACE STREAM `my-single-stream`
WITH (
kafka_topic='my-single-topic'
) AS
SELECT *
FROM `my-topic-stream`
WHERE JSON_ARRAY_LENGTH(EXTRACTJSONFIELD(FROM_BYTES(data, 'utf8'), '$.field')) IS NULL;
At this point I can create a schema for both variations, explode field, and merge the two streams back together. I don't know if this can be refined to be more efficient, but this successfully processes the transactions as I wanted.
I'm using SugarCRM rest API, and according to the documentation, to get a set of records, I have to use /<module> GET endpoint and pass JSON in the body to filter the query.
First, is it even possible to have a body in a GET request ?
and how can I build this kind of request then ?
I'm using postman and tried to pass parameters as query strings but it's not possible though.
As far as I know you have to put everything in the query string, which might look different to what you'd expect.
Example for a request to /Users:
{
max_num: 100,
fields: ["first_name", "last_name"],
filter: [
{"user_name":"admin"}
{"status":"Active"}
]
}
Written as query string this request will look like this:
/rest/v10/Users?max_num=100&fields=first_name,last_name&filter[0][user_name]=admin&filter[1][status]=Active
Observations regarding the query string format:
There is no { or }, the values of the request object are placed directly in the query string
Key-Value pairs are assigned with =, and separated by & (instead of : and ,)
There are no " or ' quotes at all, strings are written without those
An array of values (here: fields) is just one assignment with all values separated by ,
An array of objects (here: filter) has one Key-Value pair per bottom value and uses [ and ] to indicate the "path" to each value. Using 0-based numerical indices for arrays
Notes
Keep in mind there are length limits to URL incl. query string. E.g. 4096 bytes/chars for Apache 2, if I remember correctly. If you have to send very elaborate requests, you might want to use POST /rest/v10/<module>/filter instead.
URL-escaped (usually not necessary) the example filter would look like this:
/rest/v10/Users?max_num%3D100%26fields%3Dfirst_name%2Clast_name%26filter%5B0%5D%5Buser_name%5D%3Dadmin%26filter%5B1%5D%5Bstatus%5D%3DActive
When I insert an object into CosmosDB (MongoDB API), the result contains a property insertedIds.
When I console.log(insertedIds) I get
[ 5a6c46c85ac3cc4bb01ebcbb,
5a6c46c85ac3cc4bb01ebcbc,
5a6c46c85ac3cc4bb01ebcbd ]
and the typeof() is an object, for each element, although I'm not sure why - they just seem to be strings.
When I go through and JSON.stringify each element, I get (with surrounding double quotes) "5a6c46c85ac3cc4bb01ebcbb","5a6c46c85ac3cc4bb01ebcbc","5a6c46c85ac3cc4bb01ebcbd"
What is the right way to parse a CosmosDB Insert Result, and get the insertedIds as an array of strings?
Do I really have to go through and "stringify" then "de-string by removing quotes" for each returned Id? That is a huge overhead with large arrays.
Note: I believe this has something to do with MongoDB's bson/strict json: https://docs.mongodb.com/manual/reference/mongodb-extended-json/ but still not sure how to parse it.
you could use the following example:
JSON.parse(this_is_double_quoted);
JSON.parse("House");
.parse would return the value without quotes. more information can be found in the following post: remove double quotes from Json return data using Jquery
I've got some JSON within Google Refine - http://mapit.mysociety.org/point/4326/0.1293497,51.5464828 for the full version, but abbreviated it's like this:
{1234: {'name': 'Barking', 'type': 'WMC'},
5678: {'name': 'England', 'type': 'EUR'} }
I only want to extract the name for the object with the (presumed unique) type WMC.
Parse JSON in Google Refine doesn't help, that's working with arrays, not dicts.
Any suggestions what I should be looking at to fix this?
Edit: I don't know what the initial keys are: I believe they're unique identifiers which I can't predict ahead of time.
Refine doesn't currently know how to iterate through the keys of a dict where they keys are unknown (although I'm about to implement that functionality).
The trick to getting this working with the current implementation is to convert the JSON object to a JSON array. The following GREL expression will do that, parse the result as JSON, iterate through all elements of the array and give you the first name of type 'WMC'.
filter(('['+(value.replace(/"[0-9]+":/,""))[1,-1]+']').parseJson(),v,v['type']=='WMC')[0]['name']
Use that expression with the "Add column based on this column" command to create a new WMC name column. If there's a chance that there'll be more than one name of this type and you want them all, you can add in a forEach loop and join along the lines of
forEach(filter(('['+(value.replace(/"[0-9]+":/,""))[1,-1]+']').parseJson(),v,v['type']=='WMC'),x,x['name']).join('|')
This will give you a pipe separated list of names that you can split apart using "Split multi-valued cells."
It'll be easier in the next release hopefully!
Are there any good UDFs in MySQL to deal with json data, that supports the ability to retrieve a particular value in json (by dot notation key - EG: json_get('foo.bar.baz')) as well as the ability to set the value of a particular key - EG: json_set('foo.bar.baz', 'value')?
I found http://www.mysqludf.org/lib_mysqludf_json/ - but it seems to only provide the ability to create json data structures from non-json column values, as opposed to interacting with json column values.
This UDF is able to parse JSON and return the value of an attribute:
https://github.com/kazuho/mysql_json
This other one too: https://github.com/webaroo/mysql-json-udf