Change resultset structure elasticsearch - json

is it possible to change the resultset structure when retrieving data from elastic search?
the problem ist, the timeseries data are sometimes from 3000-8000 records which are a json array with json objects in it ... parsing it in this case not really efficient or necessary so i thought - could a resultset be transformed to just lets say a simple json object with an array of time and array of values? nothing more?
i could do this in java or php but since we want to have an efficent way of dealing with large datasets we are currently evaluating our options.

You can control what elasticsearch returns using source filtering:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-source-filtering.html
It can let you pick which part of the indexed document it will return, which, depending on your index structure could be an array of times and values, or at least, very easily mapped to it using the language of your choice.

Another possibility is to use scripting to control the results. If you map the result in this way, you should be able to get the hits object to be a JSON array of key: value.

Related

Is Athena a viable/sensible choice for infrequently searching *unstructured* JSON?

I'm recording request and response headers and bodies for all traffic to our API and from our API to 3rd party services into S3 as into tiny objects.
I want to be able to query this data infrequently. For example (pseudo-code):
select $.cars[0].color from "objects" where object_path in (....);
Other info:
Many "objects" in S3 won't have a valid path to $.cars[0].color (it's just one example).
I hope to not use Glue.
Cost is important - this is something that will be queried very infrequently. Configuring some ElasticSearch/similar solution is terribly out of budget for the use case.
I hope to not define my own set of schemas (this is simply not feasible).
Athena says it can search unstructured JSON. I'm having trouble creating a proof-of-concept to show this is true.
Is Athena right fr me? Am I missing a better solution?
I think Athena will work for your case.
Athena handles missing properties in JSON objects. For example, if you define the cars column as array<struct<color:string>>:
the property can be missing ⇒ SELECT cars … will be NULL
it can be an empty list ⇒ SELECT cars[1] … (Athena arrays start at 1) will result in an error, but element_at(cars, 1) and try(cars[1]) will return NULL
the object may be missing the color property ⇒ SELECT cars[1].color … will be NULL
for completely free-form JSON define the column as string and use the JSON functions to query it.
Glue is not necessary. Create the table manually, from your application, or with CloudFormation, and configure it to use partition projection and you will not have to think about using Glue crawlers at all.
Athena doesn't cost anything when you don't use it, and if you will query only infrequently this is key. Make sure to compress your data, and partition it in a way that supports your query patterns (e.g. by date or month if you most often will query recent data).
Not sure what you mean by having to define your "own set of schemas", so perhaps you can clarify that part?

Spark from_avro 2nd argument is a constant string, any way to obtain schema string from some column of each record?

suppose we are developing an application that pulls Avro records from a source
stream (e.g. Kafka/Kinesis/etc), parses them into JSON, then further processes that
JSON with additional transformations. Further assume these records can have a
varying schema (which we can look up and fetch from a registry).
We would like to use Spark's built in from_avro function, But it is pretty clear that
Spark from_avro wants you to hard code a >Fixed< schema into your code. It doesn't seem
to allow the schema to vary row by incoming row.
That sort of makes sense if you are parsing the Avro to Internal row format.. One would need
a consistent structure for the dataframe. But what if we wanted something like
from_avro which grabbed the bytes from some column in the row and also grabbed the string
representation of the Avro schema from some other column in the row, and then parsed that Avro
into a JSON string.
Does such built-in method exist? Or is such functionality available in a 3rd party library ?
Thanks !

How do I identify this JSON-like data structure?

I just came across a JSON wannabe that decides to "improve" it by adding datatypes... of course, the syntax makes it nearly impossible to google.
a:4:{
s:3:"cmd";
s:4:"save";
s:5:"token";
s:22:"5a7be6ad267d1599347886";
}
Full data is... much larger...
The first letter seems to be a for array, s for string, then the quantity of data (# of array items or length of string), then the actual piece of data.
With this type of syntax, I currently can't Google meaningful results. Does anyone recognize what god-forsaken language or framework this is from?
Note: some genius decided to stuff this data as a single field inside a database, and it included critical fields that I need to perform aggregate functions on. The rest I can handle if I can get a way to parse this data without resorting to ugly serial processing.
If this can be parsed using MSSQL 2008 that results in a view, I'll throw in a bounty...
I would parse it with a UDF written in .NET - https://learn.microsoft.com/en-us/sql/relational-databases/clr-integration-database-objects-user-defined-functions/clr-user-defined-functions
You can either write a custom aggregate function to parse and calculate these nutty fields, or a scalar value function that returns the field as JSON.
I'd probably opt for the latter in the name of separation of concerns.

Is {0:{"id":1,...},{"id:2,....}} a other reprensation of a JSON list like [{"id":1,...},{"id:2,....}]

I have a little dilema. I have a backend/Frontend Application that comunicates with a JSON based REST Api.
The backend is written in PHP(Symfony/jmsserializer) and the Frontend in Dart
The communication between these two has a little Problem.
For most List Data the backend responds with a JSON like
[{"id":1,...},{"id:2,....}]
But for some it responds with
{"0":{"id":1,...}, "1":{"id:2,....}}
Now my Question is should the backend respond with the later at all or only with the first?
Problem
You usually have a list of objects. You sometimes get an object with sub-objects as properties.
Underlying issue
JS/JSON-Lists are ordered from 0 upwards which means that if you have PHP-Array which does not respect this rule json_encode will output a JS/JSON-Object instead using the numeric indices as keys.
PHP-Arrays are ordered maps which have more features that the JSON-Lists. Whenever you're using those extra features you won't be able to translate directly into JSON-Lists without loosing some information (ordering, keys, skipped indices, etc.).
PHP-Arrays and JSON-Objects on the other hand are more ore less equivalent in terms of features and can be correctly translated between each other without any loss of information.
Occurence
This happens if you have an initial PHP-Array of values which respects the JS/JSON-List rules but the keys in the list of objects are modified somehow. For example if you have a custom indexing order {"3":{}, "0":{}, "1":{}, "2":{}} or if you have (any) keys that are strings (ie. not numeric).
This always happens if you want to use the numeric id of the object as the numeric index of the list {"123":{"id": 123, "name": "obj"}} even if the numeric ids are in ascending order... so long as they are not starting from 0 upwards it's not a json-list it's a json-object.
Specific case
So my guess is that the PHP code in the backend is doing something like fetching a list of objects but its modifying something about it like inserting them by (string) keys into the array, inserting them in a specific order, removing some of them.
Resolution
The backend can easily fix this by using array_values($listOfObjects) before using json_encode which will reindex the entire list by numeric indices of ascending value.
Arrays and dictionaries are two separate types in JSON ("array" and "object" respectively), but PHP combines this functionality in a single array type.
PHP's json_encode deals with this as follows: an array that only contains numeric keys ($array = ['cat', 'dog']) is serialized as JSON array, an associative array that contains non-numeric keys ($array = ['cat' => 'meow', 'dog' => 'woof']) is serialized as JSON object, including the array's keys in the output.
If you end up with an associative array in PHP, but want to serialize it as a plain array in JSON, just use this to convert it to a numerical array before JSON encoding it: $array = array_values($array);

Use JSON.stringify but the data still have array, what does it mean?

I have a data which are object array. It contains object arrays in a tree structure. I use JSON.stringify(myArray) but the data still contain array because I see [] inside the converted data.
In my case, I want all the data to be converted into json object not array regarding I need to used the data on TreeTable of SAPUI5.
Maybe I misunderstand. Please help me clear.
This is the example of the data that I got from JSON.stringify.
[{"value":{"Id":"00145E5BB2641EE284F811A7907717A3",
"Text":"BI-RA Reporting, analysis, and dashboards",
"Parent":"00145E5BB2641EE284F811A79076F7A3","Type":"BMF"},
"children":[{"value":{"Id":"00145E5BB2641EE284F811A7907737A3",
"Text":"WebIntelligence_4.1","Parent":"00145E5BB2641EE284F811A7907717A3",
"Type":"TWB"},"children":[{"value":{"Id":"00145E5BB2641EE284F811A7907757A3",
"Text":"Functional Areas","Parent":"00145E5BB2641EE284F811A7907737A3","Type":"TWB"},
"children":[{"value":{"Id":"00145E5BB2641EE284F811A7907777A3",
"Text":"CHARTING","Parent":"00145E5BB2641EE284F811A7907757A3","Type":"TWB"},
"children":[{"value":{"Id":"001999E0B9081EE28AB706BE26631E93",
"Text":"Drill","Parent":"00145E5BB2641EE284F811A7907777A3","Type":"TWB"},
"children":[{"value":{"Id":"001999E0B9081EE28AB706BE26633E93",
"Text":"[AUTO][ACCEPT] Drill on charts DHTML","Parent":"001999E0B9081EE28AB706BE26631E93",
"Type":"TWB","Ref":"UT_WEBI_CHARTS_DRILL_HTML"}},{"value":{"Id":"001999E0B9081EE28AB706BE26635E93",
"Text":"[AUTO][ACCEPT] Drill on charts JAVA","Parent":"001999E0B9081EE28AB706BE26631E93",
"Type":"TWB","Ref":"UT_WEBI_CHARTS_DRILL_JAVA"}}]},...
The output that I want shouldn't be array of object but should be something like...
{{"value":{
"Id":"00145E5BB2641EE284F811A7907717A3",
"Text":"BI-RA Reporting, analysis, and dashboards",
"Parent":"00145E5BB2641EE284F811A79076F7A3","Type":"BMF"},
"children":{
{"value":{
"Id":"00145E5BB2641EE284F811A7907737A3",
"Text":"WebIntelligence_4.1",
"Parent":"00145E5BB2641EE284F811A7907717A3",
"Type":"TWB"},
"children":{
{"value":{
"Id":"00145E5BB2641EE284F811A7907757A3",
"Text":"Functional Areas",
"Parent":"00145E5BB2641EE284F811A7907737A3",
"Type":"TWB"},...
JSON.stringify merely converts JavaScript data structures to a JSON-formatted string for consumption by other parsers (including JSON.parse). If you want it to stringify to a different value, you must change the source data structures first.
However, it seems that this can't be represented as anything other than an array because you have duplicate keys (i.e. value appears more than once). That would not be valid for a JavaScript object or a JSON representation of such.
I think what you want is
JSON.stringify(data[0]);
or perhaps
JSON.stringify(data[0].value);
where data is the object you passed in the question