How do I identify this JSON-like data structure? - json

I just came across a JSON wannabe that decides to "improve" it by adding datatypes... of course, the syntax makes it nearly impossible to google.
a:4:{
s:3:"cmd";
s:4:"save";
s:5:"token";
s:22:"5a7be6ad267d1599347886";
}
Full data is... much larger...
The first letter seems to be a for array, s for string, then the quantity of data (# of array items or length of string), then the actual piece of data.
With this type of syntax, I currently can't Google meaningful results. Does anyone recognize what god-forsaken language or framework this is from?
Note: some genius decided to stuff this data as a single field inside a database, and it included critical fields that I need to perform aggregate functions on. The rest I can handle if I can get a way to parse this data without resorting to ugly serial processing.
If this can be parsed using MSSQL 2008 that results in a view, I'll throw in a bounty...

I would parse it with a UDF written in .NET - https://learn.microsoft.com/en-us/sql/relational-databases/clr-integration-database-objects-user-defined-functions/clr-user-defined-functions
You can either write a custom aggregate function to parse and calculate these nutty fields, or a scalar value function that returns the field as JSON.
I'd probably opt for the latter in the name of separation of concerns.

Related

Is Athena a viable/sensible choice for infrequently searching *unstructured* JSON?

I'm recording request and response headers and bodies for all traffic to our API and from our API to 3rd party services into S3 as into tiny objects.
I want to be able to query this data infrequently. For example (pseudo-code):
select $.cars[0].color from "objects" where object_path in (....);
Other info:
Many "objects" in S3 won't have a valid path to $.cars[0].color (it's just one example).
I hope to not use Glue.
Cost is important - this is something that will be queried very infrequently. Configuring some ElasticSearch/similar solution is terribly out of budget for the use case.
I hope to not define my own set of schemas (this is simply not feasible).
Athena says it can search unstructured JSON. I'm having trouble creating a proof-of-concept to show this is true.
Is Athena right fr me? Am I missing a better solution?
I think Athena will work for your case.
Athena handles missing properties in JSON objects. For example, if you define the cars column as array<struct<color:string>>:
the property can be missing ⇒ SELECT cars … will be NULL
it can be an empty list ⇒ SELECT cars[1] … (Athena arrays start at 1) will result in an error, but element_at(cars, 1) and try(cars[1]) will return NULL
the object may be missing the color property ⇒ SELECT cars[1].color … will be NULL
for completely free-form JSON define the column as string and use the JSON functions to query it.
Glue is not necessary. Create the table manually, from your application, or with CloudFormation, and configure it to use partition projection and you will not have to think about using Glue crawlers at all.
Athena doesn't cost anything when you don't use it, and if you will query only infrequently this is key. Make sure to compress your data, and partition it in a way that supports your query patterns (e.g. by date or month if you most often will query recent data).
Not sure what you mean by having to define your "own set of schemas", so perhaps you can clarify that part?

What's the difference between json data being encoded or not

What's the purpose (not what it becomes) of doing json_encode on this before I am putting into the database
rating: {cleanliness: 3, publicFacility: 1, roomFacility: 2, security: 2}
to become this
rating: "{"cleanliness":3,"publicFacility":1,"roomFacility":2,"security":2}"
I see no point of doing this cause I need to json_decode it again before serving it back... can anybody clear me out?
Do not store json encoded data in the database. You mitigate the whole point of a relational database this way and make searching for values an expensive task. I see in your sample the attributes cleanliness, publicFacility, roomFacility and security. Those should be columns in your database so you can search for something like "all entries with a cleanliness higher than 3".
It works with the JSON column type but it is more expensive than using normal columns.
Edit: Check the use-case for your database entry. If you are sure you never need to search in or order by the encoded attributes you can store data encoded as json string. However, if your database supports the JSON column type, you should use that one because it allows searching in the stored JSON (but is more expensive than searching in normal columns). </Edit>
Second point: The second code snipped (with the quotation marks) looks like invalid syntax for json.

Change resultset structure elasticsearch

is it possible to change the resultset structure when retrieving data from elastic search?
the problem ist, the timeseries data are sometimes from 3000-8000 records which are a json array with json objects in it ... parsing it in this case not really efficient or necessary so i thought - could a resultset be transformed to just lets say a simple json object with an array of time and array of values? nothing more?
i could do this in java or php but since we want to have an efficent way of dealing with large datasets we are currently evaluating our options.
You can control what elasticsearch returns using source filtering:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-source-filtering.html
It can let you pick which part of the indexed document it will return, which, depending on your index structure could be an array of times and values, or at least, very easily mapped to it using the language of your choice.
Another possibility is to use scripting to control the results. If you map the result in this way, you should be able to get the hits object to be a JSON array of key: value.

http rest: Disadvantages of using json in query string?

I want to use query string as json, for example: /api/uri?{"key":"value"}, instead of /api/uri?key=value. Advantages, from my point of view, are:
json keep types of parameters, such are booleans, ints, floats and strings. Standard query string treats all parameters as strings.
json has less symbols for deep nested structures, For example, ?{"a":{"b":{"c":[1,2,3]}}} vs ?a[b][c][]=1&a[b][c][]=2&a[b][c][]=3
Easier to build on client side
What disadvantages could be in that case of json usage?
It looks good if it's
/api/uri?{"key":"value"}
as stated in your example, but since it's part of the URL then it gets encoded to:
/api/uri?%3F%7B%22key%22%3A%22value%22%7D
or something similar which makes /api/uri?key=value simpler than the encoded one; in this case both for debugging outbound calls (ie you want to check the actual request via wireshark etc). Also notice that it takes up more characters when encoded to valid url (see browser limitation).
For the case of 'lesser symbols for nested structures', It might be more appropriate to create a new resource for your sub resource where you will handle the filtering through your query parameters;
api/a?{"b":1,}
to
api/b?bfield1=1
api/a?aBfield1=1 // or something similar
Lastly for the 'easier to build in client side', i think it depends on what you use to create your client, usually query params are represented as maps so it is still simple.
Also if you need a collection for a single param then:
/uri/resource?param1=value1,value2,value3

strategy for returning JSON from XML canonical model

I am using the envelope pattern, and my canonical model part is in XML format. I usually return the model in full or in a summary version. Retrieval of documents is pretty quick, but when returning as part of my REST call, where I need to return JSON to the browser, my json:transform-to-json takes double the version of the call that just returns the XML.
Is a strategy to also have the canonical model in JSON format as well in the envelope, or to maybe have rendered json in full and summary formats in other documents outside of the envelope, which don't get searched, but are mainly used when returning results? This way I don't have to incur the hit for transforming the canonical model to JSON all the time.
Are there any other ways that this has been done?
Conversion from XML to JSON should be relatively light, but the mere fact it has to do something will take overhead. Doing that work upfront will definitely save time. You can put both formats in the same envelop (though JSON will have to be stored as string then), or in a different document as you suggest. Alternatively you could also store it in document-properties. Unfortunately, that only takes XML as well, so you will be storing your JSON as string in there too.
Alternatively, have you profiled the transform to see if there is a particular reason why it slows down so much? Using XSLT versus XQuery for the transform could make a difference too..
HTH!
json:transform-to-json has 3 algorithms optimized for different purposes and will perform with different tradeoffs of flexabilty, fidelity and performance.
"basic" (default) useful only to reverse json:transform-from-json()
"full" - to preserve as much information fidelity as possible, in exchange for a non 'prety' format in many cases.
"custom" - is ... custom ... designed when the json format is fixed or when you want control over the json output at the expense of handling a subset of XML accurately.
Basic and full are the most efficient. However all variants are fairly involved and require completely traversing the XML node tree and creating bottom up a JSON object tree. In ML version 8 this is then translated into the native JSON node structure. In a REST call it would then be serialized as text.
Compared to a direct return of an xml document vi fn:doc("file.xml") there is atleast 2 orders of magnitude more operations involved in the transform case.
For small documents in a REST call that still a small fraction of the total request time, expecialy if the REST call was performing a complex operation itself then returning a small result. Your use case seems the opposite - returning a xml document directly bypasses almost all of the XQuery processing and is sent directly from internal to the output or assigned to a variable.
If that an important use case to optimize, especially if the documents can be large, then saving them as text or binary will be much faster -- at the expense of more storage used. If this is only a variant representation of the xml, try storing the text JSON as binary as it will not incur any indexing overhead.
Otherwise if you need to query over the JSON then in ML7 storing as text gives you simple word queries, in ML8 storing as native JSON gives you structured queries -- both with efficient text serialization.