Athena + Glue with MongoDB not working with fields in camelCase - json

I'm using Athena + Glue to ask for some data in MongoDB. I am getting the data without any problem (even from the structures or arrays). I'm using the Glue console (the UI, nothing integrated through python or something).
Everything goes well until we ask for fields in cameCase. I cannot get information from the columns/attributes in camelCase.
According to this recent article from amazon site, it seems that everything should work when using the SerDe org.openx.data.jsonserde.JsonSerDe, and setting some properties. This is my SerDe configuration from the Glue console. Please, note that I have removed the paths property (I don't know if it implies something related).
"SerDeInfo": {
"serializationLib": "org.openx.data.jsonserde.JsonSerDe",
"parameters": {
"case.insensitive": "false",
"mapping.camelcaseproperty": "camelCaseProperty",
... other properties
}
},
where camelCaseProperty is a property of a document in our mongodb.
According to the comment in this SO question, it seems that maybe they are using an old SerDe versions. However, it seems a bit strange that AWS people wouldn't have fixed yet this issue which happened 4 years ago.
I am pretty sure that my issue is related with this problem because I manually changed the name of the attribute of one mongo document to lowercase and Athena showed me the information only for this row. It seems that it is duet to the name matches with the glue schema.
Nevertheless, I'm open to new approaches because this is only a symptom and I don't actually know where is the problem.

Related

Azure Data Explorer / Kusto JSON Ingestion Transform (GetPathElement)

I have some troubles understanding the ingestion of JSON entries (from Event Hubs) into Kusto / ADX. I seem not to be able to get the GetPathElement transform statement to work. I'd expected that something like
[{"test":"name","path":"$.content.something","transform":"GetPathElement(0)"}]
would work (according to documentation). Unfortunately, I get the (imo) undocumented error:
Value 'GetPathElement(0)' used in a switch/case is invalid
Can someone give me a hint/example on how GetPathElement should work?
This transformation option indeed doesn't work.
I am not sure this transformation meant to do what you are expecting from it.
If it had worked correction, it would put the constant word 'something' to each ingested row.
Is this what you meant to get? If yes, you can use 'ConstValue' property of the mapping.
If you mean anything different, please, explain in more details.
can you share your ingest statement.
also you can do a 1-click ingestion which, do not complete the setup but just point it to the JSON file and it will give you an ingest commands to run with Table schema commands and etc which you can then run in kusto explorer or through Azure Portal.
https://learn.microsoft.com/en-us/azure/data-explorer/ingest-data-one-click

Processing json data from kafka using structured streaming

I want to convert incoming JSON data from Kafka into a dataframe.
I am using structured streaming with Scala 2.12
Most people add a hard coded schema, but if the json can have additional fields, it requires changing the code base every-time, which is tedious.
One approach is to write it into a file and infer it with but I rather avoid doing that.
Is there any other way to approach this problem?
Edit: Found a way to turn a json string into a dataframe but cant extract it from the stream source, it is possible to extract it?
One way is to store the schema itself in the message headers (not in the key or value).
Though, this increases message size, it will be easy to parse the JSON value without the need for any external resource like a file or a schema registry.
New messages can have new schemas while at the same time old messages can still be processed using their old schema itself, because the schema is within the message itself.
Alternatively, you can version the schemas and include an id for every schema in the message headers (or) a magic byte in the key or value and infer the schema from there.
This approach is followed by Confluent Schema registry. It allows you to basically go through different versions of same schema and see how your schema has evolved over time.
Read the data as string and then convert it to map[string,String], this way you can process the any json without even knowing its schema
based on JavaTechnical answer , the best approach would be to use a schema registry and
avro data instead of json, there is no going around hardcoding a schema (for now).
include your schema name and id as a header and use them to read the schema from the schema registry.
use the from_avro fucntion to turn that data into a df!

How can I convert postgres database to JSON file?

I'm working on flutter mobile application which should be attached (connected) with the web platform that I developed with laravel, I want to generate JSON file from Postgres dynamically , I mean when I update any thing in the database it will be updated in the mobile also and I need to display the data in the mobile application.
I followed this tutorial and I understood that I must convert the database or the tables into json file. How I'm going to do it please, it's the first time I'm working with Flutter and json.
https://www.youtube.com/watch?v=m7b7_Nq7XSs&list=PLK7ZDJTUghFAmRR4mueiai7zq1RJfMQ62&index=11&t=1s
If you're just getting started, please take your time and get yourself familiar with the basics and how flutter treats the data coming from a database.
Also something you should have been reading and understood is JSON and serialization.
Based on that it is not advisable that you retrieve JSON right from the database. Instead, JSON serialization happens in one way or another inside flutter using one of the recommended approaches.
Specifically for working with PostgreSQL, there seems to be a decent tutorial.
Please keep in mind that what you actually have asked for here ("... database to JSON file") indicates you really want a file output, which is completely contrary to the API that you're going to provide with flutter.
Of course it is possible to query PostgreSQL and get the result already in JSON format, but that then also means you won't be able to work with the data model inside flutter.
However, if you finally know what you are doing, here is a way to get the result of any PostgreSQL query directly as JSON:
SELECT json_agg(t) FROM (
SELECT ...whatever you can think of...
) AS t;
If you are using laravel version greater than 5, you can use API resources to create API and connection to PostgreSQL(https://laravel.com/docs/5.7/database). It is so simple to create API using laravel API resources. Then in Flutter only thing you have to do is request endpoints you have created using laravel.

breeze sequelize: get entities in the server + transaction

We have been using breeze for the past year in our project and are quite happy with it. Previously our server was a asp.net application with entity framework. Now we are moving toward node.js and mysql. We have installed breeze-sequelize package and everything is working fine.
The documentation breeze Node server w/sequelize, says that the result of a query is a promise with the resolved result formatted so that it can be directly returned to the breeze client. And this is effectively what happens: the result of a query is just a plain old json object with values from the database, not entities the way breeze understand entities.
My question is this: I have a scenario where a heavy server process is instantiated by the client. No data is expected in the client. The process will run entirely on the server, make queries, modify data and then save them, all in the server. How can I transform those plain old json objects into entities during my process, I would like to know for example what object has been modified, what have been deleted and send appropriate message to the client.
Of course I could create myself a kind of mechanism that will track changes in my objects but I would rather rely on the breeze manager for that.
Should I create a breeze manager in the server?
var manager = new breeze.EntityManager(...)
A second concern is: with breeze-sequelize I do we handle transactions? start-transaction, complte-transaction and rollback-transaction?
Thank you for your input
To turn JSON with the attribute values of a Sequelize instance into an actual Instance use:
Model.build({ /* attributes-hash */ }, { isNewRecord: false })
For an example demonstrating this see here. The Sequelize Instance documentation (here, see especially the function changed) might also be helpful. I am not familiar with Breeze and might have misunderstood your question here, does this help?

What is JSONC? Are JSONC and JSON-C different?

Recently came across the term JSONC in a YouTube API. I browsed the Web, but found nothing much about it. Can someone explain whether these two are the same or different?
There is also jsonc aka "JSON with comments", created by Microsoft and used by Visual Studio Code. The logic for it can be found here, alas without exhaustive specification (though I'd like to be proven wrong on this).
On top of that there is this project with an actual specification which is also called jsonc, but also does far more than just adding comments.
While there definitely is a use for these technologies, some critical thinking is advised. JSON containing comments is not JSON.
JSON-C seems to just be a variation of JSON mainly targeted at C development. I.e., from the open source docs, "JSON-C implements a reference counting object model that allows you to easily construct JSON objects in C, output them as JSON formatted strings and parse JSON formatted strings back into the C representation of JSON objects."ref^1
From the YouTube API perspective (specifically, version 2, not the new version 3), The JSON-C response is just a condensed version of the JSON response (removing "duplicate, irrelevant or easily calculated values").ref^2
Why would the JSON response have "duplicate, irrelevant or easily calculated values" values anyway? Because it is converting the original ATOM XML format directly to JSON in a loseless conversion. You can find out more details here.
However, I would suggest using version 3 of the YouTube Data API. It is much easier to use. =)
JSONC is an open-source, Javascript API created by Tomás Corral Casas for reducing the size of the amount of JSON data that is transported between clients and servers. It uses two different approaches to achieve this, JSONC.compress and JSONC.pack. More information can be found on the JSONC Github page:
https://github.com/tcorral/JSONC