Saving JSON data to DynamoDB - json

I am newbie on AWS side working on a AWS IOT project where all devices updates there state and send a json to the AWS IOT.A rule is there to save data to dynamodb.I have created a table in dynamodb.
I am sending below data to the AWS,
{
"state": {
"reported": {
"color": "Blue",
"mac":"123:123"
}
}
}
But on Dynamodb it is saving three items ,first for state another for current and one for metadata.
I want to save only data which is coming for state.Is there any condition I have to write for this.

Instead of creating a rule to write directly to DynamoDB, which IMHO, is not a good practice, have the rule trigger a Lambda function, which then processes the payload ( maybe even does some error checking ) and writes to DynamoDB.
I don't believe there is any way to configure how you want the data mapped to DynamoDB so you need something like lambda to map it.
Longer term, if you need to change your schema ( or even change the database ), you can change the lambda to do something else.

Related

Importing Well-Structured JSON Data into ElasticSearch via Cloud Watch

Is is there known science for getting JSON data logged via Cloud Watch imported into an Elasticsearch instance as well structured JSON?
That is -- I'm logging JSON data during the execution of an Amazon Lambda function.
This data is available via Amazon's Cloud Watch service.
I've been able to import this data into an elastic search instance using functionbeat, but the data comes in as an unstructured message.
"_source" : {
"#timestamp" : "xxx",
"owner" : "xxx",
"message_type" : "DATA_MESSAGE",
"cloud" : {
"provider" : "aws"
},
"message" : ""xxx xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx INFO {
foo: true,
duration_us: 19418,
bar: 'BAZ',
duration_ms: 19
}
""",
What I'm trying to do is get a document indexed into elastic that has a foo field, duration_us field, bar field, etc. Instead of one that has a plain text message field.
It seems like there are a few different ways to do this, but I'm wondering if there's a well trod path for this sort of thing using elastic's default tooling, or if I'm doomed to one more one-off hack.
Functionbeat is a good starting point and will allow you to keep it as "serverless" as possible.
To process the JSON, you can use the decode_json_fields processor.
The problem is that your message isn't really JSON though. Possible solutions I could think of:
A dissect processor that extracts the JSON message to pass it on to the decode_json_fields — both in the Functionbeat. I'm wondering if trim_chars couldn't be abused for that — trim any possible characters except for curly braces.
If that is not enough, you could do all the processing in Elasticsearch's Ingest pipeline where you probably stitch this together with a Grok processor and then the JSON processor.
Only log a JSON message if you can to make your life simpler; potentially move the log level into the JSON structure.

Retrieving forecast data from OpenWeatherMap in FIWARE ORION

I am trying to get weather forecasts data from OpenWeatherMap and integrate them in Orion by performing a registeration request.
I was able to register and get the API key from OpenWeatherMap, however, the latter returns a JSON file with all the data inside, which is not supported by ORION.
I have followed the step by step tutorial https://fiware-tutorials.readthedocs.io/en/latest/context-providers/index.html#context-provider-ngsi-proxy where they have acquired the data from OpenWeatherMap using NGSI proxy, an API key is required to be indicated in the docker-compose file as an environment variable, however, the data acquired is the "current data" and not forecast and also specific to Berlin.
I have tried to access the files inside the container "fiware/tutorials.context-provider" and try to modify and match the parameters to my needs but I feel like I am taking a long blocked path.
I don't think that's even considered as good practice but I have run out of ideas :(
Can anyone suggest how I could bring the forecast data to Orion and register it as a context provider?
Thank you in advance.
I imagine you aim to implement a context provider, able to speak NGSI with Orion.
OpenWeatherMap surely doesn't implement NGSI ...
If you have the data from OpenWeatherMap, as a JSON string, perhaps you should parse the JSON and create your entities using some select key-values from the parsed OpenWeatherMap? Save the entity (entities) locally and then register those keys in Orion.
Alternatively (easier but I wouldn't recommend it), create local entities with the entire OpenWeatherMap data as the value of an attribute of the entity:
{
"id": "id-from-OpenWeatherMap",
"type": "OpenWeatherMap",
"weatherData": {
"value":
...
}
...
}
Then you register id/weatherData in Orion.

AWS Glue: Import JSON from Datalake(S3) with mixed data

I'm currently struggling to understand how to create a data catalog of our data lake (=Source).
Background:
We have an event-driven architecture and started to store all events produced by our application to a data lake (S3 Bucket). Before the events are stored we sanitize them (remove sensitive information) and add an envelope around each event with some general data:
event origin (which application generated the event)
event type (what kind of event was generated)
timestamp (when was the event generated)
...
With Kinesis Streams and Firehose, we batch those events together and store them as a JSON file in an S3 bucket. The bucket is structured like this:
/////
In there, we store the batched events with the envelope as JSON files. That means one JSON file contains multiple events:
{
"origin": "hummingbird",
"type": "AuthenticationFailed",
"timestamp": "2019-06-30T18:24:13.868Z",
"correlation_id": "2ff0c077-542d-4307-a58b-d6afe34ba748",
"data": {
...
}
}
{
"origin": "hummingbird",
"type": "PostingCreated",
"timestamp": "2019-06-30T18:24:13.868Z",
"correlation_id": "xxxx",
"data": {
...
}
}
The data object contains specific data of the events.
Now I thought I can use AWS Glue to hook into the raw data and use ETL Jobs to aggregate the event data. As I understood I need to have a data catalog for my source data and here is where I'm struggling with since the JSON always contains different events which are batched together. The standard "Crawler" cannot handle this..well it does but it creates non-sense schemas based on every JSON file.
What I wanted to achieve:
Parse through the data lake to filter out events that I'm interested in
Use the events that I'm interested in and do some transformation/aggregation/calculation with it
Store results into our current Analytics RDS or wherever (enough for our purposes right now)
Parse through newly events on a daily basis and insert/append/update that to our analytics rds
My Questions I have:
What's the best way to use glue with our data lake?
Are there possible ways to use crawlers with custom classifiers and some sort of filter together with our datalake?
Do I need to transform the data even before, to actually be able to use AWS glue?
let me give it a try.
Parse through the data lake to filter out events that I'm interested
in
Use the events that I'm interested in and do some
transformation/aggregation/calculation with it
--> You can flatten the json for each event, then export it into different S3 bucket. Refer to some python code here https://aws.amazon.com/blogs/big-data/simplify-querying-nested-json-with-the-aws-glue-relationalize-transform/
--> use Glue to crawl your new bucket & generate a new table schema, then in Athena you should be able to see it & do your filter/query/aggregation on top of the table. Once you're happy with the transformed data, you can further import it into Redshift or RDS.
Store results into our current Analytics RDS or wherever (enough for
our purposes right now)
--> From Glue Catalog above, add Redshift/RDS connection, then use Python Spark (need some basic knowledge on working with dataframe) to load data into Redshift or RDS.
https://www.mssqltips.com/sqlservertip/5952/read-enrich-and-transform-data-with-aws-glue-service/
Parse through newly events on a daily basis and insert/append/update
that to our analytics rds
--> You can schedule your Glue crawler to discover new data from the new bucket.
Alternatively, Lambda is also a good option for this. Can use S3 object creation (thenew bucket with flattened json) to trigger a Lambda to , pre-process, ETL & then insert into Redshift/RDS (Using JDBC driver)

How to update json field in Firebase DB with JMeter HTTP Request

I'm working with JMeter to make some HTTP Requests to my Firebase Database. I am able to create json data with a regular request, as well as with a CSV file. I'm wondering if it's possible to update, or add to, a json object.
My json data looks something like what is below. Let's say I wanted to add a boolean node called "sold", to which I could make equal to true or false. Could I create it within that json object? If so, could I also make it so that only fields with a specific "name" get updated?
{
"Price": "5.00",
"name": "buyer#gmail.com",
"seller_name": "seller#gmail.com",
"time": 1496893589683,
}
Looking into Updating Data with PATCH chapter of Saving Data article you can update a single field using HTTP PATCH Method.
JMeter supports HTTP PATCH method since version 2.8 so you should be in a position to use it as well in your test.

breeze sequelize: get entities in the server + transaction

We have been using breeze for the past year in our project and are quite happy with it. Previously our server was a asp.net application with entity framework. Now we are moving toward node.js and mysql. We have installed breeze-sequelize package and everything is working fine.
The documentation breeze Node server w/sequelize, says that the result of a query is a promise with the resolved result formatted so that it can be directly returned to the breeze client. And this is effectively what happens: the result of a query is just a plain old json object with values from the database, not entities the way breeze understand entities.
My question is this: I have a scenario where a heavy server process is instantiated by the client. No data is expected in the client. The process will run entirely on the server, make queries, modify data and then save them, all in the server. How can I transform those plain old json objects into entities during my process, I would like to know for example what object has been modified, what have been deleted and send appropriate message to the client.
Of course I could create myself a kind of mechanism that will track changes in my objects but I would rather rely on the breeze manager for that.
Should I create a breeze manager in the server?
var manager = new breeze.EntityManager(...)
A second concern is: with breeze-sequelize I do we handle transactions? start-transaction, complte-transaction and rollback-transaction?
Thank you for your input
To turn JSON with the attribute values of a Sequelize instance into an actual Instance use:
Model.build({ /* attributes-hash */ }, { isNewRecord: false })
For an example demonstrating this see here. The Sequelize Instance documentation (here, see especially the function changed) might also be helpful. I am not familiar with Breeze and might have misunderstood your question here, does this help?