How to read data from Kinesis stream using AWS CLI? - json

I have a Kinesis stream in AWS and can send data to it (JSON) using kinesis command and can get it back from a stream with:
SHARD_ITERATOR=$(aws kinesis get-shard-iterator --shard-id shardId-000000000000 --shard-iterator-type TRIM_HORIZON --stream-name mystream --query 'ShardIterator' --profile myprofile)
aws kinesis get-records --shard-iterator $SHARD_ITERATOR --profile myprofile
Output of this looks like something like:
HsKCQkidmlkZW9Tb3VyY2UiOiBbCgkJCXsKCQkJCSJicmFuZGluZyI6IHt9LAoJCQkJInByb21vUG9vbCI6IFtdLAoJCQkJImlkIjogbnVsbAoJCQl9CgkJXSwKCQkiaW1hZ2VTb3VyY2UiOiB7fSwKCQkibWV0YWRhdGFBcHByb3ZlZCI6IHRydWUsCgkJImR1ZURhdGUiOiAxNTgzMzEyNTA0ODAzLAoJCSJwcm9maWxlIjogewoJCQkiY29tcG9uZW50Q291bnQiOiAwLAoJCQkibmFtZSI6ICJTUUVfQVRfUFJPRklMRSIsCgkJCSJpZCI6ICJTUUVfQVRfUFJPRklMRV9JRCIsCgkJCSJwYWNrYWdlQ291bnQiOiAwLAoJCQkicGFja2FnZXMiOiBbCgkJCQl7CgkJCQkJIm5hbWUiOiAiUEVBQ09DSy1MVEEiLAoJCQkJCSJpZCI6ICJmZDk5NTRmZC03NDYwLTRjZjItOTU5Ni05YzBhMjcxNTViODgiCgkJCQl9CgkJCV0KCQl9LAoJCSJ3b3JrT3JkZXJJZCI6ICJTUUVfQVRfSk9CX1NVQk1JU1
How do I get actual JSON message in raw format (to look as JSON) - same way as it was in original when I sent it?
Thanks

As per the docs, you need to use a Base64 decoding tool or use KCL library to get the data in the format it was sent:
The first thing you'll likely notice about your record in this part of the tutorial is that the data appears to be garbage –; it's not the clear text testdata we sent. This is due to the way put-record uses Base64 encoding to allow you to send binary data. However, the Kinesis Data Streams support in the AWS CLI does not provide Base64 decoding because Base64 decoding to raw binary content printed to stdout can lead to undesired behavior and potential security issues on certain platforms and terminals. If you use a Base64 decoder (for example, https://www.base64decode.org/) to manually decode dGVzdGRhdGE= you will see that it is, in fact, testdata. This is sufficient for the sake of this tutorial because, in practice, the AWS CLI is rarely used to consume data, but more often to monitor the state of the stream and obtain information, as shown previously (describe-stream and list-streams). Future tutorials will show you how to build production-quality consumer applications using the Kinesis Client Library (KCL), where Base64 is taken care of for you. For more information about the KCL, see Developing KCL 1.x Consumers.

on unix, you can use the base64 --decode command to decode the base64 encoded kinesis record data.
for example, to decode the data of the first record:
# define the name of the stream you want to read
KINESIS_STREAM_NAME='__your_stream_name_goes_here__';
# define the shard iterator to use
SHARD_ITERATOR=$(aws kinesis get-shard-iterator --shard-id shardId-000000000000 --shard-iterator-type TRIM_HORIZON --stream-name $KINESIS_STREAM_NAME --query 'ShardIterator');
# read the records, use `jq` to grab the data of the first record, and base64 decode it
aws kinesis get-records --shard-iterator $SHARD_ITERATOR | jq -r '.Records[0].Data' | base64 --decode

Related

Microsoft Azure iot-hub: The measured senosor values sent via device to iot-hub cannot be read stored Json data

We are using a data acquisition system as a device and send some signals values via MQTT protocol into a container which is assigned to an iot-hub. The connection works well between device and iot-hub, and we receive some JSON data. When we open a JSON data, We cannot read the temperature values in "Body" inside the JSON data, since they are encoded. I would be thankful if you tell us, how we should automatically convert the JSON data to a proper format so that we could read the values in numbers?
Please find below three of our code's lines in JSON Data. The rest of the lines are the same, but they are encoded differently.
{"EnqueuedTimeUtc":"2022-02-09T10:00:30.8600000Z","Properties":{"Sensor":""},"SystemProperties":{"connectionDeviceId":"Iba","connectionAuthMethod":"{"scope":"device","type":"sas","issuer":"iothub","acceptingIpFilterRule":null}","connectionDeviceGenerationId":"637799949903534194","enqueuedTime":"2022-02-09T10:00:30.8600000Z"},"Body":"My42MjI3NTQ="}
{"EnqueuedTimeUtc":"2022-02-09T10:00:30.8750000Z","Properties":{"Sensor":""},"SystemProperties":{"connectionDeviceId":"Iba","connectionAuthMethod":"{"scope":"device","type":"sas","issuer":"iothub","acceptingIpFilterRule":null}","connectionDeviceGenerationId":"637799949903534194","enqueuedTime":"2022-02-09T10:00:30.8750000Z"},"Body":"My42ODEyNDY="}
{"EnqueuedTimeUtc":"2022-02-09T10:00:30.9070000Z","Properties":{"Sensor":""},"SystemProperties":{"connectionDeviceId":"Iba","connectionAuthMethod":"{"scope":"device","type":"sas","issuer":"iothub","acceptingIpFilterRule":null}","connectionDeviceGenerationId":"637799949903534194","enqueuedTime":"2022-02-09T10:00:30.9070000Z"},"Body":"My43Mzk1OTI="}
Thanks in advance!
Br
Masoud
you should add to the message topic two parameters such as the content-type (ct) and content-encoding (ce) like is shown in the following example:
devices/device1/messages/events/$.ct=application%2Fjson&$.ce=utf-8

Kafka s3 sink connector - many jsons in one json

I'm having an issue with the s3 sink connector. I set my flush-size to 3 (for tests) and my s3 is receiving properly the json file. But when I open the json, I don't have a list of jsons, I only have one after other. Is there any way to get "properly" the jsons in a list when they are sent to my bucket? I want to try a "good way" to solve that, else I'll fix this in a lambda function (but I wouldn't like to do it...)
What I have:
{"before":null,"after":{"id":10230,"nome":"John","idade":30,"cidade":"São Paulo","estado":"SP","sexo":"M"}
{"before":null,"after":{"id":10231,"nome":"Alan","idade":30,"cidade":"São Paulo","estado":"SP","sexo":"M"}
{"before":null,"after":{"id":10232,"nome":"Rodrigo","idade":30,"cidade":"São Paulo","estado":"SP","sexo":"M"}
What I want
[{"before":null,"after":{"id":10230,"nome":"John","idade":30,"cidade":"São Paulo","estado":"SP","sexo":"M"},
{"before":null,"after":{"id":10231,"nome":"Alan","idade":30,"cidade":"São Paulo","estado":"SP","sexo":"M"},
{"before":null,"after":{"id":10232,"nome":"Rodrigo","idade":30,"cidade":"São Paulo","estado":"SP","sexo":"M"}]
The S3 sink connector sends each message to S3 as its own message.
You're wanting to do something different, which is to batch messages together into discrete array objects.
To do this you'll need some kind of stream processing. For example, you could write a Kafka Streams processor that would process the topic and merge each batch of x messages into one message holding an array as you want.
Not clear how you expect to read these files other than manually, but most analytical tools that read S3 buckets (Hive, Athena, Spark, Presto, etc), all expect JSONLines

How can I store JSON in Drone and write it to a file without it getting malformed?

Here's the context of what I'm trying to do.
I would like have a Drone step to run database migrations against a Google Cloud SQL Postgres instance.
I need to use Cloud SQL Proxy in order to access the database. Cloud SQL Proxy requires you provide a credential file to the proxy.
The problem I'm having is that when I try to echo or printf the environment variable to a file (as suggested here) the JSON comes out malformed.
Note: I've tried adding the JSON via Drone GUI and Drone CLI.
The best solution I found to this problem is to simply base64 encode the JSON before putting it into Drone.
Decode the base64 when you need it in your step.
Example commands:
Encode: base64 data.txt > data.b64
Decode: echo $CREDS_B64 | base64 --decode > sql-deploy-creds.json

How to send a .csv file from Local Machine(Getfile) to Hive(Puthiveql) in Apache Nifi using CURL?

I want to send .csv file or mysql table from Local Machine(GetFile) to Hive(PutHiveql) in Apache Nifi using CURL.Please let me know if there is any command to do this Using Curl.
The question doesn't make sense as formed. If you want to ingest the content of a CSV file into Apache NiFi, route and transform it, and eventually write it to a Hive table, your flow would be as follows:
GetFile -> ConvertRecord (CSVReader to AvroRecordSetWriter) -> [Optional processors] -> PutHiveStreaming
PutHiveStreaming expects the incoming flowfile content to be in Avro format, so the ConvertRecord processor will translate the ingested data into the correct syntax.
I am unsure of how cURL fits into this question at all. NiFi does provide the InvokeHTTP processor to allow arbitrary outgoing HTTP requests, as well as the ExecuteStreamCommand processor to invoke arbitrary command-line activity, including cURL. I don't know why you would need to invoke either in this flow. If you are asking how you could trigger the entire flow via an external cURL command, NiFi provides both ListenHTTP and HandleHTTPRequest processors which start local web servers and listen for incoming HTTP requests. You can connect these processors to a pair of Wait/Notify processors to control the flow of the ingested file data, as GetFile is a source processor, and does not allow incoming flowfiles to trigger it.

Logging Multiple JSON Objects to a Single File - File Format

I have a solution where I need to be able to log multiple JSON Objects to a file. Essentially doing one log file per day. What is the easiest way to write (and later read) these from a single file?
How does MongoDB handle this with BSON? What does it use as a separator between "records"?
Does Protocol Buffers, BSON, MessagePack, etc... offer compression and the record concept? Compression would be a nice benefit.
With protocol buffers you could define the message as follows:
Message JSONObject {
required string JSON = 1;
}
Message DailyJSONLog {
repeated JSONObject JSON = 1;
}
This way you would just read the file from memory and deserialize it. Its essentially the same way for serializing them as well. Once you have the file (serialized DailyJSONLog) on disk, you can easily just append serialized JSONObjects to the end of that file (since the DailyJSONLog message is very simply a repeated field).
The only issue with this is if you have a LOT of messages each day or if you want to start at a certain location during the day (you're not able to easily get to the middle (or arbitrary) of the repeated list).
I've gotten around this by taking a JSONObject, serializing it and then base64 encoding it. I'd store these to a file separating by a new line. This allows you to very easily see how many records are in each file, gain access to any arbitrary JSON object within the file and to trivially keep expanding the file (you can expand the above 'repeated' message as well pretty trivially but it is a one way easy operation...)
Compression is a different topic. Protocol Buffers will not compress strings. If you were to define a pb message to match your JSON message, then you will get the benefit of having pb possibly 'compress' any integers into their [varint][1] encoded format. You will get 'less' compression if you try above base64 encoding route as well.