Kinesis Firehose putting JSON objects in S3 without seperator comma - json

Before sending the data I am using JSON.stringify to the data and it looks like this
{"data": [{"key1": value1, "key2": value2}, {"key1": value1, "key2": value2}]}
But once it passes through AWS API Gateway and Kinesis Firehose puts it to S3 it looks like this
{
"key1": value1,
"key2": value2
}{
"key1": value1,
"key2": value2
}
The seperator comma between the JSON objects are gone but I need it to process data properly.
Template in the API Gateway:
#set($root = $input.path('$'))
{
"DeliveryStreamName": "some-delivery-stream",
"Records": [
#foreach($r in $root.data)
#set($data = "{
""key1"": ""$r.value1"",
""key2"": ""$r.value2""
}")
{
"Data": "$util.base64Encode($data)"
}#if($foreach.hasNext),#end
#end
]
}

I had this same problem recently, and the only answers I was able to find were basically just to add line breaks ("\n") to the end of every JSON message whenever you posted them to the Kinesis stream, or to use a raw JSON decoder method of some sort that can process concatenated JSON objects without delimiters.
I posted a python code solution which can be found over here on a related Stack Overflow post:
https://stackoverflow.com/a/49417680/1546785

One approach you could consider is to configure data processing for your Kinesis Firehose delivery stream by adding a Lambda function as its data processor, which would be executed before finally delivering the data to the S3 bucket.
DeliveryStream:
...
Type: AWS::KinesisFirehose::DeliveryStream
Properties:
DeliveryStreamType: DirectPut
ExtendedS3DestinationConfiguration:
...
BucketARN: !GetAtt MyDeliveryBucket.Arn
ProcessingConfiguration:
Enabled: true
Processors:
- Parameters:
- ParameterName: LambdaArn
ParameterValue: !GetAtt MyTransformDataLambdaFunction.Arn
Type: Lambda
...
And in the Lambda function, ensure that '\n' is appended to the record's JSON string, see below the Lambda function myTransformData.ts in Node.js:
import {
FirehoseTransformationEvent,
FirehoseTransformationEventRecord,
FirehoseTransformationHandler,
FirehoseTransformationResult,
FirehoseTransformationResultRecord,
} from 'aws-lambda';
const createDroppedRecord = (
recordId: string
): FirehoseTransformationResultRecord => {
return {
recordId,
result: 'Dropped',
data: Buffer.from('').toString('base64'),
};
};
const processData = (
payloadStr: string,
record: FirehoseTransformationEventRecord
) => {
let jsonRecord;
// ...
// Process the orginal payload,
// And create the record in JSON
return jsonRecord;
};
const transformRecord = (
record: FirehoseTransformationEventRecord
): FirehoseTransformationResultRecord => {
try {
const payloadStr = Buffer.from(record.data, 'base64').toString();
const jsonRecord = processData(payloadStr, record);
if (!jsonRecord) {
console.error('Error creating json record');
return createDroppedRecord(record.recordId);
}
return {
recordId: record.recordId,
result: 'Ok',
// Ensure that '\n' is appended to the record's JSON string.
data: Buffer.from(JSON.stringify(jsonRecord) + '\n').toString('base64'),
};
} catch (error) {
console.error('Error processing record ${record.recordId}: ', error);
return createDroppedRecord(record.recordId);
}
};
const transformRecords = (
event: FirehoseTransformationEvent
): FirehoseTransformationResult => {
let records: FirehoseTransformationResultRecord[] = [];
for (const record of event.records) {
const transformed = transformRecord(record);
records.push(transformed);
}
return { records };
};
export const handler: FirehoseTransformationHandler = async (
event,
_context
) => {
const transformed = transformRecords(event);
return transformed;
};
Once the newline delimiter is in place, AWS services such as Athena will be able to work properly with the JSON record data in the S3 bucket, not just seeing the first JSON record only.

Once AWS Firehose dumps the JSON objects to s3, it's perfectly possible to read the individual JSON objects from the files.
Using Python you can use the raw_decode function from the json package
from json import JSONDecoder, JSONDecodeError
import re
import json
import boto3
NOT_WHITESPACE = re.compile(r'[^\s]')
def decode_stacked(document, pos=0, decoder=JSONDecoder()):
while True:
match = NOT_WHITESPACE.search(document, pos)
if not match:
return
pos = match.start()
try:
obj, pos = decoder.raw_decode(document, pos)
except JSONDecodeError:
# do something sensible if there's some error
raise
yield obj
s3 = boto3.resource('s3')
obj = s3.Object("my-bukcet", "my-firehose-json-key.json")
file_content = obj.get()['Body'].read()
for obj in decode_stacked(file_content):
print(json.dumps(obj))
# { "key1":value1,"key2":value2}
# { "key1":value1,"key2":value2}
source: https://stackoverflow.com/a/50384432/1771155
Using Glue / Pyspark you can use
import json
rdd = sc.textFile("s3a://my-bucket/my-firehose-file-containing-json-objects")
df = rdd.map(lambda x: json.loads(x)).toDF()
df.show()
source: https://stackoverflow.com/a/62984450/1771155

please use this code to solve your issue
__Author__ = "Soumil Nitin Shah"
import json
import boto3
import base64
class MyHasher(object):
def __init__(self, key):
self.key = key
def get(self):
keys = str(self.key).encode("UTF-8")
keys = base64.b64encode(keys)
keys = keys.decode("UTF-8")
return keys
def lambda_handler(event, context):
output = []
for record in event['records']:
payload = base64.b64decode(record['data'])
"""Get the payload from event bridge and just get data attr """""
serialize_payload = str(json.loads(payload)) + "\n"
hasherHelper = MyHasher(key=serialize_payload)
hash = hasherHelper.get()
output_record = {
'recordId': record['recordId'],
'result': 'Ok',
'data': hash
}
print("output_record", output_record)
output.append(output_record)
return {'records': output}

Related

Serializing scalar JSON in Flutter's Ferry Graphql for flexible Query

I have the following JSON scalar:
"""
The `JSON` scalar type represents JSON values as specified by [ECMA-404](http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf).
"""
scalar JSON
which I am trying to convert since my query is accepting input: JSON. When testing using graphql playground, query is JSON object thus the following works:
query {
carts(where: {
owner:{id: "xxx"}
store:{name: "yyy"}
}) {
id
}
}
# query is the starting from the where: {...}
# build.yaml
# build.yaml
gql_build|schema_builder: #same for gql_build|schema_builder + gql_build|var_builder + ferry_generator|req_builder:
options:
type_overrides:
DateTime:
name: DateTime
JSON:
name: BuiltMap<String, dynamic>
import: 'package:built_collection/built_collection.dart'
gql_build|serializer_builder:
enabled: true
options:
schema: myapp|lib/graphql/schema.graphql
custom_serializers:
- import: 'package:myapp/app/utils/builtmapjson_serializer.dart'
name: BuiltMapJsonSerializer
This is the custom serializer (builtmapjson_serializer.dart)
//// lib/app/utils/builtmapjson_serializer.dart
import 'package:built_collection/built_collection.dart';
import "package:gql_code_builder/src/serializers/json_serializer.dart";
class BuiltMapJsonSerializer extends JsonSerializer<BuiltMap<String, dynamic>> {
#override
BuiltMap<String, dynamic> fromJson(Map<String, dynamic> json) {
print('MyJsonSerializer fromJson: $json');
return BuiltMap.of(json);
}
#override
Map<String, dynamic> toJson(BuiltMap<String, dynamic> operation) {
print('MyJsonSerializer toJson: ${operation.toString()}');
return operation.asMap();
}
}
and the usage:
Future testQuery() async {
Map<String, dynamic> queryMap = {
"where": {
"owner": {
"id": "xxx",
"store": {"name": "yyy"}
}
}
};
final req = GFindCartsReq((b) {
return b..vars.query.addAll(queryMap);
});
var resStream = _graphQLService.client.request(req);
var res = await resStream.first;
print(
'linkExceptions: ${res.linkException}'); // Map: LinkException(Bad state: No serializer for '_InternalLinkedHashMap<String, Map<String, Object>>'.)
}
So whenever I try to query, it is throwing the linkException stated in the comment on the last line of usage. Any idea what should be the way of serializing it?
// Write query like this
query FindCarts($owner_id: String!, $store_name: String!) {
carts(where: {
owner:{id: $owner_id}
store:{name: $store_name}
}) {
id
}
}
// And make request like this:
final req = GFindCartsReq((b) => b..vars.store_name = 'XXX'..vars.owner_id = 'YYY');
I think you may be misunderstanding the use case. they are there to serialize and deserialize the response if you want to end up with a Dart object that's different from graphql representation. you may want to try rereading this section:
https://ferrygraphql.com/docs/custom-scalars/#create-a-custom-serializer
in the example in the docs, the graphql schema returns an int for the timestamp, but we want to actually use a Date object, so that's the purpose of the serializer. it tells ferry to deserialize the int in our responses to a Date so we can use a Date in our dart code. you could still use a json serializer (like in the examples you linked to) but it still would not be in the way you're trying to use it -- it would be if your schema returns a json string and you want to deserialize the json string. for example, in my case, my graphql schema actually does return a "jsonb" type on some objects. in order to handle that, i'm using built_value's default json_object like this:
(
...
type_overrides:
jsonb:
name: JsonObject
import: "package:built_value/json_object.dart"
custom_serializers:
- import: "package:built_value/src/json_object_serializer.dart"
name: JsonObjectSerializer

JSON.Lua json.encode return nil

I'm new to LUA and tried learning coding this language with Garrys Mod.
I want to get the messages from the Garrys Mod chat and send them into a Discord channel with a webhook.
It works, but I tried expanding this project with embeded messages. I need JSON for this and used json.lua as a library.
But as soon as I send a message I retrieve the following error message:
attempt to index global 'json' (a nil value)
The code that causes the error is the following:
json.encode({ {
["embeds"] = {
["description"] = text,
["author"] = {
["name"] = ply:Nick()
},
},
} }),
The complete code:
AddCSLuaFile()
json = require("json")
webhookURL = "https://discordapp.com/api/webhooks/XXX"
local DiscordWebhook = DiscordWebhook or {}
hook.Add( "PlayerSay", "SendMsg", function( ply, text )
t_post = {
content = json.encode({ {
["embeds"] = {
["description"] = text,
["author"] = {
["name"] = ply:Nick()
},
},
} }),
username = "Log",
}
http.Post(webhookURL, t_post)
end )
I hope somebody can help me
Garry's Mod does provide two functions to work with json.
They are:
util.TableToJSON( table table, boolean prettyPrint=false )
and
util.JSONToTable( string json )
There is no need to import json and if I recall correctly it isn't even possible.
For what you want to do you need to build your arguments as a table like this:
local arguments = {
["key"] = "Some value",
["42"] = "Not always the answer",
["deeper"] = {
["my_age"] = 22,
["my_name"] = getMyName()
},
["even more"] = from_some_variable
and then call
local args_as_json = util.TableToJSON(arguments)
Now you can pass args_as_json to your
http.Post( string url, table parameters, function onSuccess=nil, function onFailure=nil, table headers={} )

Parse JSON using groovy script (using JsonSlurper)

I am just two days old to groovy, I need to parse a json file with below structure. My actual idea is I need to run a set of jobs in different environments based on different sequences, so I came up with this format of json as a input file to my groovy
{
"services": [{
"UI-Service": [{
"file-location": "/in/my/server/location",
"script-names": "daily-batch,weekly-batch,bi-weekly-batch",
"seq1": "daily-batch,weekly-batch",
"seq2": "daily-batch,weekly-batch,bi-weekly-batch",
"DEST-ENVT_seq1": ["DEV1", "DEV2", "QA1", "QA2"],
"DEST-ENVT_seq2": ["DEV3", "DEV4", "QA3", "QA4"]
}]
}, {
"Mobile-Service": [{
"file-location": "/in/my/server/location",
"script-names": "daily-batch,weekly-batch,bi-weekly-batch",
"seq1": "daily-batch,weekly-batch",
"seq2": "daily-batch,weekly-batch,bi-weekly-batch",
"DEST-ENVT_seq1": ["DEV1", "DEV2", "QA1", "QA2"],
"DEST-ENVT_seq2": ["DEV3", "DEV4", "QA3", "QA4"]
}]
}]
}
I tried below script for parsing the json
def jsonSlurper = new JsonSlurper()
//def reader = new BufferedReader(new InputStreamReader(new FileInputStream("in/my/location/config.json"),"UTF-8"))
//def data = jsonSlurper.parse(reader)
File file = new File("in/my/location/config.json")
def data = jsonSlurper.parse(file)
try{
Map jsonResult = (Map) data;
Map compService = (Map) jsonResult.get("services");
String name = (String) compService.get("UI-Service");
assert name.equals("file-location");
}catch (E){
println Exception
}
I need to first read all the services (UI-service, Mobile-Service, etc..) then their elements and their value
Or you could do something like:
new JsonSlurper().parseText(jsonTxt).services*.each { serviceName, elements ->
println serviceName
elements*.each { name, value ->
println " $name = $value"
}
}
But it depends what you want (and you don't really explain in the question)
Example for reading from JsonParser object:
def data = jsonSlurper.parse(file)
data.services.each{
def serviceName = it.keySet()
println "**** key:${serviceName} ******"
it.each{ k, v ->
println "element name: ${k}, element value: ${v}"
}
}
other options:
println data.services[0].get("UI-Service")["file-location"]
println data.services[1].get("Mobile-Service").seq1

how to put pretty JSON into REST api

I am trying to build a REST service which accepts XML and convert it into JSON and call external Service which accepts JSON and put my JSON into it. I am able to put the json without pretty but I want to PUT the json in pretty format. Please suggest how to do, below is my code ...
package com.mypackge
import grails.converters.JSON
import grails.rest.RestfulController
import grails.plugins.rest.client.RestBuilder
class RestCustomerController extends RestfulController {
/*
static responseFormats = ['json', 'xml']
RestCustomerController() {
super(Customer)
}
*/
def index() {
convertXmlToJson()
}
def myJson = ''
def convertXmlToJson() {
def xml = ''' <Customer>
<customerid>9999999999999</customerid>
<ssn>8888</ssn>
<taxid>8888</taxid>
<address>
<addressline1>Yamber Ln</addressline1>
<addressline1>8664 SE</addressline1>
<city>CCCCC</city>
<state>CC</state>
<zipcode>97679</zipcode>
</address>
<firstname>Scott</firstname>
<middlename></middlename>
<lastname>David</lastname>
<account>
<accountno>576-294738943</accountno>
<accounttype>Lease</accounttype>
<accountsubtype></accountsubtype>
<accountstatus>complete</accountstatus>
<firstname>Scott</firstname>
<middlename></middlename>
<lastname>David</lastname>
<businessname></businessname>
<billingsystem>yoiuhn</billingsystem>
<brand></brand>
<plantype></plantype>
<billingaddress>
<addressline1>Yamber Ln</addressline1>
<addressline1>8664 SE </addressline1>
<city>CCCCC</city>
<state>CC</state>
<zipcode>97679</zipcode>
</billingaddress>
<job>
<jobid>8276437463728</jobid>
<jobstatus>SUCCESS</jobstatus>
</job>
</account>
</Customer>
'''.stripMargin()
// Parse it
def parsed = new XmlParser().parseText( xml )
def myId = parsed.customerid.text()
// Deal with each node:
def handle
handle = { node ->
if( node instanceof String ) {
node
}
else {
[ (node.name()): node.collect( handle ) ]
}
}
// Convert it to a Map containing a List of Maps
def jsonObject = [ (parsed.name()): parsed.collect { node ->
[ (node.name()): node.collect( handle ) ]
} ]
def json = new groovy.json.JsonBuilder(jsonObject) //.toPrettyString()
// Check it's what we expected
def mmyresp
try{
mmyresp = putRequest(myId,json)
}catch(Exception e) {
mmyresp = 'Please Validate JSON ....'
}
}
def putRequest(String id, JSON myJson) {
String url = "http://foo.com/customer/external/"+id
def rest = new RestBuilder()
def resp = rest.put(url){
contentType "application/json"
json{
myJson
}
}
return resp
}
}
The record is added in below format ...
{"Customer":[{"customerid":["9999999999999"]},{"ssn":["8888"]},
{"taxid":["8888"]},{"address":[{"addressline1":["Yamber Ln"]},
{"addressline1":["8664 SE"]},{"city":["CCCCC"]},{"state":["CC"]},{"zipcode":["97679"]}]},
{"firstname":["Scott"]},{"middlename":[]},{"lastname":["David"]},{"businessname":[]},
{"account":[{"accountno":["576-294738943"]},{"accounttype":["Lease"]},{"accountsubtype":[]},
{"accountstatus":["complete"]},{"firstname":["Scott"]},{"middlename":[]},{"lastname":["David"]},
{"businessname":[]},{"billingsystem":["yoiuhn"]},{"brand":[]},{"plantype":[]},
{"billingaddress":[{"addressline1":["Yamber Ln"]},{"addressline1":["8664 SE"]},
{"city":["CCCCC"]},{"state":["CC"]},{"zipcode":["97679"]}]},{"job":[{"jobid":["8276437463728"]},
,{"jobstatus":["SUCCESS"]}]}]}]}
But I want this to be inserted in pretty format. I tried .toPrettyString() but got casting exception when try to put as json. I am trying the REST services for the first time, not sure where I am doing wrong. Please suggest me on this.
You should set following field in you Config.groovy.
grails.converters.default.pretty.print = true
This will pretty print for both the xml and json.
you could optionally set it up for xml or json only like below:
For json:
grails.converters.json.pretty.print = true
For xml
grails.converters.xml.pretty.print = true
A sample of Config.groovy entry is:
environments {
development {
grails.converters.json.pretty.print = true
}
}
Hope it helps!!!
For Grails 4, try this:
def json = x as JSON
json.prettyPrint = true;
log.info(json.toString())

<django.utils.functional.SimpleLazyObject object at 0x2b4d1fe47650> is not JSON serializable

I have a Django app. In the view I call another function (in stats.py) which then makes a HTTP POST.
views.py
from stats import Stat
a = Stat(example="12345")
a.use(id='query')
stats.py
self.data = { example : "12345" }
req = urllib2.Request(api_url)
req.add_header('Content-Type', 'application/json')
response = urllib2.urlopen(req, json.dumps(self.data))
The problem that occurs is that I get the error,
<django.utils.functional.SimpleLazyObject object at 0x2b4d1fe47650> is not JSON serializable
Django Traceback
From looking at the Django Traceback I get the following,
/prod/tools/lx/views.py in update_input
a.use(id='query')
...
/prod/tools/main/stats.py in log_use
response = urllib2.urlopen(req, json.dumps(self.data))
...
/usr/local/lib/python2.7/json/__init__.py in dumps
return _default_encoder.encode(obj)
...
/usr/local/lib/python2.7/json/encoder.py in encode
chunks = self.iterencode(o, _one_shot=True)
...
/usr/local/lib/python2.7/json/encoder.py in iterencode
return _iterencode(o, 0)
...
/usr/local/lib/python2.7/json/encoder.py in default
raise TypeError(repr(o) + " is not JSON serializable")
...
Any ideas ?
Thanks,
Try this using urllib
import urllib
...
self.data = { example : "12345", 'Content-type':'application/json' }
self.data = urllib.urlencode(self.data)
req = urllib2.Request(api_url, self.data)
response = urllib2.urlopen(req)