Multiple JSON payload to CSV file - csv

i have a task to generate CSV file from multiple JSON payloads (2). Below are my sample data providing for understanding purpose
- Payload-1
[
{
"id": "Run",
"errorMessage": "Cannot Run"
},
{
"id": "Walk",
"errorMessage": "Cannot Walk"
}
]
- Payload-2 (**Source Input**) in flowVars
[
{
"Action1": "Run",
"Action2": ""
},
{
"Action1": "",
"Action2": "Walk"
},
{
"Action1": "Sleep",
"Action2": ""
}
]
Now, i have to generate CSV file with one extra column to Source Input with ErrorMessage on one condition basis, where the id in payload 1 matches with sourceInput field then errorMessage should assign to that requested field and generate a CSV file as a output
i had tried with the below dataweave
%dw 1.0
%output application/csv header=true
---
flowVars.InputData map (val,index)->{
Action1: val.Action1,
Action2: val.Action2,
(
payload filter ($.id == val.Action1 or $.id == val.Action2) map (val2,index) -> {
ErrorMessage: val2.errorMessage replace /([\n,\/])/ with ""
}
)
}
But, here im facing an issue with, i'm able to generate the file with data as expected, but the header ErrorMessage is missing/not appearing in the file with my real data(in production). Kindly assist me.
and Expecting the below CSV output
Action1,Action2,ErrorMessage
Run,,Cannot Run
,Walk,Cannot Walk
Sleep,

Hello the best way to solve this kind of problem is using groupBy. The idea is that you groupBy one of the two parts to use the join by and then you iterate the other part and do a lookup. This way you avoid O(n^2) and transform it to O(n)
%dw 1.0
%var payloadById = payload groupBy $.id
%output application/csv
---
flowVars.InputData map ((value, index) ->
using(locatedError = payloadById[value.Action2][0] default payloadById[value.Action1][0]) (
(value ++ {ErrorMessage: locatedError.errorMessage replace /([\n,\/])/ with ""}) when locatedError != null otherwise value
)
)
filter $ != null

Assuming "Payload-1" is payload, and "Payload-2" is flowVars.actions, I would first create a key-value lookup with the payload. Then I would use that to populate flowVars.actions:
%dw 1.0
%output application/csv header=true
// Creates lookup, e.g.:
// {"Run": "Cannot run", "Walk": "Cannot walk"}
%var errorMsgLookup = payload reduce ((obj, lookup={}) ->
lookup ++ {(obj.id): obj.errorMessage})
---
flowVars.actions map ((action) -> action ++ errorMsgLookup[action.Action1])
Note: I'm also assuming flowVars.action's id field is unique across the array.

Related

DataWeave 2.0 how to build dynamically populated accumulator for reduce()

I'm trying to convert an array of strings into an object for which each member uses the string for a key, and initializes the value to 0. (Classic accumulator for Word Count, right?)
Here's the style of the input data:
%dw 2.0
output application/dw
var hosts = [
"t.me",
"thewholeshebang.com",
"thegothicparty.com",
"windowdressing.com",
"thegothicparty.com"
]
To get the accumulator, I need a structure in this style:
var histogram_acc = {
"t.me" : 1,
"thewholeshebang.com" : 1,
"thegothicparty.com" : 2,
"windowdressing.com" : 1
}
My thought was that this is a slam-dunk case for reduce(), right?
So to get the de-duplicated list of hosts, we can use this phrase:
hosts distinctBy $
Happy so far. But now for me, it turns wicked.
I thought this might be the gold:
hosts distinctBy $ reduce (ep,acc={}) -> acc ++ {ep: 0}
But the problem is that this didn't work out so well. The first argument to the lambda for reduce() represents the iterating element, in this case the endpoint or address. The lambda appends the new object to the accumulator.
Well, that's how I hoped it would happen, but I got this instead:
{
ep: 0,
ep: 0,
ep: 0,
ep: 0
}
I kind of need it to do better than that.
As you said reduce is a good fit for this problem, alternatively you can use the "Dynamic elements" of objects feature to "flatten an array of objects into an object"
%dw 2.0
output application/dw
var hosts = [
"t.me",
"thewholeshebang.com",
"thegothicparty.com",
"windowdressing.com",
"thegothicparty.com"
]
---
{(
hosts
distinctBy $
map (ep) -> {"$ep": 0}
)}
See https://docs.mulesoft.com/mule-runtime/4.3/dataweave-types#dynamic_elements
Scenario 1:
The trick I think for this scenario is you need to enclose the expression for the distinctBy ... map with {}.
Example:
Input:
%dw 2.0
var hosts = [
"t.me",
"thewholeshebang.com",
"thegothicparty.com",
"windowdressing.com",
"thegothicparty.com"
]
output application/json
---
{ // This open bracket will do the trick.
(hosts distinctBy $ map {($):0})
} // See Scenario 2 if you remove or comment this pair bracket
Output:
{
"t.me": 0,
"thewholeshebang.com": 0,
"thegothicparty.com": 0,
"windowdressing.com": 0
}
Scenario 2: If you remove the {} from the expression {<expression distinctBy..map...} the output will be an Array.
Example:
Input:
%dw 2.0
var hosts = [
"t.me",
"thewholeshebang.com",
"thegothicparty.com",
"windowdressing.com",
"thegothicparty.com"
]
output application/json
---
//{ // This is now commented
(hosts distinctBy $ map {($):0})
//} // This is now commented
Output:
[
{
"t.me": 0
},
{
"thewholeshebang.com": 0
},
{
"thegothicparty.com": 0
},
{
"windowdressing.com": 0
}
]
Scenario 3: If you want to count the total duplicate per item, you can use the groupBy and sizeOf
Example:
Input:
%dw 2.0
var hosts = [
"t.me",
"thewholeshebang.com",
"thegothicparty.com",
"windowdressing.com",
"thegothicparty.com"
]
output application/json
---
hosts groupBy $ mapObject (value,key) -> {
(key): sizeOf(value)
}
Output:
{
"t.me": 1,
"thewholeshebang.com": 1,
"thegothicparty.com": 2,
"windowdressing.com": 1
}
Hilariously (but perhaps only to me) is the fact that I discovered the answer to this while I was writing my question. Hoping that someone will pose this same question, here is what I found.
In order to present the lambda argument in my example (ep) as the key in a structure, I must quote and intererpolate it.
"$ep"
Once I did that, it was a quick passage to:
hosts distinctBy $ reduce (ep,acc={}) -> acc ++ {"$ep": 0}
...and then of course this:
{
"t.me": 0,
"thewholeshebang.com": 0,
"thegothicparty.com": 0,
"windowdressing.com": 0
}

json key iteration in DW mule

I have the following requirement need to interate the dynamic json key
need to use this json key and iterate through it
This is my input
[
{
"eventType":"ORDER_SHIPPED",
"entityId":"d0594c02-fb0e-47e1-a61e-1139dc185657",
"userName":"educator#school.edu",
"dateTime":"2010-11-11T07:00:00Z",
"status":"SHIPPED",
"additionalData":{
"quoteId":"d0594c02-fb0e-47e1-a61e-1139dc185657",
"clientReferenceId":"Srites004",
"modifiedDt":"2010-11-11T07:00:00Z",
"packageId":"AIM_PACKAGE",
"sbsOrderId":"TEST-TS-201809-79486",
"orderReferenceId":"b0123c02-fb0e-47e1-a61e-1139dc185987",
"shipDate_1":"2010-11-11T07:00:00Z",
"shipDate_2":"2010-11-12T07:00:00Z",
"shipDate_3":"2010-11-13T07:00:00Z",
"shipMethod_1":"UPS Ground",
"shipMethod_3":"UPS Ground3",
"shipMethod_2":"UPS Ground2",
"trackingNumber_3":"333",
"trackingNumber_1":"2222",
"trackingNumber_2":"221"
}
}
]
I need output like following
{
"trackingInfo":[
{
"shipDate":"2010-11-11T07:00:00Z",
"shipMethod":"UPS Ground",
"trackingNbr":"2222"
},
{
"shipDate":"2010-11-12T07:00:00Z",
"shipMethod":"UPS Ground2",
"trackingNbr":"221"
},
{
"shipDate":"2010-11-13T07:00:00Z",
"shipMethod":"UPS Ground3",
"trackingNbr":"333"
}
]
}
the shipdate, shipmethod ,trackingnumber can be n numbers.
how to iterate using json key.
First map the array to iterate and then use pluck to get a list of keys.
Then as long as there is always the same amount of shipDate to shipMethod etc fields. filter the list of keys to only iterate the amount of times those field combinations exist.
Then construct the output of each object by dynamically looking up the key using 'shipDate__ concatenated with the index(incremented by 1 because your example starts at 1 and dw arrays start at 0):
%dw 2.0
output application/json
---
payload map ((item, index) -> item.additionalData pluck($$) filter ($ contains 'shipDate') map ((item2, index2) ->
using(incIndex=(index2+1 as String)){
"shipDate": item.additionalData[('shipDate_'++ incIndex)],
"shipMethod": item.additionalData[('shipMethod_'++ incIndex)],
"trackingNbr": item.additionalData[('trackingNumber_'++ incIndex)],
}
)
)
In DW 1.0 syntax:
%dw 1.0
%output application/json
---
payload map ((item, index) -> item.additionalData pluck ($$) filter ($ contains 'shipDate') map ((item2, index2) ->
using (incIndex = (index2 + 1 as :string))
{
"shipDate": item.additionalData[('shipDate_' ++ incIndex)],
"shipMethod": item.additionalData[('shipMethod_' ++ incIndex)],
"trackingNbr": item.additionalData[('trackingNumber_' ++ incIndex)]
}))
It's mostly the same, except:
output => %output
String => :string

Emit Python embedded object as native JSON in YAML document

I'm importing webservice tests from Excel and serialising them as YAML.
But taking advantage of YAML being a superset of JSON I'd like the request part of the test to be valid JSON, i.e. to have delimeters, quotes and commas.
This will allow us to cut and paste requests between the automated test suite and manual test tools (e.g. Postman.)
So here's how I'd like a test to look (simplified):
- properties:
METHOD: GET
TYPE: ADDRESS
Request URL: /addresses
testCaseId: TC2
request:
{
"unitTypeCode": "",
"unitNumber": "15",
"levelTypeCode": "L",
"roadNumber1": "810",
"roadName": "HAY",
"roadTypeCode": "ST",
"localityName": "PERTH",
"postcode": "6000",
"stateTerritoryCode": "WA"
}
In Python, my request object has a dict attribute called fields which is the part of the object to be serialised as JSON. This is what I tried:
import yaml
def request_presenter(dumper, request):
json_string = json.dumps(request.fields, indent=8)
return dumper.represent_str(json_string)
yaml.add_representer(Request, request_presenter)
test = Test(...including embedded request object)
serialised_test = yaml.dump(test)
I'm getting:
- properties:
METHOD: GET
TYPE: ADDRESS
Request URL: /addresses
testCaseId: TC2
request: "{
\"unitTypeCode\": \"\",\n
\"unitNumber\": \"15\",\n
\"levelTypeCode": \"L\",\n
\"roadNumber1\": \"810\",\n
\"roadName\": \"HAY\",\n
\"roadTypeCode\": \"ST\",\n
\"localityName\": \"PERTH\",\n
\"postcode\": \"6000\",\n
\"stateTerritoryCode\": \"WA\"\n
}"
...only worse because it's all on one line and has white space all over the place.
I tried using the | style for literal multi-line strings which helps with the line breaks and escaped quotes (it's more involved but this answer was helpful.) However, escaped or multiline, the result is still a string that will need to be parsed separately.
How can I stop PyYaml analysing the JSON block as a string and make it just accept a block of text as part of the emitted YAML? I'm guessing it's something to do with overriding the emitter but I could use some help. If possible I'd like to avoid post-processing the serialised test to achieve this.
Ok, so this was the solution I came up with. Generate the YAML with a placemarker ahead of time. The placemarker marks the place where the JSON should be inserted, and also defines the root-level indentation of the JSON block.
import os
import itertools
import json
def insert_json_in_yaml(pre_insert_yaml, key, obj_to_serialise):
marker = '%s: null' % key
marker_line = line_of_first_occurrence(pre_insert_yaml, marker)
marker_indent = string_indent(marker_line)
serialised = json.dumps(obj_to_serialise, indent=marker_indent + 4)
key_with_json = '%s: %s' % (key, serialised)
serialised_with_json = pre_insert_yaml.replace(marker, key_with_json)
return serialised_with_json
def line_of_first_occurrence(basestring, substring):
"""
return line number of first occurrence of substring
"""
lineno = lineno_of_first_occurrence(basestring, substring)
return basestring.split(os.linesep)[lineno]
def string_indent(s):
"""
return indentation of a string (no of spaces before a nonspace)
"""
spaces = ''.join(itertools.takewhile(lambda c: c == ' ', s))
return len(spaces)
def lineno_of_first_occurrence(basestring, substring):
"""
return line number of first occurrence of substring
"""
return basestring[:basestring.index(substring)].count(os.linesep)
embedded_object = {
"unitTypeCode": "",
"unitNumber": "15",
"levelTypeCode": "L",
"roadNumber1": "810",
"roadName": "HAY",
"roadTypeCode": "ST",
"localityName": "PERTH",
"postcode": "6000",
"stateTerritoryCode": "WA"
}
yaml_string = """
---
- properties:
METHOD: GET
TYPE: ADDRESS
Request URL: /addresses
testCaseId: TC2
request: null
after_request: another value
"""
>>> print(insert_json_in_yaml(yaml_string, 'request', embedded_object))
- properties:
METHOD: GET
TYPE: ADDRESS
Request URL: /addresses
testCaseId: TC2
request: {
"unitTypeCode": "",
"unitNumber": "15",
"levelTypeCode": "L",
"roadNumber1": "810",
"roadName": "HAY",
"roadTypeCode": "ST",
"localityName": "PERTH",
"postcode": "6000",
"stateTerritoryCode": "WA"
}
after_request: another value

extract values from json using Ruby

I need to extract only the value for 'admins' from this Json using Ruby :
JSON -
{
"Roles":[
{
"admins":[
"me"
],
"role":"cleanup"
},
{
"admins":[
"tester"
],
"role":"create a mess"
},
]
}
RUBY -
require 'json'
file = File.read('adminlist_Feb_2017.json')
thismonthlist=JSON.parse(file)
puts thismonthlist['admins']
Output - this gives me a blank output however if i change the last line to :
puts thismonthlist['Roles']
it gives me everything. I just want the list of admins.
Try something like this
thismonthlist[:Roles].flat_map { |role| role[:admins] }
=> ["me", "tester"]
admins = []
File.open('adminlist_Feb_2017.json', 'r') do |file|
json = JSON.parse(file.read)
admins = json["Roles"].flat_map{|role| role["admins"]}.uniq
end
admins
# => ["me", "tester"]
I open the file and process it in a block to ensure it's closed at the end. In the block I read the file content and parse the json string into a hash. Then I go through the "Roles" of the hash, grab the "admins" arrays and return it as one array only with Enumerable#flat_map. After I use Enumerable#uniq to return each admin only once.

Select (ignore if does not exists) for JSON logs Spark SQL

I am new to Apache spark and trying out a few POCs around this. I am trying to read json logs which are structured but a few fields are not always guaranteed, for example :
{
"item": "A",
"customerId": 123,
"hasCustomerId": true,
.
.
.
},
{
"item": "B",
"hasCustomerId": false,
.
.
.
}
}
Assume I want to transform these JSON logs into CSV, I was trying out Spark SQL to get hold of all the fields by simple Select statements but as the second JSON is missing a field(although it does has an identifier) I am not sure how can I handle this.
I want to transform the above json logs to
item, customerId, ....
A , 123 , ....
B , null/0 , ....
You should use SqlContext to read the JOSN file, sqlContext.read.json("file/path") But if you want to convert it into CSV and then you want to read it with missing values. Your CSV file should be look like
item,customerId,hasCustomerId, ....
A,123,, .... // hasCustomerId is null
B,,888, .... // customerId is null
i.e. empty record. Then you have to read this like
val df = sqlContext.read
.format("com.databricks.spark.csv")
.option("header", "true") // Use first line of all files as header
.option("inferSchema", "true") // Automatically infer data types
.load("file/path")