Unable to bulk index to elasticsearch - json

I have json file as shown here,
{ "index": { "_index": "volvo", "_type": "user" }}
{"dn": " cn=s,o=VCC\n", "changetype": " add\n", "mail": " com\n", "surname": " s\n", "givenname": " s\n", "cn": " su2\n", "objectclass": [" inetOrgPerson\n", " srvprvUserAux\n", " organizationalPerson\n", " Person\n", " ndsLoginProperties\n", " Top\n", " srvprvEntityAux\n"]}
{ "index": { "_index": "volvo", "_type": "user" }}
{"dn": " cn=s1,o=VCC\n", "changetype": " add\n", "mail": " com\n", "surname": " sa\n", "givenname": " su\n", "cn": " s\n", "objectclass": [" inetOrgPerson\n", " srvprvUserAux\n", " organizationalPerson\n", " Person\n", " ndsLoginProperties\n", " Top\n", " srvprvEntityAux\n"]}
when i try to bulk index this to my elasticsearch,
i get the following error,
{"error":{"root_cause":[{"type":"json_parse_exception","reason":"Unexpected character ('�' (code 65533 / 0xfffd)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput#4914595e; line: 2, column: 2]"}],"type":"json_parse_exception","reason":"Unexpected character ('�' (code 65533 / 0xfffd)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput#4914595e; line: 2, column: 2]"},"status":500}
Can you figure out whats the issue with my json.

I received the same error while bulk indexing and resolved the issue by changing the file encoding. Using Notepad++ I changed the file encoding from UTF-8-BOM to UTF-8 and was able to complete bulk index operation.

Related

Exploding a json column in Athena using Presto stored procedure

The Scenario
I've chose S3 folder location to create the table from a csv file, which has 1 column in JSON format. This needs to be exploded in way that creates many entries for one particular user & event.
The Problem
Athena Table looks something as follows:
agenda_data, event_id, partner_id, record_last_updated, user_id
"{'enclosed_data': {'task_active': 'true', 'status': 'completed'}, 'Agenda-1': {'currentProgress': '', 'timelines': '30/4/2020'}, 'Agenda-2': {'currentProgress': ' ', 'timelines': '25/4/2020'}, 'Agenda-3': {'currentProgress': ' ', 'timelines': '25/4/2020'}, 'Agenda-4': {'currentProgress': ' ', 'timelines': '28/4/2020'}, 'meta': {'foo': 'bar'}, 'Summary': {'finYear': '2020'}}, 'event_id': '20200407181839', 'record_last_updated': '2020-04-07T18:24:44.557362Z','user_id': '121000'}",20200407181839,Actionable,2020-04-06T13:20:31.114397Z,121000
"{'enclosed_data': {'consolidator': {'task_active': 'true', 'status': 'completed'},'Agenda-1': {'currentProgress': '', 'timelines': '25/4/2020'},'Agenda-2': {'currentProgress': 'On Going', 'timelines': '20/4/2020'},'Agenda-3': {'currentProgress': 'Completed', 'timelines': '07/4/2020'},'Agenda-4': {'currentProgress': ' ', 'timelines': '13/4/2020'},'meta': {'foo': 'bar'}, 'Summary': {'finYear': '2020'}}, event_id': '20200407202551',record_last_updated': '2020-04-07T20:32:48.215545Z', user_id': '12354'}",20200407202551,Actionable,2020-04-07T20:32:48.215545Z,12354
The Column agenda_data contains JSON data, which needs to be exploded. To put it clearly I'll repost the minimized structure of JSON.
{
"enclosed_data": {
"task_active": "true",
"status": "completed"
},
"Agenda-1": {
"currentProgress": "",
"timelines": "25/4/2020"
},
"Agenda-2": {
"currentProgress": "On Going",
"timelines": "20/4/2020"
},
"meta": {
"foo": "bar"
},
"Summary": {
"finYear": "2020"
}
},
"event_id": "20200407202551",
"record_last_updated": "2020-04-07T20:32:48.215545Z",
"user_id": "121000"
}
I need to project the Data of Agendas only when exploded, for same I tried resolving multiple blogs, I found Documents sensible though, here they go:
link1: Which helps very little
link2: Doesn't apply since I don't have Arrays in here
link3: Couldn't get either
The Expected output
The expected output is as follows:
event_id, partner_id, record_last_updated, user_id, agenda, currentProgress, timelines
20200407181839, Actionable, 2020-04-07T20:32:48.215545Z, 121000, Agenda-1, " ", "30/4/2020"
20200407181839, Actionable, 2020-04-07T20:32:48.215545Z, 121000, Agenda-2, " ", "25/4/2020"
20200407181839, Actionable, 2020-04-07T20:32:48.215545Z, 121000, Agenda-3, " ", "25/4/2020"
20200407181839, Actionable, 2020-04-07T20:32:48.215545Z, 121000, Agenda-4, " ", "28/4/2020"
20200407202551, Actionable, 2020-04-07T20:32:48.215545Z, 12354, Agenda-1, " ", "25/4/2020"
20200407202551, Actionable, 2020-04-07T20:32:48.215545Z, 12354, Agenda-2, "On Going", "20/4/2020"
20200407202551, Actionable, 2020-04-07T20:32:48.215545Z, 12354, Agenda-3, "Completed", "07/4/2020"
20200407202551, Actionable, 2020-04-07T20:32:48.215545Z, 12354, Agenda-4, " ", "13/4/2020"
EDIT #1
Success so far, that I could manage to parse json using presto function as follows:
QUERY
with meeting_data AS
(SELECT '{
"enclosed_data": {
"task_active": "true",
"status": "completed"
},
"Agenda-1": {
"currentProgress": "",
"timelines": "25/4/2020"
},
"Agenda-2": {
"currentProgress": "On Going",
"timelines": "20/4/2020"
},
"meta": {
"foo": "bar"
},
"Summary": {
"finYear": "2020"
}
},
"event_id": "20200407202551",
"record_last_updated": "2020-04-07T20:32:48.215545Z",
"user_id": "121000"
}' AS blob)
SELECT json_extract(blob,
'$["Agenda-1"]') AS agenda1, json_extract(blob, '$.enclosed_data.status') AS m_status, json_extract(blob, '$.Summary.finYear') AS finYear
FROM meeting_data
OUTPUT
agenda1, m_status, finYear
{"Agenda-1": {"currentProgress": "", "timelines":"25/4/2020"}},"completed", "2020-21"
OPEN QUESTIONS
I understood I can access the JSON when put it manually, I need this to be fetched from column one by one using loop, but how?
Once looped, how do I explode and get the expected output by repeating the other column values which aren't in JSON format?
Can this be achieved by writing a function/stored procedure in presto?

Generate Case Classes from Schema and Parse JSON

I have the json schema:
{
"type ": "record ",
"name ": "JSONSchema",
"namespace ": "com.jsonschema ",
"fields ": [{
"name ": "schema ",
"type ": "string "
},
{
"name ": "body ",
"type ": {
"type ": "record ",
"name ": "BodyFinal ",
"fields ": [{
"name ": "schema ",
"type ": "string "
},
{
"name ": "data ",
"type ": {
"type ": "array ",
"items ": {
"type ": "record ",
"name ": "DataFinal ",
"fields ": [{
"name ": "tna ",
"type ": [
"null ",
"string "
],
"default ": null
},
{
"name ": "aid ",
"type ": [
"null ",
"string "
]
}
]
}
}
}
]
}
}
]
}
How can i generate the case classes automatically using schema as:
case class JSONSchema(schema: String, body: BodyFinal)
case class BodyFinal(schema: String,data: List[DataFinal])
case class DataFinal(tna: Option[String], aid: Option[String])
And then can write a parser to validate the any json received using the case classes. So that if I change the schema in future and add/remove any fields in it then the case classes can be generated and validated against the JSON.

Groovy: Why the node is returning null

I wanted to add my json response values to an array. My groovy script,
import groovy.json.*
def ResponseMessage = '''{
"Unit": {
"Screen": [{
"Profile ": {
"ID ": 12,
"Rate ": 0
},
"Rate ": 600,
"Primary ": 1,
"Audio ": [{
"Id ": 1,
"Name ": null
}],
"Pre ": 5,
"Post ": 1
}]
}
} '''
def json = new JsonSlurper().parseText(ResponseMessage)
def Screen = json.Unit.Screen
log.info Screen
def array= []
Screen.each { s ->
array.addAll(s.Rate,s.Primary,s.Pre)
log.info "array : " + array
}
Array is returning,
INFO:array : [null, null, null]
Instead of the "create an array, call addAll in a loop" pattern, try this:
def array = Screen.collectMany { s ->
[s.Rate,s.Primary,s.Pre]
}
(Of course, once you've removed the spaces from your JSON keys)

AngularJS $http.get: JSON Has Numeric Keys - No Keys Wanted

I am testing an AngularJS website locally. I'm having problems parsing JSON data using $http.get from a local JSON file.
When I define the JSON in my controller, I have no problems. However, when I get the JSON from a file (data.json), the JSON format is different, according to the JavaScript console.
How come the JSON formats are different? Specifically, the $http.get retrieved JSON has numeric keys. Can I simply remove the numeric keys? Or is there something wrong with my JSON declaration/syntax? Below is a slew of additional information.
Here is how I define it in my controller:
$scope.customerReviews = [
{
'id': '0',
'title': 'Outstanding Employee!',
'text': 'bar foo bar foo',
'image': '<img class="img-responsive img-hover" src="images/bob.jpg">',
'href': '',
'date': 'June 17, 2014',
'author': 'john',
'articleType': 'article',
'neverSettle': 'partnering',
'category': 'customerReviews'
},
{
'id': '1',
'title': 'hooray!',
'text': 'congratulations',
'image': '<img class="img-responsive img-hover" src="images/bob.png">',
'href': '',
'date': 'June 17, 2014',
'author': 'sir charles',
'articleType': 'article',
'neverSettle': 'innovating',
'category': 'customerReviews'
},
{
'id': '2',
'title': 'Outstanding Employee',
'text': 'bar foo foo',
'image': '<img class="img-responsive img-hover" src="images/bilbo.jpg">',
'href': '',
'date': 'June 17, 2014',
'author': 'johnny',
'articleType': 'article',
'neverSettle': 'engaging',
'category': 'customerReviews'
},
{
'id': '3',
'title': 'Thank you',
'text': 'much thanks',
'image': '<img class="img-responsive img-hover" src="images/x.jpg">',
'href': '',
'date': 'June 17, 2014',
'author': 'The Graduate College',
'articleType': 'article',
'neverSettle': 'innovating',
'category': 'customerReviews'
}
];
When I copy paste from [ to ]; into the Chrome developer tools console, I get the following output:
Like I said above, my current code prints my content perfectly. But if I try to get the JSON in an external file using $http.get, it doesn't print my content, and the JavaScript console shows a different JSON format.
Here is my $http.get code (in the controller):
// http get json content
$scope.customerReviews = [];
$http.get("js/models/data.json").success(function(data){
console.log("success!");
$scope.customerReviews = data;
console.log($scope.customerReviews);
return $scope.customerReviews;
});
Here is data.json. As you can see, this JSON file is different from how I define my controller. Specifically, the " and ' are switched to be JSON validation compliant. I ran this one through a JSON validator and it is formatted correctly. Also, when I copy paste this into the console, I get the first console output. Only when I do $http.get I get the "numeric keys" and my printing functions don't work.
[
{
"id ": "0 ",
"title ": "Outstanding Employee! ",
"text ": "too lazy to obfuscate all of my content",
"image ": "<img class='img-responsive img-hover' src='images/GladisTolsa.jpg'> ",
"href ": " ",
"date ": "June 17, 2014 ",
"author ": "Martha Castleberry ",
"articleType ": "article ",
"neverSettle ": "partnering ",
"category ": "customerReviews "
},
{
"id ": "1 ",
"title ": "Facilities Help ",
"text ": "too lazy to obfuscate all of my content",
"image ": "<img class='img-responsive img-hover' src='images/FernandoLopez.png'> ",
"href ": " ",
"date ": "June 17, 2014 ",
"author ": "Lucy Valenzuela ",
"articleType ": "article ",
"neverSettle ": "innovating ",
"category ": "customerReviews "
},
{
"id ": "2 ",
"title ": "Outstanding Employee ",
"text ": "too lazy to obfuscate all of my content",
"image ": "<img class='img-responsive img-hover' src='images/MariaAlvarado.jpg'> ",
"href ": " ",
"date ": "June 17, 2014 ",
"author ": "Martha Castleberry ",
"articleType ": "article ",
"neverSettle ": "engaging ",
"category ": "customerReviews "
},
{
"id ": "3 ",
"title ": "Thank you ",
"text ": "too lazy to obfuscate all of my content",
"image ": "<img class='img-responsive img-hover' src='images/MovingServices.jpg'> ",
"href ": " ",
"date ": "June 17, 2014 ",
"author ": "The Graduate College ",
"articleType ": "article ",
"neverSettle ": "innovating ",
"category ": "customerReviews "
}
]
So the $http.get request works. Here is the console output:
Phew. I apologize for the lengthiness of my question.
My Question: How come the seemingly equivalent JSONs are outputting different formats? Specifically, why does the $http.get retrieved JSON (the second one) have numeric keys? I need the second console output to have the same output as the first console output. Can I just remove the numeric keys? Or is there something wrong with my JSON declaration/syntax?
Any input is appreciated. Especially anything that could improve my AngularJS skills, and JSON knowledge. Thanks in advance.
EDIT: Thanks to everyone so far. Apparently those are array indexes written by Chrome developer tools, not numeric keys. I won't change my post title to avoid confusion for others. On request, here is how my printing works:
<!-- ng repeat of Blog Preview Rows (reversed) -->
<div ng-repeat="x in getCategory().slice().reverse() | limitTo:quantity " close="getCategory().splice(index, 1)">
<previews></previews>
<hr />
</div>
getCategory() is a function that gets the querystring of the URL using regex. As stated before, this works when the JSON is declared in the controller. Perhaps getCategory() is ran after $http.get, therefore not printing anything? Also note that I simply reverse the ng-repeat.
Here is the <preview> directive:
.directive('previews', function () {
return {
restrict: 'AEC',
replace: 'true',
templateUrl: 'js/views/articleCollection.htm'
};
});
articleCollection.htm:
<div class="row">
<div class="col-md-1 text-center">
<p><span ng-bind-html="x.articleType"></span></p>
<p><span ng-bind-html="x.neverSettle"></span></p>
<p><span ng-bind-html="x.date"></span></p>
</div>
<div class="col-md-5">
<a href="{{ x.href }}">
<span ng-bind-html="x.image"></span>
</a>
</div>
<div class="col-md-6">
<h3>
<span ng-bind-html="x.title"></span>
</h3>
<p>
by <span ng-bind-html="x.author"></span>
</p>
<p><span ng-bind-html="x.text"></span></p>
<a class="btn btn-default" href="{{ x.href }}">Read More <i class="fa fa-angle-right"></i></a>
</div>
</div>
Thanks again. Let me know how I can further clarify my question. Also let me know how I can improve anything AngularJS related. So far, the journey has been a doozy.
Q: How come the seemingly equivalent JSONs are outputting different formats?
A: Because they are valid either way. See more info of JSON's syntax here
Q:Specifically, why does the $http.get retrieved JSON (the second one) have numeric keys?
A: I am guessing you are talking about the array position index at each array of objects. They make array easier to recognize. Of course, for viewing purpose in console.
Q: I need the second console output to have the same output as the first console output. Can I just remove the numeric keys?
A: Same as above. Google chrome output the 'numeric keys' are just for developers like us to easily recognize the position of array of object. You don't need then in your .json file.
Q: Or is there something wrong with my JSON declaration/syntax?
A: Nope. According to examples your provided, you are doing just fine. Keep up the good work!
EDIT
I've done some research, and, ahhhhhh I see your problem now.
Apparently reading JSON locally cause problem, so you need to modify a little bit.
See this:
AngularJS: factory $http.get JSON file
EDIT 2
Let me give it another go.
I personally have trouble relying on $scope, especially that I would not recommend to return a $scope in a function.
Try this:
app.factory("factoryExample", ['$http', function ($http) {
return {
Main: $http.get("js/models/data.json")
}
}]);
//in controller
app.controller('MainController', ['$scope', 'factoryExample', function ($scope, factoryExample) {
factoryExample.Main.success(function(data){
$scope.customerReviews = data;
});
}]);
As your post mentioned it seems like you are able to get json locally properly, my bad. After this code, your $scope.customerReview should be working!
EDIT3
Give your JSON a name, for your example:
{ "foo":
[
{
"id ": "0 ",
"title ": "Outstanding Employee! ",
"text ": "too lazy to obfuscate all of my content",
"image ": "<img class='img-responsive img-hover' src='images/GladisTolsa.jpg'> ",
"href ": " ",
"date ": "June 17, 2014 ",
"author ": "Martha Castleberry ",
"articleType ": "article ",
"neverSettle ": "partnering ",
"category ": "customerReviews "
},
{
"id ": "1 ",
"title ": "Facilities Help ",
"text ": "too lazy to obfuscate all of my content",
"image ": "<img class='img-responsive img-hover' src='images/FernandoLopez.png'> ",
"href ": " ",
"date ": "June 17, 2014 ",
"author ": "Lucy Valenzuela ",
"articleType ": "article ",
"neverSettle ": "innovating ",
"category ": "customerReviews "
},
{
"id ": "2 ",
"title ": "Outstanding Employee ",
"text ": "too lazy to obfuscate all of my content",
"image ": "<img class='img-responsive img-hover' src='images/MariaAlvarado.jpg'> ",
"href ": " ",
"date ": "June 17, 2014 ",
"author ": "Martha Castleberry ",
"articleType ": "article ",
"neverSettle ": "engaging ",
"category ": "customerReviews "
},
{
"id ": "3 ",
"title ": "Thank you ",
"text ": "too lazy to obfuscate all of my content",
"image ": "<img class='img-responsive img-hover' src='images/MovingServices.jpg'> ",
"href ": " ",
"date ": "June 17, 2014 ",
"author ": "The Graduate College ",
"articleType ": "article ",
"neverSettle ": "innovating ",
"category ": "customerReviews "
}
]
}
Then use:
<div ng-repeat="items in customerReviews.foo">{{items.id}}</div>
and so on.
Finally got the website on a web server, and the same code threw sce unsafe errors.
I just had to trust it as HTML before returning!
HTML-trustifying helper function:
function arrayToHTML(data) {
for (i = 0; i < data.length; i++) {
data[i]["id"] = $sce.trustAsHtml(data[i]["id"]);
data[i]["title"] = $sce.trustAsHtml(data[i]["title"]);
data[i]["text"] = $sce.trustAsHtml(data[i]["text"]);
data[i]["image"] = $sce.trustAsHtml(data[i]["image"]);
data[i]["date"] = $sce.trustAsHtml(data[i]["date"]);
data[i]["author"] = $sce.trustAsHtml(data[i]["author"]);
data[i]["articleType"] = $sce.trustAsHtml(data[i]["articleType"]);
data[i]["neverSettle"] = $sce.trustAsHtml(data[i]["neverSettle"]);
data[i]["category"] = $sce.trustAsHtml(data[i]["category"]);
data[i]["href"] = $sce.trustAsHtml(data[i]["href"]);
}
}
Working Code:
// http get json content
$scope.customerReviews = [];
$http.get("js/models/data.json").success(function(data){
console.log("success!");
$scope.customerReviews = data;
console.log($scope.customerReviews);
arrayToHTML($scope.customerReviews); // This fixed it!
return $scope.customerReviews;
});

JSON Deserialize ( NewtonSoft JSON.NET) to XML Failure

I have no experience with JSON at all, but I unfortunately have a webservice that returns data to me. I need to format the data from JSON into XML so that I can import into our own system here.
I receive the data from the Web Service in this format:
{
"httpStatusCode": 200,
"messages": [],
"succesfulResponses": [
{
"position": 0,
"response": {
"dln": "AAAPY459037VB9SV",
"dvlaServiceVersion": "1",
"hubServiceVersion": "1.0.0.0",
"dvlaProcessingDate": "2014-12-22T14:03:43.557Z",
"hubProcessingDate": "2015-05-29T16:50:51.4364004+01:00",
"licence": {
"status": "FC",
"validFrom": "1986-01-22",
"validTo": "2017-09-02",
"directiveIndicator": 0,
"entitlements": [
{
"code": "A",
"validFrom": null,
"validTo": null,
"priorTo": false,
"type": "F",
"restrictions": []
}
],
"endorsements": []
},
"httpStatusCode": 200,
}
"messages": []
}
],
"errorResponses": []
}
I tried to use the following using the Newtonsoft JSON.NET Program:
Dim doc As XmlDocument = DirectCast(JsonConvert.DeserializeXmlNode(sAnswer, "root"), XmlDocument)
Unfortunately it returned this:
2000AAAPY459037VB9SV11.0.0.02014-12-22T14:03:43.557Z2015-05-29T16:59:08.6833762+01:00FC1986-01-222017-09-020AfalseF200
Which is of no use to me at all, I need it to format the XML complete the node name / elements so that I can import this correctly, is anyone able to point me in the right direction?
Cheers,
J
Managed to format the JSON to XML correctly. I added the following to my returned JSON String:
jSON = "{" & vbCr & vbLf & " '?xml': {" & vbCr & vbLf & " '#version': '1.0'," & vbCr & vbLf & " '#standalone': 'no'" & vbCr & vbLf & " }," & vbCr & vbLf & " 'root': " + sAnswer
Then specified to deserialize by 'root'