How to put mapping in json files with Elasticsearch - json

i have a lot of JSON documents with this structure :
"positions": [
{
"millis": 12959023,
"lat": 49.01525113731623,
"lon": 2.4971945118159056,
"rawX": -3754,
"rawY": 605,
"rawVx": 0,
"rawVy": 0,
"speed": 9.801029291617944,
"accel": 0.09442740907572084,
"grounded": true
},
{
"millis": 12959914,
"lat": 49.01536940596998,
"lon": 2.4967825412750244,
"rawX": -3784,
"rawY": 619,
"rawVx": -15,
"rawVy": 7,
"speed": 10.841861737855924,
"accel": -0.09534648619563282,
"grounded": true
}
...
}
i'm trying to map this JSON document with Elasticsearch by introducing a geo_point field to get document like the one below :
"positions": [
{
"millis": 12959023,
"location" : {
"lat": 49.01525113731623,
"lon": 2.4971945118159056,
}
"rawX": -3754,
"rawY": 605,
"rawVx": 0,
"rawVy": 0,
"speed": 9.801029291617944,
"accel": 0.09442740907572084,
"grounded": true
},
...
}
PS : these documents are offered by an API.
Thanks

You could do something like this:
curl -XPUT 'http://localhost:9200/<indexname>/positions/_mapping' -d #yourjsonfile.json
Hope this SO helps!

If you can't modify the source, you need to pre-process your document before it gets indexed into elasticsearch.
If you are using elasticsearch < 5.0, you can use logstash and the
mutate filter.
If you are using elasticsearch >= 5.0 (my
advice), use an ingest pipeline and the rename processor.

Related

Apache VTL - Copy node

Is there a way to do deep copy using apache VTL?
I was trying to use x-amazon-apigateway-integration using requestTemplates.
The input JSON is as shown below,
{
"userid": "21d6523137f6",
"time": "2020-06-16T15:22:33Z",
"item": {
"UserID" : { "S": "21d6523137f6" },
<... some complex json nodes here ...>,
"TimeUTC" : { "S": "2020-06-16T15:22:33Z" },
}
}
The requestTemplate is as shown below,
requestTemplates:
application/json: !Sub
- |
#set($inputRoot = $input.path('$'))
{
"TableName": "${tableName}",
"ConditionExpression": "attribute_not_exists(TimeUTC) OR TimeUTC > :sk",
"ExpressionAttributeValues": {
":sk":{
"S": "$util.escapeJavaScript($input.path('$.time'))"
}
},
"Item": "$input.path('$.item')", <== Copy the entire item over to Item.
"ReturnValues": "ALL_OLD",
"ReturnConsumedCapacity": "INDEXES",
"ReturnItemCollectionMetrics": "SIZE"
}
- {
tableName: !Ref EventsTable
}
The problem is, the item gets copied like,
"Item": "{UserID={S=21d6523137f6}, Lat={S=37.33180957}, Lng={S=-122.03053391}, ... other json elements..., TimeUTC={S=2020-06-16T15:22:33Z}}",
As you can see, the whole nested json become a single atribute. While I expected it to become a fully blown json node on its own like below,
"Item": {
"UserID" : { "S": "21d6523137f6" },
"Lat": { "S": "37.33180957" },
"Lng": { "S": "-122.03053391" },
<.... JSON nodes ...>
"TimeUTC" : { "S": "2020-06-20T15:22:33Z" }
},
Is it possible to deep/nested copy operation on a json node like above without doing the kung-fu of iterating the node and appending the childs o a json node variable etc...
btw, I'm using AWS API Gateway request template, so it may not support all the Apache VTL templating options.
You need to use the $input.json method instead of $input.path.
"Item": $input.json('$.item'),
Note that I removed the double quotes.
If you had the double quotes because you want to stringify $.item, you can do that like so:
"Item": "$util.escapeJavaScript($input.json('$.item'))",

Copy JSON Array data from REST data factory to Azure Blob as is

I have used REST to get data from API and the format of JSON output that contains arrays. When I am trying to copy the JSON as it is using copy activity to BLOB, I am only getting first object data and the rest is ignored.
In the documentation is says we can copy JSON as is by skipping schema section on both dataset and copy activity. I followed the same and I am the getting the output as below.
https://learn.microsoft.com/en-us/azure/data-factory/connector-rest#export-json-response-as-is
Tried copy activity without schema, using the header as first row and output files to BLOB as .json and .txt
Sample REST output:
{
"totalPages": 500,
"firstPage": true,
"lastPage": false,
"numberOfElements": 50,
"number": 0,
"totalElements": 636,
"columns": {
"dimension": {
"id": "variables/page",
"type": "string"
},
"columnIds": [
"0"
]
},
"rows": [
{
"itemId": "1234",
"value": "home",
"data": [
65
]
},
{
"itemId": "1235",
"value": "category",
"data": [
92
]
},
],
"summaryData": {
"totals": [
157
],
"col-max": [
123
],
"col-min": [
1
]
}
}
BLOB Output as the text is below: which is only first object data
totalPages,firstPage,lastPage,numberOfElements,number,totalElements
500,True,False,50,0,636
If you want to write the JSON response as is, you can use an HTTP connector. However, please note that the HTTP connector doesn't support pagination.
If you want to keep using the REST connector and to write a csv file as output, can you please specify how you want the nested objects and arrays to be written ?
In csv files, we can not write arrays. You could always use a custom activity or an azure function activity to call the REST API, parse it the way you want and write to a csv file.
Hope this helps.

Insert JSON file into Elasticsearch + Kibana

I am currently working on an anomaly detection project based on elasticsearch and kibana. Recently I have converted csv file to json and tried to import this data to elasticsearch via Postman using Bulk API. Unfortunately all of the queries were wrong.
Then I have found this topic : Import/Index a JSON file into Elasticsearch
and tried following approach :
curl -XPOST 'http://localhost:9200/yahoodata/a4benchmark/4' --data-binary #Anomaly1.json
The answer I got :
{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"failed
to parse"}],"type":"mapper_parsing_exception","reason":"failed to
parse","caused_by":{"type":"not_x_content_exception","reason":"Compressor
detection can only be called on some xcontent bytes or compressed
xcontent bytes"}},"status":400}
The data I am trying to insert has the following structure (Anomaly1.json):
[
{
"timestamps": 11,
"value": 1,
"anomaly": 1,
},
{
"timestamps": 1112,
"value": 211,
"anomaly": 0,
},
{
"timestamps": 2,
"value": 1,
"anomaly": 0,
}
]

Can we add array of objects in amazon cloudsearch in json format?

I am trying to create a domain and uploading a sample data which is like :
[
{
"type": "add",
"id": "1371964",
"version": 1,
"lang": "eng",
"fields": {
"id": "1371964",
"uid": "1200983280",
"time": "2013-12-23 13:00:26",
"orderid": "1200983280",
"callerid": "66580662",
"is_called": "1",
"is_synced": "1",
"is_sent": "1",
"allcaller": [
{
"sno": "1085770",
"uid": "1387783883.30547",
"lastfun": null,
"callduration": "00:00:46",
"request_id": "1371964"
}
]
}
}]
when I am uploading sample data while creating a domain, cloudsearch is not taking it.
If I remove allcaller array then it takes it smoothly.
If cloudsearch does not allowing object arrays, then how should I format this json??
Just found after searching on aws forums, cloudsearch doesnot allow nested json (object arrays) :(
https://forums.aws.amazon.com/thread.jspa?messageID=405879&#405879
Time to try Elastic search.

How to open a .mongo file and export the content into csv?

EDIT 2014-05-01: I tried fromJSON first (as suggested below), but that only parsed the first line. I found out that there were commas missing between the brackets of each JSON line so I changed that in TextEdit and saved the file. I also added [ at the beginning of the file and ] at the end and then it worked with JSON. Now the next step: from a list (with embedded lists) to a dataframe (or csv).
I get a data package from edX every now and then on the courses we are evaluating. Some of these are just plain .csv files which are quite easy to handle, others are more difficult for me (not having a CS or programming background).
I have 2 files I want to open and parse into csv files for analysis in R. I have tried many many json2csv tools out there, but to no avail. I also tried the simple methods described here to turn json into csv.
The data is confidential, so I cannot share the entire data set, but will share the first two lines of the file, maybe that helps. The problem is that nowhere I find anything about .mongo files, which to me seems quite strange, do they even exist? Or is this just a JSON file that may be corrupted (which could explain the errors)?
Any suggestions are welcome.
The first 2 lines in one of the .mongo files:
{
"_id": {
"$oid": "52d1e62c350e7a3156000009"
},
"votes": {
"up": [
],
"down": [
],
"up_count": 0,
"down_count": 0,
"count": 0,
"point": 0
},
"visible": true,
"abuse_flaggers": [
],
"historical_abuse_flaggers": [
],
"parent_ids": [
],
"at_position_list": [
],
"body": "the delft university accredited course with the scholarship (fundamentals of water treatment) is supposed to start in about a month's time. But have the scholarship list been published? Any tentative date??",
"course_id": "DelftX/CTB3365x/2013_Fall",
"_type": "Comment",
"endorsed": false,
"anonymous": false,
"anonymous_to_peers": false,
"author_id": "269835",
"comment_thread_id": {
"$oid": "52cd40c5ab40cf347e00008d"
},
"author_username": "tachak59",
"sk": "52d1e62c350e7a3156000009",
"updated_at": {
"$date": 1389487660636
},
"created_at": {
"$date": 1389487660636
}
}{
"_id": {
"$oid": "52d0a66bcb3eee318d000012"
},
"votes": {
"up": [
],
"down": [
],
"up_count": 0,
"down_count": 0,
"count": 0,
"point": 0
},
"visible": true,
"abuse_flaggers": [
],
"historical_abuse_flaggers": [
],
"parent_ids": [
{
"$oid": "52c63278100c07c0d1000028"
}
],
"at_position_list": [
],
"body": "I got it. Thank you!",
"course_id": "DelftX/CTB3365x/2013_Fall",
"_type": "Comment",
"endorsed": false,
"anonymous": false,
"anonymous_to_peers": false,
"parent_id": {
"$oid": "52c63278100c07c0d1000028"
},
"author_id": "2655027",
"comment_thread_id": {
"$oid": "52c4f303b03c4aba51000013"
},
"author_username": "dmoronta",
"sk": "52c63278100c07c0d1000028-52d0a66bcb3eee318d000012",
"updated_at": {
"$date": 1389405803386
},
"created_at": {
"$date": 1389405803386
}
}{
"_id": {
"$oid": "52ceea0cada002b72c000059"
},
"votes": {
"up": [
],
"down": [
],
"up_count": 0,
"down_count": 0,
"count": 0,
"point": 0
},
"visible": true,
"abuse_flaggers": [
],
"historical_abuse_flaggers": [
],
"parent_ids": [
{
"$oid": "5287e8d5906c42f5aa000013"
}
],
"at_position_list": [
],
"body": "if u please send by mail \n",
"course_id": "DelftX/CTB3365x/2013_Fall",
"_type": "Comment",
"endorsed": false,
"anonymous": false,
"anonymous_to_peers": false,
"parent_id": {
"$oid": "5287e8d5906c42f5aa000013"
},
"author_id": "2276302",
"comment_thread_id": {
"$oid": "528674d784179607d0000011"
},
"author_username": "totah1993",
"sk": "5287e8d5906c42f5aa000013-52ceea0cada002b72c000059",
"updated_at": {
"$date": 1389292044203
},
"created_at": {
"$date": 1389292044203
}
}
R doesn't have "native" support for these files but there is a JSON parser with the rjson package. So I might load my .mongo file with:
myfile <- "path/to/myfile.mongo"
myJSON <- readLines(myfile)
myNiceData <- fromJSON(myJSON)
Since RJson converts into a data structure that fits the object being read, you'll have to do some additional snooping but once you have an R data type you shouldn't have any trouble working with it from there.
Another package to consider when parsing JSON data is jsonlite. It will make data frames for you so you can write them to a csv format with write.table or some other applicable method for writing objects.
NOTE: if it is easier to connect to the MongoDB and get the data from a request, then RMongo may be a good bet. The R-Bloggers also made a post about using RMongo that has a nice little walkthrough.
I used RJSON as suggested by #theWanderer and with the help of a colleague wrote the following code to parse the data into columns, choosing the specific columns that are needed, and checking each of the instances if they return the right variables.
Entire workflow:
Checked some of the data in jsonlint - corrected the errors → },{ instead of }{ between each line and [ and ] at the beginning and end of the file
Made a smaller file to play with, containing about 11 JSON lines
Used the code below to parse the datafile - however, checking the different listItems first if they are not lists themselves (that gives problems) // as you will see, I also removed things like \n because that gave errors and added an empty value for parent_id if there is none in the data (otherwise it would mix up the data)
The code to import the .mongo file into R and then parse it into CSV:
library(rjson)
###### set working directory to write out the data file
setwd("/your/favourite/dir/json to csv/")
#never ever convert strings to factors
options(stringsAsFactors = FALSE)
#import the .mongo file to R
temp.data = fromJSON(file="temp.mongo", method="C", unexpected.escape="error")
file.remove("temp.csv") ## removes the old datafile if there is one
## (so the data is not appended to the file,
## but a new file is created)
listItem = temp.data[[1]] ## prepare the listItem the first time
for (listItem in temp.data){
parent_id = ""
if (length(listItem$parent_id)>0){
parent_id = listItem$parent_id
}
write.table(t(c(
listItem$votes$up_count, listItem$visible, parent_id,
gsub("\n", "", listItem$body), listItem$course_id, unlist(listItem["_type"]),
listItem$endorsed, listItem$anonymous, listItem$author_id,
unlist(listItem$comment_thread_id), listItem$author_username,
as.POSIXct(unlist(listItem$created_at)/1000, origin="1970-01-01"))), # end t(), c()
file="temp.csv", sep="\t", append=TRUE, row.names=FALSE, col.names=FALSE)
}