I am currently working on an anomaly detection project based on elasticsearch and kibana. Recently I have converted csv file to json and tried to import this data to elasticsearch via Postman using Bulk API. Unfortunately all of the queries were wrong.
Then I have found this topic : Import/Index a JSON file into Elasticsearch
and tried following approach :
curl -XPOST 'http://localhost:9200/yahoodata/a4benchmark/4' --data-binary #Anomaly1.json
The answer I got :
{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"failed
to parse"}],"type":"mapper_parsing_exception","reason":"failed to
parse","caused_by":{"type":"not_x_content_exception","reason":"Compressor
detection can only be called on some xcontent bytes or compressed
xcontent bytes"}},"status":400}
The data I am trying to insert has the following structure (Anomaly1.json):
[
{
"timestamps": 11,
"value": 1,
"anomaly": 1,
},
{
"timestamps": 1112,
"value": 211,
"anomaly": 0,
},
{
"timestamps": 2,
"value": 1,
"anomaly": 0,
}
]
Related
I uploaded CSV-data into elasticsearch using the machine-learning approach described here.
This created an index and a pipeline with a csv - preprocessor. The import was successful.
What is the corresponding curl command line to upload CSV data into elasticsearch, assuming the index is called iislog and the pipeline iislog-pipeline?
The csv ingest processor will only work on a JSON document that contains a field with CSV data. You cannot throw raw CSV data at it using curl.
The CSV to JSON transformation happens in Kibana (when you drop the raw CSV file in the browser window) and only then Kibana will send JSON-ified CSV.
If your CSV looks like this:
column1,column2,column3
1,2,3
4,5,6
7,8,9
Kibana will transform each line into
{"message": "1,2,3"}
{"message": "4,5,6"}
{"message": "7,8,9"}
And then Kibana will send each of those raw CSV/JSON documents to your iislog index through the iislog-pipeline ingest pipeline. The pipeline looks like this:
{
"description" : "Ingest pipeline created by file structure finder",
"processors" : [
{
"csv" : {
"field" : "message",
"target_fields" : [
"column1",
"column2",
"column3"
],
"ignore_missing" : false
}
},
{
"remove" : {
"field" : "message"
}
}
]
}
In the end, the documents will look like this in your index:
{"column1": 1, "column2": 2, "column3": 3}
{"column1": 4, "column2": 5, "column3": 6}
{"column1": 7, "column2": 8, "column3": 9}
That's the way it works. So if you want to use curl, you need to do Kibana's pre-parsing job and send the latter documents.
curl -H 'Content-type: application/json' -XPOST iislog/_doc?pipeline=iislog-pipeline -d '{"column1": 1, "column2": 2, "column3": 3}'
There is another approach to insert CSV into elastic using an ingest pipeline described here: https://www.elastic.co/de/blog/indexing-csv-elasticsearch-ingest-node
In the end, it wraps each line into an json document and grok-parses each line in order to have the csv rows mapped to specific document fields.
I have used REST to get data from API and the format of JSON output that contains arrays. When I am trying to copy the JSON as it is using copy activity to BLOB, I am only getting first object data and the rest is ignored.
In the documentation is says we can copy JSON as is by skipping schema section on both dataset and copy activity. I followed the same and I am the getting the output as below.
https://learn.microsoft.com/en-us/azure/data-factory/connector-rest#export-json-response-as-is
Tried copy activity without schema, using the header as first row and output files to BLOB as .json and .txt
Sample REST output:
{
"totalPages": 500,
"firstPage": true,
"lastPage": false,
"numberOfElements": 50,
"number": 0,
"totalElements": 636,
"columns": {
"dimension": {
"id": "variables/page",
"type": "string"
},
"columnIds": [
"0"
]
},
"rows": [
{
"itemId": "1234",
"value": "home",
"data": [
65
]
},
{
"itemId": "1235",
"value": "category",
"data": [
92
]
},
],
"summaryData": {
"totals": [
157
],
"col-max": [
123
],
"col-min": [
1
]
}
}
BLOB Output as the text is below: which is only first object data
totalPages,firstPage,lastPage,numberOfElements,number,totalElements
500,True,False,50,0,636
If you want to write the JSON response as is, you can use an HTTP connector. However, please note that the HTTP connector doesn't support pagination.
If you want to keep using the REST connector and to write a csv file as output, can you please specify how you want the nested objects and arrays to be written ?
In csv files, we can not write arrays. You could always use a custom activity or an azure function activity to call the REST API, parse it the way you want and write to a csv file.
Hope this helps.
My Goal is to retrieve JSON type fields in an Solr index and also perform search queries on such fields.
I have the following documents in Solr Index and using the auto generated schema utilizing schemaless feature in Solr.
POST http://localhost:8983/solr/test1/update?commitWithin=1000
[
{"id" : "1", "type_s":"book", "title_t" : "The Way of Kings", "author_s" : "Brandon Sanderson",
"miscinfo": {"provider": "orielly", "site": "US"}
},
{"id" : "2", "type_s":"book", "title_t" : "The Game of Thrones", "author_s" : "James Sanderson",
"miscinfo": {"provider": "pacman", "site": "US"}
}
]
I see the JSON types are stored as strings in the schemaField type as seen in the output for following
GET http://localhost:8983/solr/test1/schema/fields
{
"name":"miscinfo",
"type":"strings"}
I had tried using srcField as mentioned in this post. However, a query to retrieve json type returns empty response. Below are the GET request used for the same
GET http://localhost:8983/solr/test1/select?q=1&fl=miscinfo&wt=json
GET http://localhost:8983/solr/test1/select?q=1&fl=miscinfo,source_s:[json]&wt=json
Also, the search queries for values inside JSON type fields return empty response
http://localhost:8983/solr/test1/select?q=pacman&wt=json
{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"q": "pacman",
"json": "",
"wt": "json"
}
},
"response": {
"numFound": 0,
"start": 0,
"docs": []
}
}
Please help in searching object types in Solr.
Have you checked this: https://cwiki.apache.org/confluence/display/solr/Response+Writers
JSON Response Writer A very commonly used Response Writer is the
JsonResponseWriter, which formats output in JavaScript Object Notation
(JSON), a lightweight data interchange format specified in specified
in RFC 4627. Setting the wt parameter to json invokes this Response
Writer. Here is a sample response for a simple query like
q=id:VS1GB400C3&wt=json:
i have a lot of JSON documents with this structure :
"positions": [
{
"millis": 12959023,
"lat": 49.01525113731623,
"lon": 2.4971945118159056,
"rawX": -3754,
"rawY": 605,
"rawVx": 0,
"rawVy": 0,
"speed": 9.801029291617944,
"accel": 0.09442740907572084,
"grounded": true
},
{
"millis": 12959914,
"lat": 49.01536940596998,
"lon": 2.4967825412750244,
"rawX": -3784,
"rawY": 619,
"rawVx": -15,
"rawVy": 7,
"speed": 10.841861737855924,
"accel": -0.09534648619563282,
"grounded": true
}
...
}
i'm trying to map this JSON document with Elasticsearch by introducing a geo_point field to get document like the one below :
"positions": [
{
"millis": 12959023,
"location" : {
"lat": 49.01525113731623,
"lon": 2.4971945118159056,
}
"rawX": -3754,
"rawY": 605,
"rawVx": 0,
"rawVy": 0,
"speed": 9.801029291617944,
"accel": 0.09442740907572084,
"grounded": true
},
...
}
PS : these documents are offered by an API.
Thanks
You could do something like this:
curl -XPUT 'http://localhost:9200/<indexname>/positions/_mapping' -d #yourjsonfile.json
Hope this SO helps!
If you can't modify the source, you need to pre-process your document before it gets indexed into elasticsearch.
If you are using elasticsearch < 5.0, you can use logstash and the
mutate filter.
If you are using elasticsearch >= 5.0 (my
advice), use an ingest pipeline and the rename processor.
My json data being retrieved is as of below, notice that the format is not right in terms of object and key values why is this so, I am retrieving this json data from a dataset using the jayrock rpc service for asp.net 3.5
{
"Table":{
"columns":[
"i_member_id",
"i_group_id",
"u_name",
"u_tel",
"u_email",
"u_password",
"d_timestamp",
"b_activated"
],
"rows":[
[
1,
0,
"kevin",
"1231234",
"kevin#creaworld.com.sg",
"123",
"2011-01-05T09:51:36.8730000+08:00",
true
],
[
2,
0,
"kevin2",
"asdads",
"kevin2#creaworld.com.sg",
"123123",
"2011-01-05T10:01:46.1530000+08:00",
true
]
]
}
}
Here the link is to find the json formater
http://jsonformatter.curiousconcept.com/
Better you can use third party DLL
Json.Net
It will use full for you.