I am trying to add a json file to elasticsearch which has around 30.000 lines and it is not properly formatted. I'm trying to upload it via Bulk API but I can't find a way to format it properly that actually works. I'm using Ubuntu 16.04LTS.
This is the format of the json:
{
"rt": "2018-11-20T12:57:32.292Z",
"source_info": { "ip": "0.0.60.50" },
"end": "2018-11-20T12:57:32.284Z",
"severity": "low",
"duid": "5b8d0a48ba59941314e8a97f",
"dhost": "004678",
"endpoint_type": "computer",
"endpoint_id": "8e7e2806-eaee-9436-6ab5-078361576290",
"suser": "Katerina",
"group": "PERIPHERALS",
"customer_id": "a263f4c8-942f-d4f4-5938-7c37013c03be",
"type": "Event::Endpoint::Device::AlertedOnly",
"id": "83d63d48-f040-2485-49b9-b4ff2ac4fad4",
"name": "Peripheral allowed: Samsung Galaxy S7 edge"
}
I do know that the format for the Bulk API needs {"index":{"_id":*}} before each json object in the file which it'd look like this:
{"index":{"_id":1}}
{
"rt": "2018-11-20T12:57:32.292Z",
"source_info": { "ip": "0.0.60.50" },
"end": "2018-11-20T12:57:32.284Z",
"severity": "low",
"duid": "5b8d0a48ba59941314e8a97f",
"dhost": "004678",
"endpoint_type": "computer",
"endpoint_id": "8e7e2806-eaee-9436-6ab5-078361576290",
"suser": "Katerina",
"group": "PERIPHERALS",
"customer_id": "a263f4c8-942f-d4f4-5938-7c37013c03be",
"type": "Event::Endpoint::Device::AlertedOnly",
"id": "83d63d48-f040-2485-49b9-b4ff2ac4fad4",
"name": "Peripheral allowed: Samsung Galaxy S7 edge"
}
If I insert the index id manually and then use this expression curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:92100/ivc/default/bulk?pretty --data-binary #results.json it will upload it with no errors.
My question is, how can I add the index id {"index":{"_id":*}} to each line of the json to make it ready to upload? Obviously the index id has to add +1 each line, is there any way to do it from the CLI?
Sorry if this post doesn't look as it should, I read millions of posts in Stack Overflow but this is my first one! #Desperate
Thank you very much in advance!
Your problem is that Elasticsearch expects the document to be a valid json on ONE line, like this :
{"index":{"_id":1}}
{"rt":"2018-11-20T12:57:32.292Z","source_info":{"ip":"0.0.60.50"},"end":"2018-11-20T12:57:32.284Z","severity":"low","duid":"5b8d0a48ba59941314e8a97f","dhost":"004678","endpoint_type":"computer","endpoint_id":"8e7e2806-eaee-9436-6ab5-078361576290","suser":"Katerina","group":"PERIPHERALS","customer_id":"a263f4c8-942f-d4f4-5938-7c37013c03be","type":"Event::Endpoint::Device::AlertedOnly","id":"83d63d48-f040-2485-49b9-b4ff2ac4fad4","name":"Peripheral allowed: Samsung Galaxy S7 edge"}
You have to find a way to transform your input file so that you have a document per line, then you'll be good to go with Val's solution.
Thank you for all the answers, they did help to get in me in the right direction.
I've made a bash script to automate the download, formatting and upload of the logs to Elasticsearch:
#!/bin/bash
echo "Downloading logs from Sophos Central. Please wait."
cd /home/user/ELK/Sophos-Central-SIEM-Integration/log
#This deletes the last batch of results
rm result.json
cd ..
#This triggers the script to download a new batch of logs from Sophos
./siem.py
cd /home/user/ELK/Sophos-Central-SIEM-Integration/log
#Adds newline at the beginning of the logs file
sed -i '1 i\{"index":{}}' result.json
#Adds indexes
sed -i '3~2s/^/{"index":{}}/' result.json
#Adds json file to elasticsearch
curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/ivc/default/_bulk?pretty --data-binary #result.json
So that's how I achieved this. There might be easier options but this one did the trick for me. Hope it can be useful for someone else!
Again thank you everyone! :D
Related
elastic search version i am using 6.6.1
i have created index by running following command
curl -XPUT http://localhost:9200/incident_422? -H 'Content-Type: application/json' -d #elasticsearch.json
i need to update the index file with sample json data.(sample.json)
{
"properties": {
"id185": {
"type": "byte"
},
"id255": {
"type": "text"
},
"id388": {
"type": "text"
}
}
}
I tried running the command
curl -XPUT http://localhost:9200/incident_422/mapping/_doc? -H 'Content-Type: application/json' -d #sample.json
but i get the error message saying that
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Rejecting mapping update to [incident_422] as the final mapping would have more than 1 type: [mapping, doc]"}]
i have read somewhere that ELK 6 doesnt support more than two types.
Could anyone please tell me how this can be achieved without downgrading the version
This seems to related to the removal of mapping type, you need to specify the type name while indexing the documents.
try adding type to your index request aka http://localhost:9200/incident_422/<your-type-name> in your URL and it should solve the issue.
This question already has answers here:
Bash script: Use string variable in curl JSON Post data
(3 answers)
Closed 2 years ago.
I'm trying to script a mailing using a curl api (this is the base API, in mine the html part is changed with "xmessage":
curl -s \
-X POST \
--user "$MJ_APIKEY_PUBLIC:$MJ_APIKEY_PRIVATE" \
https://api.mailjet.com/v3.1/send \
-H 'Content-Type: application/json' \
-d '{
"Messages":[
{
"From": {
"Email": "pilot#mailjet.com",
"Name": "Mailjet Pilot"
},
"To": [
{
"Email": "passenger1#mailjet.com",
"Name": "passenger 1"
}
],
"Subject": "Your email flight plan!",
"TextPart": "Dear passenger 1, welcome to Mailjet! May the delivery force be with you!",
"HTMLPart": "<h3>Dear passenger 1, welcome to Mailjet!</h3><br />May the delivery force be with you!",
"CustomCampaign": "SendAPI_campaign",
"DeduplicateCampaign": true
}
]
}'
My script look like this :
...
message=$(cat ./message.txt)
message=${message//"xdate"/$courseDate}
message=${message//"xcoursecode"/$courseCode}
message=${message//"xsubtitle"/$subtitle}
message=${message//"\r"/""}
message=${message//"\r\n"/""}
message=${message//"\n"/""}
message=${message//"\""/"\\\""}
message=${message//"'"/"'"}
mailJet=$(cat ./mailjet.txt) # containing my API as described as above
mailJet=${mailJet//"xmessage"/$message}
echo $mailJet
eval $mailJet
the command "eval $mailJet" does not works but if I do a copy paste in the terminal of the "echo $mailJet" output my command works
The eval $mailJet give the following error :
{"ErrorIdentifier":"5cce36c5-373c-48ca-90b8-2b6bfc5df526","ErrorCode":"mj-0031","StatusCode":400,"ErrorMessage":"Request payload contains not valid UTF-8 encoded characters"}
Something that partially worked, was to put directly the mailJet.txt content in the script
but I'm struggling to find the syntax to replace the xmessage by what's in $message.
Like this it did not worked :
...
message=$(cat ./message.txt)
message=${message//"xdate"/$courseDate}
message=${message//"xcoursecode"/$courseCode}
message=${message//"xsubtitle"/$subtitle}
message=${message//"\r"/""}
message=${message//"\r\n"/""}
message=${message//"\n"/""}
message=${message//"\""/"\\\""}
message=${message//"'"/"'"}
curl -s \
-X POST \
--user "$MJ_APIKEY_PUBLIC:$MJ_APIKEY_PRIVATE" \
https://api.mailjet.com/v3.1/send \
-H 'Content-Type: application/json' \
-d '{
"Messages":[
{
"From": {
"Email": "pilot#mailjet.com",
"Name": "Mailjet Pilot"
},
"To": [
{
"Email": "passenger1#mailjet.com",
"Name": "passenger 1"
}
],
"Subject": "Your email flight plan!",
"TextPart": "Dear passenger 1, welcome to Mailjet! May the delivery force be with you!",
"HTMLPart": "$message", ## neither like this : "HTMLPart": "'$message'",
"CustomCampaign": "SendAPI_campaign",
"DeduplicateCampaign": true
}
]
}'
Whereas if I put any html stuff instead of $message in the curl api, the script run without any issue.
I'm stuck (and not a great bash coder or even coder at all)
Many thanks by advance for your help.
I think your problems start at
-d '{
This use of a single quote means that nothing in this section is interpreted by bash. "$message" later on will be treated as the text $ and message.
If this is the issue, then what you need to do is unquote around the variable names, but also double-quote arround them, so you write '"$message"' Or, if you need the double-quotes to appear inside the curl command "'"$message"'".
Note you can't have the ## comment either, but I assume you put that in for our benefit.
I'm using conversocial API:
https://api-docs.conversocial.com/1.1/reports/
Using the sample from the documentation, as after all tweaks I receive this "output"
{
"report": {
"name": "dump", "generation_start_date": "2012-05-30T17:09:40",
"url": "https://api.conversocial.com/v1.1/reports/5067",
"date_from": "2012-05-21",
"generated_by": {
"url": "https://api.conversocial.com/v1.1/moderators/11599",
"id": "11599"
},
"generated_date": "2012-05-30T17:09:41",
"channel": {
"url": "https://api.conversocial.com/v1.1/channels/387",
"id": "387"
},
"date_to": "2012-05-28",
"download": "https://s3.amazonaws.com/conversocial/reports/70c68360-1234/#twitter-from-may-21-2012-to-may-28-2012.zip",
"id": "5067"
}
}
Currently, I can sort this JSON output to download only and will receive this output
{
"report" : {
"download" : "https://s3.amazonaws.com/conversocial/reports/70c68360-1234/#twitter-from-may-21-2012-to-may-28-2012.zip"
}
}
Is it anyway of automating this process by using CURL, to make curl download this file?
To download I'm planning to use simple way as:
curl URL_LINK > FILEPATH/EXAMPLE.ZIP
Currently thinking is there is a way to replace URL_LINK with download link?? Or any other way, method, way around????
Give a try to this:
curl $(curl -s https://httpbin.org/get | jq ".url" -r) > file
Just replace your url and the jq params, based in your json, thay may be:
jq ".report.download" -r
The -r will remove the double quotes "
The way it works is by using a command substitution $():
$(curl -s https://httpbin.org/get | jq ".url" -r)
This will fetch you URL and extract the new URL from the returned JSON using jq the one later is passed to curl as an argument.
I want to write a line of code which will take the results of:
du -sh -c --time /00-httpdocs/*
and output it in JSON format. The goal is to get three pieces of information for each project file in a site: directory path, date last modified, and disk space usage in human readable format. This command will output that data in tab-delimited format with each entry on a new line in the terminal:
4.6G 2014-08-22 12:26 /00-httpdocs/00
1.1G 2014-08-22 13:32 /00-httpdocs/01
711M 2014-02-14 23:39 /00-httpdocs/02
The goal is to get it to export to a JSON file so it would need to be formatted something like this:
{"httpdocs": [
{
"size": "4.6G",
"modified": "2014-08-22 12:26",
"path": "/00-httpdocs/00-PREVIEW"}
{
"size": "1.1G",
"modified": "2014-08-22 13:32",
"path": "/00-httpdocs/8oclock"}
{
"size": "711M",
"modified": "2014-02-14 23:39",
"path": "/00-httpdocs/8oclock.new"}
]}
(I know that's not quite proper JSON, I just wrote it as an example. Apologies to the pedantic among us.)
I need size to return as an integer (so maybe remove '-sh' and handle conversion later?).
I've tried using awk and sed but I'm a total novice and can't quite get the formatting right.
I've made it about this far:
du -sh -c --time /00-httpdocs/* | awk ' BEGIN {print "\"httpdocs:\": [";} {print "{"$0"},\n";} END {print "]";}'
The goal is to have this trigger twice a day so that we can get the data and use it inside of a JavaScript application.
sed '1 i\
{"httpdocs": [
s/\([^[:space:]]*\)([[:space:]]*\([^[:space:]]*\)[[:space:]]*\([^[:space:]]*\)/ {\
"size" : "\1",\
"modified": "\2",\
"path": "\3"}/
$ a\^J]}' YourFile
Quick and dirty (posix version so --posix on GNU sed).
Take the 3 argument and place them (s/../../) into a 'template" using group (\( ...\) and \1).
Include header at 1st line (i \...) and append footer ant last (a \...).
[:space:] may be [:blank:]
I have a shell script with curl -s http://ifconfig.me/all.json command which prints below output in the terminal window.
{
"version" : {
"ip_addr": "201.73.103.12",
"lang": "java",
"remote_host": "OpenSSL/0.9.8w zlib/1.2.3 libidn/1.23 libssh2/1.2.2",",
"user_agent": "curl/7.23.1 (i386-sun-solaris2.11) libcurl/7.23.1
"charset": "",
"port": "63713"}
}
I need to display the JSON value in table format.
Someone, please help me with this to implement in UNIX shell script. Thanks!
Look into the cool command-line JSON query parser / formatter tool:
http://stedolan.github.io/jq/
You can pipe your JSON input into this tool and extract out / format the keys you need.
Or, just run multiple curl -s http://ifconfigme/ calls for each "row/column" you need and output it in some sane format, E.g.:
#!/bin/bash
ip=`curl -s http://ifconfig.me/ip`
host=`curl -s http://ifconfig.me/host`
echo -e "ip\thost"
echo -e "${ip}\t${host}"