Import/Index a JSON file into Elasticsearch - json

I am new to Elasticsearch and have been entering data manually up until this point. For example I've done something like this:
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search"
}'
I now have a .json file and I want to index this into Elasticsearch. I've tried something like this too, but no success:
curl -XPOST 'http://jfblouvmlxecs01:9200/test/test/1' -d lane.json
How do I import a .json file? Are there steps I need to take first to ensure the mapping is correct?

The right command if you want to use a file with curl is this:
curl -XPOST 'http://jfblouvmlxecs01:9200/test/_doc/1' -d #lane.json
Elasticsearch is schemaless, therefore you don't necessarily need a mapping. If you send the json as it is and you use the default mapping, every field will be indexed and analyzed using the standard analyzer.
If you want to interact with Elasticsearch through the command line, you may want to have a look at the elasticshell which should be a little bit handier than curl.
2019-07-10: Should be noted that custom mapping types is deprecated and should not be used. I updated the type in the url above to make it easier to see which was the index and which was the type as having both named "test" was confusing.

Per the current docs, https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html:
If you’re providing text file input to curl, you must use the
--data-binary flag instead of plain -d. The latter doesn’t preserve newlines.
Example:
$ curl -s -XPOST localhost:9200/_bulk --data-binary #requests

We made a little tool for this type of thing https://github.com/taskrabbit/elasticsearch-dump

One thing I've not seen anyone mention: the JSON file must have one line specifying the index the next line belongs to, for every line of the "pure" JSON file.
I.E.
{"index":{"_index":"shakespeare","_type":"act","_id":0}}
{"line_id":1,"play_name":"Henry IV","speech_number":"","line_number":"","speaker":"","text_entry":"ACT I"}
Without that, nothing works, and it won't tell you why

I'm the author of elasticsearch_loader
I wrote ESL for this exact problem.
You can download it with pip:
pip install elasticsearch-loader
And then you will be able to load json files into elasticsearch by issuing:
elasticsearch_loader --index incidents --type incident json file1.json file2.json

I just made sure that I am in the same directory as the json file and then simply ran this
curl -s -H "Content-Type: application/json" -XPOST localhost:9200/product/default/_bulk?pretty --data-binary #product.json
So if you too make sure you are at the same directory and run it this way.
Note: product/default/ in the command is something specific to my environment. you can omit it or replace it with whatever is relevant to you.

Adding to KenH's answer
$ curl -s -XPOST localhost:9200/_bulk --data-binary #requests
You can replace #requests with #complete_path_to_json_file
Note: #is important before the file path

just get postman from https://www.getpostman.com/docs/environments give it the file location with /test/test/1/_bulk?pretty command.

You are using
$ curl -s -XPOST localhost:9200/_bulk --data-binary #requests
If 'requests' is a json file then you have to change this to
$ curl -s -XPOST localhost:9200/_bulk --data-binary #requests.json
Now before this, if your json file is not indexed, you have to insert an index line before each line inside the json file. You can do this with JQ. Refer below link:
http://kevinmarsh.com/2014/10/23/using-jq-to-import-json-into-elasticsearch.html
Go to elasticsearch tutorials (example the shakespeare tutorial) and download the json file sample used and have a look at it. In front of each json object (each individual line) there is an index line. This is what you are looking for after using the jq command. This format is mandatory to use the bulk API, plain json files wont work.

As of Elasticsearch 7.7, you have to specify the content type also:
curl -s -H "Content-Type: application/json" -XPOST localhost:9200/_bulk --data-binary #<absolute path to JSON file>

I wrote some code to expose the Elasticsearch API via a Filesystem API.
It is good idea for clear export/import of data for example.
I created prototype elasticdriver. It is based on FUSE

If you are using the elastic search 7.7 or above version then follow below command.
curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_bulk? pretty&refresh" --data-binary #"/Users/waseem.khan/waseem/elastic/account.json"
On above file path is /Users/waseem.khan/waseem/elastic/account.json.
If you are using elastic search 6.x version then you can use the below command.
curl -X POST localhost:9200/bank/_bulk?pretty&refresh --data-binary #"/Users/waseem.khan/waseem/elastic/account.json" -H 'Content-Type: application/json'
Note: Make sure in your .json file at the end you will add the one empty
line otherwise you will be getting below exception.
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "The bulk request must be terminated by a newline [\n]"
}
],
"type" : "illegal_argument_exception",
"reason" : "The bulk request must be terminated by a newline [\n]"
},
`enter code here`"status" : 400

if you are using VirtualBox and UBUNTU in it or you are simply using UBUNTU then it can be useful
wget https://github.com/andrewvc/ee-datasets/archive/master.zip
sudo apt-get install unzip (only if unzip module is not installed)
unzip master.zip
cd ee-datasets
java -jar elastic-loader.jar http://localhost:9200 datasets/movie_db.eloader

If you want to import a json file into Elasticsearch and create an index, use this Python script.
import json
from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
i = 0
with open('el_dharan.json') as raw_data:
json_docs = json.load(raw_data)
for json_doc in json_docs:
i = i + 1
es.index(index='ind_dharan', doc_type='doc_dharan', id=i, body=json.dumps(json_doc))

Related

Generate json file using curl xml

I'm trying to generate json file using curl and also assign specific path where in the json file will store once generated, but I tried some commands but no json output.
May I know what I need to add or change with my command?
curl -v -H "Accept: application/json" --user "admin:Test1234" https://test.com/adventure/
I test a simple curl for a json public api and send de response to a file and the result is a JSON output.
curl -H "Accept: application/json" https://catfact.ninja/fact >> cat.json
You can try it using https://reqbin.com/req/javascript/c-vdhoummp/curl-get-json-example
Or simply using postman and see the code snippet option to check cUrl code snippet.
https://imgur.com/a/LXqN8YH
I'm able to generate JSON file. I add -k on my command since my URL is HTTPS.

Error parsing JSON while using Github content API with Curl

I am trying to upload a file to Github using Github content API. Referring docs from the official documentation here.
As per the docs, the data should be base64 encoded before uploading. I have a .tar.gz file which I am converting to base64 using the following method:
base64_logs=$(base64 logs.tar.gz)
I am using following curl command:
content_response=$(curl -v \
-X PUT \
-u some-user:$(params.git-token) \
-H "Accept: application/vnd.github.v3+json" \
$content_url \
-d '{"message": "some message", "content": "'"$base64_logs"'"}')
The error message I get is
{ "message": "Problems parsing JSON", "documentation_url": "https://docs.github.com/enterprise/2.22/rest/reference/repos#create-or-update-file-contents" } 400
I am not sure where I am getting this wrong. I tried to use a hardcoded base64 string, and it worked.
Solved.
Needed extra whitespace.
base64_logs=$(base64 logs.tar.gz | tr -d \\n)
or you may need to use \\r depending on your OS. See below-related answer for more info.
Related answer: How to echo base64 within CURL?

Elasticsearch: Bulk request throws error in Elasticsearch 6.1.1

I recently upgraded to Elasticsearch version 6.1.1 and now I can't bulk index documents from a JSON file. When I do it inline, it works fine. Here are the contents of the document:
{"index" : {}}
{"name": "Carlson Barnes", "age": 34}
{"index":{}}
{"name": "Sheppard Stein","age": 39}
{"index":{}}
{"name": "Nixon Singleton","age": 36}
{"index":{}}
{"name": "Sharron Sosa","age": 33}
{"index":{}}
{"name": "Kendra Cabrera","age": 24}
{"index":{}}
{"name": "Young Robinson","age": 20}
When I run this command,
curl -XPUT 'localhost:9200/subscribers/ppl/_bulk?pretty' -H 'Content-Type: application/json' -d #customers_full.json
I get this error:
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "The bulk request must be terminated by a newline [\n]"
}
],
"type" : "illegal_argument_exception",
"reason" : "The bulk request must be terminated by a newline [\n]"
},
"status" : 400
It works fine if I send the data inline and in Elasticsearch 5.x. I tried adding newlines as well as the newline character to the end of the file. Doesn't seem to work.
Add empty line at the end of the JSON file and save the file and then try to run the below command
curl -XPOST localhost:9200/subscribers/ppl/_bulk?pretty --data-binary #customers_full.json -H 'Content-Type: application/json'
As the document says:
use the --data-binary flag instead of plain -d
-d doesn’t preserve newlines and doesn't format the json.
I faced this problem because of JSON formatting.
The error is pretty clear:
The bulk request must be terminated by a newline [\n]
So you simply need to add a newline at the end of your customers_full.json file and you'll be ok.
I ran into the same issue and spent hours adding and removing newlines before somebody pointed out I mis-typed the file name... So note that curl will throw the same error if the file is not actually present, making this super-confusing.
I had a similar issue when working with Elasticsearch 7.3.
Here's how I solved it.
Locate the .json file, say products.json file.
Double click to open the .json file in your text editor.
Scroll to the end of the .json file and then press the ENTER key on your keyboard.
Close the .json file. This will create a new line at the end of .json file.
Go back to your terminal and run the command below.
N/B: For the command below, the .json file name is products.json which I am importing to http://localhost:9200/ecommerce/product
curl -H "Content-type: application/json" -XPOST "http://localhost:9200/ecommerce/product/_bulk?pretty" --data-binary "#products.json"
That's all.
I hope this helps
For anyone using postman to make requests to ElasticSearch
Just press enter to create an empty new line!
And voila, problem solved
This worked for me:
curl -H "Content-Type: application/x-ndjson" -XPOST "localhost:9200/bank/_bulk?pretty&refresh" --data-binary "#C:\Program Files\Elastic\Elasticsearch\7.2.0\accounts.json"
I had the same problem running on Windows 10, using ElasticSearch 7.5.1.
I tried all the answers; none of them worked. I was certain I had a newline at the end of my file.
To get it to work, I had to ensure the file I was uploading was using UNIX end-of-line characters (0A only, no 0D), and also the encoding had to be UTF-8.
Using Notepad++, you can edit the metadata of the file.
Finally some good news:
Press Enter end of the line inside the JSON file and run the command again.
curl -H "Content-Type: application/x-ndjson" -XPOST 'localhost:9200/customers/personal/_bulk?pretty&refresh' --data-binary #"generated.json"
I just forgot to add an # symbol before file name like this
--data-binary "#products.json"
you just need to open json file and then go to the end of the file ( Ctrl+end) and then please Enter to break a new line.
I was struggling with this for a hot minute. Mine was caused by a space in my curl request between the --data and the -binary and gave the same error - must end with new line [\\n]}.
So double-check that in the curl req it's --data-binary not --data - binary
For me, the issue was only due to the wrong file name.
I have used customer_full.json in command whereas the file was named customer_full in my file system (without the extension).
So in my case,this command worked for me:
curl -H "Content-Type: application/x-ndjson" -XPOST 'http://localhost:9200/customers/personal/_bulk?pretty&refresh' --data-binary #"customer_full"
I faced a similar issue on windows using elastic 7.9.1 When I used below CURL command.
curl -s -H "Content-Type: application/json" -XPOST localhost:9200/accounts/docs/_bulk?filter_path=items.*.error --data-binary "#textoutES.json" >> erroredAtES.json"
I tried to manually add Newline at the end of the file but did not work.
I have created my JSON by extracting data from MySQL database like below to make sure my records should end with LINE FEED and CARRIAGE RETURN.
Then, it is working for me:
SELECT CONCAT('{"index":{"_id":"',id,'"}}\r\n',request_data,'\r\n') reqestData FROM cards
More importantly you End-of-File should have a carriage-return and Line-Feed (CRLF)if you are using windows. Also if any line in JSON contains a CR but no LF then you will get parsing exception Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper#2d5ed2ca
Windows CRLF and EOF
You need to use --data-binary instead of -d in your curl request. Please see: Bulk API
This worked in my local set-up.
curl -H "Content-type:application/json" -XPOST "http://localhost:9200/customer/personal/_bulk?pretty" --data-binary #"generated.json"
How do you do that if you are not using a data-file? I am having the issue but not sending data from a file.
const data1 = {
"amount" : "100",
"#timestamp" : `${UTC_timestamp}`,
"transaction_attributes" : {
"channel" : "channel-foobarbaz",
"session_id" : "session-1234",
"information" : "iinformation-foobarbaznformation-foobarbaz"
},
"currency" : {
"currency_description" : "my currency description",
},
"external_timestamp" : "2021-12-03T11:22:55.206229500Z" };
// execute a post
let res = http.post(url,JSON.stringify(data1),params);
A few things to check:
The file ends with new line (\n).
The new line is using Unix eol (LF) and not mac or windows eol.
When specifying the file name in the curl command, make sure "#" was added before the file name.

Use cURL to add a JSON web page's data in Solr

I see from the UpdateJSON page how to use a command prompt to index a standalone file stored locally. Using this example I was able to successfully make a .json file accessible through Solr:
curl 'http://localhost:8983/solr/update/json?commit=true' --data-binary #books.json -H 'Content-type:application/json'
What I'm not able to find is the proper syntax to do the same for a webpage containing JSON data. I've tried with the #:
curl 'http://localhost:8983/solr/update/json?commit=true' --data-binary #[URL] -H 'Content-type:application/json'
and without:
curl 'http://localhost:8983/solr/update/json?commit=true' --data-binary [URL] -H 'Content-type:application/json'
Both ways lead to errors. How do I configure a command to prompt Solr to index the contents at [URL]?
According to documentation, (https://wiki.apache.org/solr/ContentStream) you should first ensure remote streaming is enabled (solrconfig, search for enableRemoteStreaming)
Then the command should be of the kind:
curl 'http://localhost:8983/solr/update/json?commit=true&stream.url=YOURURL' -H 'Content-type:application/json'

How Can I Post Files and JSON Data Together With Curl?

I've been posting a file with this curl command:
curl -i -F file=#./File.xlsm -F name=file -X POST http://example.com/new_file/
Now I want to send some information about the file (as JSON) along with the file.
curl -i -H "Content-Type: application/json" -d '{"metadata": {"comment": "Submitting a new data set.", "current": false }, "sheet": 1, "row": 7 }' -F file=#./File.xlsm -F name=file http://example.com/new_file/
Curl is very grumpy about being used in this completely incorrect way, and in this case it says "You can only select one HTTP request!" OK, fair enough, so how do I get the file attachment and those POST variables into a single curl HTTP request?
I've had success developing similar endpoints that accept multiple files along with their metadata in JSON format.
curl -i -X POST -H "Content-Type: multipart/mixed" -F "blob=#/Users/username/Documents/bio.jpg" -F "metadata={\"edipi\":123456789,\"firstName\":\"John\",\"lastName\":\"Smith\",\"email\":\"john.smith#gmail.com\"};type=application/json" http://localhost:8080/api/v1/user/
Notice the addition of ;type=application/json at the end of the metadata request part. When uploading multiple files of different types, you can define the mime type at the end of the -F value.
I have confirmed that this works for Spring MVC 4.3.7 using #RequestPart. The key in that instance is to not provide the consumes value on the #RequestMapping annotation.
You could just add another form field:
curl -X POST http://someurl/someresource -F upload=#/path/to/some/file -F data="{\"test\":\"test\"}"
Note: due to the content type, this does not really equate to sending json to the web service.
This worked for me:
curl -v -H "Content-Type:multipart/form-data" -F "meta-data=#C:\Users\saurabh.sharma\Desktop\test.json;type=application/json" -F "file-data=#C:\Users\saurabh.sharma\Pictures\Saved Pictures\windows_70-wallpaper.jpg" http://localhost:7002/test/upload
test.json has the json data I want to send.
From #nbrooks comment, adding an additional header HTTP header works fine as shown below by using several -H or --header flags to your curl command:
curl -H "comment: Submitting a new data set." -H "current: false" -H "sheet: 1" -H "row: 7" -F file=#./File.xlsm -F name=file http://example.com/new_file/
comment and current can be combined into "metadata" in the
request.headers processing part on the flask web server.