is there any way to import a json file(contains 100 documents) in elasticsearch server.? - json

Is there any way to import a JSON file (contains 100 documents) in elasticsearch server? I want to import a big json file into es-server..

As dadoonet already mentioned, the bulk API is probably the way to go. To transform your file for the bulk protocol, you can use jq.
Assuming the file contains just the documents itself:
$ echo '{"foo":"bar"}{"baz":"qux"}' |
jq -c '
{ index: { _index: "myindex", _type: "mytype" } },
. '
{"index":{"_index":"myindex","_type":"mytype"}}
{"foo":"bar"}
{"index":{"_index":"myindex","_type":"mytype"}}
{"baz":"qux"}
And if the file contains the documents in a top level list they have to be unwrapped first:
$ echo '[{"foo":"bar"},{"baz":"qux"}]' |
jq -c '
.[] |
{ index: { _index: "myindex", _type: "mytype" } },
. '
{"index":{"_index":"myindex","_type":"mytype"}}
{"foo":"bar"}
{"index":{"_index":"myindex","_type":"mytype"}}
{"baz":"qux"}
jq's -c flag makes sure that each document is on a line by itself.
If you want to pipe straight to curl, you'll want to use --data-binary #-, and not just -d, otherwise curl will strip the newlines again.

You should use Bulk API. Note that you will need to add a header line before each json document.
$ cat requests
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
$ curl -s -XPOST localhost:9200/_bulk --data-binary #requests; echo
{"took":7,"items":[{"create":{"_index":"test","_type":"type1","_id":"1","_version":1,"ok":true}}]}

I'm sure someone wants this so I'll make it easy to find.
FYI - This is using Node.js (essentially as a batch script) on the same server as the brand new ES instance. Ran it on 2 files with 4000 items each and it only took about 12 seconds on my shared virtual server. YMMV
var elasticsearch = require('elasticsearch'),
fs = require('fs'),
pubs = JSON.parse(fs.readFileSync(__dirname + '/pubs.json')), // name of my first file to parse
forms = JSON.parse(fs.readFileSync(__dirname + '/forms.json')); // and the second set
var client = new elasticsearch.Client({ // default is fine for me, change as you see fit
host: 'localhost:9200',
log: 'trace'
});
for (var i = 0; i < pubs.length; i++ ) {
client.create({
index: "epubs", // name your index
type: "pub", // describe the data thats getting created
id: i, // increment ID every iteration - I already sorted mine but not a requirement
body: pubs[i] // *** THIS ASSUMES YOUR DATA FILE IS FORMATTED LIKE SO: [{prop: val, prop2: val2}, {prop:...}, {prop:...}] - I converted mine from a CSV so pubs[i] is the current object {prop:..., prop2:...}
}, function(error, response) {
if (error) {
console.error(error);
return;
}
else {
console.log(response); // I don't recommend this but I like having my console flooded with stuff. It looks cool. Like I'm compiling a kernel really fast.
}
});
}
for (var a = 0; a < forms.length; a++ ) { // Same stuff here, just slight changes in type and variables
client.create({
index: "epubs",
type: "form",
id: a,
body: forms[a]
}, function(error, response) {
if (error) {
console.error(error);
return;
}
else {
console.log(response);
}
});
}
Hope I can help more than just myself with this. Not rocket science but may save someone 10 minutes.
Cheers

jq is a lightweight and flexible command-line JSON processor.
Usage:
cat file.json | jq -c '.[] | {"index": {"_index": "bookmarks", "_type": "bookmark", "_id": .id}}, .' | curl -XPOST localhost:9200/_bulk --data-binary #-
We’re taking the file file.json and piping its contents to jq first with the -c flag to construct compact output. Here’s the nugget: We’re taking advantage of the fact that jq can construct not only one but multiple objects per line of input. For each line, we’re creating the control JSON Elasticsearch needs (with the ID from our original object) and creating a second line that is just our original JSON object (.).
At this point we have our JSON formatted the way Elasticsearch’s bulk API expects it, so we just pipe it to curl which POSTs it to Elasticsearch!
Credit goes to Kevin Marsh

Import no, but you can index the documents by using the ES API.
You can use the index api to load each line (using some kind of code to read the file and make the curl calls) or the index bulk api to load them all. Assuming your data file can be formatted to work with it.
Read more here : ES API
A simple shell script would do the trick if you comfortable with shell something like this maybe (not tested):
while read line
do
curl -XPOST 'http://localhost:9200/<indexname>/<typeofdoc>/' -d "$line"
done <myfile.json
Peronally, I would probably use Python either pyes or the elastic-search client.
pyes on github
elastic search python client
Stream2es is also very useful for quickly loading data into es and may have a way to simply stream a file in. (I have not tested a file but have used it to load wikipedia doc for es perf testing)

Stream2es is the easiest way IMO.
e.g. assuming a file "some.json" containing a list of JSON documents, one per line:
curl -O download.elasticsearch.org/stream2es/stream2es; chmod +x stream2es
cat some.json | ./stream2es stdin --target "http://localhost:9200/my_index/my_type

You can use esbulk, a fast and simple bulk indexer:
$ esbulk -index myindex file.ldj
Here's an asciicast showing it loading Project Gutenberg data into Elasticsearch in about 11s.
Disclaimer: I'm the author.

you can use Elasticsearch Gatherer Plugin
The gatherer plugin for Elasticsearch is a framework for scalable data fetching and indexing. Content adapters are implemented in gatherer zip archives which are a special kind of plugins distributable over Elasticsearch nodes. They can receive job requests and execute them in local queues. Job states are maintained in a special index.
This plugin is under development.
Milestone 1 - deploy gatherer zips to nodes
Milestone 2 - job specification and execution
Milestone 3 - porting JDBC river to JDBC gatherer
Milestone 4 - gatherer job distribution by load/queue length/node name, cron jobs
Milestone 5 - more gatherers, more content adapters
reference https://github.com/jprante/elasticsearch-gatherer

One way is to create a bash script that does a bulk insert:
curl -XPOST http://127.0.0.1:9200/myindexname/type/_bulk?pretty=true --data-binary #myjsonfile.json
After you run the insert, run this command to get the count:
curl http://127.0.0.1:9200/myindexname/type/_count

Related

How to add commas in between JSON objects using Linux Shell and SnowSQL?

While there are several posts about this topic on Stack Overflow, none match my exact use case. I am using a Linux shell script to run SnowSQL to generate a json file.
========================
My json file needs to have a comma between json objects.
This:
{
"CAMPAIGN": "Welcome_New",
"UUID": "fe881781-bdc2-41b2-95f2-e0e8c19dc597"
}
{
"CAMPAIGN": "Welcome_Existing",
"UUID": "77a41c02-beb9-48bf-ada4-b2074c1a78cb"
}
...needs to look this:
{
"CAMPAIGN": "Welcome_New",
"UUID": "fe881781-bdc2-41b2-95f2-e0e8c19dc597"
},
{
"CAMPAIGN": "Welcome_Existing",
"UUID": "77a41c02-beb9-48bf-ada4-b2074c1a78cb"
}
Here is my complete ksh script:
#!/usr/bin/ksh
. /appl/.snf_logon
export SNOW_PKEY_FILE=$(mktemp ./pkey-XXXXXX)
trap "rm -f ${SNOW_PKEY_FILE}" EXIT
LibGetSnowCred
{
outFile=JSON_FILE_TYPE_TEST.json
inDir=/testing
outFileNm=#my_db.my_schema.my_file_stage/${outFile}
snowsql \
--private-key-path $SNOW_PKEY_FILE \
-o exit_on_error=true \
-o friendly=false \
-o timing=false \
-o log_level=ERROR \
-o echo=true <<!
COPY INTO ${outFileNm}
FROM (SELECT object_construct(
'UUID',UUID
,'CAMPAIGN',CAMPAIGN)
FROM my_db.my_schema.JSON_Test_Table
LIMIT 2)
FILE_FORMAT=(
TYPE=JSON
COMPRESSION=NONE
)
OVERWRITE=True
HEADER=False
SINGLE=True
MAX_FILE_SIZE=4900000000
;
get ${outFileNm} file://${inDir}/;
rm ${outFileNm};
!
if [ $? -eq 0 ]; then
echo "Export successful"
else
echo "ERROR in export"
fi
}
Is the best practice to add the comma during the SELECT or after the file is generated and how?
With or without that comma, the text is still not JSON but just a random text that looks like JSON. You export several rows, each row as an independent object. You need to gather all these objects into an array to produce a valid JSON.
A JSON that encodes an array of rows looks like this:
[
{
"CAMPAIGN": "Welcome_New",
"UUID": "fe881781-bdc2-41b2-95f2-e0e8c19dc597"
},
{
"CAMPAIGN": "Welcome_Existing",
"UUID": "77a41c02-beb9-48bf-ada4-b2074c1a78cb"
}
]
The easiest way to produce this output would be to ask the database, if it supports this option (to wrap all the records into a list before generating the JSON, to not export each record in a separate JSON).
If this is not possible then you have a file that contains multiple JSONs. You can use jq to convert these individual JSONs into a JSON similar to the one described above (encoding an array of objects).
It is as simple as that:
jq --slurp '.' input_file > output_file
The option --slurp tells jq to read all the JSONs from the file input_file in memory, to parse them and to put them into an array. That is the program input.
'.' is the jq program. It says "dump the current object". It does not do any processing to the input data. The current object is the array.
After it executes the program (which, in this case doesn't do anything), jq dumps the modified value (as JSON, of course) to the standard output (by default, on screen).
The > output_file part redirects this output to a file (named output_file) instead of showing it on screen.
You can see how it works on the jq playground.

unix command to filter the json

[
{
"name":"sandboxserver.tar.gz.part-aa",
"hash":"010d126f8ccf199f3cd5f468a90d5ae1",
"bytes":4294967296,
"last_modified":"2018-10-10T01:32:00.069000",
"content_type":"binary/octet-stream"
},
{
"name":"sandboxserver.tar.gz.part-ab",
"hash":"49a6f22068228f51488559c096aa06ce",
"bytes":397973601,
"last_modified":"2018-10-10T01:32:22.395000",
"content_type":"binary/octet-stream"
},
{
"name":"sandboxserver.tar.gz.part-ac",
"hash":"2c5e845f46357e203214592332774f4c",
"bytes":5179281858,
"last_modified":"2018-10-11T08:20:11.566000",
"content_type":"binary/octet-stream"
}
]
I am getting above JSON as response while listing the objects in cloud object storage using curl -l -X GET. How can I get the object "name" assigned to an array while looping through all the objects.
for example
array[1]="sandboxserver.tar.gz.part- aa"
array[2]="sandboxserver.tar.gz.part- ab"
array[3]="sandboxserver.tar.gz.part- ac"
You can use jq.
jq is a powerful tool that lets you read, filter, and write JSON in bash.
You might need to install it first.
Try this:
I've pasted your json into a file:
~$ cat n1.json
[
{
"name":"sandboxserver.tar.gz.part-aa",
"hash":"010d126f8ccf199f3cd5f468a90d5ae1",
"bytes":4294967296,
"last_modified":"2018-10-10T01:32:00.069000",
"content_type":"binary/octet-stream"
},
{
"name":"sandboxserver.tar.gz.part-ab",
"hash":"49a6f22068228f51488559c096aa06ce",
"bytes":397973601,
"last_modified":"2018-10-10T01:32:22.395000",
"content_type":"binary/octet-stream"
},
{
"name":"sandboxserver.tar.gz.part-ac",
"hash":"2c5e845f46357e203214592332774f4c",
"bytes":5179281858,
"last_modified":"2018-10-11T08:20:11.566000",
"content_type":"binary/octet-stream"
}
]
And then used jq to find the names:
~$ jq -r '.[].name' n1.json
sandboxserver.tar.gz.part-aa
sandboxserver.tar.gz.part-ab
sandboxserver.tar.gz.part-ac
If you don't want to depend on external utility like jq, use can use python + bash combo do the trick.
response="$(cat data.json)"
declare -a array
array=($(python -c "import json,sys; data=[arr['name'] for arr in json.loads(sys.argv[1])]; print('\n'.join(data));" "$response"))
echo "${array[#]}"
Advice: Writing embedded python code may soon become unreadable so you may want to put the python code in a separate script and run the script.

Parsing JSON from shell script using JSON.sh

I'm working on parsing JSON data using JSON.sh. And I wanted to read data from json file (test.json) whose content will be something like,
{
"/home/ukrishnan/projects/test.yml": {
"LOG_DRIVER": "syslog",
"IMAGE": "mysql:5.6"
},
"/home/ukrishnan/projects/mysql/app.xml": {
"ENV_ACCOUNT_BRIDGE_ENDPOINT": "/u01/src/test/sample.txt"
}
}
And I try to parse this JSON using JSON.sh by using,
test_parser=`sh ./lib/JSON.sh < test/test.json`
echo $test_parser
It prints,
["/home/ukrishnan/projects/test.yml","LOG_DRIVER"] "syslog" ["/home/ukrishnan/projects/test.yml","IMAGE"] "mysql:5.6" ["/home/ukrishnan/projects/test.yml"] {"LOG_DRIVER":"syslog","IMAGE":"mysql:5.6"} ["/home/ukrishnan/projects/mysql/app.xml","ENV_ACCOUNT_BRIDGE_ENDPOINT"] "/u01/src/test/sample.txt" ["/home/ukrishnan/projects/mysql/app.xml"] {"ENV_ACCOUNT_BRIDGE_ENDPOINT":"/u01/src/test/sample.txt"} [] {"/home/ukrishnan/projects/test.yml":{"LOG_DRIVER":"syslog","IMAGE":"mysql:5.6"},"/home/ukrishnan/projects/mysql/app.xml":{"ENV_ACCOUNT_BRIDGE_ENDPOINT":"/u01/src/test/sample.txt"}}
Whereas, the same command (sh ./lib/JSON.sh < test/test.json), if I run through terminal, it is printing with line breaks,
["/home/ukrishnan/projects/test.yml","LOG_DRIVER"] "syslog"
["/home/ukrishnan/projects/test.yml","IMAGE"] "mysql:5.6"
["/home/ukrishnan/projects/test.yml"] {"LOG_DRIVER":"syslog","IMAGE":"mysql:5.6"}
["/home/ukrishnan/projects/mysql/app.xml","ENV_ACCOUNT_BRIDGE_ENDPOINT"] "/u01/src/test/sample.txt"
["/home/ukrishnan/projects/mysql/app.xml"] {"ENV_ACCOUNT_BRIDGE_ENDPOINT":"/u01/src/test/sample.txt"}
[] {"/home/ukrishnan/projects/test.yml":{"LOG_DRIVER":"syslog","IMAGE":"mysql:5.6"},"/home/ukrishnan/projects/mysql/app.xml":{"ENV_ACCOUNT_BRIDGE_ENDPOINT":"/u01/src/test/sample.txt"}}
I wanted to read this and assign to bash variables like,
file_name='/home/ukrishnan/projects/test.yml'
key='LOG_DRIVER'
value='syslog'
As I'm almost completely new to shell script and grep or awk, I don't have much idea of how to achieve this. Any help on this would be greatly appreciated.
I wrote a JSON serializer / deserializer for gawk, if you're interested. Save that script and modify it, replacing everything above # === FUNCTIONS === with the following:
#!/usr/bin/gawk -f
# capture JSON string from beginning to end into a scalar variable
{ json = json ORS $0 }
END {
# objectify JSON string to the multilevel array "obj"
deserialize(json, obj)
for (filename in obj) {
print "file_name=" quote(filename)
for (key in obj[filename]) {
# print key="value"
print key "=" quote(obj[filename][key])
}
}
}
Do chmod 755 json.awk and execute it. Output will resemble this:
$ ./json.awk test5.json
file_name="/home/ukrishnan/projects/mysql/app.xml"
ENV_ACCOUNT_BRIDGE_ENDPOINT="/u01/src/test/sample.txt"
file_name="/home/ukrishnan/projects/test.yml"
LOG_DRIVER="syslog"
IMAGE="mysql:5.6"
Hopefully the logic is reasonably easy to follow. If you prefer to output filename=, key=, and value= on every loop iteration, modify the nested for loops accordingly:
for (filename in obj) {
for (key in obj[filename]) {
print "file_name=" quote(filename)
print "key=" quote(key)
print "value=" quote(obj[filename][key])
}
}
That change will result in the following output:
$ ./json.awk test5.json
file_name="/home/ukrishnan/projects/mysql/app.xml"
key="ENV_ACCOUNT_BRIDGE_ENDPOINT"
value="/u01/src/test/sample.txt"
file_name="/home/ukrishnan/projects/test.yml"
key="LOG_DRIVER"
value="syslog"
file_name="/home/ukrishnan/projects/test.yml"
key="IMAGE"
value="mysql:5.6"
Anyway, with that output, you can do something silly in BASH like this to populate and act upon the variables:
#!/bin/bash
./test.awk test5.json | while read -r line; do {
eval $line
[ "${line/=*/}" = "value" ] && {
echo "bash: file_name=$file_name"
echo "bash: key=$key"
echo "bash: value=$value"
echo "------"
}
}; done
It'd probably be more graceful just to do all processing within gawk from start to finish and not mess with the polyglot handoff, though.
Getting back to json.awk, if you prefer to keep json.awk modular for easy reuse in future projects, you could remove everything above # === FUNCTIONS ===, create a separate main.awk containing the code block at the top of this answer, and #include "json.awk" as a helper library pretty much anywhere outside of END {...} (just below the shbang, for example).
JSON.sh (from http://json.org) offers a nice bash friendly means of flattening out a JSON file. Which you've already provided how it looks in your question. So, the flatten form is the format:
[node] tab value
You have to think in UNIX script in extracting the information you want, you'll note the lines you're interested in actually follow this pattern:
["filename","key"] tab ["value"]
In regex notation, we replace:
filename with (.*)
key with (.*)
tab with \t
value with (.*)
We can retrieve the first, second and third matching groups with \1, \2, \3 respectively.
When used in sed we also note that these symbols []() need to be escaped with a backslash \, resulting in the following script:
./lib/JSON.sh < test/test.json | sed 's/\["\(.*\)","\(.*\)\"]\t"\(.*\)"/\1,\2,\3/;t;d'
/home/ukrishnan/projects/test.yml,LOG_DRIVER,syslog
/home/ukrishnan/projects/test.yml,IMAGE,mysql:5.6
/home/ukrishnan/projects/mysql/app.xml,ENV_ACCOUNT_BRIDGE_ENDPOINT,/u01/src/test/sample.txt
Now we put the lines in a loop and for each line, we can extract out filename,key,value:
for line in $(./lib/JSON.sh < test/test.json | sed 's/\["\(.*\)","\(.*\)\"]\t"\(.*\)"/\1,\2,\3/;t;d')
do
IFS="," read -ra arr <<< $line
filename=${arr[0]}
key=${arr[1]}
value=${arr[2]}
cat <<EOF
filename : $filename
key : $key
value : $value
EOF
done
Which outputs:
filename : /home/ukrishnan/projects/test.yml
key : LOG_DRIVER
value : syslog
filename : /home/ukrishnan/projects/test.yml
key : IMAGE
value : mysql:5.6
filename : /home/ukrishnan/projects/mysql/app.xml
key : ENV_ACCOUNT_BRIDGE_ENDPOINT
value : /u01/src/test/sample.txt

Import data from Mongodb on GCE to Bigquery

My task is to import data from a mongodb collection hosted on GCE to Bigquery. I tried the following. Since bigquery does not accept '$' symbol in field names, I ran the following to remove the $oid field,
mongo test --quiet \
--eval "db.trial.find({}, {_id: 0})
.forEach(function(doc) {
print(JSON.stringify(doc)); });" \
> trial_noid.json
But, while importing the result file, I get an error that says
parse error: premature EOF (error code: invalid)
Is there a way to avoid these steps and directly transfer the data to bigquery from mongodb hosted on GCE?
In my opinion, the best practice is building your own extractor. That can be done with the language of your choice and you can extract to CSV or JSON.
But if you looking to a fast way and if your data is not huge and can fit within one server, then I recommend using mongoexport to extract to JSON. Let's assume you have a simple document structure such as below:
{
"_id" : "tdfMXH0En5of2rZXSQ2wpzVhZ",
"statuses" : [
{
"status" : "dc9e5511-466c-4146-888a-574918cc2534",
"score" : 53.24388894
}
],
"stored_at" : ISODate("2017-04-12T07:04:23.545Z")
}
Then you need to define your BigQuery Schema (mongodb_schema.json) such as:
$ cat > mongodb_schema.json <<EOF
[
{ "name":"_id", "type": "STRING" },
{ "name":"stored_at", "type": "record", "fields": [
{ "name":"date", "type": "STRING" }
]},
{ "name":"statuses", "type": "record", "mode": "repeated", "fields": [
{ "name":"status", "type": "STRING" },
{ "name":"score", "type": "FLOAT" }
]}
]
EOF
Now, the fun part starts :-) Extracting data as JSON from your MongoDB. Let's assume you have a cluster with replica set name statuses, your db is sample, and your collection is status.
mongoexport \
--host statuses/db-01:27017,db-02:27017,db-03:27017 \
-vv \
--db "sample" \
--collection "status" \
--type "json" \
--limit 100000 \
--out ~/sample.json
As you can see above, I limit the output to 100k records because I recommend you run sample and load to BigQuery before doing it for all your data. After running above command you should have your sample data in sample.json BUT there is a field $date which will cause you an error loading to BigQuery. To fix that we can use sed to replace them to simple field name:
# Fix Date field to make it compatible with BQ
sed -i 's/"\$date"/"date"/g' sample.json
Now you can compress, upload to Google Cloud Storage (GCS) and then load to BigQuery using following commands:
# Compress for faster load
gzip sample.json
# Move to GCloud
gsutil mv ./sample.json.gz gs://your-bucket/sample/sample.json.gz
# Load to BQ
bq load \
--source_format=NEWLINE_DELIMITED_JSON \
--max_bad_records=999999 \
--ignore_unknown_values=true \
--encoding=UTF-8 \
--replace \
"YOUR_DATASET.mongodb_sample" \
"gs://your-bucket/sample/*.json.gz" \
"mongodb_schema.json"
If everything was okay, then go back and remove --limit 100000 from mongoexport command and re-run above commands again to load everything instead of 100k sample.
With this solution, you can import your data with the same hierarchy to BigQuery but if you want to flat your data, then below alternative solution would work better.
ALTERNATIVE SOLUTION:
If you want more flexibility and performance is not your concern, then you can use mongo CLI tool as well. This way you can write your extract logic in a JavaScript and execute it against your data and then send output to BigQuery. Here is what I did for the same process but used JavaScript to output in CSV so I can load it much easier to BigQuery:
# Export Logic in JavaScript
cat > export-csv.js <<EOF
var size = 100000;
var maxCount = 1;
for (x = 0; x < maxCount; x = x + 1) {
var recToSkip = x * size;
db.entities.find().skip(recToSkip).limit(size).forEach(function(record) {
var row = record._id + "," + record.stored_at.toISOString();;
record.statuses.forEach(function (l) {
print(row + "," + l.status + "," + l.score)
});
});
}
EOF
# Execute on Mongo CLI
_MONGO_HOSTS="db-01:27017,db-02:27017,db-03:27017/sample?replicaSet=statuses"
mongo --quiet \
"${_MONGO_HOSTS}" \
export-csv.js \
| split -l 500000 --filter='gzip > $FILE.csv.gz' - sample_
# Load all Splitted Files to Google Cloud Storage
gsutil -m mv ./sample_* gs://your-bucket/sample/
# Load files to BigQuery
bq load \
--source_format=CSV \
--max_bad_records=999999 \
--ignore_unknown_values=true \
--encoding=UTF-8 \
--replace \
"YOUR_DATASET.mongodb_sample" \
"gs://your-bucket/sample/sample_*.csv.gz" \
"ID,StoredDate:DATETIME,Status,Score:FLOAT"
TIP: In above script I did the small trick by piping output to able to split the output into multiple files with sample_ prefix. Also during split, it will GZip the output so you can load it easier to GCS.
When using NEWLINE_DELIMITED_JSON to import data into BigQuery, one JSON object, including any nested/repeated fields, must appear on each line.
The issue with your input file appears to be that the JSON object is split into many lines; if you collapse it to a single line, it will resolve this
error.
Requiring this format allows BigQuery to split the file and process it in parallel without being concerned that splitting the file will put one part of a JSON object in one split, and another part in the next split.

Recognising timestamps in Kibana and ElasticSearch

I'm new to ElasticSearch and Kibana and am having trouble getting Kibana to recognise my timestamps.
I have a JSON file with lots of data that I wish to insert into Elasticsearch using Curl. Here is an example of one of the JSON entries.
{"index":{"_id":"63"}}
{"account_number":63,"firstname":"Hughes","lastname":"Owens", "email":"hughesowens#valpreal.com", "_timestamp":"2013-07-05T08:49:30.123"}
I have tried to create an index in Elasticsearch using the command:
curl -XPUT 'http://localhost:9200/test/'
I have then tried to set up an appropriate mapping for the timestamp:
curl -XPUT 'http://localhost:9200/test/container/_mapping' -d'
{
"container" : {
"_timestamp" : {
"_timestamp" : {"enabled: true, "type":"date", "format": "date_hour_minute_second_fraction", "store":true}
}
}
}'
// format of timestamp from http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-date-format.html
I then have tried to bulk insert my data:
curl -XPOST 'localhost:9200/test/container/_bulk?pretty' --data-binary #myfile.json
All of these commands run without fault however when the data is viewed in Kibana the _timestamp field is not being recognised. Sorting via the timestamp does not work and trying to filter the data using different periods does not work. Any ideas on why this problem is occuring is appricieated.
Managed to solve the problem. So for anyone else having this problem:
The format we had our date saved in was incorrect, needed to be :
"_timestamp":"2013-07-05 08:49:30.123"
then our mapping needed to be:
curl -XPUT 'http://localhost:9200/test/container/_mapping' -d'
{
"container" : {
"_timestamp" : {"enabled": true, "type":"date", "format": "yyyy-MM-dd HH:mm:ss.SSS", "store":true, "path" : "_timestamp"}
}
}'
Hope this helps someone.
There is no need to make and ISO8601 date in case you have an epoch timestamp. To make Kibana recognize the field as date is has to be a date field though.
Please note that you have to set the field as date type BEFORE you input any data into the /index/type. Otherwise it will be stored as long and unchangeable.
Simple example that can be pasted into the marvel/sense plugin:
# Make sure the index isn't there
DELETE /logger
# Create the index
PUT /logger
# Add the mapping of properties to the document type `mem`
PUT /logger/_mapping/mem
{
"mem": {
"properties": {
"timestamp": {
"type": "date"
},
"free": {
"type": "long"
}
}
}
}
# Inspect the newly created mapping
GET /logger/_mapping/mem
Run each of these commands in serie.
Generate free mem logs
Here is a simple script that echo to your terminal and logs to your local elasticsearch:
while (( 1==1 )); do memfree=`free -b|tail -n 1|tr -s ' ' ' '|cut -d ' ' -f4`; echo $load; curl -XPOST "localhost:9200/logger/mem" -d "{ \"timestamp\": `date +%s%3N`, \"free\": $memfree }"; sleep 1; done
Inspect data in elastic search
Paste this in your marvel/sense
GET /logger/mem/_search
Now you can move to Kibana and do some graphs. Kibana will autodetect your date field.
This solution works for older ES <2.4
For the newer version of ES you can either use the "date" field along with the following parameters:
https://www.elastic.co/guide/en/elasticsearch/reference/current/date.html