Insert header for each document before uploading to elastic search - json

I have a ndjson file with the below format
{"field1": "data1" , "field2": "data2"}
{"field1": "data1" , "field2": "data2"}
....
I want to add a header like
{"index": {}}
before each document before using the bulk operation
I found a similar question: Elasticsearch Bulk JSON Data
The solution is this jq command:
jq -cr ".[]" input.json | while read line; do echo '{"index":{}}'; echo $line; done > bulk.json
But I get this error:
'while' is not recognized as a internal or external command
What am I doing wrong? Im running Windows
Or is there a better solution?
Thanks

The while in your sample is a construct that is usually built-in a developer-friendly shell like e.g. sh, bash or zsh but windows doesn't provide out of the box. See the bash docs for example.
So if this is a one-time thing, probably the fastest solution is to just use some text editor and add the required action lines by using some multi-cursor functionality.
On the other hand, if you are restricted to Windows but want some kind of better shell to use this more often, you should have a look at the cmder project that brings you a bash environment to your windows desktop when using the full version that is packaged with git-for-windows. This should allow you to use such scripting features even on a non linux or mac environment.

Related

How to always have json output show up nicely formatted using jq without appending the jq command

I just installed jq on Ubuntu 18.04 and It is really nicely formatted when making curl calls. I am wondering if there is a way to always have my terminal(s) output JSON data nicely formatted without having to append the | jq . command to the end of it? I am familiar with setting up aliases in my .zshrc file but there is no single command I would use this for so not sure how to make this happen.
Dummy command I ran
curl -X GET localhost:3000/products | jq .
I want the jq command to always be in effect when returning json to from the CLI.
Thanks!
While I won't go so far as to say this cannot be done, not only am I not aware of any straightforward way of doing so, I think it would be extremely challenging to do at all for the following reasons:
It's hard to reliable identify that a command is outputting JSON. Many things might look like JSON but not be intended to be JSON. A command might output something that looks like JSON in large part but that isn't entirely JSON.
It's impossible to know if a long-running command is outputting valid JSON until it has finished. This gives a dilemma of what to do in the interim - wait for it to finish, potentially indefinitely (especially if it's waiting for your input!) or find some way to roll-back the output if it turns out we guessed wrong.
It's hard to predict up-front if a command is going to return JSON. As in your example, you can't tell what the remote server is going to return from a curl command.
It's unclear what's the "right" thing to do for non-trivial commands, such as invoking shell functions or pipelines or applying redirections, or for programs such as text editors that use advanced control codes for redrawing the screen.
For these reasons, I think if this is possible it would require quite an advanced shell that captures the output of all commands the user executes, analyses and formats that output continuously, and has the capability to roll-back and reformat (or unformat) the output if it realises the output is incompatible with its method of display. I think that would be pretty interesting to see but I'm not aware of such a shell.
You could start with a simple script, e.g.
$ cat curljq
#!/bin/bash
curl "$#" | jq .
$ chmod +x curljq
$ ./curljq -Ss -H "Accept: application/json" https://reqbin.com/echo/get/json
{
"success": "true"
}

JQ under windows: Direct JSON Input

I use a powershell script to get JSON Information from a webinterdace. Actually i store the JSON Information from there into a file before i use JQ 1.5 (under Windows 10) to transform the json into a format that i can upload into a database. But since i use jq in the same powershell enviroment, i think i can avoid that redirection and work directly with the json text (with a variable or with the json text direct in the jq command). I checked the Manual but found no clear answer on that question (for me). I found in the Manual the --argjson command that Looks like that i need. But the Manual is not clear how i define the variable under Windows/powershell.
Regards
Timo
Sorry if the question is confusing. I found a way to work with variables directly on the commandline, e.g:
$variable | C:\jq.exe [Filter]
There was no Explanation in the JQ Manual how to pass json text directly on the commandshell to jq. But i found it. Thanks for your help.
Regards
Timo
You could use the ConvertFrom-Json command to convert any piece of JSON directly into a PowerShell object.

How can I find out what is contained in a saved Docker image tar file?

Let's say that I have saved one or more Docker images to a tar file, e.g.
docker save foo:1.2.0 bar:1.4.5 bar:latest > images.tar
Looking at the tar file, I can see that in addition to the individual layer directories, there is a manifest.json file that contains some meta information about the archives contents, including a RepoTags array for each image:
[{
"Config": "...",
"Layers": [...],
"RepoTags": [
"foo:1.2.0"
]
},
{
"Config": "...",
"Layers": [...],
"RepoTags": [
"bar:latest",
"bar:1.4.5"
]
}]
Is there an easy way to extract that info from the tar file, e.g. through a Docker command - or do I have to extract the manifest.json file, run it through a tool like jq and then collect the tag info myself?
The purpose is to find out what is contained in the archive before importing/loading it on a different machine. I imagine that there must be some way to find out what's in the archive...
Since I have not received any answers so far, I'll try to answer myself using what I have tried, and it's working for me.
Using a combination of tar and jq, I came up with this command:
tar -xzOf images.tar.gz manifest.json | jq '[.[] | .RepoTags] | add'
This will extract the manifest.json file to stdout, pipe it into in the jq command, and jq combines the various RepoTags arrays into a single array:
[
"foo:1.2.0",
"bar:1.4.5",
"bar:latest"
]
The result is easy to read and works for me. Only downside is that it requires an installation of jq. I would love to have something that works without dependencies.
Still looking for a better answer, don't hesitate to post an answer if you have something that's easier to use!

Prevent packer printing output for certain inline scripts?

I am running a bunch of packer scripts, but some of them generate too much output for logs and it's getting really annoying. Is there any way I can change my json file so that I can disable output for one of these shell scripts in packer?
One example of my packer shell script calls that I'd like silenced:
{
"type": "shell",
"scripts": [
"scripts/yum_install_and_update"
"scripts/do_magic"
]
}
Packer doesn't natively support this, but if you can modify the scripts you could have them internally suppress the output.

Best way to format large JSON file? (~30 mb)

I need to format a large JSON file for readability, but every resource I've found (mostly online) doesn't deal with data say, above 1-2 MB. I need to format about 30 MB. Is there any way to do this, or any way to code something to do this?
With python >= 2.6 you can do the following:
For Mac/Linux users:
cat ugly.json | python -mjson.tool > pretty.json
For Windows users (thanks to the comment from dnk.nitro):
type ugly.json | python -mjson.tool > pretty.json
jq can format or beautify a ~100MB JSON file in a few seconds:
jq '.' myLargeUnformattedFile.json > myLargeBeautifiedFile.json
The command above will beautify a single-line ~120MB file in ~10 seconds, and jq gives you a lot of json manipulation capabilities beyond simple formatting, see their tutorials.
jsonpps is the only one worked for me (https://github.com/bazaarvoice/jsonpps).
It doesn't load everything to RAM unlike jq, jsonpp and others that I tried.
Some useful tips regarding installation and usage:
Download url: https://repo1.maven.org/maven2/com/bazaarvoice/jsonpps/jsonpps/1.1/jsonpps-1.1.jar
Shortcut (for Windows):
Create file jsonpps.cmd in the same directory with the following content:
#echo off
java -Xms64m -Xmx64m -jar %~dp0\jsonpps-1.1.jar %*
Shortcut usage examples:
Format stdin to stdout:
echo { "x": 1 } | jsonpps
Format stdin to file
echo { "x": 1 } | jsonpps -o output.json
Format file to file:
jsonpps input.json -o output.json
Background-- I was trying to format a huge json file ~89mb on VS Code using the command (Alt+Shift+F) but the usuals, it crashed. I used jq to format my file and store it in another file.
A windows 11 use case is shown below.
step 1- download jq from the official site for your respective OS - https://stedolan.github.io/jq/
step 2- create a folder in the C drive named jq and paste the executable file that you downloaded into the folder. Rename the file as jq (Error1: beware the file is by default an exe file so do not save it as 'jq.exe' save it only as 'jq')
step 3- set your path variable to the URL of the executable file.
step 4- open your directory on cmd where the json file is stored and type the following command - jq . currentfilename.json > targetfilename.json
replace currentfilename with the file name that you want to format
replace targetfilename with the final file name that you want your data formatted in
within seconds you should see your target file in the same directory in a formatted version which can now be opened on VS Code or any editor for that matter. Any error related to the recognizability of jq as a command can be traced back with high probability to Error 1.
jq jquery json data-preprocessing data-cleaning
You can use Notepad++ (https://notepad-plus-plus.org/downloads/) for formatting large JSON files (tested in Windows).
Install Notepad++
Go to Plugins -> Plugins Admin -> Install the 'Json Viewer' plugin. The plugin source code is present in https://github.com/kapilratnani/JSON-Viewer
After plugin installation, go to Plugins -> JSON Viewer -> Format JSON.
This will format your JSON file