Monitoring disk usage in CentOS using 'du' to output JSON - json

I want to write a line of code which will take the results of:
du -sh -c --time /00-httpdocs/*
and output it in JSON format. The goal is to get three pieces of information for each project file in a site: directory path, date last modified, and disk space usage in human readable format. This command will output that data in tab-delimited format with each entry on a new line in the terminal:
4.6G 2014-08-22 12:26 /00-httpdocs/00
1.1G 2014-08-22 13:32 /00-httpdocs/01
711M 2014-02-14 23:39 /00-httpdocs/02
The goal is to get it to export to a JSON file so it would need to be formatted something like this:
{"httpdocs": [
{
"size": "4.6G",
"modified": "2014-08-22 12:26",
"path": "/00-httpdocs/00-PREVIEW"}
{
"size": "1.1G",
"modified": "2014-08-22 13:32",
"path": "/00-httpdocs/8oclock"}
{
"size": "711M",
"modified": "2014-02-14 23:39",
"path": "/00-httpdocs/8oclock.new"}
]}
(I know that's not quite proper JSON, I just wrote it as an example. Apologies to the pedantic among us.)
I need size to return as an integer (so maybe remove '-sh' and handle conversion later?).
I've tried using awk and sed but I'm a total novice and can't quite get the formatting right.
I've made it about this far:
du -sh -c --time /00-httpdocs/* | awk ' BEGIN {print "\"httpdocs:\": [";} {print "{"$0"},\n";} END {print "]";}'
The goal is to have this trigger twice a day so that we can get the data and use it inside of a JavaScript application.

sed '1 i\
{"httpdocs": [
s/\([^[:space:]]*\)([[:space:]]*\([^[:space:]]*\)[[:space:]]*\([^[:space:]]*\)/ {\
"size" : "\1",\
"modified": "\2",\
"path": "\3"}/
$ a\^J]}' YourFile
Quick and dirty (posix version so --posix on GNU sed).
Take the 3 argument and place them (s/../../) into a 'template" using group (\( ...\) and \1).
Include header at 1st line (i \...) and append footer ant last (a \...).
[:space:] may be [:blank:]

Related

JQ write each object to subdirectory file

I'm new to jq (around 24 hours). I'm getting the filtering/selection already, but I'm wondering about advanced I/O features. Let's say I have an existing jq query that works fine, producing a stream (not a list) of objects. That is, if I pipe them to a file, it produces:
{
"id": "foo"
"value": "123"
}
{
"id": "bar"
"value": "456"
}
Is there some fancy expression I can add to my jq query to output each object individually in a subdirectory, keyed by the id, in the form id/id.json? For example current-directory/foo/foo.json and current-directory/bar/bar.json?
As #pmf has pointed out, an "only-jq" solution is not possible. A solution using jq and awk is as follows, though it is far from robust:
<input.json jq -rc '.id, .' | awk '
id=="" {id=$0; next;}
{ path=id; gsub(/[/]/, "_", path);
system("mkdir -p " path);
print >> path "/" id ".json";
id="";
}
'
As you will need help from outside jq anyway (see #peak's answer using awk), you also might want to consider using another JSON processor instead which offers more I/O features. One that comes to my mind is mikefarah/yq, a jq-inspired processor for YAML, JSON, and other formats. It can split documents into multiple files, and since its v4.27.2 release it also supports reading multiple JSON documents from a single input source.
$ yq -p=json -o=json input.json -s '.id'
$ cat foo.json
{
"id": "foo",
"value": "123"
}
$ cat bar.json
{
"id": "bar",
"value": "456"
}
The argument following -s defines the evaluation filter for each output file's name, .id in this case (the .json suffix is added automatically), and can be manipulated to further needs, e.g. -s '"file_with_id_" + .id'. However, adding slashes will not result in subdirectories being created, so this (from here on comparatively easy) part will be left over for post-processing in the shell.

Iteratively split output of a stream in stdin

I have a large JSON file that I am streaming with jq.
This can be used as a test file:
{
"a": "some",
"b": [
{
"d": "some"
},
{
"d": "some"
},
{
"d": "some"
},
{
"d": "some"
},
{
"d": "some"
},
{
"d": "some"
},
{
"d": "some"
},
{
"d": "some"
},
{
"d": "some"
},
{
"d": "some"
}
]
}
I am trying to save separate files once a defined number of lines has been provided in STDIN. Multiple answers (
How can I split one text file into multiple *.txt files?,
How can I split a large text file into smaller files with an equal number of lines?,
Using jq how can I split a very large JSON file into multiple files, each a specific quantity of objects?,
Split a JSON array into multiple files using command line tools)
suggest the use of split piped to the initial command.
jq -c --stream 'fromstream(0|truncate_stream(inputs|select(.[0][0]=="b")| del(.[0][0:2])))' ex.json | split -l 4 --numeric-suffixes=1 - part_ --additional-suffix=.json
This works, however, based on my knowledge of the | in unix, it takes the output of the first command and sends it to the second so STDIN will contain all of the lines (making the stream useless, although STDIN will likely not go out of memory as it can be saved on disk).
I have read that xargs can send a predefined number of lines to a command, so I tried this:
jq -c --stream 'fromstream(0|truncate_stream(inputs|select(.[0][0]=="b")| del(.[0][0:2])))' ex.json | xargs -I -l5 split -l 4 --numeric-suffixes=1 - part_ --additional-suffix=.json
However, no output is generate, plus the | is still there so I am assuming I would get the same behavior. In addition, I believe split will overwrite the previously created files as it would be a new invocation.
Does anyone have any advice? Am I missing something in my unix terminal knowledge?
(This question How to 'grep' a continuous stream? lists how to grep a continuous stream using the --line-buffered approach, is there an equivalent for split?)
As commented by #Fravadona:
"the | in unix, it takes the output of the first command and sends it to the second so STDIN will contain all of the lines"
No, the commands in a pipe run in parallel; each command might do a little buffering internally for optimizing IO though.
So the indicated command has the expected behavior.

Json to Elasticsearch via API

I am trying to add a json file to elasticsearch which has around 30.000 lines and it is not properly formatted. I'm trying to upload it via Bulk API but I can't find a way to format it properly that actually works. I'm using Ubuntu 16.04LTS.
This is the format of the json:
{
"rt": "2018-11-20T12:57:32.292Z",
"source_info": { "ip": "0.0.60.50" },
"end": "2018-11-20T12:57:32.284Z",
"severity": "low",
"duid": "5b8d0a48ba59941314e8a97f",
"dhost": "004678",
"endpoint_type": "computer",
"endpoint_id": "8e7e2806-eaee-9436-6ab5-078361576290",
"suser": "Katerina",
"group": "PERIPHERALS",
"customer_id": "a263f4c8-942f-d4f4-5938-7c37013c03be",
"type": "Event::Endpoint::Device::AlertedOnly",
"id": "83d63d48-f040-2485-49b9-b4ff2ac4fad4",
"name": "Peripheral allowed: Samsung Galaxy S7 edge"
}
I do know that the format for the Bulk API needs {"index":{"_id":*}} before each json object in the file which it'd look like this:
{"index":{"_id":1}}
{
"rt": "2018-11-20T12:57:32.292Z",
"source_info": { "ip": "0.0.60.50" },
"end": "2018-11-20T12:57:32.284Z",
"severity": "low",
"duid": "5b8d0a48ba59941314e8a97f",
"dhost": "004678",
"endpoint_type": "computer",
"endpoint_id": "8e7e2806-eaee-9436-6ab5-078361576290",
"suser": "Katerina",
"group": "PERIPHERALS",
"customer_id": "a263f4c8-942f-d4f4-5938-7c37013c03be",
"type": "Event::Endpoint::Device::AlertedOnly",
"id": "83d63d48-f040-2485-49b9-b4ff2ac4fad4",
"name": "Peripheral allowed: Samsung Galaxy S7 edge"
}
If I insert the index id manually and then use this expression curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:92100/ivc/default/bulk?pretty --data-binary #results.json it will upload it with no errors.
My question is, how can I add the index id {"index":{"_id":*}} to each line of the json to make it ready to upload? Obviously the index id has to add +1 each line, is there any way to do it from the CLI?
Sorry if this post doesn't look as it should, I read millions of posts in Stack Overflow but this is my first one! #Desperate
Thank you very much in advance!
Your problem is that Elasticsearch expects the document to be a valid json on ONE line, like this :
{"index":{"_id":1}}
{"rt":"2018-11-20T12:57:32.292Z","source_info":{"ip":"0.0.60.50"},"end":"2018-11-20T12:57:32.284Z","severity":"low","duid":"5b8d0a48ba59941314e8a97f","dhost":"004678","endpoint_type":"computer","endpoint_id":"8e7e2806-eaee-9436-6ab5-078361576290","suser":"Katerina","group":"PERIPHERALS","customer_id":"a263f4c8-942f-d4f4-5938-7c37013c03be","type":"Event::Endpoint::Device::AlertedOnly","id":"83d63d48-f040-2485-49b9-b4ff2ac4fad4","name":"Peripheral allowed: Samsung Galaxy S7 edge"}
You have to find a way to transform your input file so that you have a document per line, then you'll be good to go with Val's solution.
Thank you for all the answers, they did help to get in me in the right direction.
I've made a bash script to automate the download, formatting and upload of the logs to Elasticsearch:
#!/bin/bash
echo "Downloading logs from Sophos Central. Please wait."
cd /home/user/ELK/Sophos-Central-SIEM-Integration/log
#This deletes the last batch of results
rm result.json
cd ..
#This triggers the script to download a new batch of logs from Sophos
./siem.py
cd /home/user/ELK/Sophos-Central-SIEM-Integration/log
#Adds newline at the beginning of the logs file
sed -i '1 i\{"index":{}}' result.json
#Adds indexes
sed -i '3~2s/^/{"index":{}}/' result.json
#Adds json file to elasticsearch
curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/ivc/default/_bulk?pretty --data-binary #result.json
So that's how I achieved this. There might be easier options but this one did the trick for me. Hope it can be useful for someone else!
Again thank you everyone! :D

Find and edit a Json file using bash

I have multiple files in the following format with different categories like:
{
"id": 1,
"flags": ["a", "b", "c"],
"name": "test",
"category": "video",
"notes": ""
}
Now I want to append all the files flags whose category is video with string d. So my final file should look like the file below:
{
"id": 1,
"flags": ["a", "b", "c", "d"],
"name": "test",
"category": "video",
"notes": ""
}
Now using the following command I am able to find files of my interest, but now I want to work with editing part which I an unable to find as there are 100's of file to edit manually, e.g.
find . - name * | xargs grep "\"category\": \"video\"" | awk '{print $1}' | sed 's/://g'
You can do this
find . -type f | xargs grep -l '"category": "video"' | xargs sed -i -e '/flags/ s/]/, "d"]/'
This will find all the filnames which contain line with "category": "video", and then add the "d" flag.
Details:
find . -type f
=> Will get all the filenames in your directory
xargs grep -l '"category": "video"'
=> Will get those filenames which contain the line "category": "video"
xargs sed -i -e '/flags/ s/]/, "d"]/'
=> Will add the "d" letter to the flags:line.
"TWEET!!" ... (yellow flag thown to the ground) ... Time Out!
What you have, here, is "a JSON file." You also have, at your #!shebang command, your choice of(!) full-featured programming languages ... with intimate and thoroughly-knowledgeale support for JSON ... with which you can very-speedily write your command-file.
Even if it is "theoretically possible" to do this using "bash scripts," this is roughly equivalent to "putting a beautiful stone archway over the front-entrance to a supermarket." Therefore, "waste ye no time" in such an utterly-profitless pursuit. Write a script, using a language that "honest-to-goodness knows about(!) JSON," to decode the contents of the file, then manipulate it (as a data-structure), then re-encode it again.
Here is a more appropriate approach using PHP in shell:
FILE=foo2.json php -r '$file = $_SERVER["FILE"]; $arr = json_decode(file_get_contents($file)); if ($arr->category == "video") { $arr->flags[] = "d"; file_put_contents($file,json_encode($arr)); }'
Which will load the file, decode into array, add "d" into flags property only when category is video, then write back to the file in JSON format.
To run this for every json file, you can use find command, e.g.
find . -name "*.json" -print0 | while IFS= read -r -d '' file; do
FILE=$file
# run above PHP command in here
done
If the files are in the same format, this command may help (version for a single file):
ex +':/category.*video/norm kkf]i, "d"' -scwq file1.json
or:
ex +':/flags/,/category/s/"c"/"c", "d"/' -scwq file1.json
which is basically using Ex editor (now part of Vim).
Explanation:
+ - executes Vim command (man ex)
:/pattern_or_range/cmd - find pattern, if successful execute another Vim commands (:h :/)
norm kkf]i - executes keystrokes in normal mode
kk - move cursor up twice
f] - find ]
i, "d" - insert , "d"
-s - silent mode
-cwq - executes wq (write & quit)
For multiple files, use find and -execdir or extend above ex command to:
ex +'bufdo!:/category.*video/norm kkf]i, "d"' -scxa *.json
Where bufdo! executes command for every file, and -cxa saves every file. Add -V1 for extra verbose messages.
If flags line is not 2 lines above, then you may perform backward search instead. Or using similar approach to #sps by replacing ] with d.
See also: How to change previous line when the pattern is found? at Vim.SE.
Using jq:
find . -type f | xargs cat | jq 'select(.category=="video") | .flags |= . + ["d"]'
Explanation:
jq 'select(.category=="video") | .flags |= . + ["d"]'
# select(.category=="video") => filters by category field
# .flags |= . + ["d"] => Updates the flags array

Linux convert a table from bash command into json

while working with json in windows is very easy in Linux I'm getting trouble.
I found a way to convert list into json using jq:
For example:
ls | jq -R -s -c 'split("\n")'
output:
["bin","boot","dev","etc","home","lib","lib64","media","mnt","opt","proc","root","run","sbin","srv","sys","tmp","usr","var"]
I'm getting trouble to convert a table into json
I'm looking for an option to convert a table that I get from bash command into a json. I already searched for many tools but none of them are generic and you need to adjust the commands to each different table.
Do you know how can I convert a table that I get from a bash commands into json that can be generic?
table output for example:
rpm -qai
output:
Name : gnome-session
Version : 3.8.4
Release : 11.el7
Architecture: x86_64
Install Date: Mon 21 Dec 2015 04:12:41 PM EST
Group : User Interface/Desktops
Size : 1898331
License : GPLv2+
Signature : RSA/SHA256, Thu 03 Jul 2014 09:39:10 PM EDT,
Key ID 24c6a8a7f4a80eb5
Source RPM : gnome-session-3.8.4-11.el7.src.rpm
Build Date : Mon 09 Jun 2014 09:12:26 PM EDT
Build Host : worker1.bsys.centos.org
Relocations : (not relocatable)
Packager : CentOS BuildSystem <http://bugs.centos.org>
Vendor : CentOS
URL : http://www.gnome.org
Summary : GNOME session manager
Description : nome-session manages a GNOME desktop or GDM login session. It starts up the other core GNOME components and handles logout and saving the
session.
Thanks!
There are too many poorly-specified textual formats to create a single tool for what you are asking for, but Unix is well-equipped to the task. Usually, you would create a simple shell or Awk script to convert from one container format to another. Here's one example:
printf '"%s", ' * | sed 's/, $//;s/.*/[ & ]/'
The printf will produce a comma-separated, double-quoted list of wildcard matches. The sed will trim the final comma and add a pair of square brackets around the entire output. The results will be incorrect if a file name contains a double quote, for example, but in the name of simplicity, let's not embellish this any further.
Here's another:
rpm -qai | awk -F ' *: ' 'BEGIN { print "{\n"; }
{ printf "%s\"%s\": \"%s\"", delim, $1, substr($0, 15); delim="\n," }
END { print "\n}"; }'
The -qf output format is probably better but this shows how you can extract fields from a reasonably free-form line-oriented format using a simple Awk script. The first field before the colon is extracted as the key, and everything from the 15th column onwards is extracted as the value. Again, we ignore the possible complications (double quotes in the values would need to be escaped, again, for example) to keep the example simple.
If your needs are serious, you will need to spend more time on creating a robust parser; but then, you will usually want to work with tools which have a well-defined output format in the first place (XML, JSON, etc) and spend as little time as possible on ad-hoc parsers. Unfortunately, there is still a plethora of tools out there which do not support an --xml or --json output option out of the box, but JSON support is fortunately becoming more widely supported.
You can convert a table from bash command into a json using jq
This command will return a detailed report on the system’s disk space usage
df -h
The output is something like this
Filesystem Size Used Avail Capacity iused ifree %iused Mounted on
/dev/disk3s1s1 926Gi 20Gi 803Gi 3% 502068 4293294021 0% /
devfs 205Ki 205Ki 0Bi 100% 710 0 100% /dev
/dev/disk3s6 926Gi 7.0Gi 803Gi 1% 7 8418661400 0% /System/Volumes/VM
/dev/disk3s2 926Gi 857Mi 803Gi 1% 1811 8418661400 0% /System/Volumes/Preboot
/dev/disk3s4 926Gi 623Mi 803Gi 1% 267 8418661400 0% /System/Volumes/Update
Now we can convert the output of this command into json with jq
command=($(df -h | tr -s ' ' | jq -c -Rn 'input | split(" ") as $head | inputs | split(" ") | to_entries | map(.key = $head[.key]) | from_entries'))
echo $command | jq
{
"Filesystem": "/dev/disk3s1s1",
"Size": "926Gi",
"Used": "20Gi",
"Avail": "803Gi",
"Capacity": "3%",
"iused": "502068",
"ifree": "4293294021",
"%iused": "0%",
"Mounted": "/"
}
{
"Filesystem": "devfs",
"Size": "205Ki",
"Used": "205Ki",
"Avail": "0Bi",
"Capacity": "100%",
"iused": "710",
"ifree": "0",
"%iused": "100%",
"Mounted": "/dev"
}
{
"Filesystem": "/dev/disk3s6",
"Size": "926Gi",
"Used": "7.0Gi",
"Avail": "803Gi",
"Capacity": "1%",
"iused": "7",
"ifree": "8418536520",
"%iused": "0%",
"Mounted": "/System/Volumes/VM"
}
{
"Filesystem": "/dev/disk3s2",
"Size": "926Gi",
"Used": "857Mi",
"Avail": "803Gi",
"Capacity": "1%",
"iused": "1811",
"ifree": "8418536520",
"%iused": "0%",
"Mounted": "/System/Volumes/Preboot"
}
convert table from bash command into json