remove new line \n in large file (10GB) - language-agnostic

I have large file 1.txt containing:
User: Test1
Password: P#sawFia1_f
User: Test2
Password: C99vijJiDB9fo#K!!1
I'm using sed -i '/\nPassword/ s///g' 1.txt for remove new line with Password: but it's not removing it. Why? The final output needs to be:
User: Test1;P#sawFia1_f
User: Test2;C99vijJiDB9fo#K!!1

Assuming the lines are paired like that, you can use the following:
perl -pe'
s/^User:.*\K\n/;/;
s/^Password:\s*//;
' file.in >file.out
(It can be used as-is or placed all on one line.)

Using any awk, given your provided sample input/output all you'd need is:
$ awk -v RS= '{print $1, $2 ";" $4}' file1.txt
User: Test1;P#sawFia1_f
User: Test2;C99vijJiDB9fo#K!!1
or if you really do need a blank line between each output line:
$ awk -v RS= -v ORS='\n\n' '{print $1, $2 ";" $4}' file1.txt
User: Test1;P#sawFia1_f
User: Test2;C99vijJiDB9fo#K!!1
If that's not all you need then please edit your question to include more truly representative sample input/output including cases that the above doesn't work for.

Assumptions:
every User: line is followed by a Password: line
the actual password value does not contain white space
each User/password combo is followed by a blank line
all other lines in the file are ignored/discarded (otherwise OP should update the sample input to show how other lines of data are to be processed)
One awk approach:
$ awk '/^User:/ {printf "%s",$0} /^Password:/ {printf ";%s\n\n",$2}' 1.txt
User: Test1;P#sawFia1_f
User: Test2;C99vijJiDB9fo#K!!1
Once OP confirms the script works as needed, and assuming OP wants to overwrite the original file, and assuming OP is running GNU awk, OP can add the -i inplace flag to have 1.txt overwritten, eg:
awk -i inplace '/^User:/ { printf "%s", $0 } /^Password:/ { printf ";%s\n\n",$2}' 1.txt

Assuming the shown structure, of User and Password lines followed by an empty line
perl -i.bak -00 -wpe's/\nPassword:\s*/;/' file
Reads the file in paragraphs (by -00 switch), so applying the regex to each pair of lines in a string.
The -i.bak changes the input file "in-place" but also keeps a backup (file.bak).
If you don't want a backup just remove .bak part, once it's all well tested.
Or, process line by line
perl -i.bak -wnlE'/^Password:\s*(.*)/ ? say "$u;$1" : /^User/ ? $u=$_ : say' file
This works with, and reprints, any other lines as well.
If there is only an empty line in between, and which needn't be retained, it simplifies to
perl -i.bak -wnlE'/^Password:\s*(.*)/ ? say "$u;$1" : ($u=$_)' file

With your shown samples, please try following awk code, written and tested in GNU awk.
awk -v RS='(^|\n)User:[^\n]*\nPassword:[^\n]*' '
RT{
sub(/^\n/,"",RT)
sub(/\n/,";",RT)
print RT
}
' Input_file
Explanation: Using GNU awk, setting RS(record separator) to (^|\n)User:[^\n]*\nPassword:[^\n]*(explained further in post). In main section of awk checking if RT is NOT NULL then substituting starting new line with NULL in it and then substituting new line with ;, finally printing its value as per required output.
NOTE: Above will print the output on terminal, once you are happy with results you can use GNU awk's -i inplace option, change awk to awk -i inplace in above code.
One liner form of above code:
awk -v RS='(^|\n)User:[^\n]*\nPassword:[^\n]*' 'RT{sub(/^\n/,"",RT);sub(/\n/,";",RT);print RT}' Input_file

I'm using sed -i '/\nPassword/ s///g' 1.txt for remove new line with
Password: but it's not removing it. Why?
You are misunderstanding how GNU sed works. In basic usage it does apply changes to each line, latter understand as characters between start of file or newline and end of file or newline, therefore such line does not contain newline. Your task require knowning 2 lines of input before procuring 1 line of output. This can be done exploiting GNU sed feature dubbed hold space following way, let file.txt content be
User: Test1
Password: P#sawFia1_f
User: Test2
Password: C99vijJiDB9fo#K!!1
then
sed -e '/^User/{h;d}' -e '/^Password/{H;g;s/\nPassword: /;/}' file.txt
gives output
User: Test1;P#sawFia1_f
User: Test2;C99vijJiDB9fo#K!!1
Explanation:
for line starting with User, save current line into hold (h) and go to next line (d)
for line starting with Password append newline and current line to hold (H) then set current line content to that of hold (g) then replace newline followed by Password followed by : followed by space using semicolon.
Disclaimer: this solution assumes that every line starting with User is always followed by line starting with Password and every line starting with Password is preceded by line starting with User.
(tested in GNU sed 4.2.2)

Related

Why is JSON from aws rds run in Docker "malformed" according to other tools?

To my eyes the following JSON looks valid.
{
"DescribeDBLogFiles": [
{
"LogFileName": "error/postgresql.log.2022-09-14-00",
"LastWritten": 1663199972348,
"Size": 3032193
}
]
}
A) But, jq, json_pp, and Python json.tool module deem it invalid:
# jq 1.6
> echo "$logfiles" | jq
parse error: Invalid numeric literal at line 1, column 2
# json_pp 4.02
> echo "$logfiles" | json_pp
malformed JSON string, neither array, object, number, string or atom,
at character offset 0 (before "\x{1b}[?1h\x{1b}=\r{...") at /usr/bin/json_pp line 51
> python3 -m json.tool <<< "$logfiles"
Expecting value: line 1 column 1 (char 0)
B) But on the other hand, if the above JSON is copy & pasted into an online validator, both 1 and 2, deem it valid.
As hinted by json_pp's error above, hexdump <<< "$logfiles" indeed shows additional, surrounding characters. Here's the prefix: 5b1b 313f 1b68 0d3d 1b7b ...., where 7b is {.
The JSON is output to a logfiles variable by this command:
logfiles=$(aws rds describe-db-log-files \
--db-instance-identifier somedb \
--filename-contains 2022-09-14)
# where `aws` is
alias aws='docker run --rm -it -v ~/.aws:/root/.aws amazon/aws-cli:2.7.31'
> bash --version
GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu)
Have perused this GitHub issue, yet can't figure out the cause. I suspect that double quotes get mangled somehow when using echo - some reported that printf "worked" for them.
The use of docker run --rm -it -v command to produce the JSON, added some additional unprintable characters to the start of the JSON data. That makes the resulting file $logfiles invalid.
The -t option allocations a tty and the -i creates an interactive shell. In this case the -t is allowing the shell to read login scripts (e.g. .bashrc). Something in your start up scripts is outputting ansi escape codes. Often this will to clear the screen, set up other things for the interactive shell, or make the output more visually appealing by colorizing portions of the data.

How to use shell variable in MQTT

I am new to shell scripting and MQTT.
I need to publish a JSON file using MQTT. We can do it by storing the JSON contents in a shell variable. But it is not working for me.
my shell script:
#!/bin/sh
var1='{"apiVersion":"2.1","data":{"id":"4TSJhIZmL0A","uploaded":"2008-07-15T18:11:59.000Z","updated":"2013-05-01T21:01:49.000Z","uploader":"burloandbardsey","category":"News","title":"bbc news start up theme","description":"bbc","thumbnail":{"sqDefault":"http://i.ytimg.com/vi/4TSJhIZmL0A/default.jpg","hqDefault":"http://i.ytimg.com/vi/4TSJhIZmL0A/hqdefault.jpg"},"player":{"default":"http://www.youtube.com/watch?v=4TSJhIZmL0A&feature=youtube_gdata_player","mobile":"http://m.youtube.com/details?v=4TSJhIZmL0A"},"content":{"5":"http://www.youtube.com/v/4TSJhIZmL0A?version=3&f=videos&app=youtube_gdata","1":"rtsp://v5.cache7.c.youtube.com/CiILENy73wIaGQlAL2aGhIk04RMYDSANFEgGUgZ2aWRlb3MM/0/0/0/video.3gp","6":"rtsp://v5.cache7.c.youtube.com/CiILENy73wIaGQlAL2aGhIk04RMYESARFEgGUgZ2aWRlb3MM/0/0/0/video.3gp"},"duration":15,"aspectRatio":"widescreen","rating":4.6683936,"likeCount":"354","ratingCount":386,"viewCount":341066,"favoriteCount":0,"commentCount":155,"accessControl":{"comment":"allowed","commentVote":"allowed","videoRespond":"allowed","rate":"allowed","embed":"allowed","list":"allowed","autoPlay":"allowed","syndicate":"allowed"}}}'
mosquitto_pub -h localhost -t test -m "$var1"
echo "$var1"
my Mosquitto commands:
Publisher: `mosquitto_pub -h localhost -t "test" -m "{"Contents":$var1}"
Subscriber: mosquitto_sub -h localhost -t "test"
Output I got:
{"Contents":}
Expected Output:
{"Contents":{"name":"Harini", "age":24, "city":"NewYork", "message":"Hello world"}}
I can get the output only at the terminal because of echo. But I want to publish and subscribe to the contents of the shell variable(var1)
Please help me out to get the output. Whether I need to add some more code in the shell script. I don't know how to proceed. Or can you suggest any other method.
The following works just fine, it's all about which quotes you use where:
#!/bin/sh
var1='{"name":"Harini", "age":24, "city":"NewYork","message":"Hello world"}'
echo $var1
mosquitto_pub -t test -m "{\"Content\": $var1}"
You need to wrap the -m argument in quotes because it contains spaces, which in turn means you need to escape the " round Content.
Wrapping the content of var1 in single quotes means you don't need to escape the double quotes in it.

Printing column separated by comma using Awk command line

I have a problem here. I have to print a column in a text file using awk. However, the columns are not separated by spaces at all, only using a single comma. Looks something like this:
column1,column2,column3,column4,column5,column6
How would I print out 3rd column using awk?
Try:
awk -F',' '{print $3}' myfile.txt
Here in -F you are saying to awk that use , as the field separator.
If your only requirement is to print the third field of every line, with each field delimited by a comma, you can use cut:
cut -d, -f3 file
-d, sets the delimiter to a comma
-f3 specifies that only the third field is to be printed
Try this awk
awk -F, '{$0=$3}1' file
column3
, Divide fields by ,
$0=$3 Set the line to only field 3
1 Print all out. (explained here)
This could also be used:
awk -F, '{print $3}' file
A simple, although awk-less solution in bash:
while IFS=, read -r a a a b; do echo "$a"; done <inputfile
It works faster for small files (<100 lines) then awk as it uses less resources (avoids calling the expensive fork and execve system calls).
EDIT from Ed Morton (sorry for hi-jacking the answer, I don't know if there's a better way to address this):
To put to rest the myth that shell will run faster than awk for small files:
$ wc -l file
99 file
$ time while IFS=, read -r a a a b; do echo "$a"; done <file >/dev/null
real 0m0.016s
user 0m0.000s
sys 0m0.015s
$ time awk -F, '{print $3}' file >/dev/null
real 0m0.016s
user 0m0.000s
sys 0m0.015s
I expect if you get a REALY small enough file then you will see the shell script run in a fraction of a blink of an eye faster than the awk script but who cares?
And if you don't believe that it's harder to write robust shell scripts than awk scripts, look at this bug in the shell script you posted:
$ cat file
a,b,-e,d
$ cut -d, -f3 file
-e
$ awk -F, '{print $3}' file
-e
$ while IFS=, read -r a a a b; do echo "$a"; done <file
$

Bash for loop picking up filenames and a column from read -r and gnu plot

The top part of the following script works great, the .dat files are created via the MySQL command, and work perfectly with gnu plot (via the command line). The problem is getting the bottom (gnuplot) to work correctly. I'm pretty sure I have a couple of problems in the code: variables and the array. I need to call each .dat file (plot), have the title in the graph (from title in customers.txt)and name it (.png)
any guidance would be appreciated. Thanks a lot -- RichR
#!/bin/bash
set -x
databases=""
titles=""
while read -r ipAddr dbName title; do
dbName=$(echo "$dbName" | sed -e 's/pacsdb//')
rm -f "$dbName.dat"
touch "$dbName.dat"
databases=("$dbName.dat")
titles="$titles $title"
while read -r period; do
mysql -uroot -pxxxx -h "$ipAddr" "pacsdb$dbName" -se \
"SELECT COUNT(*) FROM tables WHERE some.info BETWEEN $period;" >> "$dbName.dat"
done < periods.txt
done < customers.txt
for database in "${databases[#]}"; do
gnuplot << EOF
set a bunch of options
set output "/var/www/$dbName.png"
plot "$dbName.dat" using 2:xtic(1) title "$titles"
EOF
done
exit 0
customers.txt example line-
192.168.179.222 pacsdbgibsonia "Gibsonia Animal Hospital"
Error output.....
+ for database in '"${databases[#]}"'
+ gnuplot
line 0: warning: Skipping unreadable file ".dat"
line 0: No data in plot
+ exit 0
to initialise databases array:
databases=()
to append $dbName.dat to databases array:
databases+=("$dbName.dat")
to retrieve dbName, remove suffix pattern .dat
dbName=${database%.dat}

vim tidy makeprg with stdin

I've got this in .vim/ftplugin/html.vim:
set makeprg=%!tidy\ -q\ -i\ --show-warnings\ no
If I do make in a html file I get this error:
E499: Empty file name for '%' or '#', only works with ":p:h"
When I execute this:
:%!tidy -q -i --show-warnings no
It works beautifully. What am I doing wrong with my set makeprg?
I don't think makeprg was intended to be used that way. I suggest you simply define your own mapping or command
:map ,m :%!tidy -q -i --show-warnings no<CR>
:Command Make %!tidy -q -i --show-warnings no<CR>
%! replaces the contents of the buffer with the output of the following command, but when calling :make, the % is replaced with the file name for the current buffer. The error comes, because your current buffer is not editing a file, so the % replacement can't take place.