Get tsv file with header using sed - csv

So I wrote this sed commands to get .tsv files filtered (in this case) by chromosome 19. Unfortunatley i dont know how to get the Header for the tsv file as well. So far i only get headerless data. how should I modify my code?
wget https://www.dropbox.com/s/dataset.tsv.bgz -O temp.data.99.tsv.bgz
gunzip -c temp.data.99.tsv.bgz > temp.data.99.tsv
sed -n '/^19:/p' temp.data.99.tsv | sed 's/:/ /g' > finished_tsv_files/temp.data.99_Chr_19.tsv
rm temp.data.99.tsv

Replace
/^19:/p
with
1p; /^19:/p
to output first line, too.

Related

converting \s+ delimited file to csv using sed

I'm trying to convert a file that has two or more white spaces separating each column.
YP_010083342.1 - 258 VOG00003 - 582 8.6e-22 80.7 0.2 1 1 5.3e-25 1e-21 80.4 0.2 193 363 5 185 1 251 0.60 anti-repressor protein [Staphylococcus phage LH1]
I'd like to convert this to a csv using sed. The following sed commands make no apparent changes to the file.
sed -i 's/\s+/,/g' file.ouput
sed -i 's/$\s+/,/g' file.ouput
sed -i 's/\t+/,/g' file.ouput
sed -i 's/$\t+/,/g' file.ouput
but the following command results in the following
sed -i 's/\s\s/,/g' file.ouput
YP_010083342.1,,, -,,,,,,258 VOG00003,,,,,, -,,,,,,582, 8.6e-22, 80.7, 0.2, 1, 1, 5.3e-25,, 1e-21, 80.4, 0.2, 193, 363,, 5, 185,, 1, 251 0.60 anti-repressor protein [Staphylococcus phage LH1]
Is anyone able to explain why this is occurring and how to properly solve this?
You can use this sed:
sed -E 's/ {2,}/,/g' file
YP_010083342.1,-,258 VOG00003,-,582,8.6e-22,80.7,0.2,1,1,5.3e-25,1e-21,80.4,0.2,193,363,5,185,1,251 0.60 anti-repressor protein [Staphylococcus phage LH1]
Or this awk:
awk -F ' {2,}' -v OFS=, '{$1=$1} 1' ff
The problem is that + is part of extended regular expressions, which have to be enabled using sed -r (or -E). Some seds such as GNU sed support it as an extension also in basic regular expressions, but it has to be escaped: \+. \s is also an extension, by the way.
Assuming GNU sed, any of these would work:
sed -i 's/\s\s\+/,/g' file.output
sed -E -i 's/\s\s+/,/g' file.output
sed -E -i 's/\s{2,}/,/g' file.output
More portable, working with any sed (redirect output to another file, then rename):
sed 's/[[:blank:]]\{2,\}/,/g' file.output

BASH script reading variable from another .txt file

I have output.txt file, where my script is storing some outputs, I just need to get the output of ID which is in the 1st line of the output.txt in myscript.sh file, can someone suggest a way to do that
{"id":"**dad04f6d-4e06-4420-b0bc-cb2dcfee2dcf**","name":"new","url":"https://dev.azure.com/vishalmishra2424/82c93136-470c-4be0-b6da-a8234f49a695/_apis/git/repositories/dad04f6d-4e06-4420-b0bc-cb2dcfee2dcf","project":{"id":"82c93136-470c-4be0-b6da-a8234f49a695","name":"vishalmishra","url":"https://dev.azure.com/vishalmishra2424/_apis/projects/82c93136-470c-4be0-b6da-a8234f49a695","state":"wellFormed","revision":12,"visibility":"public","lastUpdateTime":"2021-04-22T14:24:47.2Z"},"size":0,"remoteUrl":"https://vishalmishra2424#dev.azure.com/vishalmishra2424/vishalmishra/_git/new","sshUrl":"git#ssh.dev.azure.com:v3/vishalmishra2424/vishalmishra/new","webUrl":"https://dev.azure.com/vishalmishra2424/vishalmishra/_git/new","isDisabled":false}
The snippet you posted looks like
JSON and a utility named file
which can guess different types of file says that too:
$ file output.txt
output.txt: JSON data
You should use JSON-aware tools to extract value of id, for example
jq:
$ jq -r '.id' output.txt
**dad04f6d-4e06-4420-b0bc-cb2dcfee2dcf**
or jshon:
$ jshon -e id < output.txt
"**dad04f6d-4e06-4420-b0bc-cb2dcfee2dcf**"

How to replace a html comments using shell script

I am trying to uncomment a line in html file using shell script but I am not able to write a sed command for this .
I have a line
<!--<url="/">-->
I need to uncomment this line using shell script
<url="/"/>
sed -i -e "s|'<!--<url="/"/>-->'|<url="/">|g" myFile.html
Any idea how to replace this comment?
Use :
sed -re 's/(<!--)|(-->)//g'
e.g:
echo '<HTML><!--<url="/">--> <BODY>Test</BODY></HTML>' | sed -re 's/(<!--)|(-->)//g'
Like this?
sed -i 's|<!--<url="/">-->|<url="/">|g' myFile.html
It's better to use single quotes because it prevents interpretation of everything including double quotes.
You need to escape(add backslash) before / character.Secondly, both crucial arguments should be separated with /, but not with |.Use the following line:
sed -i 's/<!--<url="\/">-->/<url="\/">/g' myFile.html

How to add a comma to the end of every line in a json file [duplicate]

given a plain text document with several lines like:
c48 7.587 7.39
c49 7.508 7.345983
c50 5.8 7.543
c51 8.37454546 7.34
I need to add some info 2 spaces after the end of the line, so for each line I would get:
c48 7.587 7.39 def
c49 7.508 7.345983 def
c50 5.8 7.543 def
c51 8.37454546 7.34 def
I need to do this for thousands of files. I guess this is possible to do with sed, but do not know how to. Any hint? Could you also give me some link with a tutorial or table for this cases?
Thanks
if all your files are in one directory
sed -i.bak 's/$/ def/' *.txt
to do it recursive (GNU find)
find /path -type f -iname '*.txt' -exec sed -i.bak 's/$/ def/' "{}" +;
you can see here for introduction to sed
Other ways you can use,
awk
for file in *
do
awk '{print $0" def"}' $file >temp
mv temp "$file"
done
Bash shell
for file in *
do
while read -r line
do
echo "$line def"
done < $file >temp
mv temp $file
done
for file in ${thousands_of_files} ; do
sed -i ".bak" -e "s/$/ def/" file
done
The key here is the search-and-replace s/// command. Here we replace the end of the line $ with 2 spaces and your string.
Find the sed documentation at http://sed.sourceforge.net/#docs

Search and replace html tags in sed recursively

I am trying to write a script to search and remove htm and html tags from all files recursively. The starting point is given as input in the command to run the script. The resultant files should be saved in new file at the same place ending with _changed. e.g., start.html > start.html_changed.
Here is the script I wrote so far. It works fine, but the output prints out to the terminal, and I want it to be saved in files respectively.
#!/bin/bash
sudo find $1 -name '*.html' -type f -print0 | xargs -0 sed -n '/<div/,/<\/div>/p'
sudo find $1 -name '*.htm' -type f -print0 | xargs -0 sed -n '/<div/,/<\/div>/p'
Any help is much appreciated.
The following script works just fine, but it is not recursive. how can I make it recursive?
#!/bin/bash
for l in /$1/*.html
do
sed -n '/<div/,/<\/div>/p' $l > "${l}_nobody"
done
for m in /$1/*.htm
do
sed -n '/<div/,/<\/div>/p' $m > "${m}_nobody"
done
Just edit the xargs part as follows:
xargs -0 -I {} sh -c "sed -n '/<div/,/<\/div>/p' {} > {}_changed"
Explanation:
-I {}: sets a placeholder
> {}_changed": does redirection to the file with _changed suffix