File composition using command line tools (Linux / Mac) - html

I have a file containing some text and some kind of placeHolder, and another file with some other text
Eg:
myText.txt:
some text strings plus a {{myPlaceholderText}} and some more text
myPlaceholderText.txt:
more text here
I want to be able to create a 3rd file containing the string:
"some text strings plus a more text here and some more text"
Is it possible to do that using command line tools?

I think sed is the easiest way to do it:
$ sed "s/{{myPlaceholderText}}/$(<myPlaceholder.txt)/g" myText.txt
some text strings plus a more text here and some more text

Yes. And bash is the safest common tool besides interpreted languages.
#!/bin/bash
R=$(<myPlaceholderText.txt)
while read -r LINE; do
echo "${LINE//'{{myPlaceholderText}}'/$R}"
done < myText.txt > another_file.txt
Output to another_file.txt:
some text strings plus a more text here and some more text
Another through awk:
awk 'BEGIN{getline r < ARGV[1];ARGV[1]=""}{gsub(/{{myPlaceholderText}}/,r)}1' myPlaceholderText.txt myText.txt > another_file.txt

Related

Shell bash how to delete text between pattern not inclusive?

I am trying to delete the text between <pre><\pre> html tags using:
sed -i '/<pre>/,/<\/pre>/d' file.html
But this deletes the <pre></pre> tags too. The are only one pre tag pair in the file.
How can I avoid to delete de pre tags?
Thanks.
This might work for you (GNU sed):
sed -n '/<pre>/{p;:a;n;/<\/pre>/!ba};p' file
Turn off implicit printing by using the -n option.
If a line contains <pre>, print it and fetch the next line.
If that line does not contain </pre> loop back and repeat.
Otherwise print all other lines.

Bash: Content between two complex Patterns - html

I have tried multiple times to get digits between two html patterns.
Neither sed nor awk worked for me, since the examples in the internet were too easy to fit my task.
Here is the code I want to filter:
....class="a-size-base review-text">I WANT THIS TEXT</span></div> ....
So I would need a command that output: I WANT THIS TEXT between ...review-text"> and </span>
Do you have a clue? Thanks for the effort and greetings from Germany.
Here is the plain code
Try:
tr '\n' ' ' file.html | grep -o 'review-text">[^<>]*</span> *</div>' | cut -d'>' -f2 | cut -d'<' -f 1
It should work if there are no any tags inside "I WANT THIS TEXT"
I can't see the problem here supposing the text you want to extract doesn't contains < nor >.
For instance with POSIX REGEXP:
$ HTML_FILE=/tmp/myfile.html
$ sed -n "s/.*review-text.>\([^<]*\)<.*/\1/gp" $HTML_FILE
prints the text between HTML TAGS

Format XML using command line

I have a html text file and i want to format it so that paragraphs are always on the same line e.g.
<p>paragraph info here</p>
instead of
<p>paragraph
info here </p>
Is there a tool that enables me to do this
You can use sed
cat test.html |sed ':a;N;$!ba;s/\n/ /g' |sed 's/<\/p> /<\/p>\n/g'
In first run it remove all line break and then add it after paragraph tag
It is not clear but it work
While the requirement paragraphs are always on the same line would be met by simply joining the whole file to a single line, this solution is less radical:
perl -pe 'if (/<p>/../<\/p>/) { s/\n/ / unless /<\/p>/ }' test.html

sed search for html tags and leave only first and last entry

i use sed to manipulate a html file so that i can import it into wordpress
now i have a problem to unify tags
e.g
`<Article> .... <ShortCut>... some text <ShortCut> some more text ... </ShortCut>
<ShortCut> some more text ... </ShortCut></ShortCut> </Article>...`
restult shoul be:
`<Article> .... <ShortCut>... some text some more text ... some more text ... </ShortCut>
</Article>...`
is there a way with sed to remove all these ShortCut Tags and leave only the first and the last between the Tags Article?
thx for any help!
Update: in the input file there are more then one article. therefore i can only consolidate the ShortCuts per Article section
Using awk
awk -F"</?ShortCut>" '{printf "%s <ShortCut>",$1; for (i=2;i<NF;i++) printf $i;print "</ShortCut> " $NF}' file
<Article> .... <ShortCut>... some text some more text ... some more text ... </ShortCut> </Article>...

New lines after grep a binary file are missing

Im trying to get text from text layers in a PSD file, under linux.
Now Im using:
egrep -a 'LayerText' file.psd
<photoshop:LayerText>免费获得宝贵资源! \ 工业现场过程仪表校准测试和维护诊断的必备工具 福禄克过程校准器,为工作在过程行业的技术工程师,自动化系统维护和仪表工程师,质量控制工程师,计量人员提供全面的工业校准测试和维护诊断工具:包括智能认证校准器,多功能信号校准器,压力校准器,温度校准器,环路校准器以及其他过程信号故障诊断和检测工具。FLUKE过程校准及检测工具,在化工、电力、石油、纸浆、食品饮料、制造业和污水处 理/给排水等行业的现场校准及检测维护方面处于世界领先水平。过程校准的全系列产品,从简单的回路校准器到复杂的文档化全功能过程校准器,可以提供各种必需的温度、压力、电流、电压以及电阻和频率的校准。来自福禄克750系列的校准管理软件,更是满足了用户 日益增长的对现场仪表校准数据进行归档整理的需求</photoshop:LayerText>
But using Photoshop the text layer has new lines after:
免费获得宝贵资源!
工业现场过程仪表校准测试和维护诊断的必备工具
How can I parse and output the text separated by real newlines and not all in one single line.
Thanks in advance.
It looks like your file does not contain a regular newline character but something else (looks like two spaces around a backslash).
If you want to separate the files using this (which looks like unusual) line separator, you can do that e.g. using sed(1).