Print text from specific line number of an HTML file - html

I'm trying to find a way to print the text from a specific line number of an HTML file.
I've found ways to print the line numbers of a text search, but I want to do the reverse, printing the text of the line number, where the line number stays constant but the text there may change.

One way
$ cat foo.txt
dog
bird
monkey
$ sed '2!d' foo.txt
bird
In simple terms, if not line 2, delete

Related

puts line and coloring word that match to variable

I'm writing in tcl and want to print the all line and coloring the word or part of the word if it match to variable:
Example:
set line "size_cell inst BUFF8"
set cell "BUFF"
while printing the line I want to color the word "BUFF"
Need your advice please

Format XML using command line

I have a html text file and i want to format it so that paragraphs are always on the same line e.g.
<p>paragraph info here</p>
instead of
<p>paragraph
info here </p>
Is there a tool that enables me to do this
You can use sed
cat test.html |sed ':a;N;$!ba;s/\n/ /g' |sed 's/<\/p> /<\/p>\n/g'
In first run it remove all line break and then add it after paragraph tag
It is not clear but it work
While the requirement paragraphs are always on the same line would be met by simply joining the whole file to a single line, this solution is less radical:
perl -pe 'if (/<p>/../<\/p>/) { s/\n/ / unless /<\/p>/ }' test.html

sed search for html tags and leave only first and last entry

i use sed to manipulate a html file so that i can import it into wordpress
now i have a problem to unify tags
e.g
`<Article> .... <ShortCut>... some text <ShortCut> some more text ... </ShortCut>
<ShortCut> some more text ... </ShortCut></ShortCut> </Article>...`
restult shoul be:
`<Article> .... <ShortCut>... some text some more text ... some more text ... </ShortCut>
</Article>...`
is there a way with sed to remove all these ShortCut Tags and leave only the first and the last between the Tags Article?
thx for any help!
Update: in the input file there are more then one article. therefore i can only consolidate the ShortCuts per Article section
Using awk
awk -F"</?ShortCut>" '{printf "%s <ShortCut>",$1; for (i=2;i<NF;i++) printf $i;print "</ShortCut> " $NF}' file
<Article> .... <ShortCut>... some text some more text ... some more text ... </ShortCut> </Article>...

File composition using command line tools (Linux / Mac)

I have a file containing some text and some kind of placeHolder, and another file with some other text
Eg:
myText.txt:
some text strings plus a {{myPlaceholderText}} and some more text
myPlaceholderText.txt:
more text here
I want to be able to create a 3rd file containing the string:
"some text strings plus a more text here and some more text"
Is it possible to do that using command line tools?
I think sed is the easiest way to do it:
$ sed "s/{{myPlaceholderText}}/$(<myPlaceholder.txt)/g" myText.txt
some text strings plus a more text here and some more text
Yes. And bash is the safest common tool besides interpreted languages.
#!/bin/bash
R=$(<myPlaceholderText.txt)
while read -r LINE; do
echo "${LINE//'{{myPlaceholderText}}'/$R}"
done < myText.txt > another_file.txt
Output to another_file.txt:
some text strings plus a more text here and some more text
Another through awk:
awk 'BEGIN{getline r < ARGV[1];ARGV[1]=""}{gsub(/{{myPlaceholderText}}/,r)}1' myPlaceholderText.txt myText.txt > another_file.txt

New lines after grep a binary file are missing

Im trying to get text from text layers in a PSD file, under linux.
Now Im using:
egrep -a 'LayerText' file.psd
<photoshop:LayerText>免费获得宝贵资源! \ 工业现场过程仪表校准测试和维护诊断的必备工具 福禄克过程校准器,为工作在过程行业的技术工程师,自动化系统维护和仪表工程师,质量控制工程师,计量人员提供全面的工业校准测试和维护诊断工具:包括智能认证校准器,多功能信号校准器,压力校准器,温度校准器,环路校准器以及其他过程信号故障诊断和检测工具。FLUKE过程校准及检测工具,在化工、电力、石油、纸浆、食品饮料、制造业和污水处 理/给排水等行业的现场校准及检测维护方面处于世界领先水平。过程校准的全系列产品,从简单的回路校准器到复杂的文档化全功能过程校准器,可以提供各种必需的温度、压力、电流、电压以及电阻和频率的校准。来自福禄克750系列的校准管理软件,更是满足了用户 日益增长的对现场仪表校准数据进行归档整理的需求</photoshop:LayerText>
But using Photoshop the text layer has new lines after:
免费获得宝贵资源!
工业现场过程仪表校准测试和维护诊断的必备工具
How can I parse and output the text separated by real newlines and not all in one single line.
Thanks in advance.
It looks like your file does not contain a regular newline character but something else (looks like two spaces around a backslash).
If you want to separate the files using this (which looks like unusual) line separator, you can do that e.g. using sed(1).