I have a .txt file that contains escape characters. You can't see them when you cat or more or "less -r" the file, but if you vi the file you see the following:
session_cache ^[[27G: 375755
Normal output from "cat" is:
session_cache : 375755
The ^[[27G I believe is a ANSI terminal code for tab maybe? I would like to be able to display this text file in a standard web page without having to see those escape characters.
Is this possible without having to convert the .txt file to HTML and manually remove all the different escape sequences (ie make HTML behave like cat does)?
Dan
Related
I'm using InDesign's data merge to make playing cards. Is there a way to include non-breaking space in the data? I would like some words to be kept on same lines.
I tried copy pasting non-breaking space from InDesign and web browser to csv file without success. &nbps; doesn't work either.
I would recommend using a dummy string in your source and proceed to a later replacement within InDesign (i.e. ##NBSP## > non breaking space character).
Try using the Unicode character for nbsp in InDesign: U00A0. If it is anything like the InDesign scripting it should be entered \u00A0. or "\u00A0"
You should combine your copy and paste with Einar's method of Unicode. Meaning, copy a non-breaking space from InDesign and paste it in your CSV file. Then, when you read the CSV file, either via a script or using place option of InDesign, make sure you are using Unicode format while reading the file. That will make sure your unicode characters get preserved because non-breaking space is a unicode character.
I copied and pasted the non-breaking space from Indesign into the CSV file using Brackets. When I placed the InDesign file using the place option, I made sure I used UTF8 reading (in the import options dialog box) and it preserved the space for me.
Is that what you are trying to achieve? Does that help?
Thanks,
Abdul
I have the following HTML:
<html><body><p>n<sup>th</sup></p></body></html>
I am using the command:
$ libreoffice --convert-to docx:"MS Word 2007 XML" test.html
To convert that HTML into a DOCX file. However I notice that the resulting DOCX file does not actually contain the <sup> tag. It looks like it is using position and size to replicate the <w:vertAlign> tag:
<w:position w:val="8"/><w:sz w:val="19"/>
What I would need to know is how to make libreoffice put in the <w:vertAlign> tag instead of using position and size.
Additonal Info:
I had a similar problem with bold and italics (<strong><em>) but was able to get the conversion to work correctly if I converted the strong and em tags to b and i tags respectively.
If you are looking to edit the HTML, it would be much better to use a tool that is suited for editing HTML, such as Notepad++ or Sublime (as examples).
If you need to have the HTML as a LibreOffice document for a specific reason, you could open the HTML file in Notepad and save as a text file with .txt as the extension. That should allow you to open the document in LibreOffice.
You can try using a WYSIWYG(What You See Is What You Get) editor like TinyMCE(http://www.tinymce.com/). There are lots of them online and you can also find some desktop applications for that. but if you want to convert it in docx you can try this http://htmltodocx.codeplex.com/ it is written in php and uses PHPWord and is quite efficient.
Just create a Python script that replaces your unwanted tags with the <w:vertAlign> tag where ever needed.
The command works fine if you replace 'docx' with 'xml', like this:
libreoffice --convert-to xml:"MS Word 2003 XML" test.html
I have to call a csv file as resource file and the output needs to be displayed.Now the issue is how should I display a message like "Thanks for answering" in red colour through csv file.I need to use html tags in csv file like
<font color='red" ></font>
but the file that is calling this csv file is displaying the content along with the html tags.
CSV stands for "Comma/Character Separated Values" – plain text, no formatting, except for the optional header line and the character that separates the values.
If you need to define formatting like font colors in the source file, you would have to use another file format like (X)HTML, XLS or RTF.
I've been attempting to open .epub files in vim for reading (yes it's silly, let's ignore that for now) and I'm having trouble with how the internal html of epubs displays characters such as ' and " among other things.
Vim displays ' as â~#~Y while opening the file with less gives me <E2><80><99>. I'm not sure how vim deals with this (it seems to treat ~# and ~Y as single characters) and as such I'm not sure how to go about replacing the special HTML characters with their utf-8 equivalent.
Is there a encoding setting that will display this properly? Or a way to manually input these characters such that I could create a search and replace macro?
Thanks
It looks like Vim doesn't properly detect the UTF-8 encoding; you can check with
:setlocal fileencoding?
and force UTF-8 with
:edit ++enc=utf-8 file.epub
(or tweak your 'fileencodings' option to have it automatically detected).
I have a collection of html files that I gathered from a website using wget. Each file name is of the form details.php?id=100419&cid=13%0D, where the id and cid varies. Portions of the html files contain articles in Asian language (Unicode text). My intention is to extract the Asian-language text only. Dumping the rendered html using a command-line browser is the first step that I have thought of. It will eliminate some of the frills.
The problem is, I cannot dump the rendered html to a file (using, say, w3m -dump ). The dumping works if only I direct the browser (at the command-line) to the properly formed URL : http://<blah-blah>/<filename>. But this is way I will have to spend the time to download the files once again from the web. How do I get around this, what other tools could I use?
w3m -dump <filename> complains saying:
w3m: Can't load details.php?id=100419&cid=13%0D.
file <filname> shows:
details.php?id=100419&cid=13%0D: Non-ISO extended-ASCII HTML document text, with very long lines, with CRLF, CR, LF, NEL line terminators