HTML/RTF string in to RTF file - html

Does anybody know how to insert some formatted text string into some RTF file?
I am able to insert any plain text into an RTF file (to any place in document I want), but not formatted strings.
I know that when such string is added to RTF file, then also some RTF heading has to be updated. And here is a problem. I need to find out what shall be placed in RTF heading and in which exactly place. Maybe there is some ready solution. So far I cannot find it anywhere.
Normally I work with Java, but the problem is not necessary related to any language.

They talk about using a valid headder, in the .rtf specifications. I hope this will help you to get a valid format result.
Plain text in .rtf files, without any valid "formatting syntax", will not return any other result than the given plain text.
Another way, to get a neat rich text formatted document, is by using an .rtf editor or some .rtf compiler for the programming language you are using.

Instead of dealing with the rtf property, you can use the Text property. Set the cursor to where you want to insert text. Then paste formatted text from another richtext box, or paste normal text and change its formatting.

Related

Can Excel functions recognize bold text?

For convenience sake in something work related, I need to convert text style into html format. If I have this sentence for example; "the sky is Blue" in a MS Word .doc document, I want to be able to copy it to excel and have the bold potion be written with html tags.
Question is, can Excel functions detect text styles? and if so which function would be correct? I was thinking of Substitute but not so sure anymore.
Any help would be appreciated!
I think this is something that will be better done in the Word before you copy it to Excel. I found this article about it (https://word.tips.net/T001904_Adding_Tags_to_Text.html) - basically just use Find and Replace where you set up the format of what are you looking for (like italic) and that you want to replace it with tags like this:
<i>^&</i>
The part ^& tells it to include the string it found, so you do not lose the content and it adds the tags before and after the string in given format.

write_html() method in fpdf not using font/encoding specified

I'm creating a PDF with a large collection of quotes that I've imported into python with docx2python, using html=True so that they have some tags. I've done some processing to them so they only really have the bold, italics, underline, or break tags. I've sorted them and am trying to write them onto a PDF using the fpdf library, specifically the pdf.write_html(quote) method. The trouble comes with several special characters I have, so I am hoping to encode the PDF to UTF-8. To write with .write_html(), I had to create a new class as shown in their readthedocs under the .write_html() method at the very bottom of the left hand side:
from fpdf import FPDF, HTMLMixin
class htmlFPDF(FPDF, HTMLMixin):
pass
pdf = htmlFPDF()
pdf.add_page()
#set the overall PDF to utf-8 to preserve special characters
pdf.set_doc_option('core_fonts_encoding', 'utf-8')
pdf.write_html(quote) #[![a section of quote giving trouble with quotations][2]][2]
The list of quotes that I have going into the pdf all appear with their special characters and the html tags (<u> or <i>) in the debugger, but after the .write_html() step they then show up in the pdf file with mojibake, even before being saved, as seen through debugger. An example being "dayâ€ÂTMs demands", when it should be "day's demands" (the apostrophe is curled clockwise in the quote, but this textbox doesn't support).
I've tried updating the font I use by
pdf.add_font('NotoSans', '', 'NotoSans-Regular.ttf', uni=True)
pdf.set_font('NotoSans', '', size=12)
added after the .add_page() method, but this doesn't change the current font (or fix mojibake) on the PDF unless I use the more common .write(text_height, quote) method, which renders the underline/italicize tags into the PDF as text. The .write() method does preserve the special characters. I'm not trying to change the font really, but make sure that what's written onto the PDF preserves the special characters instead of mojibake them.
I've also attempted some .encode/.decode action before going into the .write_html(), as well as attempted some methods from the ftfy library. And tried adding '' to the start of each quote to no effect.
If anyone has ideas for a way to iterate through each line on the PDF that'd be terrific, since then I could use ftfy to fix the mojibake. But ideally, it would be some other html tag at the start of each quote or a way to change the font/encoding of the .write_html() method, maybe in the class declaration?
Or if I'm at a dead-end and should just split each quote on '<', use if statements to detect underlines, italicize, etc., and use the .write() method after all.
Extract docx to html works really bad with docx2python. I do this few month ago. I recommend PyDocX. docx2python are good for docx file content extracting, not converting it into a html.

All paragraphs are empty in an opened document in python-docx

I do the following:
from docx import Document
document = Document('text.docx')
document.paragraphs[42].text
And it gives me '' whatever number I enter, and for loop to find and replace a word does not work. But if I save the document with document.save('text2.docx'), the document is not empty.
The document is relatively big and contains many different formatting, images, tables, styles.
My task is to find and replace a word in docx document with some correction of the following word, so I will be glad, if you suggest another tool
I ran into this problem and was able to read the document using docx2txt: https://pypi.org/project/docx2txt/

Javascript: How to preserve line feeds, tabs and spaces in xml text nodes

An example will be simplest:
<?xml version="1.0"?><topNode>Some text.
A new line.
A new line after a blank line.
A new line after four spaces.
A new line after a tab.
</topNode>
These texts are edited by users in textareas, so I have no idea what will be in them, but the users will want to preserve the formatting.
So then I need to save the file.
First I use jQuery.parseXML() to put this into an xml object. Then I add lots of other xml nodes. Then I use new XMLSerializer().serializeToString(critXML) to get the whole thing back into text for saving. All the formatting is lost.
Code beautifying after the fact won't know where the original formatting was.
Question 1: How do I preserve the formatting in the xml?
Question 2: Is json better at preserving formatting? I would think it would be since json operations seem to preserve formatting generally. But I don't want to go down that road unless I know it will work.
I could manually construct the xml as text, but that seems like a pain.
Thanks in advance for any help!!!

Saving text as HTML from form

I have a form with a text field that users input text into. They can use multiple lines, put in bold text, underlined text, etc., but the text, when saved to SQL Server doesn't have any formatting saved, just the text is saved. What is the best way to save the text with the HTML so that when it gets viewed by another user and pulled up from Sql Server the HTML is saved and the formatting is saved?
Ex.
hello
Paul
This would be saved as
helloPaul
you can't see it but there are bold and carriage return html tags rapped around the text
When receiving data from the user, on the server side code, use HTML encode to safely store the data:
var inputData = Server.HtmlEncode("<strong>some data input from user</strong>"); //insert your user input data variable here
Then when displaying the data in your cshtml page, decode the data to display it as the user entered it:
HttpUtility.HtmlDecode(saveUserDataFromDatabaseVariable);
All this is assuming you have a rich text editor being plugged into the input field. CKEditor and TinyMCE are good ones.
You can use a text editor. Take a look at CKEditor. It's free and easy to use :)
Can you post some code and more details?
I have had good success with CKEditor. It is customizable, and its content can easily be saved via postback to a standard asp:TextBox.
It is possible that the editor you are using is not actually updating the input/textarea that you are using, it may be cloning the text and drawing the formatting in an overlay. You can use developer tools, or javascript, to verify this by checking the value property of the input or textarea element. If it is being saved via AJAX or javascript the code may be using the textContent or innerText properties instead of innerHTML.
I used the richtexteditor dll that's free online. it gave me a wiziwig box that the user can edit texxt in.