HTML to EXCEL -> simple question - html

O have a ,,export to excel" function, I have some tables and it works fine, but I have one single problem.
For moving to the next line I use <br />, but what if I want to switch to the next column? What tag can I use to switch to the next column?
Thanks

Simple HTML tags are supported on a limited basis by Excel. There used to be a list of supported HTML tags as well as some HTML extensions supported by Excel (from Excel 97 onwards), but I can't find it on MSDN anymore. Here's an alternate link:
http://www.code4lifesoftware.com/articles/msexcelreadme.htm
The new XML/HTML format supported from Excel 2000 onwards is a lot more complex, and requires more work:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnoffxml/html/ofxml2k.asp
Take a look at these links, hopefully you'll find the syntax you're looking for!

In all Excel versions where I used this approach no other way to go to another column, but to use the table. You can mark up your html file with a table layout (although this is not recommended by W3C), and place all of the nested data table inside the main layout table. Unfortunately no other way.
P.S.: Look at Excel html format: Saving and Opening HTML Files.
The BR tag has a mso-data-placement style attribute specifying where the data is stored. The attribute can have one of the following string constants: new-cell means to start a new cell in the next row after the break and same-cell means that the break is in a cell.

If you use commas and make your file a .csv, that would be one way. If you use tabs, then have it read as a tab delimited file. Basically, you need to tell Excel what your delimiter (separator character) is, and it will handle it from there.

Related

In LibreOffice Calc, How To Access Raw HTML in a Cell?

I have a LibreOffice Calc spreadsheet that has a column with HTML formatting in it, including links. There is some data in the HTML that I need to preserve and extract. This would be easy to do if I could access the raw HTML, but I cannot figure out how to access the raw HTML within LibreOffice itself, nor can I figure out how to export it.
I would ideally like to find a solution of some sort of formula or function I could place in a new column, which would make that column display the raw HTML so that I can work with it using basic string operations. Alternatively if there is some sort of in-app command like a change of formatting in the menu or a way to use "Paste Special..." to view the HTML, I could do that too.
Failing this, I could settle for a solution that would preserve the raw HTML when exporting the data to CSV format or some other easily-parsable format. I have unfortunately not figured out how to do this either; when I export to CSV the HTML formatting is lost.
As a last resort I would be open to a custom macro using basic but I would rather find a simpler solution.
I have also tagged this as OpenOffice because I suspect this question may only be relevant to aspects of LibreOffice that have not changed from OpenOffice, although if this is not the case I would be interested to know about it!

Can Word automatically find the titles and apply the corresponding style to them?

I'm desperately trying to convert an html to word or pdf with updated table of contents with page numbers (initially an R-markdown doc -> html).
When opening my HTML to word, the 650 page (!) document does not display page numbers or table of contents, although the titles are saved as titles.
One suggested solution was that the document may have become corrupted during conversion. It was therefore necessary to copy / paste all the text on another document and save it.
Indeed, when I copy and paste the text leaving all styles, pagination is possible.
But I need to have my titles and create an automatic table of contents!
I have the list of titles at the beginning of the document, but they are not identified as titles by Word.
Could there be a way that Word automatically applies the Heading 1 style to all sentences starting with 1, 2, 3 etc; Heading 2 style to all sentences starting with 1.1, 1.2, 2.1, 2.3 etc. And so on?
Maybe a Macro ? (I don't know anything about VBA :( )
Thanks in advance!
You can easily do this without using any VBA code just by using Find and Replace, e.g.
If you don't know how to use wildcards see: https://wordmvp.com/FAQs/General/UsingWildcards.htm

All paragraphs are empty in an opened document in python-docx

I do the following:
from docx import Document
document = Document('text.docx')
document.paragraphs[42].text
And it gives me '' whatever number I enter, and for loop to find and replace a word does not work. But if I save the document with document.save('text2.docx'), the document is not empty.
The document is relatively big and contains many different formatting, images, tables, styles.
My task is to find and replace a word in docx document with some correction of the following word, so I will be glad, if you suggest another tool
I ran into this problem and was able to read the document using docx2txt: https://pypi.org/project/docx2txt/

Comparison of HTML and plain text from SQL

There are two columns. One of them contains HTML and another contains plain text. How can I compare them as 2 plain texts? Converting HTML -> plain text should be done the same way as a browser does when copying selected HTML into clipboard and pasting it into notepad.
The answer to this SO question links to a user-defined function for stripping HTML tags from text. After doing this you can then compare with the plain text field, e.g.
SELECT * FROM YourTable
WHERE plainText = udf_stripHTML(htmlText)
The SQL doesn't know that one is HTML and one is not.
If you just want to compare the precise content, use = or LIKE.
If you want to remove the tags, do precisely that... remove the tags from the HTML column, and then compare the result of that to the SQL column.
When you pull the values from the database they are whatever datatype your field containes. You can manipulate the strings any way you want in your desired programming language.... (they should already be text if that is what they were).
SQL 2008 (and earlier) does not contain any function or code that can "natively" convert HTML into, err, non-HTML. You either need to write such a function yourself, or find a third-party utility that can do this. (Is there application code that does this? Perhaps read the data and run it through that app?)

Source text contains simple HTML. How can I simply format the text in MS Word?

I've inherited a project that stores basic HTML formatting (i.e. - <b>, <i> tags) in a database and writes it out to a Word document. This is my first Word automation assignment, so be gentle!
Currently, there is a complicated function that runs after the document is complete that searches and replaces these tags. However, as this is run after the document is complete, any logic that is determined at run time (i.e. - insert page break here) can lead to disastrous results. For example, if I have a large chunk of bolded text, this bold text takes up more space and pushes the line break down to the next page, resulting in a mostly blank page.
I believe the fix for this is to format the text as it comes from the database so the positioning logic will be correct. I don't want to call the complicated procedure multiple times as it is time consuming and our end users need this document as quickly as possible.
Is there an easy way to write HTML formatted text to a Word document without needing to find and replace every supported tag? I would think that there would be something within Word that could handle this automatically. Thanks in advance if you can point me in the right direction.
Try this:
First, save the HTML you are about to insert as an ordinary ".htm" file.
Then use the Range object and it's InsertFile method to insert the ".htm" file at any given position:
Dim r As Range
Set r = ActiveDocument.Range
r.InsertFile FileName:=TempFilePath, Link:=False, ConfirmConversions:=False
Word should be smart enough to handle the HTML and do all of the format conversion on it's own. Use CSS to control the finer parts of the formatting.
Delete the ".htm" file when done.
maybe you can invoke an embedded IE (IWebBrowser2) to layout the text, then copy to clipboard as richtext, and finally paste to Word as RichText (formatted).