In LibreOffice Calc, How To Access Raw HTML in a Cell? - html

I have a LibreOffice Calc spreadsheet that has a column with HTML formatting in it, including links. There is some data in the HTML that I need to preserve and extract. This would be easy to do if I could access the raw HTML, but I cannot figure out how to access the raw HTML within LibreOffice itself, nor can I figure out how to export it.
I would ideally like to find a solution of some sort of formula or function I could place in a new column, which would make that column display the raw HTML so that I can work with it using basic string operations. Alternatively if there is some sort of in-app command like a change of formatting in the menu or a way to use "Paste Special..." to view the HTML, I could do that too.
Failing this, I could settle for a solution that would preserve the raw HTML when exporting the data to CSV format or some other easily-parsable format. I have unfortunately not figured out how to do this either; when I export to CSV the HTML formatting is lost.
As a last resort I would be open to a custom macro using basic but I would rather find a simpler solution.
I have also tagged this as OpenOffice because I suspect this question may only be relevant to aspects of LibreOffice that have not changed from OpenOffice, although if this is not the case I would be interested to know about it!

Related

How do I convert a dynamic coded hyperlink in excel to the equivalent HTML code?

I've found a similar question, but the answer didn't solve my issue. I'm trying to do a similar thing as this post, but I'm not sure if my original Excel cells are populated with the same thing.
Mine don't have a fixed link, but rather reference other cells using the following formula:
=HYPERLINK("https://www.website.com/search/?search="&B2, "View")
I've tried running the VBA code from the linked post above, but no luck. Is there a tweak for this to populate the resolved URL into the proper HTML code using "View" for the hyperlink text? The value in cell B2 is a number, let's say 12345.
So I'd like the end result to populate the cell with:
<a href="https://www.website.com/search/?search=12345>View</a>
End goal: I'm trying to export the Excel data as HTML table code, so trying to prepare the cells for proper HTML format to display the links on the website. Any export method I've found just exports the hyperlink cell as plain text "View" which is obviously not the desired result. If I can convert these cells before the export, then that solution would work fine.
Alternately, if there's a way to directly export the entire spreadsheet to an HTML encoded table (while also converting the hyperlinks as above), that would be even better. Note: the export to website function within Excel (using 2016) does not work...I need simple, plain HTML list code that doesn't reference the original spreadsheet.
Sorry if I've misunderstood your question, but does this help?
Function for Column B
="<a href='https://www.website.com/search/?search="&A2&"'>View</a>"

Extracting JSON data from html source for use with jsonlite in R

I have a background in data and have just been getting into scraping so forgive me if my web standards and languages is not up to scratch.
I am trying to scrape some data from a javascript component of a website I use. Viewing the page source I can actually see the data I need already there within javascript function calls in JSON format. For example it looks a little like this.
<script type="text/javascript">
$(document).ready(function () {
gameState = 4;
atView.init("/Data/FieldView/20152220150142207",{"a":[{"co":true,"col:"Red"}],"b":false,...)
meLine.init([{"c":100,"b":true,...)
</script>
Now, I only need the JSON data in meLine.init. If I physically copy/paste only the JSON data into a file I can then convert that with jsonlite in R and have exactly what I need.
However I don't want to have to copy/paste multiple pages so I need a way of extracting only this data and leaving everything else behind. I originally thought to save the html source code to R, convert to text and try and regex match "meLine.init(", but I'm not really getting anywhere with that. Could anyone offer some help?
Normally I'd use XML and xpath to parse an html page but in this case (since you know the exact structure you're looking for) you might be able to do it directly with a bit of regular expressions (this is generally not a good idea as emphasized here). Not sure if this gets you exactly to your goal but
sub("[ ]+meLine.init\\((.+)\\)" , "\\1",
grep("meLine.init", readLines("file://test.html"), value=TRUE),
perl=TRUE)
will return the line you're looking for and then you can work your magic with jsonlite. The idea is to read the page line by line. grep the (hopefully) single line that contains the string meLine.init and then extract the JSON string from that. Replace file://test.html with the URL you want to use

convert special characters html to excel in vb.net

I have a web page which I want to convert to excel. I have created an HTML file with a gridview and then convert it to excel. The problem is that in the excel file, some of the columns are shown in the general format and like this 6.5E15. However, they are credit card numbers and should not be shown this way. So the user has to change the cell format to number manually to see the whole credit card number. What should I do to make this right in my code.
Well, I read the posts but they did not help me. At last I used the String.format function and changed the format of the string and put some spaces in the middle of it so that excel would not be able to change it to a number. Therefore 6037991497126305 was shown as 6037 9914 9712 6305. Well It solved my situation because the card numbers are usually written this way and it wont confuse anyone. But I still do not know how to solve this in other situations.

How can I convert an OpenOffice Writer document (.odt) to multiple HTML files with navigation?

I have an OpenOffice Writer document (.odt) with a table of contents, sections, subsections, etc.
Is there a quick way to convert (export) this into multiple HTML files with a navigation sidebar, converting the sections into links?
You can:
Unzip the odt, parse the XML and make the HTML file yourself.
Use OpenOffice to export the document to HTML.
There are several ways to export HTML from OpenOffice or LibreOffice:
Use File > Export, then select file type XHMTL. However, this creates one big HTML file, not multiple files.
Use File > Save as, then select file type HTML document. This creates one big HTML file which is similar but not fully equal to the one above.
Use File > Send > Create HTML document. In the following dialog, you can select a style used in the document based on which the document is split into multiple HTML files. However, I did not get this to work properly. My document is always split on level 1, no matter what I selected here.
Use File > Wizards > Web page. You will get multiple settings to chose from. However, this does not work at all for me. It either fails completely or it does not produce the expected output.
The last two solutions were found on the OpenOffice Wiki at https://wiki.openoffice.org/wiki/Documentation/OOo3_User_Guides/Getting_Started/Saving_Writer_documents_as_web_pages
As a conclusion, I cannot provide a complete solution. I am still looking for a good way to solve this problem.

HTML to EXCEL -> simple question

O have a ,,export to excel" function, I have some tables and it works fine, but I have one single problem.
For moving to the next line I use <br />, but what if I want to switch to the next column? What tag can I use to switch to the next column?
Thanks
Simple HTML tags are supported on a limited basis by Excel. There used to be a list of supported HTML tags as well as some HTML extensions supported by Excel (from Excel 97 onwards), but I can't find it on MSDN anymore. Here's an alternate link:
http://www.code4lifesoftware.com/articles/msexcelreadme.htm
The new XML/HTML format supported from Excel 2000 onwards is a lot more complex, and requires more work:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnoffxml/html/ofxml2k.asp
Take a look at these links, hopefully you'll find the syntax you're looking for!
In all Excel versions where I used this approach no other way to go to another column, but to use the table. You can mark up your html file with a table layout (although this is not recommended by W3C), and place all of the nested data table inside the main layout table. Unfortunately no other way.
P.S.: Look at Excel html format: Saving and Opening HTML Files.
The BR tag has a mso-data-placement style attribute specifying where the data is stored. The attribute can have one of the following string constants: new-cell means to start a new cell in the next row after the break and same-cell means that the break is in a cell.
If you use commas and make your file a .csv, that would be one way. If you use tabs, then have it read as a tab delimited file. Basically, you need to tell Excel what your delimiter (separator character) is, and it will handle it from there.