Converting several html files into one word file - html

I received web-service documentation in html format, but it is very unfriendly when it comes to search for a specific word. Using index file it displays list of names of each request on the left and when you click on a particular one then on the right it displays description and content of this request.
Unfortunately I have to do some mapping with web-services that we already have. When searching through CTRL + F it only goes trough the left side (list), doesn't matter if you place cursor over the description on the right, click and try to search this way too - it doesn't work.
My idea is to extract all html files that have been provided to us into one word document (this way I can go through descriptions not only trough the list of names). Unfortunately all I can reach is that these files open in separate word files (one html file per one word file). It's almost 1000 requests to be mapped and working this way is going to take forever...
So the question is: How to combine more than one html file into one word file?

There two ways to merge html files
Using Command Line
Copy all html files that you want to merge into a folder.
Navigate to that folder using terminal or command prompt.
Execute following commands
on Mac/Linux
cat *.html > output.html
on Windows :
type *.html > output.html
Using already available tools
https://www.sobolsoft.com/howtouse/combine-html-files.htm, html-merge (Windows Only)
In order to convert merged html file to a word document, read here.

Related

How can I replace some text in html on python?

My situation is...
I have few hundreds of chrome html files on one folder, and I want to replace certain text(ex. james) to another text(ex. tom) for every html files. Honestly, I'm just a beginner to python, so may I get a detailed code of it? I need 1. how to open every html file in one folder 2. how to find certain text on html 3. how to replace it to another text (on python) Thanks a lot.
you can just open up the directory in VSC and bulk replace all the instances of any string in all the HTML files directly. I required to do the same and found this to be a very convenient method.

Magento 2.1.1 We can't find required columns: sku

My apologies if a solution has been provided elsewhere. I have searched and could not find anything similar to what I am experiencing. I am trying to upload categories on a Magento CE 2.1.1 website. I have a file with almost 4000 categories and sub categories and the only practical way is to upload via a csv file.
I downloaded a sample file to use and when I upload the same sample file it's working fine when I click Check data" button. However, when I replace the values on the rows with my own and save the file as csv with UTF-8 text encoding, I am getting an error message below. This is also happening when I save the file as csv even without changing the values. I have tested this with a csv file saved from both Mac Numbers and Windows Excel.
I only need to upload Categories (and not products) but I am not sure if this is possible.
File links:
Importing
Not importing
Actual project sample file
The files are quite similar but strangely one is working and the other is not.
Error
We can't find required columns: sku.
Column names: "sku;store_view_code;attribute_set_code;product_type;categories;product_websites;name;description;short_description;weight;product_online;tax_class_name;visibility;price;special_price;special_price_from_date;special_price_to_date;url_key;meta_title;meta_keywords;meta_description;base_image;base_image_label;small_image;small_image_label;thumbnail_image;thumbnail_image_label;swatch_image;swatch_image_label;created_at;updated_at;new_from_date;new_to_date;display_product_options_in;map_price;msrp_price;map_enabled;gift_message_available;custom_design;custom_design_from;custom_design_to;custom_layout_update;page_layout;product_options_container;msrp_display_actual_price_type;country_of_manufacture;additional_attributes;qty;out_of_stock_qty;use_config_min_qty;is_qty_decimal;allow_backorders;use_config_backorders;min_cart_qty;use_config_min_sale_qty;max_cart_qty;use_config_max_sale_qty;is_in_stock;notify_on_stock_below;use_config_notify_stock_qty;manage_stock;use_config_manage_stock;use_config_qty_increments;qty_increments;use_config_enable_qty_inc;enable_qty_increments;is_decimal_divided;website_id;related_skus;related_position;crosssell_skus;crosssell_position;upsell_skus;upsell_position;additional_images;additional_image_labels;hide_from_product_page;bundle_price_type;bundle_sku_type;bundle_price_view;bundle_weight_type;bundle_values;bundle_shipment_type;associated_skus" are invalid
This might be because you opened the file in Excel which will add a BOM to the start of the file. When the Magento importer tries to read the file, it expects the first header/cell to say sku, but it instead sees the BOM.
Two ways to solve this:
1) Don't open it in excel - use Google sheets, or a text editor if you are feeling brave,
2) If you opened the file in excel, close it, open it in notepad++, click encoding up the top and set to "Encode in UTF-8" (NOT "Encode in UTF-8-BOM"). Then save and you are good to go.

R Writing Excel Document

My question is whether or not anybody knows of a better way to do what I'm already doing. I'm creating a report as a list, and trying to render it both in HTML and Excel.
I'm developing a shiny app that generates reports for Qualtrics surveys.
The results table is a list of HTML strings that I paste together and display in a shinydashboard. Here's a dput of the example results tables.
Here's how I'm creating the html results tables list -- the html_tabelize() function in my package. Here's a dput of the example input.
In the shiny server.R file the way I create the Excel file is with the following code:
output$downloadResults <- downloadHandler(
filename = 'tables.xls',
content = function(file) {
write(html_tabelize(main()[['blocks']]), file)
}
)
To summarize: I get the blocks, I run html_tabelize on them, and then I write the HTML output to a file called "tables.xls". When I open that file, because Excel can render HTML, it renders something like this:
My concern and problem with what I'm doing are two-fold:
If I were writing an Excel document instead of simply rendering HTML in Excel, then I could perhaps get a better formatted document. I'd like that.
When you download the results tables xls file and try to open it, you get a warning from Excel. I don't want the users of my app to see this warning, because it's distracting and could worry them about something that isn't really a concern.
I know that options exist for writing Excel files in R, but so far what I've seen indicates that their input must be either a data frame, or a list of data frames. The list I am rendering from has different types of components, like the question text, as well as data frames of results. Originally I was using pandoc, but pandoc, even when run from R, is a system binary, and it's difficult to list as a dependency (and if I can't list it as a dependency, it's tough to make sure it's installed for the users of my app). Additionally, I found out pandoc doesn't even convert to "real" Excel -- it also just saves HTML in a .xls file. Does anybody have any suggestions as to how I can improve this part of my app?

download links from a web page with renaming

I'm trying to find a way to automatically download all links from a web page, but I also want to rename them. for example:
<a href = fileName.txt> Name I want to have </a>
I want to be able to get a file named 'Name I want to have' (I don't worry about the extension).
I am aware that I could get the page source, then parse all the links, and download them all manually, but I'm wondering if there are any built-in tools for that.
lynx --dump | grep http:// | cut -d ' ' -f 4
will print all the links that can be batch fetched with wget - but is there a way to rename the links on the fly?
I doubt anything does this out of the box. I suggest you write a script in Python or similar to download the page, and load the source (try the Beautiful Soup library for tolerant parsing). Then it's a simple matter of traversing the source to capture the links with their attributes and text, and download the files with the names you want. With the exception of Beautiful Soup (if you need to be able to parse sloppy HTML), all you need is built in with Python.
I solved the problem by converting the web page entirely to unicode on the first pass (using notepad++'s built-in conversion)
Then I wrote a small shell script that used cat, awk and wget to fetch all the data.
Unfortunately, I couldn't automate the process since I didn't find any tools for linux which would convert an entire page from KOI8-R to unicode.

What's the best way to automate text replacing?

Here's the situation:
I have a lot of HTML files, and these HTML files link to a lot of documents. The documents have ALL been renamed. I have an excel sheet which has the old name of the file and the new name of the file.
What would be the quickest way to change the links inside the HTML files to accommodate the new names?
The method I'm using now:
Have all the HTML files opened in Notepad++
Use Notepad++'s 'Replace in All Opened Documents' function to replace all occurrences of a certain link with the new file name.
Is there a quicker, better way?
Perl's regular expressions.
elaboration:
pseudocode
open up each file for read-only and read them into a list.
close the files
foreach element in the list
#do the desired text replacement
`s/$oldtext/$newtext/g`;
open each file once more now for writing
write out the new text.
It's not hard, but requires some testing. If you have a lot of edits(and more may happen later), this is more efficient.
There are several free and open-source tools that replace text in several files, one of the open-source ones is FART
If you prefer something with a GUI, try the free Text Crawler
First save the excel to somethine nice and simple like a csv file so its easy to read in you favourite language eg perl. Iterate over each file and do the search and replace. One gotcha though is to do it all in one pass otherwise you could create problems if there are links that have changed in complex ways. Ie if file a.html changed to b.html and b.html changed to a.html you can mess up the links if you do it in multiple passes. So load all the changes into memory then cycle through each file and replace all links in it simultaneously.
Because it is specifically html search and replace a tool like this would be ideal:
http://www.aliassoftware.com/
Finds and Replaces multiple text strings in multiple files at once !