download html page for offline use - html

I want to make an html page available for offline viewing by downloading the html and all images / css resources from it, but not other pages which are links.
I was looking at httrack and wget but could not find the right set of arguments (I need the command line).
Any ideas?

If you want to download using the newest version of wget, get it using cygwin installer
and use this version
wget -m –w 2 –p -E -k –P {target-dir} http://{website}
to mirror {website} to {target-dir} (without images in 1.11.4).
Leave out -w 2 to speed up the progress.

For one page, the following wget command line parameters should be enough. Please keep in mind that it might not download everything including background images attached to CSS files etc.
wget -p <webpage>
Also try wget --help for a list of all command line parameters.

Related

How to extract whole html with complete styling after when user designed their page on my website?

Like weeebly, Wix i want to make a website on which user can be able to design their web page with predefined controls and styling. So how can i get or extract whole web page's html with complete styling ? please mention any link or solution
If you are using linux or mac, then I'd suggest using wget. As long as the website isn't blocking these types of download requests wget will download the entire website including resource files (-r) and create the folder structure that would make sense.
wget -r -p -e robots=off http://www.example.com
If the url you want to retrieve blocks this sort of download request, you'll then only receive the index.html using wget.
On windows I use https://www.httrack.com/ It's free and downloads the website just fine. Believe someone has created a windows version of wget as well.

How to translate my cURL command into Chrome command?

I want to fire a POST request in command line, to post my image to a image searching site. At first, I tried cURL and get this command which works:
curl -i -X POST -F file=#search.png http://saucenao.com/search.php
It will post a file in FORM to the searching site and returns a HTML page result full with JavaScript which makes it hard to read in terminal. And it's also hard to preview online image in terminal.
Then I remember that I can open Chrome with arguments in command line, which I think may solve my problem. After some digging, I found Chrome switches, but seams it's just about Chrome starting flags (I'm not sure is this right, but I didn't find how to fire a post request like cURL do.)
So, can I use Chrome in command line to start it with a POST request just like my cURL command above?
There are a couple of things you could do.
You could write a script in JavaScript that will send the POST request and display the results inside the <body> element or the like;
You could keep the cURL command and use the -o (or --output) to save the resulting HTML in a file (but lose the -i switch, to avoid having the headers in the file), then open the file in Chrome or whichever browser you prefer. You could combine the two commands as a one-liner in any operating system. If you use Ubuntu, for example:
$ curl -o search.html -X POST -F file=#search.png http://saucenao.com/search.php && google-chrome search.html && rm search.html
According to this answer you could use bcat in order to avoid using a temporary file. Install it by apt-get install ruby-bcat and then just run
$ curl -X POST -F file=#search.png http://saucenao.com/search.php | bcat
I think the easier option is #2, but whichever you prefer.

How to parse HTML into .txt format using C

I need to parse HTML into .txt format using C.
An example - it has to detect each
1. <p>
2. <tr>
3. <ul> etc...
and convert them into text (in a document)
Can somebody help please?
I think, the easiest way to download a html webpage in c is to use libcurl. Assuming you have already set up your development environment, follow these steps:
Visit the download page of libcurl and download its latest version.
take a look at the install page and learn how to install the library. For Linux, the installation is pretty straight forward, just type ./configure && make && make install in the terminal.
Download the url2file.c example of libcurl. The <curl/curl.h> header file which is exposed in this file essentially provides necessary functions to let you communicate with the web server.
Next, compile the url2file.c using gcc -o url2file url2file.c -lcurl.
Finally, test url2file using ./url2file http://example.com. The results will be stored in page.out file which is in plaintext.
NOTES:
You need to install the libcurl in order to be able to compile the url2file.c file, unless it will throw a fatal error.
If you have already installed the curl program on your machine, you can download webpages using curl http://example.com > page.out command in the terminal.
Also, wget lets you to download and store webpages: wget http://example.com.
This answer stores a webpage as a pliantext. It doesn't perform any specific html tag processing.

problems of loading html using wget

I would like to download a html from twitter. I use wget but I can only download a small part of the members because it needs more time for loading the list. How can I download all this html?
I use now, stores the information in usa.dat
wget -c -N -p -O usa.dat https://twitter.com/IABM1995/lists/usa/members
check this
scrape websites with infinite scrolling

Wget copy text from html

I'm really new to programming and linux/unix so I was wondering what command I can use to copy the text only of a webpage and save it in a file in the directory. I want to copy the text of something like this
http://cseweb.ucsd.edu/classes/wi12/cse130-a/pa5/words
would wget do it? also what specific commands get it saved into the directory?
Another option using wget like you wondered about would be:
wget -O file.txt "http://cseweb.ucsd.edu/classes/wi12/cse130-a/pa5/words"
The -O option lets you specify which file name you want to save it to.
One option would be:
curl -s http://cseweb.ucsd.edu/classes/wi12/cse130-a/pa5/words > file