problems of loading html using wget - html

I would like to download a html from twitter. I use wget but I can only download a small part of the members because it needs more time for loading the list. How can I download all this html?
I use now, stores the information in usa.dat
wget -c -N -p -O usa.dat https://twitter.com/IABM1995/lists/usa/members

check this
scrape websites with infinite scrolling

Related

How to extract whole html with complete styling after when user designed their page on my website?

Like weeebly, Wix i want to make a website on which user can be able to design their web page with predefined controls and styling. So how can i get or extract whole web page's html with complete styling ? please mention any link or solution
If you are using linux or mac, then I'd suggest using wget. As long as the website isn't blocking these types of download requests wget will download the entire website including resource files (-r) and create the folder structure that would make sense.
wget -r -p -e robots=off http://www.example.com
If the url you want to retrieve blocks this sort of download request, you'll then only receive the index.html using wget.
On windows I use https://www.httrack.com/ It's free and downloads the website just fine. Believe someone has created a windows version of wget as well.

How to translate my cURL command into Chrome command?

I want to fire a POST request in command line, to post my image to a image searching site. At first, I tried cURL and get this command which works:
curl -i -X POST -F file=#search.png http://saucenao.com/search.php
It will post a file in FORM to the searching site and returns a HTML page result full with JavaScript which makes it hard to read in terminal. And it's also hard to preview online image in terminal.
Then I remember that I can open Chrome with arguments in command line, which I think may solve my problem. After some digging, I found Chrome switches, but seams it's just about Chrome starting flags (I'm not sure is this right, but I didn't find how to fire a post request like cURL do.)
So, can I use Chrome in command line to start it with a POST request just like my cURL command above?
There are a couple of things you could do.
You could write a script in JavaScript that will send the POST request and display the results inside the <body> element or the like;
You could keep the cURL command and use the -o (or --output) to save the resulting HTML in a file (but lose the -i switch, to avoid having the headers in the file), then open the file in Chrome or whichever browser you prefer. You could combine the two commands as a one-liner in any operating system. If you use Ubuntu, for example:
$ curl -o search.html -X POST -F file=#search.png http://saucenao.com/search.php && google-chrome search.html && rm search.html
According to this answer you could use bcat in order to avoid using a temporary file. Install it by apt-get install ruby-bcat and then just run
$ curl -X POST -F file=#search.png http://saucenao.com/search.php | bcat
I think the easier option is #2, but whichever you prefer.

Wget copy text from html

I'm really new to programming and linux/unix so I was wondering what command I can use to copy the text only of a webpage and save it in a file in the directory. I want to copy the text of something like this
http://cseweb.ucsd.edu/classes/wi12/cse130-a/pa5/words
would wget do it? also what specific commands get it saved into the directory?
Another option using wget like you wondered about would be:
wget -O file.txt "http://cseweb.ucsd.edu/classes/wi12/cse130-a/pa5/words"
The -O option lets you specify which file name you want to save it to.
One option would be:
curl -s http://cseweb.ucsd.edu/classes/wi12/cse130-a/pa5/words > file

download html page for offline use

I want to make an html page available for offline viewing by downloading the html and all images / css resources from it, but not other pages which are links.
I was looking at httrack and wget but could not find the right set of arguments (I need the command line).
Any ideas?
If you want to download using the newest version of wget, get it using cygwin installer
and use this version
wget -m –w 2 –p -E -k –P {target-dir} http://{website}
to mirror {website} to {target-dir} (without images in 1.11.4).
Leave out -w 2 to speed up the progress.
For one page, the following wget command line parameters should be enough. Please keep in mind that it might not download everything including background images attached to CSS files etc.
wget -p <webpage>
Also try wget --help for a list of all command line parameters.

Wget recognizes some part of my URL address as a syntax error

I am quite new with wget and I have done my research on Google but I found no clue.
I need to save a single HTML file of a webpage:
wget yahoo.com -O test.html
and it works, but, when I try to be more specific:
wget http://search.yahoo.com/404handler?src=search&p=food+delicious -O test.html
here comes the problem, wget recognizes &p=food+delicious as a syntax, it says: 'p' is not recognized as an internal or external command
How can I solve this problem? I really appreciate your suggestions.
The & has a special meaning in the shell. Escape it with \ or put the url in quotes to avoid this problem.
wget http://search.yahoo.com/404handler?src=search\&p=food+delicious -O test.html
or
wget "http://search.yahoo.com/404handler?src=search&p=food+delicious" -O test.html
In many Unix shells, putting an & after a command causes it to be executed in the background.
Wrap your URL in single quotes to avoid this issue.
i.e.
wget 'http://search.yahoo.com/404handler?src=search&p=food+delicious' -O test.html
if you are using a jubyter notebook, maybe check if you have downloaded
pip install wget
before warping from URL