Sometimes is hard to do just simple stuff with puppeteer. I was wondering if it is possible to use puppeteer to login to a dropbox link with password and then use someting like wget or curl to do the rest. I imagine that I would need to read and then pass on on some sort of access token after the login.
Would this be possible?
(yes, I know that using the dropbox API would perhaps be an easier and more correct solution)
I am not familiar how puppeteer stores cookies, but I am sure you can do that, see references below, and e.g. the Puppeteer API docu on cookies.
Here my 50 cents on wget and/or cURL with access control. In both cases it is possible to load cookies which provide you access to dropbox. Assume you have them stored in myAccessCookies.txt, you can reload them and use them e.g. with wget:
wget -qO- --load-cookies myAccessCookies.txt http://www.example.com/replaceWithDropboxLink
Another hint: To my knowledge, Dropbox allows to share direct, temporary links to files which do not require further authentication, see e.g. https://help.dropbox.com/files-folders/share/set-link-permissions - if this is not a security threat, and you can influence your counter-party to use these, everything is easy.
References and further reading
Superuser: How to download dropbox files using wget
https://www.apharmony.com/software-sagacity/2014/10/using-wget-with-cookies/ on how to use wget with cookies
Download Folder including Subfolder via wget from Dropbox link to Unix Server
How to save cookies and load it in another puppeteer session?
https://stackoverflow.com/a/56515357/6189678 - how to store and reuse cookies (suggestion by V.Kostenko)
Related
I'm moving the Mercurial repositories for all my open-source projects to OSDN (OSDN.net) from Bitbucket because Bitbucket will soon drop support for Mercurial. However, OSDN only supports SSH, not HTTPS, as a file exchange protocol, and ReadTheDocs does not support SSH URLs. The ReadTheDocs public API allows builds to be triggered, but does not support any way to provide the source files with the build trigger.
Or any documented way, at least. Does anybody know of a way to either push document source files to RTD with a build trigger, or connect an OSDN repository to RTD so that RTD can clone the source files itself?
Thanks.
OSDN does support both SSH & HTTP(S), for "writing" the only option is ssh. However, read-the-docs needs only to 'read'; https is fine (And supported, although a bit hard to find).
On OSDN, toggle the "RO|r/w" button, to see the other-URL. It's not a button, nor trigger; but it looks like it --The UX/UI design isn't very great ...
Copy that RO value (again: ignore the UI-feedback. You can copy the https-URL. And past it on RTfD.
Note: for now, I could get webhooks/integration working. So, you have to go read-the-docs to rebuild, after a push. Or use the curl webhook from e.g a Makefile locally, see: https://docs.readthedocs.io/en/stable/webhooks.html#parameters
I have a pdf link like www.xxx.org/content/a.pdf, and I know that there are many pdf files in www.xxx.org/content/ directory but I don't have the filename list. And When I access www.xxx.org/content/ using browser, it will redirect to www.xxx.org/home.html.
I tried to use wget like "wget -c -r -np -nd --accept=pdf -U NoSuchBrowser/1.0 www.xxx.org/content", but it returns nothing.
So does any know how to download or list all the files in www.xxx.org/content/ directory?
If the site www.xxx.org blocks the listing of files in HTACCESS, you can't do it.
Try to use File Transfer Protocol with FTP path you can download and access all the files from the server. Get the absolute path of of the same URL "www.xxx.org/content/" and create a small utility of ftp server and get the work done.
WARNING: This may be illegal without permission from the website owner. Get permission from the web site first before using a tool like this on a web site. This can create a Denial of Service (DoS) on a web site if not properly configured (or if not able to handle your requests). It can also cost the web site owner money if they have to pay for bandwidth.
You can use tools like dirb or dirbuster to search a web site for folders/files using a wordlist. You can get a wordlist file by searching for a "dictionary file" online.
http://dirb.sourceforge.net/
https://sectools.org/tool/dirbuster/
I am using Octave 4.0.0 for windows, and want to download stock prices from a web page that is open to all public. I use the following call:
data = urlread(https://www.netfonds.no/quotes/paperhistory.php?paper=API.A&csv_format=csv)
However, I get the following error message:
urlread: Peer certificate cannot be authenticated with given CA certificates
I have searched internet, including StackOverflow, for this error message, but do not understand the advices given there.
Q1: Is there something lacking on my pc? If so, what do I do?
Q2: Can I change the call somehow to adjust for something lacking on my pc?
Thanks in advance for any help : )
It appears that is a bug in urlread() for certain versions of Octave. For a course I'm doing, we changed this:
responseBody = urlread(submissionUrl, 'post', params);
to
[code, responseBody] = system(sprintf('echo jsonBody=%s | curl -k -X POST -d #- %s', body, submissionUrl));
Although the page is publicly available, the connection is encrypted. For an encrypted connection to make sense, it must use a key that you trust. The typical user does not thinks about whether to trust it, it leaves the job of deciding this to the OS or web browser (who then rely on certificate authorities). I am guessing this is your case.
The error you get is because the website you are accessing uses a key that was certified by something that urlread does not "trust". Ideally, you would have a single list of trusted certificates and all applications would use it. If your web browser trusts it, but the rest of your system does not, you have a configuration issue. Either your web browser is keeping its own list of trusted certificates, or libcurl (the library that urlread uses) is not finding the certificates installed on your system.
This "configuration" will be a directory with several .pem files. The specific certificate required for this website will most likely be named GlobalSign_Root_CA_-_R2.pem.
And it works here:
octave> data = urlread ("https://www.netfonds.no/quotes/paperhistory.php?paper=API.A&csv_format=csv")
data = quote_date,paper,exch,open,high,low,close,volume,value
20150508,API,Amex,0.39,0.40,0.39,0.40,85933,34194
20150507,API,Amex,0.40,0.41,0.38,0.39,163325,64062
...
For Windows a workaround is to use the curl command in the Windows console. This can be called by Octave via the system command. With the curl command you can chose the option '--insecure' that will also allow connection to websites without certificates. Only use this option if you're sure the website is safe.
sURLLink = 'https://www.netfonds.no/quotes/paperhistory.php?paper=API.A&csv_format=csv'
command=['curl --insecure ','"',sURLLink,'"'];
[status, output] =system(command);
We are planning on opening a company account on google drive which will be accessible to only company people.
The issue is we want to put several files on our drive and download them programatically. We tried using google drive APIs but the download speed it very low.
Then we also tried wget but that requires that all the files are made public which we cannot do.
Is there any way to use wget with credentials which will allow a file to be downloaded via an URL.
Our typical file size is 50GB.
There is actually a command from wget to specify user and password. Have you tried the following?
wget --user='username' --ask-'password' https://docs.google.com/'ItemType'/Export='DocumentId'
I have a shared web host and I am trying to figure out a way to download the latest copy of a private project from bitbucket onto the server.
The server does not have any versioning tools installed, but it does have scp and ssh with a jailshell level of access. It also has wget and curl...
Can I can do something like this?
scp ssh://hg#bitbucket.org/jespern/testrepo ~/public_html
I don't have a problem setting up the identity files / DSA keys, but I'm not exactly sure how the protocols are put together here so I need some help with the basic syntax.
Or, if scp is not the way to go, does ssh have an option for doing this? or is it possible to use CURL or wGet to grab the latest version of the repository and then reconstruct it on the server?
I am sure there is a way to do this, so please don't respond saying "it can't be done."
Thanks!
You can download from bitbucket using either http with URL like this:
http://bitbucket.org/jespern/rewsfeed/get/tip.tar.bz2
Notice how tip can be used in place of a revision ID in that URL form to always get the latest snapshot.
Alternately, you can just install Mercurial in your home directory on the shared web host -- people have succeeded in doing that on almost every webhost out there no matter how locked down they are.
Then you can just do: /home/me/bin hg clone ssh://hg#bitbucket.org/jespern/testrepo ~/public_html