Wget get past "infected with a virus" screen on Google Drive - google-drive-api

So I've been trying to get wget to download a Google Drive file that I uploaded. Unfortunately, Google Drive incorrectly flags the file as a virus, so wget can't get the direct download link.
Things I've tried:
using the gdrive.pl fie that someone made, but I'm on Windows, and /tmp/cookies.txt does not exist.
doing wget --no-check-certificate https://docs.google.com/uc?export=download&id=FILEID -O FILENAME, but it says 400 Bad Request
using https://docs.google.com/uc?export=download&id=ID, but it fails because of the download infected file warning.
Does anyone have any suggestions to solve this?

Here is what I was able to do, based on a starting point I found at https://medium.com/#acpanjan/download-google-drive-files-using-wget-3c2c025a8b99 :
Edit I noticed you said Windows, so this command with sed won't work natively in Windows - I'll put steps without sed for Windows below
You of course start by sharing the file and getting the file ID from the share link on google drive. Then:
wget --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate "https://docs.google.com/uc?export=download&id=SHARE_LINK_ID" -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p' > /tmp/confirm && wget --load-cookies /tmp/cookies.txt --no-check-certificate "https://docs.google.com/uc?export=download&confirm="$(cat /tmp/confirm)"&id=SHARE_LINK_ID" -O YOUR_FILENAME && rm /tmp/cookies.txt /tmp/confirm
Replace SHARE_LINK_ID with your ID from your shared file link. Replace YOUR_FILENAME with your desired output file name.
This attempts to download the file and gets the html of the warning message about potential viruses in the file. It uses cookies as you need to use the same session ID for the subsequent download with the confirmation code.
It then gets the generated confirm code from that response and writes it to a temporary file.
I then does another wget adding the confirmation code to the query string to download the file, using the saved cookie to allow the confirmation code to work for the saved session.
Most likely this could be worked into a script, passing an argument of the share link ID to make it more useful.
For Windows (without sed)
wget --save-cookies %TMP%/cookies.txt --keep-session-cookies --no-check-certificate "https://docs.google.com/uc?export=download&id=SHARE_LINK_ID" -O %TMP%/confirm.txt
Downloads the confirmation html.
notepad %TMP%/confirm.txt
Opens %TMP%/confirm.txt in Notepad to get the confirm code string (CTRL+F to look for "confirm=" and get the code right after that). Replace it in the below command line (along with putting in the filename you want and the share link ID from google drive)
wget --load-cookies %TMP%/cookies.txt --no-check-certificate "https://docs.google.com/uc?export=download&confirm=CONFIRM_CODE&id=SHARE_LINK_ID" -O YOUR_FILENAME
Delete the temp files:
del %TMP%/cookies.txt %TMP%/confirm.txt

Try this. Don't forget to replace two FILEID and one FILENAME fields with your desired file's file id and the output file's name respectively.
wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=FILEID' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=FILEID" -O FILENAME && rm -rf /tmp/cookies.txt
source: https://medium.com/geekculture/wget-large-files-from-google-drive-336ba2e1c991

Related

wget command to download web-page & rename file with with html title?

I would like to download an html web-page and have the filename be the title of the html page.
I have found a command to get the html title:
wget -qO- 'https://www.linuxinsider.com/story/Austrumi-Linux-Has-Great-Potential-if-You-Speak-Its-Language-86285.html/' | gawk -v IGNORECASE=1 -v RS='</title' 'RT{gsub(/.*<title[^>]*>/,"");print;exit}'
And it prints this: Austrumi Linux Has Great Potential if You Speak Its Language | Reviews | LinuxInsider
Found on: https://unix.stackexchange.com/questions/103252/how-do-i-get-a-websites-title-using-command-line
How could i pipe the title back into wget to use it as the filename when downloading that web-page?
EDIT: in case there is no way to do this directly in wget, i found a way to simply rename the html files once downloaded
Renaming HTML files using <title> tags
You can't wget a file, analyze it's contents and then make the same wget execution that downloaded the file magically go back in time and output it to a new file named after it's contents that you analyzed in step 2. Just do this:
wget '...' > tmp &&
name=$(gawk '...' tmp) &&
mv tmp "$name"
Add protection against / in name as necessary.

How to translate my cURL command into Chrome command?

I want to fire a POST request in command line, to post my image to a image searching site. At first, I tried cURL and get this command which works:
curl -i -X POST -F file=#search.png http://saucenao.com/search.php
It will post a file in FORM to the searching site and returns a HTML page result full with JavaScript which makes it hard to read in terminal. And it's also hard to preview online image in terminal.
Then I remember that I can open Chrome with arguments in command line, which I think may solve my problem. After some digging, I found Chrome switches, but seams it's just about Chrome starting flags (I'm not sure is this right, but I didn't find how to fire a post request like cURL do.)
So, can I use Chrome in command line to start it with a POST request just like my cURL command above?
There are a couple of things you could do.
You could write a script in JavaScript that will send the POST request and display the results inside the <body> element or the like;
You could keep the cURL command and use the -o (or --output) to save the resulting HTML in a file (but lose the -i switch, to avoid having the headers in the file), then open the file in Chrome or whichever browser you prefer. You could combine the two commands as a one-liner in any operating system. If you use Ubuntu, for example:
$ curl -o search.html -X POST -F file=#search.png http://saucenao.com/search.php && google-chrome search.html && rm search.html
According to this answer you could use bcat in order to avoid using a temporary file. Install it by apt-get install ruby-bcat and then just run
$ curl -X POST -F file=#search.png http://saucenao.com/search.php | bcat
I think the easier option is #2, but whichever you prefer.

Wget -i gives no output or results

I'm learning data analysis in Zeppelin, I'm a mechanical engineer so this is outside my expertise.
I am trying to download two csv files using a file that contains the urls, test2.txt. When I run it I get no output, but no error message either. I've included a link to a screenshot showing my code and the results.
When I go into Ambari Sandbox I cannot find any files created. I'm assuming the directory the file is in is where the csv files will be downloaded too. I've tried using -P as well with no luck. I've checked in man wget but it did not help.
So I have several questions:
How do I show the output from running wget?
Where is the default directory that wget stores files?
Do I need additional data in the file other than just the URLs?
Screenshot: Code and Output for %sh
Thanks for any and all help.
%sh
wget -i /tmp/test2.txt
%sh
# list the current working directory
pwd # output: home/zeppelin
# make a new folder, created in "tmp" because it is temporary
mkdir -p /home/zeppelin/tmp/Folder_Name
# change directory to new folder
cd /home/zeppelin/tmp/Folder_Name
# transfer the file from the sandbox to the current working directory
hadoop fs -get /tmp/test2.txt /home/zeppelin/tmp/Folder_Name/
# download the URL
wget -i test2.txt

HTTrack returns file not found

I downloaded a website with HTTrack using the following command:
/usr/local/bin/httrack https://www.website.com -O /Users/mainuser/Desktop/website -n -j
I than located the index.html file in website folder and run it. Chrome returns the message: file not found. That's funny, because normally the websites I parse with httrack work just fine on my file system. What cold be the reason for this behaviour?
Try the following, using the v (verbose) flag to enable HTTrack to give you as much debugging information as possible:
/usr/local/bin/httrack "https://www.website.com" -O "/Users/mainuser/Desktop/website" -n -j -v

Wget copy text from html

I'm really new to programming and linux/unix so I was wondering what command I can use to copy the text only of a webpage and save it in a file in the directory. I want to copy the text of something like this
http://cseweb.ucsd.edu/classes/wi12/cse130-a/pa5/words
would wget do it? also what specific commands get it saved into the directory?
Another option using wget like you wondered about would be:
wget -O file.txt "http://cseweb.ucsd.edu/classes/wi12/cse130-a/pa5/words"
The -O option lets you specify which file name you want to save it to.
One option would be:
curl -s http://cseweb.ucsd.edu/classes/wi12/cse130-a/pa5/words > file