How to download HTTP directory with all files and sub-directories as they appear on the online files/folders list? - html

There is an online HTTP directory that I have access to. I have tried to download all sub-directories and files via wget. But, the problem is that when wget downloads sub-directories it downloads the index.html file which contains the list of files in that directory without downloading the files themselves.
Is there a way to download the sub-directories and files without depth limit (as if the directory I want to download is just a folder which I want to copy to my computer).

Solution:
wget -r -np -nH --cut-dirs=3 -R index.html http://hostname/aaa/bbb/ccc/ddd/
Explanation:
It will download all files and subfolders in ddd directory
-r : recursively
-np : not going to upper directories, like ccc/…
-nH : not saving files to hostname folder
--cut-dirs=3 : but saving it to ddd by omitting
first 3 folders aaa, bbb, ccc
-R index.html : excluding index.html
files
Reference: http://bmwieczorek.wordpress.com/2008/10/01/wget-recursively-download-all-files-from-certain-directory-listed-by-apache/

I was able to get this to work thanks to this post utilizing VisualWGet. It worked great for me. The important part seems to be to check the -recursive flag (see image).
Also found that the -no-parent flag is important, othewise it will try to download everything.

you can use lftp, the swish army knife of downloading if you have bigger files you can add --use-pget-n=10 to command
lftp -c 'mirror --parallel=100 https://example.com/files/ ;exit'

wget -r -np -nH --cut-dirs=3 -R index.html http://hostname/aaa/bbb/ccc/ddd/
From man wget
‘-r’
‘--recursive’
Turn on recursive retrieving. See Recursive Download, for more details. The default maximum depth is 5.
‘-np’
‘--no-parent’
Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded. See Directory-Based Limits, for more details.
‘-nH’
‘--no-host-directories’
Disable generation of host-prefixed directories. By default, invoking Wget with ‘-r http://fly.srk.fer.hr/’ will create a structure of directories beginning with fly.srk.fer.hr/. This option disables such behavior.
‘--cut-dirs=number’
Ignore number directory components. This is useful for getting a fine-grained control over the directory where recursive retrieval will be saved.
Take, for example, the directory at ‘ftp://ftp.xemacs.org/pub/xemacs/’. If you retrieve it with ‘-r’, it will be saved locally under ftp.xemacs.org/pub/xemacs/. While the ‘-nH’ option can remove the ftp.xemacs.org/ part, you are still stuck with pub/xemacs. This is where ‘--cut-dirs’ comes in handy; it makes Wget not “see” number remote directory components. Here are several examples of how ‘--cut-dirs’ option works.
No options -> ftp.xemacs.org/pub/xemacs/
-nH -> pub/xemacs/
-nH --cut-dirs=1 -> xemacs/
-nH --cut-dirs=2 -> .
--cut-dirs=1 -> ftp.xemacs.org/xemacs/
...
If you just want to get rid of the directory structure, this option is similar to a combination of ‘-nd’ and ‘-P’. However, unlike ‘-nd’, ‘--cut-dirs’ does not lose with subdirectories—for instance, with ‘-nH --cut-dirs=1’, a beta/ subdirectory will be placed to xemacs/beta, as one would expect.

No Software or Plugin required!
(only usable if you don't need recursive deptch)
Use bookmarklet. Drag this link in bookmarks, then edit and paste this code:
javascript:(function(){ var arr=[], l=document.links; var ext=prompt("select extension for download (all links containing that, will be downloaded.", ".mp3"); for(var i=0; i<l.length; i++) { if(l[i].href.indexOf(ext) !== false){ l[i].setAttribute("download",l[i].text); l[i].click(); } } })();
and go on page (from where you want to download files), and click that bookmarklet.

wget is an invaluable resource and something I use myself. However sometimes there are characters in the address that wget identifies as syntax errors. I'm sure there is a fix for that, but as this question did not ask specifically about wget I thought I would offer an alternative for those people who will undoubtedly stumble upon this page looking for a quick fix with no learning curve required.
There are a few browser extensions that can do this, but most require installing download managers, which aren't always free, tend to be an eyesore, and use a lot of resources. Heres one that has none of these drawbacks:
"Download Master" is an extension for Google Chrome that works great for downloading from directories. You can choose to filter which file-types to download, or download the entire directory.
https://chrome.google.com/webstore/detail/download-master/dljdacfojgikogldjffnkdcielnklkce
For an up-to-date feature list and other information, visit the project page on the developer's blog:
http://monadownloadmaster.blogspot.com/

You can use this Firefox addon to download all files in HTTP Directory.
https://addons.mozilla.org/en-US/firefox/addon/http-directory-downloader/

wget generally works in this way, but some sites may have problems and it may create too many unnecessary html files. In order to make this work easier and to prevent unnecessary file creation, I am sharing my getwebfolder script, which is the first linux script I wrote for myself. This script downloads all content of a web folder entered as parameter.
When you try to download an open web folder by wget which contains more then one file, wget downloads a file named index.html. This file contains a file list of the web folder. My script converts file names written in index.html file to web addresses and downloads them clearly with wget.
Tested at Ubuntu 18.04 and Kali Linux, It may work at other distros as well.
Usage :
extract getwebfolder file from zip file provided below
chmod +x getwebfolder (only for first time)
./getwebfolder webfolder_URL
such as ./getwebfolder http://example.com/example_folder/
Download Link
Details on blog

Related

Keyboard short for uploading two files in PhpStorm

The problem
In PhpStorm I have a style.css- and a app.js-file that I have to upload to a server over and over again. I'm trying to automate it.
They're compiled by Webpack, so they are generated/compiled. Which means that I can't simply use the 'Tools' >> 'Deployment' >> 'Upload to...' (since that file isn't and won't every be open).
What I currently do
At the moment, every time I want to see the changed I've done, then I do this (for each file):
Navigate to the files in the file-tree (using the mouse)
Select it
The I've set up a shortcut for Main menu >> Tools >> Deployment >> Upload to..., where-after I select the server I want to upload to.
I do this approximately 100+ times per day.
The ideal solution
The ideal solution would be, that if I pressed a shortcut like CMD + Option + Shift + G
That it then uploaded a selection of files (a scope?) to a predefined remote server.
Solution attempts
Open and upload.
Changing to those files (using CMD + p) and then uploading them (once they're open). But the files are generated, which means that it takes PhpStorm a couple of seconds to render the content (which is necessary before I can do anything with the file) - so that's not faster.
Macro.
Recording a macro, uploading the two files, looking like this:
If I go to the menu and trigger the Macro, then it works. So far so good.
But if I assign a shortcut key and trigger that shortcut while in a file, then it shows me this:
And if I press '1' (for it to upload to number 1 on the list), then it uploads the file that I'm currently in(!?), and not the two files from my macro.
I've tried several different shortcuts (to rule out some kind of keyboard-shortcut-clash):
CMD + Option + CTRL + 0
CMD + Shift 0
CMD + ;
... Same result.
And the PhpStorm Macro's doesn't seem to give me that many options anyways.
Keyboard Maestro.
I've tried doing it using Keyboard Maestro.
But I can't get it setup right. Because if it can't find the folders (if they're off-screen or if I'm in a different project and forgot to adjust they shortcuts), then it blasts through the rest of the recorded actions, resulting in chaos. Ideally it should stop, if it can't find the file on the screen.
Update1 - External program
Even if it's not possible to do in PhpStorm, - are there then another program that I could achieve this with?
Update2 - Automatic Deployment in PhpStorm
I've previously used this, - but I've had happen a few times that I started sync'ing waaaay to many files, overwriting critical core files. It seems smart, but can possibly tear down walls if I've forgotten to define an ignore properly.
I wish there was an 'Automatic Deployment for theses files'-function.
Update3 - File Watchers
I looked into file-watchers ( recommendation from #LazyOne ). Based on this forum thread, then file watchers cannot be used to upload files.
It is possible to accomplish it using external program scp (Secure Copy Protocol):
Steps:
1. Create a Scope (for compiled files app.js and style.css)
2. Create a Custom File Watcher with scp over that Scope
Start with Scope:
Create a Local Scope with name scp files for your compiled files directory (I will assume that your webpack compiles into dist directory):
Then, to add dist directory into Scope, select that folder and click on Include Recursively. Apply and Move to File Watchers
Create a custom template for File Watcher:
Choose a Name
Choose File type as Any
Choose Scope as scp files(created earlier)
Choose Program as scp
Choose Arguments as $FileName$ REMOTE_USER#REMOTE_HOST:/REMOTE_DIR_PATH/$FileName$
Choose Working directory as $FileDir$
That's it, basically what we have done is every time when a file in that scope changes, that file is copied with scp to the remote server to the corresponding path.
Voila. Apply Everything and recompile your project and you will see that everything is uploaded to the server.
(I assumed that you have already set up your ssh client; Generated public/private keys; Added a public key in your remote server; And, know ssh credentials to connect to your remote server)
I figured this out myself. I posted the answer here.
The two questions are kind of similar but not identical.
This way I found is also not the best, since it stores the server password in clean text. So I'll leave the question open, in case someone can come up with a better way to achieve this.

How to include only a single folder in Bamboo build plan

I need Bamboo to build the project automatically when a file in "api" subfolder changes. When a file in any other subfolder changes the bamboo build plan shouldn't run.
Folder structure:
project
- api
- ui
- core
In the Plan Configuration repositories tab, from the "Include / exclude files" dropdown I have selected the following option
Include only changes that matches the following pattern
and I have tried the following patterns:
.*/api/.*
api/
api/*
api\/*
api/**
/api/*
but the build plan isn't running. With "Include / exclude files" dropdown set to None the build plan runs (but does so when a file changes in any other subfolder also)
I can't split the project up to different repositories.
What pattern should I use or is there any other solution for this?
Pattern that ended up working was
api/.*
It's a regular expression from the root of the checkout supposedly, although I have not used this feature. Here are some of their examples:
https://confluence.atlassian.com/display/BAMBOO052/_planRepositoryIncludeExcludeFilesExamples?_ga=2.91083610.1778956526.1502832020-118211336.1443803386
What you might try is let it checkout the whole thing without the include filter set, and don't let it delete the working directory. Look on the filesystem and verify the path from the root of the working directory. Then test your regex against the whole path relative from that working directory.

git and html files

I would like to keep two versions of a static html file in my git repository. Both are basically identical, except for links for scripts, media etc (dev version vs. live version).
Right now I keep the dev version in repo, and overwrite the live version values manually on the live machine (=I have local git changes there). I am not happy with this setup, because there's manual labour for each push/pull.
What is the best flow for managing files that cannot be split into config/rest sections (like HTML)?
You could...
Remove the file from your repository and just manually populate it. If it doesn't change very often, this works just fine.
Remove the file from your repository, and generate it from a template via a post-merge script in .git/hooks/post-merge (this hook is run, for example, after git pull).
Name the file after the branch or hostname or some other variable (e.g., static.master.html vs. static.develop.html, etc) and dynamically determine which one to use at runtime.
Those are some ideas. I imagine other folks will contribute additional suggestions.
Expanding on the 2nd bullet point by larsks:
You could keep two copies in the repo (say it were your homepage) index.dev.html and index.prod.html. On the remote, your post-merge script could do something like:
cp -a index.prod.html index.html
or
truncate -s 0 index.html
cat index.prod.html >> index.html
Another problem beside renaming is to keep the content of the both files in sync. So having dedicated files for the same reason only differing in one minor path is a lot of redundncy, if you change one, you have to think on updating the other as well.
OK, you stated that the HTML file is static, but here a line of PHP to generate the difference would solve our problem
Achim

Selectively updating working directory

I'm working on some code with a partner. Our make files differ slightly courtesy of different build setups. Because of this, so far we have not been tracking this file. However it would be nice to have at least one of ours tracked. The problem is, when that is done and the other person runs hg update, their copy gets update and the code won't compile.
Is there a way to track the file, but have it such that you can update the working directory selectively? Or is there some other way I should deal with this problem?
This is a slight variant of the standard "how do I deal with a config file" question. The standard answer in SVN, Mercurial, and Git is: don't track the file, instead track <file>.example. Then each user copies that over to <file> and tweaks it as needed.
But Makefiles are a bit smarter than config files: they execute code and can include other files. In which case, it starts making sense to track the Makefile normally and have it include another local file if it's present that overrides the default rules. For instance, the following will work with GNU Make:
# pull in any local user tweaks
-include Makefile.local
MQ extension is the best and The Right Way (tm) to do it (not easiest, but...)
Store common part of file in repo, individual personalisation - in own MQ-patches
Is it possible to combine your Makefiles? Then there is not chance of losing your different configurations by not storing them in version control.
For example, you could add a conditional statement based on the username. My username is ryan and this code echos my name, but if it is run on your computer, it probably will echo "not ryan."
all:
if [ `whoami` = "ryan" ]; then echo "ryan"; else echo "not ryan"; fi

How do I get a file off google-api-java-client?

OK i want download the the following open source code: http://code.google.com/p/google-api-java-client/source/browse/calendar-v2-atom-android-sample/?repo=samples
I am lead to believe you need to use mercurial for this but have yet to find a tutorial on how. Why is there not a download zip file type thing for this?
I am using eclipse.
How do I get this example?
You can get each file individually by clicking it then right-click "View raw file" in the right column and choose "Save" (which may say something slightly different depending on your browser.)
I don't know about this project, but usually you can download the samples in the zips on the downloads tab
If you want to do it faster than that, you can find a Mercurial client for most operating systems at https://www.mercurial-scm.org/downloads.
Once you have Mercurial installed, running the command
hg clone https://code.google.com/p/google-api-java-client.samples/ google-api-java-client-samples
will give you a full copy of the current version in the current directory.