I am working on a project in which I need to find all files in a website. For example my webpage contains index.html and a PDF file.
How others can find out that there is a PDF file in my website domain?
You need a scraper of some sort.
ex) http://scrapy.org/
You can traverse a webpage by their links.
If you think of a page as a node and link as children
you can easily cover all files of that website.
If that specific website shows every link in its pages,
this method is possible, if not then you have to use
other means such as search engine to look through it's indexed pages.
Related
I made a directory called downloads where I will have files that I want people to be able to access with the direct link on my Jekyll site, however I don't want them to be able to go to https://example.com/downloads and then see all of the downloads.
How can I turn off visitors from seeing the index page?
There are two approaches.
You can make an actual /downloads/index.html page that will be shown instead of the default directory-like page.
You can use the jekyll-redirect-from plugin and redirect the url /downloads/ to your 404 page, for example.
I am trying to covert a wordpress website into a simple html/css website but the problem is that whenever I use httrack, it downloads the whole wordpress files making it hard for me to extract the simple html/css files
Is there away to solve that using httrack? or any other method?
in the httrack url list, add the list of pages you actually need. For Example, if the website you wish to extract is abc.com, then instead of just writing abc.com mention the pages you want For Example abc.com/index.php abc.com/about.php
This will reduce the unnecessary files required. Also, the httrack tries to create folder similar to the host. So this is the max that you can do. Inspect Element goes handy while editing these mirrored sites.
I run a website for my photography where I have a stories page (http://www.traumantic.com/stories.htm) that is a long list of choices that lead to a sub folder and a gallery of images for that session.
I have an index.htm file in each of those folders that displays the gallery chosen.
I am trying to develop a new format for my pages, and putting it in place means replacing dozens of index.htm files and editing each one for that new format. A boatload of work.
I have noted that a lot of news sites seems to have a method of using a single template for the main body of the page and the elements of the news story are pulled in from another source.
I figured I could do this with XML like I did with my galleries, but I am lost.
I tried creating an XML file in a couple of text folders and then reading that form an HTM file two levels up. Didn't work.
Currently when you click on a link on my stories page, it opens the index.htm file in a sub-folder.
What I want to happen is this.
Clicking on a choice on my stories page launches an html template that reads the details from the folder.
The one html template would be used for all of the different story folders below. Making it far easier to modify the look of my web site quickly.
I'd rather put a ton a of work into designing this system that doing a mass replace and edit project on hundreds of files.
I hope this makes sense to some of you and that you can guide me to some study topics that will help me learn how to do this.
I am seeking advice on places where I can see example of this process.
The simplest option is to use an iframe
https://www.w3schools.com/tags/tag_iframe.asp
<iframe src="/path/to/file.html"></iframe>
Searching "html include" will yield a few guides that have various JavaScript implementations. (e.g., https://www.w3schools.com/howto/howto_html_include.asp)
If you're able to run php, you could use include
https://www.w3schools.com/php/php_includes.asp
But at that point, you might want to consider installing some sort of template engine like twig https://twig.symfony.com/doc/2.x/intro.html
It's easy to find all children pages of one webpage. But it is not trivial to get all parent pages of one webpage, how can I do that by using Google?
You can't and Google cannot help you as it doesn't index all of the web.
At best it follows links on other pages, or is initiated by someone wanting to have something indexed explicitly.
Create a server. Put an HTML page on it that no other page on your server has a link to. Name the page with some non-guessable UUID in the name.
Google will not find this unless they start to randomly change parts of URLs to test for existing pages (a lengthy process).
Within that page you can have links pointing to other pages. It is a parent page for those pages, is a web page, and will not be found via Google.
I'm trying to create an editable page in Sharepoint. I already have the page in HTML (it's quite large) and it has many images in it. Previously I have just created a new page in sharepoint and pasted the HTML source in, the uploaded/inserted the images manually, one at a time.
Unfortunately, I am not able to do this in a reasonable amount of time since there are many images this HTML file is using.
So, I want an editable Sharepoint page that keeps the images intact from a directory that looks like this:
thepage.html
1.png
2.png
...
...
...
343.png
etc
Any ideas?
EDIT: For more clarity - this is a specifications document in HTML form, so it has a lot of text and header integrated with images. I'd like it to be converted to an actual Sharepoint Page that is editable from Sharepoint's interface.
Seems best here to use a low-tech solution, some HTML editing and use the best way for you to upload multiple files.
Assuming
C:\mypage
-> \page.html
-> \images\1.png
-> \images\2.png
...
-> \images\100.png
Via the UI
Go to a Document or Image library, and use the "Upload Multiple files/images" (this only appears on Internet Explorer)
Lets say you uploaded it to //sharepoint/myimages
Create a new content page (say an Article page, or WebPart Page with a Content Editor WebPart)
Lets say your page resides now at //sharepoint/pages/mypage.aspx
Change your html to point from <img src="images/1.png" /> to <img src="../myimages/1.png" />
Edit the HTML for your newly created page (Ribbon > Edit HTML Source), paste your HTML code
Via SharePoint Designer
Drag and Drop all the images in your desired location
repeat the HTML steps above
To replace text in bulk, SharePoint Designer, your favorite HTML editor or event Notepad can do that well using the CTRL+H menu / Edit > Find & Replace options.
NOTE: the //sharepoint address up there is the http url for your site, SO won't let me use a full fake address as a sample.
From IE or from Word, save the page as a complete webpage so it creates an HTML file plus a folder with the images.
In network places, create web folder (WebDAV) pointing to Sharepoint. This way, you can access it from the file system in Explorer.
Open your new network place, navigate to the library where you want your HTML file to be, and drag-n-drop the file and folder into there.
The file then will be visible in browser, with the pictures, but the folder will be hidden.
If I have understood correctly your question. You can use this post answer to load list of images by javascript and php ->
Load list of image from folder.
Upload files to Share Point server and use that folder.
Or you can dynamically write c# code to read Share Point folder and display images.