How do I Prevent Httrack From Downloading the Same File Again? - html

I am using httrack to download this website:
http://4minutearticles.com/
However, the problem is that the author has link back to the main page on every page of his website
For example http://4minutearticles.com/ext/
The Parent Directory Link Redirect to the main page
and the software start downloading again
How do I prevent this loop from happening?

Read the answer to the question on the link provided below:
"I have duplicate files!What's going on?"
Link: http://www.httrack.com/html/faq.html#Q1b11
Also have a look at the "Filters:Advanced" on following link:
http://www.httrack.com/html/filters.html
It may help you on your issue.

You can use filters to stop HTTRACK from downloading same files or folders. You can do this by clicking the "Set options" button in front of the "Preferences and Mirror options" label, then opening the "Scan Rules" tab and then the "Exclude links" button to set the rules as you want.

This is generally the case for top indexes (index.html and
index-2.html).
This is a common issue, but that can not be easily avoided!
For example, http://www.foobar.com/ and
http://www.foobar.com/index.html might be the same pages. But if links
in the website refers both to http://www.foobar.com/ and
http://www.foobar.com/index.html, these two pages will be caught. And
because http://www.foobar.com/ must have a name, as you may want to
browse the website locally (the / would give a directory listing, NOT
the index itself!), HTTrack must find one. Therefore, two index.html
will be produced, one with the -2 to show that the file had to be
renamed.
It might be a good idea to consider that http://www.foobar.com/ and
http://www.foobar.com/index.html are the same links, to avoid
duplicate files, isn't it? NO, because the top index (/) can refer to
ANY filename, and if index.html is generally the default name,
index.htm can be choosen, or index.php3, mydog.jpg, or anything you
may imagine. (some webmasters are really crazy)
Note: In some rare cases, duplicate data files can be found when the
website redirect to another file. This issue should be rare, and might
be avoided using filters.
See also: Updating a project

Related

Custom Tumblr theme wont save because of non-https urls?

So yesterday i was happily editing the theme of my tumblr blog and everything was working fine. Go into same blog to day and it brings up thsi when i click save:
"Un oh! We could't save your theme. Looks like your custom theme references assets from non-HTTPS Urls. Please try again using only HTTPS Urls."
Super confusing because not urls have been add since yesterday and everything was fine then. Same thing is happening with my other blogs with custom themes. I even went through and deleted all the urls on the html page just to see if it would do anything and the same alert came up. What is going on ??
Please help
Cheers
Just had the same problem. Tumblr updated their Encryption policies.
If you're a theme developer and you'd like to ensure your themes
support HTTPS, make sure that any externally hosted resources, such as
Cascading Style Sheets (CSS) or Javascript files, and even images, are
served using HTTPS.
As we now know that Tumblr requires that we use HTTPS instead of HTTP, here's how can we solve the error:
Make sure that you are in the customize section and access "Edit HTML":
In the "Edit HTML", press Ctrl+F (or press the Settings button and then "Find and replace"):
Search for "http" and Replace with "https" - apply that to all.
As the previous step can cause some links to be "httpss", this needs to be fixed. By accessing the Find and Replace and Search for "httpss" and Replace with "https".
The steps bellow should solve your problem. If they don't, see "Extra considerations" below, more specifically, point 1.
Extra considerations
I've done all the above, but it didn't solve my problem. What should I do?
When one has android-app://, for example:
<link rel="alternate" href="android-app://com.tumblr/tumblr/x-callback-url/blog?blogName=goncalomperes" />
One will need to add [https], as following:
<link rel="alternate" href="android-app:https://com.tumblr/tumblr/x-callback-url/blog?blogName=goncalomperes" />
As #mchid suggested in the comment, apart from android-app://, we will also need to do the above for: "//, ios-app://, and http-equiv.
Accordingly to Tumblr support:
Yet another update: SSL is now being turned on by default for ALL
Tumblrs that use our Official theme on the web. Even though we don’t
recommend it, you can still turn it off in your blog settings.
So changing the Encryption section to allow SSL should not be the problem.
Ok im a goose. Looks like tumblr has changed their requirements on http. I know it sounds obvious but i couldn't tell why it was happening on every theme apart from their default theme. The reason is you need to go in and change the tumblr links to css and java from http to https "http://static.tumblr.com/xlsgtjb/WEMoeha97/style.css becomes https://static.tumblr.com/xlsgtjb/WEMoeha97/style.css" If you still get the alert after this try searching for other urls and delete or change them to https
I have the same problem, and I thought all I have to do was changing the encryption to "Always serve blog over SSL" in the blog settings, like this: Blog settings, encryption
Apparently not, because the problem isn't just in the blog URL but also in the customization section.
So you need to enter the section, go through all the code, find the http URLs and change them into https URLs.
Before you begin, make a backup of your existing html in case there is an issue. There are a few ways to do this but I recommend doing both of the following.
First, select all in the Tumblr html editor and copy and then paste the contents into a text file text editor on your computer and save the file. This backup is preferred.
Next, save a copy of the html for your main tumblr page. You can use wget which will result in an index.html file or you can right-click on your page, select "view source" and then select all, copy, and then paste that into a text editor. If the preferred backup fails for whatever reason, this one can be used as an alternative.
Now, to fix the problem.
First, open the Tumblr html editor and left-click anywhere in the html code and then press CTRL+F to use the "Search For" and "Replace With" feature.
Search for: http:// and replace with: https:// and then click on All to replace all.
Search for: "// and replace with: "https:// and then click on All to replace all.
Search for: android-app:// and replace with: android-app:https:// and then click on All to replace all.
Search for: ios-app:// and replace with: ios-app:https:// and then click on All to replace all.
Search for: http-equiv and replace with: https-equiv and then click on All to replace all.
Finally, click on Update Preview to verify your changes. If everything looks good and your page displays fine, click Save.
As mentioned by others, the CSS fields are most important. If you are still getting an error, Search For css and click through the results while inspecting the code that follows under each CSS section to make sure all links are https. This is how I discovered "//
However it should be noted that, at least for me, the code did update despite the error. Even when I got an error, I noticed that the changes were applied and remained after closing and reopening the html editor.

How to make working path in HTML?

So, currently I'm making a website. It's an assignment. And when I tried to open it on different computer, it didn't work.
So, for example: "a href="file:///E:/assignment/main page/index.html#"
It did work on my computer, but it won't work on another. I need it to work at any computer.
There are two halves to your question:
How do I make my website accessible anywhere?
You need a web server, or you need to use a hosting company. GoDaddy, 1and1, HostGator, and other hosting companies have computers (web servers) that are configured to show their webpages to anyone in the world. They cost around $10 per month, and you end up with the ability to create links such as http://example.com/myproject/index.html
It's possible that your professor will let you put your web pages on one of his drives that are accessible anywhere on campus. Otherwise, a flash drive can do in a pinch. Put the files onto a flash drive and then bring the flash drive to class.
Is there a better way to write links?
Most websites use relative URLs in their links. For example, Stack Overflow, instead of writing every link as http://stackoverflow.com/whatever, will usually use a relative URL instead: /whatever.
There are a few simple rules that your browser follows when turning an href tag into a web address (in this example, we're starting from this page: http://stackoverflow.com/questions/15078748/how-to-make-working-path-in-html#15078792)
If the link starts with http:// (or anything else that comes before
a ://), then your browser will take you exactly there. For example:
http://stackoverflow.com takes you to the Stack Overflow home page.
If the link starts with /, then the browser will take you out of
any subfolders before executing the rest of the link. For example:
/election will take you here: http://stackoverflow.com/election
If the link starts with ../, then it will send you exactly one folder
up. This can be done multiple times. For example. ../ will send you
here: http://stackoverflow.com/questions/ .
If the link starts with a
question mark, ampersand, or hash tag, (?, &, #) then it will usually append
this to whatever page you are currently on. #example would take you
to
http://stackoverflow.com/questions/15078748/how-to-make-working-path-in-html#example
.
Finally, the browser will keep you in your current folder, then
send you to that link, for example: example will send you here:
http://stackoverflow.com/questions/15078748/example
You must use relative paths not absolute paths.
In simple words, you have to write:
...
to link to index.html a page which is in the same directory as your file index.html;
examples:
./my_page.html
use the "./" for linking pages in the same directory;
if the source and dest pages are in different folders, you shall use:
../my_page.html
or
./folder_path/my_page.html
according to the relative paths of the pages.

The links to .xls files will not work on my intranet site?

I am currently building an asp intranet site.
There are various helpful links that I need to include and some of them happen to be .xls files that are located on a local network within the company.
I link these documents just like I would any word docs (which work fine by the way).
<span>Schedule</span>
The link above works if I simply copy and paste the raw address into my browser (a pop-up window comes up asking me to open the file in Excel). But when I make this a link on the intranet site and try to click on it, nothing happens. I can see the link when I hover over it on the status bar but that's it. It is non-clickable. Anyone have any idea what is causing this and how to fix it?
I should mention that two of these .xls files are password-protected but one of them is simply a read-only file which can be opened by anyone.
I am 100% sure this has nothing to do with css styling because the same thing happens in the current (old) intranet site made by someone else and I use these links on different menu bars as well.
I think you use wrong syntax for shared files, try this:
file:///P:\-Projects-\SCHEDULE.xls
Backslashes are still valid for the path part. Moreover, I'm not sure whether Sharepoint may recognize correctly path to most likely network drive P:.
For me such link to local share works:
file:///\\fs-1\Install\Windows\Servers\DB\MSSQL\SQL2005\en_sql_server_2005_service_pack_4_x64.exe
The solution to this problem is to add the site to the "Trusted Sites" list.
Opening intranet files without the user knowing is considered a secruity threat.
In IE go to Internet Options -> Security -> Trusted Sites then add the site.
http://answers.microsoft.com/en-us/ie/forum/ie9-windows_7/after-latest-update-ie-wont-open-network-file/172e4ac3-1c1f-4948-8a3f-c8c344eae06d

Mediawiki: configuring the entry page, adding a new page

Have a wiki installed in our organization, and want to start using it.
Failed to find the answers for the next 2 basic questions:
How do I configure the entry page to show a list of all existing pages
How do I create a new page (!). Only succeeded doing it by typing a url of an non existing page. Guess there are nicer methods for this
Thanks
Gidi
For how to show a list of all pages, look at DynamicPageList, which is part of MediaWiki. (There's a more advanced third-party version, but it's not needed for such a simple task.)
Creating a new page really is exactly as you said: Type a URL and save some edits. Most beginning editors will edit a link into a page, and then use that link to browse to the page, so that they don't accidentally forget the spelling and lose the page to the Ether. (Of course it would show up in the recently edited and other special pages.)
This is more of a webapps.stackexchange.com question though.

How to link a relative html file in the scenario where user can call the files from the browser by adding a / at the end

(Sorry I am not able to frame question correctly.)
Following is the scenario.
I have 2 Html files.
File1.Html has
Click Me
File2.Html has
Click Me
Now when I open the file1.html in browser by typing following in browser.
http://Localhost/File1.html
The file1.html with a link is shown and when clicked it goes to
http://Localhost/File2.html
BUT
If I open the file1.html in browser by typing following in browser(note the / at the end).
http://Localhost/File1.html/
The file1.html with a link is shown and when clicked it goes to
http://Localhost/File1.html/File2.html
I know this is not a right way to do in browser but you cant stop user doing so.
The above example I have used just to simplify the issue. My real production issue issue is while using the MVC url are actually routed. So a user can legally use http://example.com/Employee Or http://example.com/Employee/ and due to this my jqGrid is not working.
Please guide me for a workaround.
UPDATE:
This works ok in IExplorer : wierd.
You want a link relative to the root. The following:
Click Me
(note the '/' at the start of the href) will link to http://Localhost/File1.html wherever the page containing the link is (so long as it's on the same host).
not relative to root i need it relative to parent
That's not possible. If you are using routed URIs there can be all sorts of /path/segments following the base name. The browser has no way of knowing what the real ‘parent’ is.
The usual solution is to use root-relative URIs as suggested by Joe. If you need to allow your application to be mounted at a configurable prefix under the root, that prefix will need to be copied out into the link.
Your question reminds me of a technique for search friendly URLs, implemented in PHP.
Things like:
http://localhost/index.php/2009/09/
It was described on Sitepoint.com The idea was that index.php could retrieve the trailing part of the URL from the web server and decide what to do with it. Including whether to deal with a final / or not.
It won't be relevant to html files (which could not, after all, retrieve the trailing part of a URL) but it might provide further ideas.