mass change link in html website - html

I took over an old HTML based site with all hard coded links, no frames etc. Theres who knows how many pages that have a link to abc.html (<--example).
I've been asked to go through the pages and change the abc.html link to 123.html (<--another example).
I could download the entire site from via FTP then use find and replace to go through all the files, then upload the changes.
Problem is the site is poorly organized and heavily nested so theres probably several hundred mg of junk I'd have to download just to be sure.
The other option is to change the html code of abc.html and put in something like
We've moved, you are currently being
redirected.
And use some sort of redirect.
Anyone have any other ideas on how to do this?

Why not using a software such as Actual Search and Replace ?

You will need to return HTTP 301 Moved Permanently on old links so that the search engines know that the content has moved and not just disappeared.

I made a list of all the files that contained the old link using
grep -lir "some text" *
(above taken from comandlinefu.com)
I then used the following command to replace all the matching text accordingly.
find . -name '*.html' -exec sed -ir 's/old/new/g' {} \;
(also taken from commandlinefu.com)
I used the sed version as it created backups of the html files and named them *.htmlr
Not ideal as I now have more junk, but I can easily delete them with
rm *.htmlr

Related

wget downloads the same html for every version of the website

I'm attempting to download the html using wget for this website:
https://cxcfps.cfa.harvard.edu/cda/footprint/cdaview.html#Footprints|filterText%3D%24filterTypes%3D|query_string=&posfilename=&poslocalname=&inst=ACIS-S&inst=ACIS-I&inst=HRC-S&inst=HRC-I&RA=210.905648&Dec=39.609177&Radius=0.0006&Obsids=&preview=1&output_size=256&cutout_size=12.8|ra=&dec=&sr=&level=&image=&inst=ACIS-S%2CACIS-I%2CHRC-S%2CHRC-I&ds=
Which is a version of the main website:
https://cxcfps.cfa.harvard.edu/cda/footprint/cdaview.html
The only difference from the main website is that the first link takes you to the version that has already searched through a database and displayed results, which you can see in a table. But when I use wget to download the text version of the html for the longer link, but it gives me the exact same text as for the main/short link. I'm confused, but maybe I just don't understand enough about html. I thought they should be slightly different, display the text-html for the database results, etc.
I also used the --mirror option to download all the necessary files, but they all look the same, too. I've also tried using cURL for this too, and the same thing. Can someone please explain why this is happening and if it's fixable?
The problem is that the main website has a lot of javascript and other code that is not included in the version that you are downloading. The --mirror option will download all the necessary files, but it's not going to be exactly what you want. You can use wget to download the HTML file from the main website, then use wget again with the --mirror option to download all the necessary files. Then you can use grep to search through the HTML file for the table that you want.

Pre-Linking Pages in HTML Before Uploading

I am currently creating a website with over 700 pages and I would like to be able to link them together before I upload the files to my host server if possible. Is there a good way to link pages together pre-upload without for sure knowing what the final URLs are going to be?
I am working in and plan to upload/manage my website files through Dreamweaver.
I have seen the prompt in Dreamweaver to update links before. If I link the file paths now, will it update to the URLs when the site is uploaded?
you need to use root-relative links. do some searching on that. as long as you don't change your file structure you will be good where your site is run.
instead of using absolute links such as http://www.website.com/folder1/page1
you would use /folder1/page1
as long as your root was where you started the paths from you can start with "/" as above.
there are some instances where you would do a relative link from a certain folder to another one ../folder1/page1 this is not something i would recommend here.
good luck and comment on this if you have more questions.

No Static Footer or Header

My company just bought out another company and I have to change some links on their site to point to our site. However, this site doesn't have a static footer or header (as in, each link is recreated on each HTML page). So instead of changing the necessary files (30+), is there any other way to do a sweeping change?
Thanks.
While there are several methods, the one that I would recommend would be to use a server side include file.
My recommendation would be to follow these steps, approximately:
Copy the header / nav contents from one of your HTML files into a new PHP include file (called, for example, header.php).
Edit each HTML file, removing the header / nav contents, and including the file - that would look something like this: <?php require_once 'header.php'; ?>
Repeat for the footer, if that has "common" links and markup.
While this may take longer initially, the very first time you have to make any updates it will pay off.
Lastly, there are ways - if necessary - to (utilizing PHP) make the current nav item have an active class, etc. That's a bit of a stretch for this answer, but this answer may get you going in the right direction.
If you have access to these files on a GNU/Linux machine, use sed:
sed -i 's|http://oldcompany.com|https://newcompany.net|g' /dir/of/static/files
The -i flag does infile replacements, so each file under the given path is will be searched for the first URL and replaced accordingly.
Please note that this will just change links like http://oldcompany.com/team to http://newcompany.net/team. It also does not change links for https://oldcompany.com which would require modifications of the sed expression. Please give more information on how the links should be altered so we can provide solutions for your specific problem.

Is it bad practise to start links with "/" in html?

My website code sample:
<a href=/post/64/page-name><img src=/img-folder/2015/09/image.jpg></a>
<div id=cont2><a href=/post/64/page2>page 2 link</a></div>
My first question is, can I start links just with /? Is it a bad practise? because all website sources that I looked it starts with www.website.com/... not just /
Second question is quotes. It is not needed since html 2.0, but is it important in the example above?
My website is having some problems on google to show correctly... may it be because this problems?
It isn't bad practice. A URL starting with / is merely a relative URL that's relative the the base path. You're using it just fine.
Another example usage is when you want to reference a CSS or JavaScript file and you're deep down into the path.
<script src="/scripts/main.js"></script>
Then, no matter where the user is on your site, they'd always request http://example.com/scripts/main.js. Where example.com is your site's domain.
Additionally: Always quote attribute values. (attribute="value" and not attribute=value).
/ means start of where you are currently. So if your resource is located under same directory, you are allowed to use / to start with. If you refer to external resources, you can't use / to start. (E.g. www.google.com means google website, while /www.google.com means folder under your current directory named www.google.com, like http://localhost/www.google.com)
Quotes are needed when you use white-space in your attributes values (e.g. class="my super classs-name that has white-space" | class=my super classs-name that has white-space).
My website is having some problems on google to show correctly SEO stuff. What problems? Your page is not under first page of Google search? It's separate topic about that.
It is not forbidden. When you start your link with slash / it just a path relative to base element.
You can read more about BASE element here: http://www.w3.org/TR/html4/struct/links.html#h-12.4
For example, if you are already at: http://example.com/folder/index.html
/posts/index.html would link to: http://www.example.com/posts/index.html
posts/idnex.html would link to http://www.example.com/folder/posts/index.html
if you reference external sources you have to add the full path/adress
if you reference local resources its up to you.(more or less) take a llok at How to properly reference local resources in HTML?
You should use either double " or single ' qoutations - thats a good practise at least.
But you dont have to if there is no whitespace.
When you start your link with "/" its mean that you start from the root directory.
Example: Your website is in the directory /web/html.
When you now start your link with "/" its goes to the root folder. In this case the web folder.
I know this is old and answered, but it came up on Google when I was searching for something similar, so I just wanted to add to the answers.
Sometimes, when I need to do something real quick with simple HTML site that doesn't require a server, I usually just open index.html in Terminal to quickly preview the page in browser. However when you open your site like that, using the leading slash to load resources (ie. /js/main.js) won't work. That's because when you load your website by opening a file in your browser, the browser takes the root of your drive as the base path for your website.
So if you have your files like this for instance:
drive/Users/username/Documents/www/index.html
drive/Users/username/Documents/www/js/main.js
And you reference your script like this:
<script src="/js/main.js"></script>
The browser will think you're actually pointing here (if you open the file directly in browser):
drive/js/main.js
Because / in this case means drive and not the website's root (www in this case) folder as it would on a server.
Nope, it is not a bad habit to put '/' when starting links. But not having a quote in every html attribute? I don't think so. But i would suggest to put quote(") in every html attribute for it to be more readable.

Is there a way to export a page with CSS/images/etc using relative paths?

I work on a very large enterprise web application - and I created a prototype HTML page that is very simple - it is just a list of CSS and JS includes with very little markup. However, it contains a total of 57 CSS includes and 271 javascript includes (crazy right??)
In production these CSS/JS files will be minified and combined in various ways, but for dev purposes I am not going to bother.
The HTML is being served by a simple apache HTTP server and I am hitting it with a URL like this: http://localhost/demo.html and I share this link to others but you must be behind the firewall to access it.
I would like to package up this one HTML file with all referenced JS and CSS files into a ZIP file and share this with others so that all one would need to do is unzip and directly open the HTML file.
I have 2 problems:
The CSS files reference images using URLs like this url(/path/to/image.png) which are not relative, so if you unzip and view the HTML these links will be broken
There are literally thousands of other JS/CSS files/images that are also in these same folders that the demo doesn't use, so just zipping up the entire folder will result in a very bloated zip file
Anyway -
I create these types of demos on a regular basis, is there some easy way to create a ZIP that will:
Have updated CSS files that use relative URLs instead
Only include the JS/CSS that this html references, plus only those images which the specific CSS files reference as well
If I could do this without a bunch of manual work, if it could be automatic somehow, that would be so awesome!
As an example, one CSS file might have the following path and file name.
/ui/demoapp/css/theme.css
In this CSS file you'll find many image references like this one:
url(/ui/common/img/background.png)
I believe for this to work the relative image path should look like this:
url(../../common/img/background.png)
I am going to answer my own question because I have solved the problem for my own purposes. There are 2 options that I have found useful:
Modern browsers have a "Save Page As..." option under the File menu, or in Chrome on the one menu. This, however does not always work properly when the page is generated by javascript
I created my own custom application that can parse out all of the CSS/Javascript resources and transform the CSS references to relative URLs; however, this is not really a good answer for others.
If anyone else is aware of a commonly available utility or something like that which is better than using the browser built in "Save page as..." option - feel free to post another answer.