Parsing relative links on a html page - html

I'm trying to parse a page to find all valid urls, but here is a problem. There are 3 types of links on a page: url (_http://site.com/dir/page.html), absolute uri (/dir/page.html) and relative uri (dir/page.html without starting slash). Probably i'm wrong about terminology, i'm not an html coder. But that's not the case in any way.
I need to find and collect all urls (i.e. _http://site.com/dir/subdir/page.html and so on). And here is the problem. If there is a page _http://site.com/dir/page.html with a link like link it's supposed to bring us to _http://site.com/dir/subdir/page.html. But if there is <base href="/"> in the head section of a page, same link leads to _http://site.com/subdir/page.html i.e. different from _http://site.com/dir/subdir/page.html.
The question is if there can be anything else in html code on a page that can influence target url.
Thanks in advance.

In HTML as such there is nothing else beside the href base You mentioned
What could become tricky and should be considered is that there might be linkage on page made by script execution, so things like window.location.href = something. This would be easy if the links are clearly stated, but they might be also computed by the script and then You could miss the link or mis-read it by using simple parsing.

Your problem is actually how url linking in html works, please read: http://www.webdevelopersnotes.com/design/relative_and_absolute_urls.php3 . So say you're in /admin/ and you need /admin/login.aspx . My relative URL is login.aspx, while my absolute is /admin/login.aspx make sense?
So basically what I'm saying is consider which directory your link is being served out of. That will determine the type and content of the url link to use.
Other than that, as stated already, jscript and server side code can also do linking.

Related

In which page the "Canonical URL" meta tag should be placed when there are 2 pages called a.html and b.html which holds the same content?

In which page the "Canonical URL" meta tag should be placed (and how to write it on the right page?) when there are 2 pages called a.html and b.html which holds the same content?
If both files are in the same website, it is highly unrecommended to have duplicated content and one of those two pages should be removed (adding a Redirect from the removed to the final URL, to avoid 404 errors). Apart from the Moz link above, the official guide from Google, one of the most prominents search engines, is also clear: avoid having duplicated content across your websites.
Canonical URLs are best used when parts of a website are being republished elsewhere, it's a very elegant form of attributing the authorship and relevance to the original content generator.
Now, if you can't delete one of the pages of your project and redirect the URL to the other one, you need to decide which one is going to be considered as "main" from the project's point of view, and set the attribute canonical pointing to that URL.
I hope i helped you, and if I did solve your issue, don't forget to mark the answer :)
Regards,

HTML remove url variables, NO PHP or JS solution

My site uses both PHP and the JS AJAX so I'm fairly familiar with them both, and I don't want a solution that includes them. I have this page structure where all my users stay on just one landing php page, which then fetches the right content depending on the URL's p variable.
http://www.example.com/?p=about
http://www.example.com/?p=aMap-anothermap-evenAnothermap-lastelyTheFile
This page structure works great for me except that I don't know the right way to make a link that just removes the whole ?p=home. Because I want my home/start page to be variable free. I want it to be
http://www.example.com/
rather than
http://www.example.com/?p=home
Now I could just make the link
http://www.example.com/?
And then just remove the ? with the JS pushState(), but that would look pretty silly and would only work for JS users.
Let's say i would want to the do the above example with just the ? then I could create a link like this.
Link
<script src="SomeCoolPushStateScript"></script>
And I know from experience that this doesn't even work:
Link
So here comes the question: How do I remove the ?variable=something part of an URL when using an HTML href?
The path ./ should do the trick.
Link
If you want to preserve the main script name, like index.php, you will have to include that name.
Link
Alternately, you could dynamically generate domain-relative or absolute URL's with PHP.
You don't need to use querystrings.
Link
would go to example.com's root.
I don't recommend using "./". This would do what you want if the user is on a page that is in the root directory of your website (e.g. http://www.example.com/page.html). However, this would not work if they were on a page in a subdirectory. E.g. if the user's on http://www.example.com/hello/page.html, it would just link to http://www.example.com/hello/.
Using "/" makes sure the user goes to the root of your website.

How to get website URL in a SharePoint custom masterpage?

I'm making a custom masterpage and on the navigation bar, I would like to have a button which goes back to the current site main page. This masterpage will be use for other subsites as well. Is there a way for me to get the current site URL within the index file?
You can use URL Tokens to get your site URL.
<SharePoint:SPLinkButton runat="server" NavigateUrl="~site/" id="homelink">
</SharePoint:SPLinkButton>
Notice the URL token of ~site used in attribute NavigateUrl="~site/". You may have to use the $SPUrl command also.
<%$SPUrl:~site/myPage.aspx%>
You can also refer to this discussion for more usage examples.
NOTE: I haven't tried the above examples, but these should be sufficient to provide you a path to your answer.

How to link a webpage to itself without first knowing its name?

To link a page to itself (e.g. http://example.com/folder/ThisPage.html), we can simply create a href as such:
ThisPage.html:
Link
This works, but has the disadvantage of needing to be updated when the file name changes. For example, if the file name changes to ThatPage.html, our href needs to change accordingly to Link.
I'm looking for an alternative without that disadvantage. I've tried:
Link
Doesn't work as Link does, because it appends a "blank query part" (question mark) to the URL.
Link
Doesn't work as Link does, on some browsers (e.g. Opera).
How do we link a page to itself, without having to update the relevant portion when the name of the page changes?
Note: JavaScript not allowed.
Just use Link. Nobody cares about the question mark appended to the URL. It does the requirement and that is what counts right?
It's very simple, just leave the href="" blank. So that's how:
Click me to refresh page
But this is not necessarily a good idea, because the cache may not be cleared, and whatever you need it for, if the page has changed in the meantime the change may not appear despite the reload. Probably a better idea is the javascript code location.reload(); to take. But there are enough explanations on other sites, which is why I won't explain it here. You can of course also for example take a question mark (?), but this is unnecessary, actually not intended for it and can cause problems depending on the program.
Here is a short list of common hyperlinks:
Points to the root page
Link
Points to a file relative to the root page
Link
Points to a file relative to the current file
Link
Points to a file in the previous folder
Link
Points to a file in the second previous folder
Link
Points to a file in a folder below
Link
Points to the current file
Link
Points to a page with a different host but the same protocol
Link
I hope that my answer will help some people, because I found it via a search engine and saw that there is no correct answer. And it's my first answer here 😅
If you want it to go nowhere, you can use
link
But if you want it to reload the page, you'll have to go with JavaScript.
If you want to reload the page you could use the Meta refresh tag
http://www.w3.org/TR/WCAG20-TECHS/H76.html
If you want to reload the page, you really should take a look into javascript. It is the best way to do it.
Just do this:
This Very Site
Source: I saw this in the source code of Matthew Alger's website. Check it out for yourself!
Why not try ?
I looked some things up, and as it turns out, ./ refers to current directory.
You can just make a link to the same page.
Here ya go. Hope this is what you are looking for
Link

Paths relative to the file location and NOT the url

**** EDIT: SOLVED HERE Relative paths from file for img, a and header ****
Somewhat new to web design.
I just finished creating a dynamic site. It can read domain/category, domain/category/this-article-about-x, all redirecting to domain/index.php and working well.
However, I quickly learned that whenever I used a relative path such as ./include, the relative path was taking in account the current URL, and not the actual url where the php file is at.
I have 2 questions that I couldn't answer when browsing the internet for a long time:
1) If domain/index.php is trying to show an image with ./thumbnails/science/image.jpg, it won't work if the actual url is domain/category/, but it WILL work if its just domain/category(no slash at the end). Why this inconsistency? The HTML code is showing the src to the image is the same on both cases.
2) My header has a dropdown menu with categories. Once its submitted, it will call itself (header.php), see which category the user chose and redirect to domain/category. All works well. You can then rechange the dropdown menu to another category and everything loads again. But again, if you access directly domain/cateory/ (with the slash at the end), the CSS won't load, the images won't load as said on question 1, and submitting the form will cause a problem because it will look for header.php on domain/category/, and not just on domain where the header.php file actually is.
I have successfully used dirname(FILE) to make sure my includes all work (as far as my testing has gone, no errors). But I could not use dirname(FILE) to generate links or images 'img src=' that will always point to the images regardless of the URL.
On localhost, the HTML is coming as img src=c:\path\htdocs\thumbnails\img, which is right, but the image does not display anyway. Same with the 'a href' links. Trying to create links with dirname(FILE) created links on localhost as c:\correctpath, but clicking on them did absolutely nothing. Also, using header with dirname(FILE) to handle the redirection from the dropdown menu caused it to cease working as well (but if I used ./ . dropDownMenuValue, it would again correctly change the url to domain/categoryChosen)
How can I use a consistent method for relative urls that allow me to work on localhost and then upload to my web host without the need make changes to the files, AND that works with dynamic websites that have pretty name urls through htaccess?
Thank you,
This is all very puzzling for me how getting a relative path from the current file path is being so hard and generating so many different issues, and why only include statements seem to work well with dirname(FILE)
Edit: http://board.phpbuilder.com/showthread.php?10374336-RESOLVED-mod_rewrite-for-SEO-Friendly-URLs-and-relative-path-issue-fixes
I've found this on other sites, but it requires you to change the base everytime you want to go from localhost to the web. I'm trying to avoid that. Clearly there has to be a simple way to do something so basic. I can't believe google and everybody else are changing paths when they are ready to make something live.
You should just always use absolute paths for public files like media. If you don't want to do that for some reason or your site is just too involved at this point you'll have to create rewrite rules for the other file types so they can also be included from the correct path.