HTML file structure's affect on website URL

HTML file structure's affect on website URL - html

I currently have a simple website, hosted on github pages with a file structured in hierarchical directories as shown below:
/foobar.com
/css
/js
/images
/html
/news
/news_content
fizz.html
buzz.html
news.html
about.html
contact.html
index.html
However, when I am on the buzz webpage for example, this has resulted in the URL to become:
https://foobar.com/html/news/news_content/buzz.html
Is there a way to change this URL so that it doesn't show all the folder directories and instead, just the file itself i.e. https://foobar.com/buzz.html as I don't want to separate all the individual HTML files into separate folders?

Yes, you can change the URL to be more user-friendly by using server-side URL rewriting or client-side JavaScript.
For server-side URL rewriting, you'll need to use a web server such as Apache or Nginx and configure it to rewrite the URLs. You can find more information on how to do this for Apache or Nginx by searching for "URL rewriting" on their websites or forums.
For client-side URL rewriting, you can use JavaScript to manipulate the URL shown in the browser's address bar. However, this method may not be ideal for search engines or users who have JavaScript disabled.
If you are using GitHub Pages to host your website, you might be able to achieve URL rewriting by using Jekyll. Jekyll is a static site generator that supports URL rewriting and can be used in combination with GitHub Pages to host your website. You can find more information on how to do this by searching for "Jekyll URL rewriting."

How a URL is resolved to a resource depends on the HTTP server and/or the server side programming language you are using.
Github Pages provides no features that allow anything other than a direct reflection of the directory layout in the URL.
The closest you could come would be to write a program that transformed the input (the file structure you want to work with) into the file structure that Github Pages will express as the URLs you desire (and then run it as a build step that takes the pages out of your working branch and into your gh-pages branch; possibly you could use actions to do this).

Related

Specifying index.html in browser to load home page

If I want to load the homepage of https://medium.com/ by typing the exact index.html file address into my browser, how would I do that? Or is it not possible?
https://medium.com/index.html gives me a 404 error. Also curious how I would do this more broadly with any webpage for which my browser is displaying a url that does not end in .html.

Common static websites hosted just as files somewhere usually have an index.html document which can be resolved either directly or is normally loaded when no particular document is specified so https://example.com/ and https://example.com/index.html both work.
But this is not how most webs work. Pages can be dynamically generated server side, you just send a request to the server and if the path matches some server operation it will create a response for you. Unless https://example.com/ returns documents from a directory using something classic like the Apache Web Server set to serve static files from a directory, it won't work.

There is no general way to know what, if any, URLs for a given website resolve to duplicates of the homepage (or any other page).
Dynamically generated sites, in particular, tend not to have alternative URLs for pages.

Vue-Nuxt: Why can't I see the generated HTMLs correctly?

So when I type npm run generate Nuxt generates my project into the dist folder. In that folder I can find a folder called _nuxt where I have .js files and the index.html file but when I open it in a browser it doesn't show anything.
So, my question is: Aren't those static files?
When you work with the CDN served vue.js you have the html file and you click and everything is showed on the browser because those .html files are static, they don't need an internal localhost server. Why npm run generate doesn't do the same? Or how can I see those generated files?

As #aljazerzen explained, Vue,js doesn't do SSR out of the box, one of the aims of Nuxt.js is to provide SSR for you, as a benefit you can also generate a static version of your website. If I get what you want correctly, what you want to do is that when you open your index.html (the one that Nuxt.js generates for you) you can see your functional webpage. When you're accessing your website as a file:/// url, your browser (at least I've seen it happen with Chrome) doesn't load your .js files.
I don't have any Nuxt generated websites at hand so I can't tell you exactly why this happen. But this is my guess: when Nuxt generate those files it gives them a src that can't be accessed as file:///, maybe something as /your_js.js, that when it tries to load it, thinks it's the / of the root folder instead of relative to your website's root (/).
The solution to this problem is to serve your assets using any web server. According to Nuxt.js's documentation:
nuxt generate :Build the application and generate every route as a HTML file (used for static hosting).
You could do a quick test and use a simple web server by typing:
python -m http.server
In the folder that contains your generated assets.
Hope this helps!

Nuxt uses server side rendering.
You can read more here.
To generate static HTML files, run:
nuxt generate
Explanation: Vanilla Vue.js application is rendered only when the page loads and JavaScript can start running. This means that some clients that do not have JavaScript enabled (web crawlers) won't see the page. Also for a brief second before Vue.js can render the page, there is blank screen, when plain HTML files could already be visible.
Now, server-side rendering (SSR) is a technique for rendering a single page app (SPA) on the server and then sending a fully rendered page to the client. The client’s JavaScript bundle can then take over and the SPA can operate as normal.
This can also help with SEO and with providing meta data to social media channels.
But on the downside (as you mentioned), such application cannot be hosted at a CDN, since you have to have a Node.js process running to render the page.
In my opinion, SSR is redundant with SPAs if what you are building is actually an application and not a website. A website should mostly display information and should not be interactive. It should leverage web-based mechanisms such as links, cookies and plain HTML with CSS. In the contrast, web application (eg. Vue.js application) should be more like a mobile application: it is larger to download, but performs better and offers much more interactive experience. Such application does not need server-side rendering, since we can wait for it to load a bit more and because it shouldn't be indexed by search engines (it is not a website).

Why does a React build need to be served? Why can't I just open it in the browser?

Apologies for somewhat of a basic question, but I haven't been able to find the technical reason anywhere I've looked.
Basically, if I do npm run build I get a static html file and a bunch of css and javascript files in the build folder. I would think that I should then be able to open up that index.html file in the browser and have it work, just as would be the case for some static HTML built without React.
So, my question is: what is it that react is relying on that requires to be served up with a static file server like serve or webpack dev server?

It uses Ajax internally. The Same Origin Policy prevents it reading file: scheme URLs in most browsers.

Do I need to specify a webpage's url?

I've uploaded several files to my server and it's really quite baffling. The home page is saved as index.html, and when I type in the URL of said page it miraculously, and quite successfully shows the right page. What about my other pages? I have linked to them from the home page with the following code:
About Us
How does my html file, presumably called about.html, supposed to know that its URL is "http://www.example.com/about/"? I am dubbing this "The Unanswered Question" because I have looked at numerous examples of metadata and there is nothing about specifying the URL of a page.

It depends on what type of server you are running.
Static web servers
If it is the simplest kind of static file server with no URL aliasing or rewriting then URLs will map directly to files:
If your "web root" was /home/youruser/www/, then that means:
http://www.example.com -> /home/youruser/www/
And any paths (everything after the domain name) translate directly to paths under that web root:
http://www.example.com/about.html -> /home/youruser/www/about.html
Usually web servers will look automatically for an "index.html" file if no file is specified (i.e. the URL ends in a /):
http://www.example.com/ -> /home/youruser/www/index.html
http://www.example.com/about/ -> /home/youruser/www/about/index.html
In Apache, the filename searched for is configurable with the DirectoryIndex directive:
DirectoryIndex index.html index.txt /cgi-bin/index.pl
That means that every request to a path that ends in a / (and to add yet another rule, under some common settings it will automatically append a / if the path is the name of a directory, for example 'about'):
http://www.example.com/ -> /home/youruser/www/index.html
-> or /home/youruser/www/index.txt
-> or /home/youruser/www/cgi-bin/index.pl
Web servers with path interpretation
There are too many different types of servers which perform this functionality to list them all, but the basic idea is that a request to the server is captured by a program and then the program decides what to output based on the path.
For example, a program might perform different routes for basic matching rules:
*.(gif|jpg|css|js) -> look for and return the file from /home/user/static
blog/* -> send to a "blog" program to generate the resulting page
using a combination of templates and database resources
Examples include:
Python
Java Servlets
Apache mod_rewrites (used by Wordpress, etc.)
Links in HTML pages
Finally, the links in the HTML pages just change the URL of the location bar. The behavior of an HTML link is the same regardless of what exists on the server. And the server, in turn, only responds to HTTP requests and only produces resources (HTML, images, CSS, JavaScript, etc.), which your browser consumes. The server only serves those resources and does not have any special behavioral link with them.
Absolute URLs are those that start with a scheme (such as http: as you have done). The whole content of the location bar will be replaced with this when the user clicks the link.
Domain relative URLs are those that start with a forward slash (/). Everything after the domain name will be replaced with the contents of this link.
Relative URLs are everything else. Everything after the last directory (/) in the URL will be replaced with the contents of this link.
Examples:
My page on "mydomain.com" can link to your site using the Example.com about just as you have done.
If I change my links to about then it will link to mydomain.com instead.
An answer your question
How does my html file, presumably called about.html, supposed to know that its URL is "http://www.example.com/about/"?
First, the file itself has no idea what its URL is. Unless:
the HTML was dynamically generated using a program. Most server-side languages provide a way to get this.
after the page is served, client-side scripts can also detect the current URL
Second, if the URL is /about and the file is actually about.html then you probably have some kind of rewriting going on. Remember that paths, in their simplest, are literal translations and /about is not the same as about.html.

Just use /about.html to link to the page

Theoretically, it's better for URLs in your documents to be relative, so that you don't have to change them in the event you change the domain or the files location.
For example, if you move it from localhost to your hosted server.
In your example, instead of www.example.com/about.html use /about.html.

Given the link above you would need a about page named index.html located in a directory named about for your example to work. That is however not common practice.

I'm a bit confused, but here is some information. Any file named "index" is the default display page for any directory(folder) when trying to view that directory.
All files in a folder are always relative to that directory. So if your link is in a file, within a different directory, then you must type in that directory along with the file. If it is the same directory, then there is no need to type in that directory, just the file name.

about.html doesn't know what it's URL is, its the index.html file that calls your about.html file.
When you're in any given directory, linking to other pages within that directory is done via a simple relative link:
About Us
Moving up a directory, assuming you're in a sub folder (users) perhaps you can use the .. operator to navigate up one directory:
About Us
In your case your about page is in the same directory as the page you're linking from so it just goes to the right page.
Additionally (and I think this may be what you're asking) if you have:
about.html
about.php
about.phtml
about.jpg
for example, and you visit http://www.yoursite.com/about it will automatically bring up the html page and the other pages should be referenced explicitly somewhere if you want them to be used.

Why doesn't Wikipedia have extensions?

Look at a random wikipedia article like http://en.wikipedia.org/wiki/Impostor_syndrome, I see that there's no .html attached to the end of the address. In fact, if I do try to put a .html after it, Wikipedia tells me "Wikipedia does not have an article with this exact name." How come it doesn't need any file extensions?

More a superuser question?
There is no law saying that an html file has to end in .html or .htm and since wiki generates pages from a database there is really no file page there anyway (except in a cache).
Not having .htm or .php is moresensible - why do you care what technology they use when you ask for a url? It would be like having to put the operating system of the recipient at the end of their email address.

if you make a call to a website it probably looks like
www.example.com/siteA/index.html
this request just tells the webserver you want to see a resource that is called index.html in siteA.
the website that runs on this server has to determine what you want to see and how the data is loaded.
index.html could be a file in the siteA directory
or
it can be row with the key "index.html" in the siteA-table in your database.
so the part siteA/index.html is just a resource identifier. the grammar of this resource identifier is completely free and is determined per website.
url rewriting is also common to make url easier to read and remember.
for example there could be a rewrite rule to accomplish the following:
if the user enters something like
www.example.com/download/demo.zip
rewrite it so your website sees it like:
www.example.com/download.php?file=demo.zip

Wikipedia's servers map the url to the page you want. .html is just a naming convention that, today is mostly historical from the period of static pages when urls actually were names of files on the server. In fact, there may be no file at all, where the server queries the database and a web framework sends out the html on the fly.

Wikipedia is most likely using the Apache module mod_rewrite in order to not have to link paths directly to a file system path.
See: http://en.wikipedia.org/wiki/Rewrite_engine#Web_frameworks
However programming languages can also take control of the incoming URLs and return data depending on the structure of the link according to some set of rules, for example the Django web framework employees a URL dispatcher.

That's because Wikipedia uses MediaWiki's feature of URL shortening.
Actually when you search for a file it really loads a php file. Try searching for a word that doesn't exist, for example "Pazaz". The URL is http://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=pazaz . Notice index.php in the URL.
To tell the truth it's not a MediaWiki feature, it's Apache. For further info http://www.mediawiki.org/wiki/Manual:Short_URL .

URL routing is your answer for example in ASP read below source from
The ASP.NET MVC framework includes a flexible URL routing system that enables you to define URL mapping rules within your applications. The routing system has two main purposes:
Map incoming URLs to the application and route them so that the right Controller and Action method executes to process them
Construct outgoing URLs that can be used to call back to Controllers/Actions (for example: form posts, links, and AJAX calls)

I would suggest that sites like this use some sort of Model View Controller framework similar to Ruby on Rails where the url 'directories' form a part of a request/url route...
In frameworks that are MVC based, the url 'directories' can dictate what View/Controller to utilise as well as what action should be taken with the data.
eg: shop.com/product/carrots
Where product is a view/controller and carrots is the data. The framework then analyses which action/route to take. Default could be viewing the product information and price of the carrot.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008