How exactly does one do something like create a unique URL.
Like how facebook does it facebook.com/mynamehere
One way would be to create multiple folders each time we have a new user..but that doesn't seem to be the best approach
You can try a program like Elgg if you are trying to build a social media site. Otherwise, a person's profile can be custom in a couple of ways. Most of them mentioned. You, as mentioned, can use .htaccess for rewrites. You can use an automated custom url plugin (this may help: How to generate a custom URL from a html input?). Similarly, you can use the previously mentioned Elgg for social media, and but also as a last resort can use your folder method, but only if absolutely required.
I think the question is: how is it done technically, so we don't need to have physical file for every valid URL?
The answer is URL rewriting. In case of Apache server, you want to enable mod_rewrite and configure it to translate particular URL pattern (like myfbclone.com/mynamehere to myfbclone.com/index.php?username=mynamehere). This way you need to have one script file that handles all the URLs accordingly.
Different servers have different means of rewriting URLs, like Nginx or IIS, so the exact way of configuration depends on your server, but the concept is usually the same.
Related
Let's say we have a site that's test.com, would test.com/ ever be a different site/file then test.com? I know that the url represents a path to the server to get that file. Going to test.com/file and test.com/file/ could potentially bring up different sites since the latter is a directory. So I was wondering if the same is true for the root.
Or am wrong about the url as well?
It all depends on your environment and how it handles the routing. Many frameworks treat url/ and url as aliases, but many frameworks don't. So the answer to your question is yes, it can be different.
the url represents a path to the server to get that file
This is correct, but this can be any path you choose.
When you create a simple website with nested folders yes you can create something like this:
/webroot/index.html
/blog.html
/myvideos/list.html
Which results in for example www.example.com, www.example.com/blog.html and www.example.com/myvideos/list.html
But with some server side settings called rewrites you can make your url behave like anything you want.
I could even redirect urls to entire different servers. Or make 2 different urls go to the same path. Anything you want.
I would like to know if there is a way to change the url of a .html file without changing the name of the file. Eg. if I have a website with a page called 1.html, is there a way to make it appear as mypagename.html to visitors and crawlers?
This is called URL-rewriting. URL-rewriting implementation differs between different platforms. So, you have to state your server application (e.g. Apache, IIS etc.).
If you're running on *nix, it may be easiest to just use a link (soft may require a configuration change, hard will require you to remember that both names refer to the same file).
Note on enabling soft links: https://serverfault.com/questions/244592/followsymlinks-on-apache-why-is-it-a-security-risk
You could use the HTML5 History API to push a new, possibly fake, address. I wouldn't suggest it tough.
https://github.com/browserstate/History.js/
Yes there is, but your host must provide URL rewriting. If you're on a Apache server look for .htaccess capabilities and mod_rewrite's RewriteRule directive
I was wondering how to keep images secure on my website. We have a site that requires login then then user can view thousands of different images all named after their ID in the database.
Even though you need to login to view the images the proper way...nothing is stopping a user from browsing through the images by typing <website-director>/image-folder/11232.jpg or something.
this is not the end of the world but definitely not ideal. I see that to stop this facebook just names the images something much more complicated + stores them in hashed folders.
Gmail does a very interesting thing, their image tags looks like this:
<img src=/mail/?attid=0.1&disp=emb&view=att&th=12d7d49120a940e5>
I thought the src attribute has to contain a reference to an image??...how does gmail get around this?
This is more for educational purposes at this point, as I think this gmail scheme might be overkill for our implementation.
Thanks for your feedback in advance,
Andrew
I thought the src attribute has to contain a reference to an image?
GMail is referencing an image. It's just being pulled dynamically, probably based off of that th=12d7d49120a940e5 string.
Try browsing to http://mail.google.com/mail/?attid=0.1&disp=emb&view=att&th=12d7d49120a940e5
Instead of it being a direct path to its location on the server's filesystem, it uses a dynamic script (the images may even be in a database, who knows).
Besides serving up an image dynamically from your webapp, it's also possible to use a webapp to dynamically authorize access to static resources that the webserver will serve -- commonly by putting the files somewhere that the webserver has access to, but not mapped to any public URI, and then using something like X-Sendfile (lighttpd, Apache with mod_sendfile, others), X-Accel-Redirect (nginx), X-Reproxy-File (Perlbal), etc. etc. Or with FastCGI you can configure an application in a FastCGI "authorizer" role rather than a content provider.
Any of these will let you check the image being authorized, and the user's session, and make whatever decision you need to, without tying up a proceses of your backend application for the entire time that the image is being sent to the client. It's not universally true, but usually a connection to the backend app represents a lot more resources being reserved than a connection to the webserver, so freeing them up ASAP is smart.
The code that runs after this GET request is issued:
/mail/?attid=0.1&disp=emb&view=att&th=12d7d49120a940e5
outputs an image to the browser. Something doesn't have to be named with a .jpg or .png or whatever ending to be considered an image by a browser. This is how captcha algorithms are able to serve up different images depending on a value in the id. For example, this link:
http://www.google.com/recaptcha/api/image?c=03AHJ_VusfT0XgPXYUae-4RQX2qJ98iyf_N-LjX3sAwm2tv1cxWGe8pkNqGghQKBbRjM9wQpI1lFM-gJnK0Q8G3Nirwkec-nY8Jqtl9rwEvVZ2EoPlwZrmjkHT7SM32cCE8PLYXWMpEOZr5Uo6cIXz1mWFsz5Qad1iwA
Serves up this image:
So the answer really is to just obfuscate your image names/links a bit like Facebook does so that people can't easily guess them.
Look at a random wikipedia article like http://en.wikipedia.org/wiki/Impostor_syndrome, I see that there's no .html attached to the end of the address. In fact, if I do try to put a .html after it, Wikipedia tells me "Wikipedia does not have an article with this exact name." How come it doesn't need any file extensions?
More a superuser question?
There is no law saying that an html file has to end in .html or .htm and since wiki generates pages from a database there is really no file page there anyway (except in a cache).
Not having .htm or .php is moresensible - why do you care what technology they use when you ask for a url? It would be like having to put the operating system of the recipient at the end of their email address.
if you make a call to a website it probably looks like
www.example.com/siteA/index.html
this request just tells the webserver you want to see a resource that is called index.html in siteA.
the website that runs on this server has to determine what you want to see and how the data is loaded.
index.html could be a file in the siteA directory
or
it can be row with the key "index.html" in the siteA-table in your database.
so the part siteA/index.html is just a resource identifier. the grammar of this resource identifier is completely free and is determined per website.
url rewriting is also common to make url easier to read and remember.
for example there could be a rewrite rule to accomplish the following:
if the user enters something like
www.example.com/download/demo.zip
rewrite it so your website sees it like:
www.example.com/download.php?file=demo.zip
Wikipedia's servers map the url to the page you want. .html is just a naming convention that, today is mostly historical from the period of static pages when urls actually were names of files on the server. In fact, there may be no file at all, where the server queries the database and a web framework sends out the html on the fly.
Wikipedia is most likely using the Apache module mod_rewrite in order to not have to link paths directly to a file system path.
See: http://en.wikipedia.org/wiki/Rewrite_engine#Web_frameworks
However programming languages can also take control of the incoming URLs and return data depending on the structure of the link according to some set of rules, for example the Django web framework employees a URL dispatcher.
That's because Wikipedia uses MediaWiki's feature of URL shortening.
Actually when you search for a file it really loads a php file. Try searching for a word that doesn't exist, for example "Pazaz". The URL is http://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=pazaz . Notice index.php in the URL.
To tell the truth it's not a MediaWiki feature, it's Apache. For further info http://www.mediawiki.org/wiki/Manual:Short_URL .
URL routing is your answer for example in ASP read below source from
The ASP.NET MVC framework includes a flexible URL routing system that enables you to define URL mapping rules within your applications. The routing system has two main purposes:
Map incoming URLs to the application and route them so that the right Controller and Action method executes to process them
Construct outgoing URLs that can be used to call back to Controllers/Actions (for example: form posts, links, and AJAX calls)
I would suggest that sites like this use some sort of Model View Controller framework similar to Ruby on Rails where the url 'directories' form a part of a request/url route...
In frameworks that are MVC based, the url 'directories' can dictate what View/Controller to utilise as well as what action should be taken with the data.
eg: shop.com/product/carrots
Where product is a view/controller and carrots is the data. The framework then analyses which action/route to take. Default could be viewing the product information and price of the carrot.
I was wondering what's the best way to switch a website to a temporary "under costruction" page and switch it back to the new version.
For example, in a website, my customer decided to switch from Joomla to Drupal and I had to create a subfolder for the new CMS, and then move all the content to the root folder.
1) Moving all the content back to the root folder always create some problems with file permissions, links, etc...
2) Creating a rewrite rule in .htaccess or forward with php is not a solution because another url is shown including the top folder.
3) Many host services do not allow to change the root directory, so this is not an option since I don't have access to apache config file.
Thanks
Update: I can maybe forward only the domain (i.e. www.example.com) and leave the ip on the root folder (i.e. 123.24.214.22), so the access is finally different for me and other people? Can I do this in .htaccess file ?
One thing to consider is you don't want search engines to cache your under construction page - and you also don't want them to drop your homepage from the search index either (Hence just adding a "noindex" meta tag isn't the perfect solution).
A good way to deal with this is do a 302 redirect (temporarily moved) from your homepage to your under construction page - that way the search engine does not cache your homepage as an under construction page, does not index your under construction page (assuming it has a NOINDEX meta tag), and does not drop your homepage from the search index either.
One way would be the use of an include on your template page.
When you want the construction page to show, you set a redirect in the include to take all traffic to the construction page.
When you are done your remove the redirect.
What about hijacking your index.php file?
Something simple, along the lines of
<?php
if (SITE_OFFLINE)
include 'under_construction.html';
else
//normal content of your index page
?>
where you would naturally define SITE_OFFLINE in an appropriate place for your needs.
What I did when I used PHP for websites was to configure Apache to direct all requests to a front controller. You then would have full access to all requests no matter where they are pointing to. Then in your front controller (PHP file, static html file, etc.), you would do whatever you need to do there.
I believe you need to configure pathinfo in Apache and some other settings, it has been about 3 years since I have used that approach. But, this approach is also good for developing your own CMS or application so that you have full control over security.
You have to do something similar to this:
http://www.phpwact.org/pattern/front_controller
I am looking for more details, I know my configuration had more to it than that.
This is part of what I'm looking for too:
http://httpd.apache.org/docs/2.0/mod/core.html
Enabling path_info passes path information to the script, so all requests now go through a single point of entry. Let me find my configuration, I know vaguely how this works, but I'm sure it looks like a lot of hand waving.
Also, keep in mind that because all requests are going through this single PHP file, you are responsible for serving images, JavaScript, CSS, etc. So, if a requests is coming in for /css/default.css, that will go through your php script (index.php, most likely), then you'll need to determine how to handle the request. Serving static files is trivial, but it is a little more work.
If you don't want to go that route, you could possibly do something with mod_rewrite so that it only looks for .html, .htm pages or however you have your site configured. For me, I don't do extensions, so that made my regex a little more difficult. I also wanted to secure access to all files. The path_info was the solution for me, but if you don't need that granularity, then writing a front controller might be a bit too much work.
Walter