Reroute old content (.html/.php etc.) to Ruby on Rails - html

I have switched to Ruby on Rails and my current problem is to reroute
the old contents like XXX/dummy.html or XXX/dummy.php in RoR.
What exactly is the best solution for
isolated content (XXX/onlyinstance.html)
content which has a internal structure like XXX/dummy1.html, XXX/dummy2.html
http://guides.rubyonrails.org/routing.html does not explain how to
migrate old content.
Note: Changing the old links is NOT an option. The website is hosted, it is not my
own server. As the domain hasn't changed, the solution to redirect it seems to be
unnecessary...there should be a better solution.
EDIT: I have found out that the best solution is in fact rerouting it by the way weppos described.
So add a .htaccess file in the public directory
and write
RewriteEngine on
Redirect permanent /XXX.php http://XYZ/XXX
For whatever reason, RoR did not accept rerouting in routes.rb...while .html/.xml
all goes fine, .php does not function. I haven't found out why. Because weppos answer
was the best, I will award him a 50 point bounty, but as the others answers are valid,
too, I will upvote them. Thank you all

You can do this in multiple ways.
The best and most efficient way is to use your front end web server.
You can easily setup some configurations in order to redirect all the old URLs to the new ones.
With Apache, you can use mod_alias and mod_rewrite.
Redirect /XXX/onlyinstance.html /new/path
RedirectMatch ˆ/XXX/dummy([\d])+\.html$ /new/path/$1
This is the most efficient way both for server and client because handled at server level without the need to initialize the Ruby interpreter.
If you can't/wan't take advantage of server settings, you can decide to use Rails itself.
Talking about performance, the most efficient way is to use a Rack middleware which is much more efficient than creating a full controller/action.
class Redirector
def self.call(env)
if env["PATH_INFO"] =~ %r{XXX/onlyinstance\.html}
[301, {"Content-Type" => "text/html", "Location" => "http://host/new/path/"}, "Redirecting"]
else
[404, {"Content-Type" => "text/html"}, "Not Found"]
end
end
end
There is also a Rack plugin called Redirect that provides a nice DLS for configuring redirects using a Rack middleware.
Just a footnote. I won't creating additional routes using routes.rb because you'll end up duplicating your site URLs and wasting additional memory.
See also Redirect non-www requests to www urls in Rails

What do you mean by migrating? I recommend to redirect clients to use the RoR URLs. This can be done using HTTP 301 status codes. See http://en.wikipedia.org/wiki/HTTP_301:
The HTTP response status code 301 Moved Permanently is used for permanent redirection.
This can be done in the configuration of your HTTP server.

You have to redefine your application since Rails uses RESTful routing (as you probably have read). So in order to have a php file which handles show, creating,destroying, etc of items, you need to build a item Model, Controller and views for the different actions.
The static HTML files you can copy to the public directory, since that is the same. The structure you used can still be the same.
In order to modify your routing you have to add map.resource to your config/routes.rb file. This implements the RESTful routes to your controller. To start use the webserver provided by Rails (WEBrick), by entering the script/server command. Later when you have more experience you could think of switching to another server if WEBrick is not sufficient.
I suggest you start writing a basic (blog) application with Rails first, see here. So you see what parts is Rails using and how you can use them.
Afterwards you are able to identify these parts in you PHP solution and are bbetter capable to convert your pages. At least I followed this approach when I started to use/convert to Rails from PHP.

Related

How to serve static files of a service using path-based routing?

I've been searching for quite a while on many forums, how to serve static files referenced by HTML pages like href="/", when the frontend service isn't located at the root / of my host, using Nginx Ingress.
This question has been asked many times here, but briefly: I have a example.com/ host, and I need to expose a frontend service in a path like example.com/front1. This simply cannot work. The index.html returns perfectly, but then a href="/styles.css" hits the gateway asking for example.com/styles.css instead of example.com/front1/styles.css. And nginx just returns a 404.
The only "clean" solution I've found is to manage one subdomain for each frontend service, so it would request something like front1.example.com/styles.css. I'm afraid it might not be a great solution.
Some people do hacks like using sub filters annotations to replace href="/ strings with href="/front1/ for each file returned by that ingress and stuff like that, but this also falls short with dynamically javascript-generated pages and so on. And of course I can throw all my stuff in a S3 server, but think about it: If I have a cluster of many microfrontends and other third-party self-hosted frontend services (like signoz, which I have no control on the source code), it would be amazing if I could reference every service by its path like example.com/service-name/ and not worry about anything else.
I've been thinking of a possible solution in a lost comment of using nginx.ingress.kubernetes.io/server-snippet to inject some njs logic to replace the base URL by the Referer header (http://example.com/front1) whenever it references a known service, but I have no idea of how to use njs, if it is even possible, let alone if it is a good solution.
Anyway, my question is, has anyone solved that problem and could point me in the right direction? It is for a friend

Is it possible to configure Sinatra .erb templates for offline using cache.manifest?

I've looked around at various posts on the web; but it looks like it's all only for static .html files. Mephisto and rack-offline looked like they could be useful, but I couldn't figure out if they could help with sinatra templates.
My views/index.erb has 3 get do's - /part1, /part2, /part3 which hold html output; would be great if they could be cached for offline. Any pointers?
I'll try to answer your question as best I can. I guess with "My views/index.erb has 3 get do's", you mean you have three routes in your application, /part1, /part2, and /part3, respectively. Those three routes are processed using ERB templates and return HTML. Now you'd like to put them into a cache manifest for offline use.
First of all: For the client, it doesn't matter if the resource behind a URL is generated dynamically or if it is a static file. You could just put part1 (notice the missing slash) into your manifest and be done.
The effect would be that a client requests /part1 just once, and then use the cached version until you update your manifest.
Here's the catch: If you process ERB templates, you obviously have something dynamic in the response. And that's why I don't get why you'd want to cache the response.
Don't get me wrong: There might be perfectly good reasons why you want to do this. And I don't see any reason why you can't put routes to dynamic resources into your cache manifest.

Why doesn't Wikipedia have extensions?

Look at a random wikipedia article like http://en.wikipedia.org/wiki/Impostor_syndrome, I see that there's no .html attached to the end of the address. In fact, if I do try to put a .html after it, Wikipedia tells me "Wikipedia does not have an article with this exact name." How come it doesn't need any file extensions?
More a superuser question?
There is no law saying that an html file has to end in .html or .htm and since wiki generates pages from a database there is really no file page there anyway (except in a cache).
Not having .htm or .php is moresensible - why do you care what technology they use when you ask for a url? It would be like having to put the operating system of the recipient at the end of their email address.
if you make a call to a website it probably looks like
www.example.com/siteA/index.html
this request just tells the webserver you want to see a resource that is called index.html in siteA.
the website that runs on this server has to determine what you want to see and how the data is loaded.
index.html could be a file in the siteA directory
or
it can be row with the key "index.html" in the siteA-table in your database.
so the part siteA/index.html is just a resource identifier. the grammar of this resource identifier is completely free and is determined per website.
url rewriting is also common to make url easier to read and remember.
for example there could be a rewrite rule to accomplish the following:
if the user enters something like
www.example.com/download/demo.zip
rewrite it so your website sees it like:
www.example.com/download.php?file=demo.zip
Wikipedia's servers map the url to the page you want. .html is just a naming convention that, today is mostly historical from the period of static pages when urls actually were names of files on the server. In fact, there may be no file at all, where the server queries the database and a web framework sends out the html on the fly.
Wikipedia is most likely using the Apache module mod_rewrite in order to not have to link paths directly to a file system path.
See: http://en.wikipedia.org/wiki/Rewrite_engine#Web_frameworks
However programming languages can also take control of the incoming URLs and return data depending on the structure of the link according to some set of rules, for example the Django web framework employees a URL dispatcher.
That's because Wikipedia uses MediaWiki's feature of URL shortening.
Actually when you search for a file it really loads a php file. Try searching for a word that doesn't exist, for example "Pazaz". The URL is http://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=pazaz . Notice index.php in the URL.
To tell the truth it's not a MediaWiki feature, it's Apache. For further info http://www.mediawiki.org/wiki/Manual:Short_URL .
URL routing is your answer for example in ASP read below source from
The ASP.NET MVC framework includes a flexible URL routing system that enables you to define URL mapping rules within your applications. The routing system has two main purposes:
Map incoming URLs to the application and route them so that the right Controller and Action method executes to process them
Construct outgoing URLs that can be used to call back to Controllers/Actions (for example: form posts, links, and AJAX calls)
I would suggest that sites like this use some sort of Model View Controller framework similar to Ruby on Rails where the url 'directories' form a part of a request/url route...
In frameworks that are MVC based, the url 'directories' can dictate what View/Controller to utilise as well as what action should be taken with the data.
eg: shop.com/product/carrots
Where product is a view/controller and carrots is the data. The framework then analyses which action/route to take. Default could be viewing the product information and price of the carrot.

web-development: how do you usually handle the "under costruction" page"?

I was wondering what's the best way to switch a website to a temporary "under costruction" page and switch it back to the new version.
For example, in a website, my customer decided to switch from Joomla to Drupal and I had to create a subfolder for the new CMS, and then move all the content to the root folder.
1) Moving all the content back to the root folder always create some problems with file permissions, links, etc...
2) Creating a rewrite rule in .htaccess or forward with php is not a solution because another url is shown including the top folder.
3) Many host services do not allow to change the root directory, so this is not an option since I don't have access to apache config file.
Thanks
Update: I can maybe forward only the domain (i.e. www.example.com) and leave the ip on the root folder (i.e. 123.24.214.22), so the access is finally different for me and other people? Can I do this in .htaccess file ?
One thing to consider is you don't want search engines to cache your under construction page - and you also don't want them to drop your homepage from the search index either (Hence just adding a "noindex" meta tag isn't the perfect solution).
A good way to deal with this is do a 302 redirect (temporarily moved) from your homepage to your under construction page - that way the search engine does not cache your homepage as an under construction page, does not index your under construction page (assuming it has a NOINDEX meta tag), and does not drop your homepage from the search index either.
One way would be the use of an include on your template page.
When you want the construction page to show, you set a redirect in the include to take all traffic to the construction page.
When you are done your remove the redirect.
What about hijacking your index.php file?
Something simple, along the lines of
<?php
if (SITE_OFFLINE)
include 'under_construction.html';
else
//normal content of your index page
?>
where you would naturally define SITE_OFFLINE in an appropriate place for your needs.
What I did when I used PHP for websites was to configure Apache to direct all requests to a front controller. You then would have full access to all requests no matter where they are pointing to. Then in your front controller (PHP file, static html file, etc.), you would do whatever you need to do there.
I believe you need to configure pathinfo in Apache and some other settings, it has been about 3 years since I have used that approach. But, this approach is also good for developing your own CMS or application so that you have full control over security.
You have to do something similar to this:
http://www.phpwact.org/pattern/front_controller
I am looking for more details, I know my configuration had more to it than that.
This is part of what I'm looking for too:
http://httpd.apache.org/docs/2.0/mod/core.html
Enabling path_info passes path information to the script, so all requests now go through a single point of entry. Let me find my configuration, I know vaguely how this works, but I'm sure it looks like a lot of hand waving.
Also, keep in mind that because all requests are going through this single PHP file, you are responsible for serving images, JavaScript, CSS, etc. So, if a requests is coming in for /css/default.css, that will go through your php script (index.php, most likely), then you'll need to determine how to handle the request. Serving static files is trivial, but it is a little more work.
If you don't want to go that route, you could possibly do something with mod_rewrite so that it only looks for .html, .htm pages or however you have your site configured. For me, I don't do extensions, so that made my regex a little more difficult. I also wanted to secure access to all files. The path_info was the solution for me, but if you don't need that granularity, then writing a front controller might be a bit too much work.
Walter

How is "Is offline for maintenance" page implemented?

Occasionally when I try to open a site I will see a page saying smth like "This site is offline for maintenance" and then some comments follow on how long it would presumably take. Stack Overflow does that too.
How does it work? I mean if the site is shut down who replies to my HTTP request and serves this page?
There is a trick in asp.net where you place a file called
App_Offline.htm
All requests will go to this, until the page is deleted.
For other environments you can often just change where the server points, or another such plan.
-- Edit
A server-agnostic approach is achieved through something like load-balancing.
Under the hood you can send the requests to a given internal server. You may then decide to point all requests to your server 'a', which you configure to show the 'downtime' page. Then, you make changes to server 'b', confirm it as successful, and point all requests to 'b'. Then you update 'a', and let requests go to both.
In ASP.NET (and ASP.NET MVC as Stackoverflow uses) this is provided by the app_offline.htm feature. This works simply by forwarding all ASP.NET requests to the app_offline.htm file.
Incidentally the copy Web Site tool in ASP.NET performs the process of placing this file in the root of the web app, copyies the Web site files and then deletes this file.
Strategies for other technologies are discussed here.
In apache you may use a .htacces file with this content.
order deny,allow
allow from 192.168.1.151
deny from all
ErrorDocument 403 404.html
ErrorDocument 404 404.html
ErrorDocument 500 404.html
This will deny access to everyone except one IP and serve a static 404.html file.
This works in the case you only have one server without load-balancing and other stuff. Should work for load-balancing too though.
The apache reverse proxy server can be configured to send that response - if it is being used as part of that architecture.