Website Architecture Rethink

Website Architecture Rethink - html

I will try and be succinct and you can ask for further information if you feel it would help.
We have designed and built a website for delivering training courses. We are continuing to add Courses and Lessons. Our design approach has been to design the Lessons in a similar way to designing a book in that each Lesson contains many 'pages' with no scrolling, of a fixed size, and the text and images carefully laid out individually with attention to the flow of the content and the use of white space. A navigation bar at the bottom allows the user to go to the next or previous 'page', jump to any of the sub-sections of the Lesson or jump to a specific 'page'.
We have created hundreds of Lessons and each Lesson has been approximately 20 'pages' in length. Our simple but effective approach has been to have a single html file for each Lesson and create each 'page' within its own Div. The visibility of the Divs are controlled by JavaScript functions called by the navigation bar at the bottom of the window (swf file). This way when managing and laying out all of the content we don't have thousands of individual html files and navigating a Lesson is simple. Also we can easily open up a complete Lesson and review it in isolation in a browser.
Just to complete the picture we have developed Course html files which act as a wrapper to pull in and display collections of Lessons. The Lessons are displayed within an iFrame in the Course html file and xml files are used to determine which Lessons a Course contains.
The project has been very successful (here comes the 'but') but our client is now increasing the length of new Lessons and it is this which is forcing us to re-consider our approach. Our client has a very managed corporate intranet and all Users have IE8. When viewing a Course and clicking to view a Lesson the whole Lesson has to be downloaded just to view the first 'page' (you knew that of course!). It was slow but acceptable before, now it is becoming a real problem.
So, eventually, here is the question: how could we evolve our approach to delivering our content more efficiently, asking the server to deliver page by page rather than a whole Lesson up front?
When the project started we were told by our client hosting the website that we could not create a dynamic website accessing SQL or similar so we went static with xml data. We have more freedom now and could employ a more dynamic approach. However I would prefer not to start again as we have a huge amount of legacy content. An ideal would be to evolve our current approach but to manage the downloading better.
I look forward to hearing your thoughts.
Regards
Chris

So you have some javascript like
openPage(pageId);
That takes a div id, hides the current "page" and opens the new one. You probably have a collection of those "pageIds" somewhere that provides the inter-page navigation (or you could be building it dynamically from the div ID's, but that would be tricky, distinguishing "page" div ids from normal div ids, so I'm sticking with my first assertion, you are keeping a list of ids.
I'd suggest adding a url to each id, and having the javascript check the iFrame's location against the requested pageId, and if it's different, load the new html file. It could default to "current location" so you don't have to modify all the existing content, just the javascript.
This would allow you to put the first page in a different html file then all the others, and to shred them into appropriately sized files.
If you are clever, you will kick off a process after you load the first page to go ahead and start pulling the other pages for the lesson into the browser cache so that they are quick to display once the user is done with the first page.

Related

How are big websites dealing with changes in their layout and/or code?

Since few days I have been learning HTML and CSS and it goes pretty well, but the first, quite serious problem that I faced was about the website's structure - how are bigger projects (few "main" pages and multiple subpages) dealing with even small changes in their code/layout?
Every tutorial I watched or read was based on creating very small websites, which were made of index.html and a few pages, let's say sub1.html, sub2.html, sub3.html. The idea was to create a layout of the page in index.html with all hyperlinks we were going to use and then, after we were done with it - copy its content to files sub1.html, sub2.html, sub3.html and change their content to our needs. This seems to be pretty reasonable for that small website, because we do not have a lot of code and changes should not take lots of time then.
But what if we are creating a website which will contain e.g. 50 subpages? How should we deal with changes on every single page, if we want to change the order of items in menu or do anything else with the repeating content of the website?

You're looking for a templating system of some kind, which will assemble full pages from components.
For example, you might have an outer template which sets up the basics like the doctype declaration, some common script and CSS includes, etc. From there, one layer in, you might have a common header/footer. Another template inside that might set up a page like the home page with featured content. A sibling to that template might be inner pages which have perhaps headings and regular content.
Sometimes these pages are assembled on-request. Other times, they are assembled when you change your content, and static pages are pushed to your web server.

Scraper: distinguishing meaningful text from meaningless items, hadoop

I'm trying to build a crawler and scraper in Apache Nutch to find all the pages containing a section talking about a particular word-topic (e.g. "election","elections", "vote", etc).
Once I have crawled, Nutch cleans the HTML from stop words, and tags, but it doesn't take out menu voices (that are in every pages of the website).
So it could happen that when you look for all the pages talking about elections, you could retrieve a whole website because it has the word "elections" in its menu and therefore in every page.
I was wondering if techniques that analyze multiple pages of the website to understand what is the main template of a page, exist. Useful papers and/or implementations/libraries.
I was thinking about creating some kind of hadoop Job that analyzed similarities between multiple pages to extract a template. But the same website could have multiple templates, so it is hard to think of an effective way to do that.
E.G.
WEBPage 1:
MENU HOME VOTE ELECTION NEWS
meaningful text... elections ....
WebPage 2:
MENU HOME VOTE ELECTION NEWS
meaningful text... talking about swimming pools ....

You didn't mention which branch of Nutch (1.x/2.x) are you using, but at the moment I can think of a couple of approaches:
Take a look at NUTCH-585 which will be helpful if you are not crawling many different sites and if you can specify which nodes of your HTML content you want to exclude from the indexed content.
If you're working with different sites and the previous approach is not feasible take a look at NUTCH-961 which uses the boilerplate feature inside Apache Tika to guess what texts matter from your HTML content. This library uses some algorithms and provides several extractors, you could try it and see what works for you. In my experience I've had some issues with news sites that had a lot of comments and some of the comments ended up being indexed alone with the main article content, but it was a minor issue after all. In any case this approach could work very well for a lot of cases.
Also you can take a peek at NUTCH-1870 which let you specify XPath expressions to extract certain specific parts of the webpage as separated fields, using this with the right boost parameters in Solr could improve your precision.

It's fast to build and update a static site or a dynamic site?

Good day, everyone!
I'm currently building 2 sites: one of them is my 'personal website' that will contain contact information, current and finished projects. (Like a presentation card, you know what I'm talking about!).
The other one it's a site regarding a tool that I'm currently developing: I want to make 3-4 section with classic things about a software: what it is, what it does, news about developments, a FAQ section and a download page.
Now, the problem is: I don't want to waste time with such 'silly' website. I want make it fast and update it easily.
I've got 2 ways in my mind:
1) Create a dynamic site (php) that will 'build' pages from a database that contains all things like finished projects, news feed and so on. I have to create the backend for content insertion, but once i've done it I can insert new content in few seconds.
2) Build a site based on static pages (classic html) filled MANUALLY with new content (like the weekly news feed); isn't much 'professional', it's much more fast to grow up but can be difficult to insert new content (Every time I want to make a news I have to write the title in an html tag, bold content with tag and so on) and move manually the old news to another page. Maybe exist external tool to help me doing that?
I always thought that static webpage aren't used even for site that 'allows' new content being updated often (once a week) but I found that isn't completely true: LOT of site that I like (medium popularity software sites) it's just a bunch of text on a static page.
I guess that isn't a smart thing waste time build a nice site for a poor developed software, isn't it?
Also, isn't kind of newbie build a site with such an old way?
What tool can I use for fast 'formatting' html news text?
Any suggestion for creating these website with fewest time spent?

When I develop websites, I use a basic template: a "toolbar" or "navbar" at the top of the page and an iframe tag that contains the content and pages browsed. You can learn how Joomla! and Wordpress platforms work and see the idea behind it: a group of files build an html file from data stored inside XML files (either in physical disk or in a database). Those files and classes build and render those pages until you get what you see in a static (sometimes dynamic with JavaScript/jQuery) page/file. Open-Source is a great thing - the human kind must use it wisely.

I will also recommend using JSON or some other database to get the needed code and append it to the body. I use XMLHttpRequest() to get code inside json, i then parse it get the html string and append it to the body of my website. It works well for me.‎ ‎ ‎ ‎ ‎

How to hide content from File2HD?

There is a website called file2hd.com which can download any type of content from your website including audio, movies, links, applications, objects and style sheets. Of course this doesn't work for high profile websites such as Google, but is there there a type of method I can use to cloak content on my website and prevent this?
E.g. Using a HTML Code, or using .htaccess method?
Answers are appreciated. :)

If you hide something from the software, you also hide it from regular users. Unless you have a password-protected part of your website. But even then, those users with passwords will be able to fetch all loaded content - HTML is transparent. And since you didn't provide what kind of content are you trying to hide, it's hard to give you a more accurate answer.
One thing you can do, but it works just for certain file types, is to server just small portions of a file. For example, you have a video on your page and you're fetching 5-second bits of the video from the server every 5 seconds. That way, in order for someone to download the whole thing, they'd have to get all the bits (by watching the whole thing) and then find a way to join the parts... and it's usually just not worth it. Think of Google Maps... and Google uses this/similar technique on a few other products as well.

How do i code a scrolling window to stay at a certain link per page

On this page, I want to get my scrolling dinosaur name window to specifically keep that dinosaurs name at the top so the person doesn't have to scroll all the way down to the next dinosaur.
I also want to know if there's an easier way to do this window.
My predicament is this....
I have over 30 dinosaurs on here. Each time I add a new one I have to update each and every one of the dinosaurs pages to add that one new dinosaur. Its not really time effective... Is there a better way without having to use frames?
My code is open so you can look at it and modify it at your leasure.
Thanks!
Vince

At this point I would suggest you go for server side code. Since you have 30 dinosaurs, it would be much easier to create and maintain a simple page using server side scripts such as PHP or ASP.NET to load the dinosaur from a database.
What are server side scripts?
Server side scripts allow you to dynamically generate a page on the fly whenever the user requests a page. For example, take youtube's search page. Rather than generate a seperate page for every single possible search term, they simply have a base template there, and then they fetch the relevant results based on the search query. The same can be applied to your site. You can have one page for all the dinosaurs, and you would just load the appropriate dinosaur based on the url.
Once you do that, putting the current dinosaur at the top of the page would be a trivial task. Since it appears that you already have a fair amount of knowledge in HTML, it should be easy for you to pick up and use some PHP. Codecademy has some excellent tutorials.

Along the same lines as Kevins answer but more specifically I'd like to recommend you look into a PHP MVC framework such as CakePHP, Laravel or CodeIgniter.
You've done all the hard work manually building these pages, which is awfully time consuming.
Once you learn one of these frameworks and you'll rebuild this site in a day.

If your links had id attributes on them you could scroll the list to a position by linking to #whatever. Here's a quick code example of a link.
<li id="camarasaurus">Camarasaurus</li>
Here's a small example: http://jsbin.com/ExExEvAB/1/edit?html,css,output
As for making it easier to administrate, I'd look into PHP since it's widely available and there's tons of resources to learn from. When you're basically looking for is <?php include "dinosaur-menu.html" ?> since you're thinking in terms of frames. You can make it even easier but this alone should make it a ton easier to update.
I really started to enjoy Mixture recently. It's great for prototyping and is, in my opinion, perfect for exactly what you're trying to do here.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008