template removal/detection/difference utility for HTML and other text - html

I remember reading a while back on some random website about a program that would look at multiple pages on an HTML site and detect the differences/similarities between the pages to automatically detect which parts were template "boilerplate" and which parts were new content, and then based on this, automatically spit out just the parts that are content.
Unfortunately, I didn't remember enough details about this utility to actually find it on google, so I wonder if any of you guys have run across anything like this, and CAN remember the name of it.
Thanks.

Murphy's Law (or is it some other law) has stricken, and I've found it just moments after I'd given up and posted this question. The project I am thinking of is this:
http://code.google.com/p/boilerpipe/
Thanks.

Related

HTML to WordPress

I've never used WordPress.org before. I'm wondering if I can convert this html website that I have to wordpress exactly the same?
https://reporting.pacificamerican.org/pas/
And if this website is a wordpress site, does it means that I don't have to go into the codes if I want to update my content? Because right now with html site it takes more time to update all the contents.
Thank you.
Yes, you can, but looking at the content I wouldn't say it looks like a great idea. Mostly because of how static the current content seems to be.
Pros:
It looks like you are about to add a blog-page. WordPress does make such re-occuring content easy.
It looks like you have repeated the menu on every page. (If you change the menu on one page, then you have to makes changes to all pages as well?). WordPress would help with that and let you use one menu everywhere. But there are also tons of easier methods to accomplish the same thing without WordPress. (For example <?php include 'menu.php';?> using PHP).
Cons:
The "multiple sub-pages in one page" that you are using doesn't play naturally with WordPress. Absolutely possible yes, perhaps not even difficult, but not out-of-the-box for sure.
The time it would take to edit pages would likely not change as drastic as you hope. I believe that the current content looks so static that anyone with a bit HTML/CSS knowledge would rather want to edit those static html files over having to click around in the WordPress admin interface.
The scroll-spy, editing tables and things like the yearly admissions does not come naturally either. I can think of a few dozen ways to solve such things with WordPress, but if you are going to do this work yourself, then the WordPress-conversion will take some effort and the results will not always be as pretty as you might imagine.
You'll definitely take a performance hit over using only static html. (But that is true for any CMS/framework)
My suggestion would be to first look at your current workflow. Perhaps look at an IDE that can upload with a click or on save, have history so you can back up when things break, and predefined snippets that make static content changes easy, (and of course code syntax highlighting!).
What tools are you using now?
Also remember that you are asking on a coding-site. Not many here would opt to use the WordPress editor over simply editing html-files. In fact, I dare to say many here carries a deep grudge after having to work around some specific quirks in the WP editor (aka tinyMCE).
Sure, you could replicate the layouts.
Sure, your content would be editable with just a form.
It would take a lot of effort, but certainly doable.

Rules to pull reader-view like content from website?

I'm trying to implement my own little reader view app (an app that would do the same thing as reader-mode on safari), and there are a few things I find asking myself:
Is there a technical term for this feature (reader-view doesn't really cut it)?
Is there a standard that websites are supposed to follow in order to indicate the content they would like to have in their reader views
Is there an open-source set of HTML parsing rules to pull the "readable" content from a website?
Is the effort to implement such a thing simply too big for a single person in a few weeks and if so should I opt for services such as Instaparser?
I believe the original to be implemented by arc90, and they called it readability. You can check out their page here.
It's been ported to many different languages over time, so you could take a look at the different implementations to learn more about it, how it's done etc.
Python readability
JReadability
JavaScript
Ruby
This is just a small sample here, there's many more examples if you would like to find more.
Edit: Oops, after some more Googling I found this question with an answer that explains it very well.

Using a list of dynamic links throughout website

By "dynamic links", I mean a list of links that will constantly be updated.
To illustrate my question, I have a website that I am constantly writing new articles for. I currently have about 10 articles. If someone is to read article #5, there is a list of links to all 10 articles in the right panel of the page. As I update the site, and article #1 becomes out of date, I'd like to replace article #1 with article #11. Rather than updating the links within every article (so 10 times), is there a way to update the links once and have them all update simultaneously to every page?? Could I create an iframe for this??
Thanks for any and all help!
What's your goal? Do you want to learn to be a web developer? Or are you mostly concerned with getting your articles published?
If you want to be a web developer, I'd recommend steering clear of large CMS system like Wordpress or Drupal. Those are great products. But you want to learn the basics first. I think starting a PHP tutorial is the way to go.
If you just want to publish your articles, I'd recommend you find a nice place to create a blog. There are so many to choose from. It all depends on how much you want to spend.
Feel free to ask follow up questions. Web development sounds simple. But it's really a complex topic. I can't imagine what is must be like starting out these days with so many choices and competing technologies.
One way to do it would be to use Server-side includes. (Wikipedia) They work like this:
<!--#include file="some-content.html" -->
or
<!--#include virtual="some-folder/some-content.html" -->
The difference is file="" finds a file relative to the current page, whereas virtual="" finds it from the domain root. Either way, this method can use any type of regular text file as a source. The actual addition of the content is done by the server (hence the name) so its contents will be parsed as regular HTML and all CSS will apply to it as if the file were part of your page. I don't know about compatibility with different hosts, but if your web server supports it, this is probably the easiest way to go.

Common ways to target links?

Are iframes still widely in use today?
I am coding a site with divs, and I want everything to appear in the container div. Is it possible to do it without coding the header + nav into each page and have the content show at the exact same spot without using iframes?
I did a quick Google search and found a post that said it's not possible, but my site will have quite a bit of links.
As of right now, I am coding it with Tumblr, and the hashtags in the posts would act as links to a section of posts (Ex: #blog would retrieve every post under the "blog" link). What are some widely used ways to target links on a website?
If you are creating a multi-page website, it would be helpful to have the HTML content be generated dynamically or be built statically from template files. You don't want to manually update the same content across multiple HTML files.
Dynamic Pages
There are several options for dynamically generating HTML content depending on the software available to you. For example, PHP is a popular language for web development and is available through many web hosts.
Static Pages
It is possible to build static HTML documents from templates using something like Jekyll.
I'm not sure if I'm interpreting what you mean by "coding it with Tumblr" correctly or not, but I think you mean you're making a Tumblr site with their built-in HTML editing capability.
I think you'll have a very difficult time achieving the behavior you desire there. I think you're trying to create something resembling a single-page application. Tumblr probably just allows basic static HTML with little Javascript. The suggestion Kyle made about using PHP or something like that won't work because that code must be executed on a server, and Tumblr doesn't provide that capability to my knowledge.
If you really want this kind of functionality, you probably should get some paid web hosting and develop your web development skills. It's not a simple task, but it's fun!
Sorry if I underestimated you or anything. Just trying to read between the lines. It seems to me that you may be relatively new to web development given the content of your post, and I'm trying to nudge you in the right direction constructively.

Templated HTML Editor

I'm looking for a HTML editor that kinda supports templated editing or live snippets or something like that.
Background: I'm working on a website for a friend. As there are no specifications what the webspace/webserver can or can't do, I decided to make it a pure HTML/CSS page, or rather 10 of them. I wrote a template, copied it 10 times and edited the content. And guess what, the template has to be changed.
Therefore I'm looking for a (HTML-)editor that has some kind of live template system where I can edit the content in as it where plain text and then save the project into the 10 pure HTML/CSS files.
I thought about using PHP (the only script language I've some knowledge in), but writing the underlying template script would cost me enough time that I could change all files by hand. I'm not that familiar with AJAX to know if there's a way to load content from another file. If so, this would be an option if there already is a script. With Webdeveloper (firefox extension) I could save the generated source code as HTML/CSS.
Thanks in advance
Edit: any hints how to do this without an editor are welcome
Edit2: In my mind the tool looks like a plain old text editor like SciTe, but capable of editing multiple files simultaneously in the same text area, so it looks like editing one ordinary file, but actually it's a whole bunch of files.
Dreamweaver will do this for you, it's had HTML templating of the type your describe built in from very early versions (because from how you phrase the question I do not think you're thinking along the lines of a PHP templating engine such as Smarty, but some sort of HTML layout formating)
Although I regularly look around for Dreamweaver replacements, and I've certainly been impressed by Aptana, I still tend to use Dreamweaver in my development stack simply because whereas I can compensate for some of the more coding-orientated features it misses, I find the WYSIWYG nature of the editor invaluable.
I would have used a template engine.
I wrote a post about a dead simple script using the Dwoo template engine and mod_rewrite, where I am taking the uri and loading the forrect data and template based on that. You should be able to get it running in a few minutes.
Maybe I am way off on this, but why don't you look into an Open Source Content Management System (PHP/MYSQL)? There are MANY light systems that are not like Drupal, Joomla (if you do not want the big bulk of those CMS's).
There are even a few good ones for light web design that are flat file driven.
That would be my suggestion, at least if not for this project, look into it for future projects.
Here is an example of a great micro CMS that would seem to fit the bill for what you are doing:
http://www.mini-print.com/