I am wondering how one would go about using a permalink from another website to extract data about that particular permalink, especially in the case of looking for specific information. Kind of how youtube has websites that essentially use the link to the video to download and convert it to mp3 format. Its for a college project in HTML5, but upon researching the subject (for about a week) I didnt come up with alot of information on how to go about it using html. Any help in advance will be appreciated. Just basic structure is necessary. im not good or anything, I just want to actually learn, so i need some directional pointing that way i get on the right track.
Thanks in advance :)
Oh and to be more direct, I mean in such a way as to list certain products at the price they are being listed on the site they are being listed on. However, from within my own site. (all in html...)Figured I should be more direct.
Those websites that extract audio from Youtube probably doing it using Python or something similar
This might help you.
I'm trying to implement my own little reader view app (an app that would do the same thing as reader-mode on safari), and there are a few things I find asking myself:
Is there a technical term for this feature (reader-view doesn't really cut it)?
Is there a standard that websites are supposed to follow in order to indicate the content they would like to have in their reader views
Is there an open-source set of HTML parsing rules to pull the "readable" content from a website?
Is the effort to implement such a thing simply too big for a single person in a few weeks and if so should I opt for services such as Instaparser?
I believe the original to be implemented by arc90, and they called it readability. You can check out their page here.
It's been ported to many different languages over time, so you could take a look at the different implementations to learn more about it, how it's done etc.
Python readability
JReadability
JavaScript
Ruby
This is just a small sample here, there's many more examples if you would like to find more.
Edit: Oops, after some more Googling I found this question with an answer that explains it very well.
my question seems to be dumb, but because am making an application that all the pages needs to get the users authentification, and because am using HTML5 so,
Does Google robots parse those pages?
if no, so then, is it useful to use Microformats?
It depends on exactly what you are trying to do, but the question Microformats solve is "How can I make this HTML easily understandably by a computer?".
If you think that a computer will ever need to parse out the data in a meaningful way, then use it.
This could be either Google (though in your case, it won't be able to log in), an internal search you write at some point, a browser extension to highlight some sort of information etc.
In short though, it's unlikely to be useful, but on the other hand it's not hard to implement!
As I was working on this project for a friend of mine who is terrified of changing from HTML to flash, I realized that maybe there could be a bridge between them. So I started working on a flash project that would grab the HTML from his page and parse it to display it in flash. Although I am sure there are resources available for this already, I figured that the experts on SO might be willing to suffer through the logic of one user trying to develop this script.
So basically, I am not asking for an answer, I am asking for some step-by-step direction that could be posted so other people could see the logic behind breaking down this project. I think it would be really useful (not just for me, but for anyone wanting to learn more about objects and oop).
So, much like the thread between primarily Senocular and Rampage, this would be a thread where I would be the student asking the questions in a logical step-by-step manner and someone else (or someones else) could provide guidance.
Let me know if you are interested and I can start by posting what I have already written. We can go from there and I am sure it will prove insightful to anyone who reads it. If no one is interested, or no one has the time or inclination, no problem.
Best wishes,
Jase
Who in their right mind would change from html to flash for displaying a simple website? I don't see the logic behind it, it's more like you are trying too hard. Flash has its function in the web, as well as html does. If it's just for simple displaying, using flash is just the wrong way and won't make your website any better but worse because its loading time will be too long.
Goole Search retrieved these:
HTMLWrapper
Groe.org HTMLParser
There is an article about the 1st on *drawlogic. I think the seconds' home is on sourceforge here.
Thing is, browsers already do a fine job at parsing html code. Having the flash player parse html files not only does away with any accessibility advantage your markup can offer but it also feels like reinventing the wheel. If you need to display html content, leave it to the browser.
Slightly offtopic - Flashpaper can convert most HTML pages into swf format.
Given properly "disciplined" HTML, you can use the XML parser in the player for the basic parsing. Are you really talking about writing an HTML renderer in Flash though? Or just being able to pull information from HTML dynamically?
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Latex-to-html converters I've seen in the past have been pretty awful. Editing raw html is no fun and doesn't seem to translate well to the printed page. How do others solve this problem? Links to examples (both pdf and html) would be great.
Added: Another similar question was just asked:
What formatting language should I use for project documentation
For documenting code, I also recommend Sphinx. ReStructured Text is nice because it is readable and somewhat marked up in plaintext, and can do a nice job converting to html and to pdf. I still like LaTeX for certain things. My wife and I use LaTeX to write our christmas letter, which we mail out via snail mail. The pdf version is pretty fancy, with two columns, and headers and footers. The html version is simpler. I convert with plastex. Examples here:
http://fedibblety.com/annualReports
I don't think any binary format is a good choice (Word) for any sort of document that you might like to read 10 years from now. That is one of the nice things about LaTeX.
Yes, LaTeX-to-HTML converters used to suck (you've probably tried LaTeX2HTML), but of late they've got better. Tex4ht is highly configurable, and produces nice XHTML+CSS. See also other converters.
You can also use Docbook, if you can bear to write in it. There are converters from DocBook to both HTML and LaTeX (or to PDF directly); an example of the latter is dblatex.
See this post: LaTeX vs Docbook.
After many years of anguish and several false starts, I'm about to revisit this, and I'm going to give Sphinx a try. It can generate HTML or LaTeX from ReStructured Text.
I'm hoping it will be a much "lighter" option than full DocBook, but with many of the advantages.
You could take a step back and use something like DocBook and render to PDF via LaTeX and HTML straight from the DocBook files. Alternatively, Adobe Technical Communication Suite (Framemaker) will let you single-source a document to PDF and HTML. See this posting for a rundown on various technical documentation systems.
This is a personal choice but Latex in theory is perfect however in practice it's pain-in-the-arse. I'm using VS.NET HTML editor + raw HTML edit when I need it.
So I think using an WSIWYG HTML editor is best choice. You can always use a simple tool to convert it to PDF, and you can always edit HTML when you need something advanced. Also it's easier to put online when you need.
That's how I'm managing my software documentations and works fine for me.
PlasTeX looks like a nice latex-to-html converter, though I haven't tried it myself.
My friend Rob Felty wrote a blog post extolling its virtues:
http://blog.robfelty.com/2008/03/19/finally-a-better-latex-to-html-converter/
AsciiDoc looks like an interesting possibility.
Read about EPUB format. Its e-book format. http://en.wikipedia.org/wiki/EPUB
Since the answer mentioning Asciidoc was somewhat short on examples, here are some of the things your are looking for:
A pdf generated with Asciidoc
A cheatsheet with a side by side of the Asciidoc markup and the html result.
A list of publications done using Asciidoc, including O'Reilly books and the git documentation (to see both ends of the user scale).
I'm not sure that latex is really the best tool for this. The trouble you're having with the usual latex to html converter is indicative of the problem: html is simple not as expressive as latex.
If you insist on latex to html, take care to use a limited subset that can convert reasonably.
I've used TeXinfo in the past and it does a good job. Here's an example: http://yootles.com/api. I'd prefer to stick with LaTeX though instead of use another language.
If everything else fails you could grab an LaTeX to XML converter and write a simple XSLT stylesheet to convert it to HTML, or create a CSS style sheet and attach it to the XML file directly.
We've been using WebWorks ePublisher (www.webworks.com) which offers both multiple single-source formats (we are using Word) and the ability to output to many output formats (we output to Adobe PDF and Online Help (.CHM).
We were facing this problem in an academic project that involved Eclipse software, and we used plastex to convert Latex to HTML and Eclipse Help. Getting it to work was quite difficult, but the end result looks really nice. You can see all three versions here:
http://handbook.event-b.org/
Further, as this is an open project, the code (build scripts) are available. We have a continuous build system (Jenkins) that rebuilds everything when new Latex is checked in. This is particularly nice, as contributors don't need to install the toolchain on their systems. They just check in the new Latex and check on the server whether the HTML was produced correctly. Sources:
http://sourceforge.net/p/rodin-b-sharp/svn/HEAD/tree/trunk/Handbook/org.rodinp.handbook.feature/
Best, Michael
I don't have enough points to comment, but to bolster the plastex answer, here is the updated plastex example link:
http://robfelty.com/2008/03/19/finally-a-better-latex-to-html-converter
LaTeX? Seriously? I wasn't aware anyone outside academia still used it. I'd go with HTML, which you can save as PDF from the web browser. If you really must have some advanced typographic stuff, go with Word instead - it has a way to save to HTML (probably not as clean as one would like), and you can save as PDF with a free plug-in (downloadable separately).
Oh, and I wouldn't bother using things like InDesign - they are overkill. Also, don't bother paying for Acrobat Professional - there is a zillion free solutions available.