I'm currently downloading website via an ActionScript HTMLLoader to later have access to the DOM to get some information out of the page.
The problem is: each resource that is linked on the page (images, stylesheets, javascript) is also loaded which takes some additional time. I don't really need those resources, because only the plain HTML/DOM is interesting.
Is there any way to disable loading of linked resources? At first I tried using an URLLoader and parse the result as XML, but when the website isn't valid this doesn't work. I also didn't find a library that validates/parses a given HTML-string into valid XML.
I'm using Adobe AIR on desktop.
Perhaps convoluted, but you could load the file with URLLoader, convert it to a string, use regex to remove links to the external resources you don't want, and then load the result into the HTMLLoader.
Related
I have a Plone based site with some custom Archetypes-based content types. Now we feel the need to support HTML5-based animations etc. for new multimedia contents.
Is there some suitable content type already, or should we invent it on our own?
AFAICS, it will look like this:
There is some common createJs.js file somewhere which provides the API which is used for all HTML5 multimedia contents.
Each multimedia object features some Javascript code (which could be saved as a file animation.js somewhere)
The same directory which holds the animation.js file will contain all further resources needed, e.g. images.
The view HTML code will need to src the animation.js file (which will in turn "create" the animation).
So, is there some object type already I should use?
If there isn't - should I put everything into the ZODB, or would it be better to store the resources plainly in the server file system (and let Apache serve them more or less directly)?
For video and audio you can just use the File content type and modify the templates to use the HTML5 video and audio tags, which make use of browser-built-in video and audio players.
plone.app.contenttypes is doing that, but this is Dexterity based. However, it can give you some hints: https://github.com/plone/plone.app.contenttypes/blob/master/plone/app/contenttypes/browser/templates/file.pt
Also, you have to provide the video and audio files in a web-suitable format: https://developer.mozilla.org/en-US/docs/Web/HTML/Supported_media_formats
For custom JavaScript code you rather need a custom Plone application. For that you do not need a specific content type except a custom content type if you need animation-specific configuration options to be passed to the JavaScript. Otherwise a simple view which includes the JavaScript will do it. For more information on these topics consult the Plone documentation: https://docs.plone.org/4/en/develop/plone/views/browserviews.html
Disclaimer: The docs above refer to use Grok, which I would not recommend as this is not have Plone core support. We need to get the docs updated there.
When I have one basis html file in which I want to embed a few external html files as sort of a portfolio page, what is the best way to embed those external html files? I now use iframe, but I've read that is not the preferred way? What other options are there and which one is the best?
Most modern websites implement a server-side framework such as php or asp.net that can assemble the final HTML for each page and output it together
The only issue with iFrame is that it causes additional round-trips to the server, as the client has to load each frame individually, but if you don't have access to any server-side scripting then any other solution will do the same thing
I have created a SWF file using Flash that loads an FLV file on my local development machine. When publishing the SWF file and generating the appropriate HTML, I can successfully load the video by opening the generated HTML page that Flash creates. However, when placing the generated HTML code in my View, the Flash is loading, but the video is not playing. I think it's a reference error to the location of the FLV file, but I've tried every combination I can think of. I placed the SWF and FLV in the corresponding View folder where I want the video to load, but to no avail. Does anyone have a working example that I can look at, or any suggestions? Thanks.
I think it's a reference error to the location of the FLV file, but I've tried every combination I can think of.
Yes, I think so as well. Have you tried using Url helpers to reference static resources on your site, like
#Url.Content("~/Content/Videos/MySupervideo.flv")
The actual solution to this for me was this...
In your Flash file, the Component Inspector should point to the location on the web server where the FLV file is located...
Publish the SWF, copy the appropriate HTML to the View you would like the video to play in...
And just as Darin has pointed out, use the Helpers to write the path to the SWF file on your web server where it is located. The only difference is this (which I discovered using Google's 'Inspect Element' feature and then clicking on the 'Network' item, then clicking the path loaded on the left for the 'SWF'... on the right it stated 404 Status Not Found... why?
#Url.Content("~/Content/video/name-of-swf.swf") actually produced
src='http://localhost/content/video/name-of-swf.swf.swf'
This obviously incorrect... so here is the correct way to use the Helpers...
#Url.Content("~/Content/video/name-of-swf")
Hope this helps someone... I am giving Darin credit because he pushed me in the right direction...
I would like to know whether it a possible to detect whether a HTML page contains a video.
I know that one possible way is to look for ".swf" in the HTML source code. But most of the pages do not contain the file name.
For example, given following URL and possibly its source code, is it possible to find out whether it contains a video:
http://www.cnn.com/video/
There are many ways to embed Video into a HTML page - as Flash Video or instances of Platform-Specific players through <object> and <embed> tags (but not every one of those tags is a video! The same holds true for .swf - it's just the file extension of Flash files, Video or not), the new HTML 5 <video> tag... They are not impossible to find out but it's a lot of work to catch all possible player types, formats and embed codes, and will result in a lot of false positives / negatives.
Then, there are JavaScript libraries that initialize players after the containing page has loaded - those are almost impossible to detect.
It's still a very complex issue to get video into a web page reliably, and subsequently, it's even more complex to find it out. Depending on what you are trying to achieve, I would consider dropping it.
For your case (CNN site) you can parse Open Graph micro-markup for a video information.
Meta tags such as og:video:type, og:image will help you.
Video hosting services usually support micro-markup, e.g. open graph or scheme.org.
So you can parse these markups.
Check if an <object> tag exists in the DOM and check its content type and parameters. You will find the pattern by yourself.
You can also search for .flv, or .mp4 in the source code.
I currently have a "PrintingWebService" that I call from an AJAX page with all the information that is needed to construct a highly customized PDF printout using PDF Sharp and the PDFSharp's GDI+ mode, which takes DrawString and other commands that work basically just like GDI+ only they are drawn to the PDF.
I then save the PDF file to a location on the webserver and return the file name from the web service, and the AJAX page opens a new window with the pdf file.
So far, it works well, however, there is one part of my AJAX page that I want to printout and I haven't come up with a solution for yet. I've got a string of the HTML content of a TinyMCE editor that I want to dispay in the bottom part of the PDF page.
I'm looking for some sort of tool I could use for this purpose. Even something opensource that prints to GDI+ I could use by taking the source code and translating it to use PdfSharp's GDI+ (the class names are like XGraphics, with each class having X before the GDI+ name).
If I have to I will limit what HTML can be generated by TinyMCE and write my own renderer, but that will be a big challenge, so I'm looking for other solutions first.
I've stayed away from a printer-friendly page approach because I wanted to construct a page that was a near identical of an existing WinForms printout, using my existing code. With PdfSharp I was able to convert all the code except the text area stuff (which used the RichTextBox and RTF in the WinForms version).
Tony,
I personally have used WebSupergoo's ABCPdf library with much success. You can actually render HTML directly to the PDF and it does fairly well in regards to accuracy.
Another free software that will allow you the flexibility of writing HTML to PDF that I have used in the past with much success is iTextSharp.
Otherwise, I think you'll have to write something to render HTML to GDI.
Either way, you may want to consider using an HttpHandler that you map to using your web.config to generate the PDF file. This will allow for you to render the PDF to a bytestream and then dump it directly to the user (as opposed to having to save each PDF receipt to the web server). It will also allow for you to use the .pdf extension in the page that returns the receipt (PurchaseReceipt.pdf could be mapped to a HttpHandler)... making it more cross-browser friendly. Older versions of Adobe / Browsers will not display correctly if you start throwing a PDF byte stream from an ASPX page.
Hope this helps.