I would like to know whether it a possible to detect whether a HTML page contains a video.
I know that one possible way is to look for ".swf" in the HTML source code. But most of the pages do not contain the file name.
For example, given following URL and possibly its source code, is it possible to find out whether it contains a video:
http://www.cnn.com/video/
There are many ways to embed Video into a HTML page - as Flash Video or instances of Platform-Specific players through <object> and <embed> tags (but not every one of those tags is a video! The same holds true for .swf - it's just the file extension of Flash files, Video or not), the new HTML 5 <video> tag... They are not impossible to find out but it's a lot of work to catch all possible player types, formats and embed codes, and will result in a lot of false positives / negatives.
Then, there are JavaScript libraries that initialize players after the containing page has loaded - those are almost impossible to detect.
It's still a very complex issue to get video into a web page reliably, and subsequently, it's even more complex to find it out. Depending on what you are trying to achieve, I would consider dropping it.
For your case (CNN site) you can parse Open Graph micro-markup for a video information.
Meta tags such as og:video:type, og:image will help you.
Video hosting services usually support micro-markup, e.g. open graph or scheme.org.
So you can parse these markups.
Check if an <object> tag exists in the DOM and check its content type and parameters. You will find the pattern by yourself.
You can also search for .flv, or .mp4 in the source code.
Related
I've set up a site for watching videos I've uploaded to YouTube. I'm currently using multiple html docs with each different video which is inconvenient. I'm wondering if there's a way I could read the URL of the video from a .xml file and run every video off one html doc instead of the 10 I have at the moment. (Or if anyone has a better suggestion of how to do it I would appreciate that just as much.) Cheers
I'm afraid giving a clear answer on this question is hard because it's kind of a big question.
It can't be done with pure HTML so you have two options that both involve learning something new.
If the host you put your site on allows you to use some kind of back-end programming language (PHP for example is very common) you can learn how to render more dynamic html pages and load a video based on some parameter in the URL (?video=1) for example.
If you don't want to go the back-end way you are going to need to look into javascript, which you can use to modify the contents of a html page dynamically. Using this route you can add an embedded youtube video to your page after it has loaded based on some kind of variable.
Of course thats possible, just think about an xml structure to store your urls
example might be:
<videos>
<video>
<url>youtube.com/asdf</url>
<title>First Video</title>
</video>
<video>
<url>youtube.com/ewqe</url>
<title>Secound Video</title>
</video>
</videos>
to read out the videos just iterate over the xml structure in JS and append them to your website or directly print them at page load via php etc.
I have a Plone based site with some custom Archetypes-based content types. Now we feel the need to support HTML5-based animations etc. for new multimedia contents.
Is there some suitable content type already, or should we invent it on our own?
AFAICS, it will look like this:
There is some common createJs.js file somewhere which provides the API which is used for all HTML5 multimedia contents.
Each multimedia object features some Javascript code (which could be saved as a file animation.js somewhere)
The same directory which holds the animation.js file will contain all further resources needed, e.g. images.
The view HTML code will need to src the animation.js file (which will in turn "create" the animation).
So, is there some object type already I should use?
If there isn't - should I put everything into the ZODB, or would it be better to store the resources plainly in the server file system (and let Apache serve them more or less directly)?
For video and audio you can just use the File content type and modify the templates to use the HTML5 video and audio tags, which make use of browser-built-in video and audio players.
plone.app.contenttypes is doing that, but this is Dexterity based. However, it can give you some hints: https://github.com/plone/plone.app.contenttypes/blob/master/plone/app/contenttypes/browser/templates/file.pt
Also, you have to provide the video and audio files in a web-suitable format: https://developer.mozilla.org/en-US/docs/Web/HTML/Supported_media_formats
For custom JavaScript code you rather need a custom Plone application. For that you do not need a specific content type except a custom content type if you need animation-specific configuration options to be passed to the JavaScript. Otherwise a simple view which includes the JavaScript will do it. For more information on these topics consult the Plone documentation: https://docs.plone.org/4/en/develop/plone/views/browserviews.html
Disclaimer: The docs above refer to use Grok, which I would not recommend as this is not have Plone core support. We need to get the docs updated there.
We have a feature in our application which allows users to select a set of assets (images, videos etc.) and generate an embed code for those, which can be embedded in another web page. As of now we use iframes to implement in the embed code where page pointed by src attribute of iframe spits out HTML to embed.
For some security reasons we want to get rid of iframes and replace that with something else like an object tag, script tag etc.
My question is about object tag. Primary use of object tag seems to be to embed a video, a pdf etc. I know it can be used to embed an entire webpage just like what we want. But my question is - is that recommended? The webpage we want to embed will have a set of assets with options to sort, download, share, preview those assets.
So will it be a good practice to use object tag for embedding such a complex web page? Or is it meant for minimal usage like embedding a video clip, a slideshow etc.?
Depending on the type of technology you are using, you can do something similar.
With your request you are getting into the portlet/web part discussion, where you want to embed portlets (mini apps). There is no object tag you can use (to my knowledge its only image, applet and iframe I think) from the client side, but you might be able to pre-load the parts before you send the user the final page (say, like wordpress widgets in php).
Otherwise you need to go the Javascript route, and do some kind of lazy loading of your 'widget/applications' as needed.
I am playing with the idea of reading an MP3 file from the hard drive and playing it in the browser. I know one way of doing it - get list of File objects from <input> tag, then get their object URL and assign it to src of <audio>. However there are some drawbacks of this technique (for one, it has to be repeated on every page refresh).
Therefore I am exploring if I can use an NPAPI plugin to read the music file from the HDD and then give it to <audio> element somehow. However, I can't figure out how to convert the binary contents of the tile into a File object that javascript can use. Any suggestions?
The only option I can think of would be to use a data: url. I don't know for sure if these are supported in audio tags or not, but they are definitely supported in img tags.
BTW, you are aware that a NPAPI plugin has to be injected into the page in an object tag before it does anything, right? Make sure you know the difference between an extension and a plugin, because what you're talking about sounds more like an extension problem.
Also take a look at FireBreath; if you do decide to use a NPAPI plugin, that'll simplify your life a lot.
Given an HTML page I would like to get all the 'x' files that are embedded in the HTML file or are linked by it, where 'x' equals:
Images (JPG,PNG,GIF...)
Documents (Word, PowerPoint, PDF...)
Flash (.flv, .swf)
How do I do this?
So images are easy to extract because they are either linked to with a link ending in a (.png|.jpg|....) or they are embedded with an img tag.
Documents can not be embedded, they can only be linked to (with a link ending in a .doc|.ppt|.pdf|...). So they are also easy to get.
Here is my problem:
How do I get the flash files that are embedded in webpages?
Please give me a pseudo-algorithm or a regex pattern.
If I am wrong in my points above (1. and 2.) please tell me so too.
Thanks!
The Firefox extension DownThemAll lets you right-click a page and download all of the media of a specified extension. It's open source, so you might want to look at their code and see how they implemented it.
I'd use an event based XML parser (like SAX) and write the rules for the and tags to get the src and href attributes.