H.264 RTP In-band parameter sets changing - h.264

I have an RFC 6184 stream that transmits SPS/PPS in its first packets.
After around 5 seconds there is another SPS/PPS that changes the resolution of the stream.
My decoder handles this as expected nicely. Though my first question would be if it is legal according to the ISO standard.
My second concern is that when the stream is dumped to an ISO MP4, if it would cause any problem. As far as I know the AVC Configuration Record can handle multiple parameter sets.

I would conciser It undefined behavior. If it works for you, that’s great, but it may not work forever or in all environments. MP4 can put SPS/PPS in with other frame data. So it’s at least possible to package. Again different players may or may not work correctly.

Related

How to calculate MF_MT_MPEG_SEQUENCE_HEADER for MPEG4 Sink in Win7?

I have a MF topology that captures video and audio and encodes it into H264, then writes it to an MPEG4 sink. However, the problem is that my H264 encoder (Intel QuickSync H264 Encoder) does not define a value for MF_MT_MPEG_SEQUENCE_HEADER in its output types. Thus, when I set the video media type of my MPEG4 sink, no sequence header is defined and the sink cannot correctly finalize, as mentioned in the MPEG4 Sink documentation:
https://msdn.microsoft.com/en-us/library/windows/desktop/dd757763(v=vs.85).aspx
After searching around, I learned that I need to get the SPS & PPS values for the MF_MT_MPEG_SEQUENCE_HEADER attribute. I am not sure about how to get these. My application is designed only for Windows 7, but in Windows 8 it seems like you can just set the MF_MPEG4SINK_SPSPPS_PASSTHROUGH attribute to have the sink grab the SPS & PPS from the input samples (see the above link). I have no interest in individual frame samples other than to obtain this value, and currently my application code is not looking at individual H264 samples.
What is an easy way to obtain the SPS & PPS values from a MF H264 stream on Windows 7?
I could explain exactly how to do it. But I believe the how will be confusing if you don't understand the why. I have another post that explains these concepts in pretty good detail, and writing the code to accomplish this should be trivial after understanding the bitstream format.
You should pay specific attention to the AVCC section
Possible Locations for Sequence/Picture Parameter Set(s) for H.264 Stream

When decoding H264 packages which of the sprop-parameter-sets has higher priority?

I am decoding a streaming video. It is sent as RTP packages.
Before sending the video the receiver gets an SDP file that, among other things, has the sprop parameter sets.
However, the decoding works even if i remove that parameter. That's why I'm presuming the sprop parameter sets are also present in the H264 packages (in the RTP payload).
So, we can have sprop parameter sets on two places, which is considered the prioritized one?
There is no priority. The sprop contains an SPS/PPS. Each SPS/PPS has an id, when needed the NALS in the stream just indicate the specific SPS/PPS it needs. It is also legal for an h.264 encoder to just repeat the same SPS/PPS in the stream for protocols that do not have a method of transmitting out of band data, and require the ability to just a stream in progress (Like over the air TV)

Streaming MP3 with pre-processing

I was wondering if there's a way to get streaming audio data (mp3, flac, arbitrary chunks of data in any encoding) from a server, process it and start playing after first chunk?
Background: All of our data is stored in compressed chunks, with a LAMP environment serving the data. de-compression and re-assembling is done client-side with
xhr-downloads, indexeddb, filehandle and/or chromefs. All currently
available audio/video functionality requires the audio to be
downloaded completely (otherwise decodeAudioData fails) or requires an
URL to the source without giving me a chance to process incoming data
client-side.
I am looking for a solution to squeeze my processing into the browser "inbuild" streaming/caching/decoding functionality (e.g. audio/video tag). I don't want to pre-process anything server-side, I don't want flash/java-applets and I'd like to avoid aligning data client-side (e.g. process mp3)
Question: Would it be possible to dynamically "grow" a storage that a bloburl points to? In other words: Create a filehandle / fileentry, generate a blobURL, feed it into an audio tag and grow the file with more data ?
Any other ideas?
Michaela
Added: Well, after another day of fruitless attempts, I must confirm that there are two problems in dealing with streamed/chunked mp3|ogg data:
1) decodeAudioData ist just too picky about what's fed into it. Even if I pre-align ogg-audio (splitting at "OggS" boundaries) I am unable to get the second chunk decoded.
2) Even IF I would be able to get the chunks decoded, how would I proceed playing them without setting timers, start positions or other head-banging detours? Maybe the webAudioAPI developers should take a look at aurora/mp3 ?
Added: Sorry to be bitching. But my newest experiments with recording audio from the microphone are not very promising either. 400K of WAV for a few seconds of recording? I have taken a few minutes to write about my experiences with webAudioAPI and added a few suggestions - from a coders perspective: http://blog.michaelamerz.com/wordpress/a-coders-perspective-on-the-webaudioapi/
Checkout https://github.com/brion/ogv.js. Brion's project chunk-loads an .ogv video and outputs the raw data back to screen through WebAudio API and Canvas, playing in the original FPS/timing of the file itself.
There is a StreamFile object in the codebase that handles the chunked load, buffering and readout of the data, as well as an example of how it is being assembled for playback through WebAudio.
I actually emailed Brion directly for a little help and he got back to me within an hour. It wan't built for exactly your use case, but the elements are there and I highly recommend Brion who is very knowledgeable on file formats, encoding and playback.
You cannot use <audio> tag for this. However, what you can use
Web Audio API - allows you dynamically construct audio stream in JavaScript
WebRTC - might need the streamed data pre-processing on the server-side, not sure
Buffers are recyclable, so you can discard already played audio.
How you load your data (XHR, WebSockets, chuncked file downloads) really doesn't matter as long as you can stick the raw data to a JavaScript buffer.
Please note that there is no universal audio format the browsers can decode and your mileage with MP3 may vary. AAC (mpeg-4 audio) is more supported and it has best web and mobile coverage. You can also decode AAC using pure JavaScript in Firefox: http://jster.net/library/aac-js - you can also decode MP3 in pure JavaScript: http://audiocogs.org/codecs/mp3/
Note that localStorage supports only 5 MB data per origin without additional dialog boxes, so this severely limits storing audio on the client-side.

Is there a reliable way to learn if a video file is valid? VideoPlayer does not detect empty/small size files as invalid ones

I need to detect if video file is valid(and delete it if it is not). It is an air application.
For large files it can be accomplished by processing MediaPlayerState.PLAYBACK_ERROR state. No problems.
However, when there's an empty file like empty.mp4, it does not dispatch any PLAYBACK_ERROR.
Then I created a file singleChar.mp4 (with x letter inside), the behavior was exactly the same.
It goes through READY state without any problems. playing is true at that time (it is said in the documentation that it is true when it is playing or attempting to play, so this is also not reliable).
Then I suggested that it treats those files as valid ones with duration=0... No! Duration is not set, durationChange is not dispatched.
The best approach for now is to play() it, setTimeout for about 50ms and check duration when this timeout expires. At least it works in 50% cases. However, this is completely unreliable. Reliability may be improved by extending the delay, but I'd prefer to detect it quickly and handle silently before user will notice any problems.
Other possible approach is to check if the file is less than some size(maybe,100KB?), however, this is also another stupid unreliable approach.
So, is there a reliable way to detect if a file is a valid video file?

how to send h.264 parameter sets 'out of band' during an ongoing RTP session?

I am implementing an RTSP server in my application to serve h264 video via RTP. I have read the relevant RFC's and spent a lot of time reading about h264/RTP/RTSP and one point which I am still confused about is how to transmit the 'Sequence Parameter Set' and 'Picture Parameter Set' out of band.
The documentation that I have read states that these should preferably be transmitted via a reliable out-of-band mechanism but I haven't been able to find anything that defines how to transmit them out of band other than using the sprop-parameter-sets attribute of an SDP file.
For instance, RFC 6184 Sectoin 8.4 states:
Parameter set NALUs can be transported using three different
principles:
A. Using a session control protocol (out-of-band) prior to the
actual RTP session.
B. Using a session control protocol (out-of-band) during an ongoing
RTP session.
C. Within the RTP packet stream in the payload (in-band) during an
ongoing RTP session.
...
It is recommended to implement principles A and B within a session
control protocol ...
Section 8.2.2 includes a detailed discussion on transport of
parameter sets in-band or out-of-band in SDP Offer/Answer using media
type parameters sprop-parameter-sets ... This
section contains guidelines on how principles A and B should be
implemented within session control protocols.
...
Parameter sets MAY be added or updated during the lifetime of a
session using principles B and C.
I have read Sections 8.2.2 and 8.4 and cannot find any description of how to implement method 'B'. Everything that I have read on this topic is incredibly vauge, for instance, Wikipedia has the following to say on the subject:
In other applications, it can be advantageous to convey the parameter sets "out-of-band" using a more reliable transport mechanism than the video channel itself.
What am I missing here? Is there some other standard for transmitting this via RTSP? RTCP?