How many bytes does ffprobe need? - ffprobe

I would like to use ffprobe to look at the information of media files. However, the files are not on my local disk, and I have to read from a remote storage. I can read the first n bytes, write them to a temporary file and use ffprobe to read the information. I would like to know the least such n.
I tested with a few files, and 512KB worked with all the files that I tested. However, I am not sure if that will work for all media files.

ffprobe (and ffmpeg) aims to parse two things when opening an input:
the input container's header
payload data from each stream, enough to ascertain salient stream parameters like codec attributes and frame rate.
The header size is generally proportional to the number of packets in the file i.e. a 3 hour MP4 file will have a larger header than a 3 min MP4.
(if the header is at the end of the file, then access to the first 512 kB won't help)
From each stream, ffmpeg will decode packets till its stream attributes have been populated. The amount of bytes consumed will depend on stream bitrate, and how many streams are present.
So, the strict response to 'I am not sure if that will work for all media files' is it won't.

Related

Downsampling possible within RTSP/RTP?

I have a media server serving several cameras.
I'd like for the server to downsample the data from, say, 20 fps to 1 fps.
Obviously I could do this by decoding and recoding the video frames - however, the server is a little resource constrained. I notice that if I simply drop RTP UDP packets, the output is not so good - I see both tearing and junk in the images (at least with a opencv/ffmpeg client).
Is it possible to downsample within an RTP stream by dropping more carefully chosen frames/packets, to avoid junk and tearing in the output? (Currently I'm able to extract RTP|H264 raw data chunks on the server, but am not running them through a full codec).
An H.264 stream consists of different frame types: I (or IDR), P and B.
I (or IDR) frames are a full pictures and can be decoded without any other frames.
So you could filter out P and B frames and only pass on I frames.
Your resulting frame rates depends on the I (or IDR) frame frequency of the original stream. I am guessing you get somewhere between 0.1 to 2 fps.

Re encode an audio stream recording on the fly?

Is it possible to rip an audio steam with Variable Bit Rate encoding and re encode it on the fly as it is being recorded with Constant Bit Rate encoding?
I am downloading an audio stream in AAC format with VBR encoding using cURL.
The duration of a VBR encoded file will be calculated by its byte length, resulting in a discrepancy for the duration on different players. The result of this duration discrepancy does not allow me to seek and slice precisely. I would need to re encode it somehow with a constant bit rate to get the seeking to work properly.
The audio stream is hours long so re encoding it afterwards takes way too much time and processing power.
Is there anything I can do about this?
Perhaps I can specify some settings in cURL to achieve a constant recording bit rate?

Is it possible to remove start codes using NVENC?

I'm using NVENC SDK to encode OpenGL frames and stream them over RTSP. NVENC gives me encoded data in the form of several NAL units. In order to stream them with Live555 I need to find the start code (0x00 0x00 0x01) and remove it. I want to avoid this operation.
NVENC has a sliceOffset attribute which I can consult, but it indicates slices, not NAL units. It only points the ending of the SPS and PPS headers, where the actual data starts. I understand that a slice is not equal to a NAL (correct me if I'm wrong). I'm already forcing single slices for encoded data.
Is any of the following possible?
Force NVENC to encode individual NAL units
Force NVENC to indicate where the NAL units in each encoded data block are
Make Live555 accept the sequence parameters for streaming
There seems to be a point where every person trying to do H.264 over RTSP/RTP comes down to this question. Well here are my two cents:
1) There is a concept of an access unit. An access unit is a set of NAL units (may be as well be only one) that represent an encoded frame. That is the level of logic you should work at. If you are saying that you want the encoder to give you individual NAL unit's, then what behavior do you expect when the encoding procedure results in multiple NAL units from one raw frame (e.g. SPS + PPS + coded picture). That being said, there are ways to configure the encoder to reduce the number of NAL units in an access unit (like not including the AUD NAL, not repeating SPS/PPS, exclude SEI NAL's) - with that knowledge you can actually know what to expect and kind of force the encoder to give you single NAL per frame (of course this will not work for all frames, but with the knowledge you have about the decoder you can handle that). I'm not an expert on the NVENC API, I've also just started using it, but at least as for Intel Quick Sync, turning off AUD,SEI and disabling repetition of PPS/SPS gave me roughly around 1 NAL per frame for frames 2...N.
2) Won't be able to answer this since as I mentioned I'm not familiar with the API but I highly doubt this.
3) SPS and PPS should be in the first access unit (the first bit-stream you get from the encoder) and you could just find the right NAL's in the bit-stream and extract them, or there may be a special API call to obtain them from the encoder.
All that being said, I don't think it is that hard to actually run through the bit-stream, parse the start codes and extract the NAL unit's and feed them to Live555 one by one. Of course, if the encoder offers to output the bit-stream in the AVCC format (compared to the start codes or Annex B it uses interleaved length value between the NAL units so you can just jump to the next one without looking for the prefix) then you should use it. When it is just RTP it's easy enough to implement the transport yourself, since I've had bad luck with GStreamer that did not have proper support for FU-A packetization, in case of RTSP the overhead of the transport infrastructure is bigger and it is reasonable to use a 3rd party library like Live555.

HLS - how to reduce delay?

Anyone know how configure the HLS media server for reduce a little bit the delay of live streaming video?
what types of parameters i need to change?
I had heard that you could do some tuning using parameters like this: HLSMediaFileDuration
Thanks in advance
A Http Live Streaming system typically has an encoder which produces segments of a certain number of seconds and a media server (web server) which serves playlists containing a list of URLs to these segments to player applications.
Media Files = Segments = .ts files = MPEG2-TS files (in HLS speak)
There are some ways to reduce the delay in HLS:
Reduce the encoded segment length from Apple's recommended 10 seconds to 5 seconds or less. Reducing segment length increases network overhead and load on the web server.
Use lower bitrates, larger .ts files take longer to upload and download. If you use multi-bitrate streams, make sure the first bitrate listed in the playlist is a little lower than the bitrate most of your users use. This will reduce the time to start playing back the stream
Get the segments from the encoder to the web server faster. Upload while still encoding if possible. Update the playlist as soon as the segment has finished uploading
Also remember that the higher the delay the better the quality of your stream (low delay = lower quality). With larger segments, there is less overhead so more space for video data. Taking a longer time to encode results in better quality. More buffering results in less chance of video streams stuttering on playback.
HLS is all about trading quality of playback for longer delay, so you will never be able to use HLS for things like video conferencing. Typical delay in HLS is 30-60 sec, minimum in practice is around 15 sec. If you want low delay use RTP for streaming, but good luck getting good quality on low or variable speed networks.
Please specify which media server you use. Generally speaking, yes - changing chunk size will definitely affect delay time. The less is the first chunk, the quicker the video will be shown in the player.
Actually, Apple recommend to divide your file to small chunks this equal length of file, integers.
In practice, there is huge difference between players. Some of them parse manifest changing this values.
Known practice is to pre-cache in memory first chunks in low & medium resolution (Or try to download them in background of app/page - Amazon does this, though their video is MSS)
I was having the same problem and the keys for me were:
Lower the segment length. I set it to 2s because I'm streaming on a local network. In other type of networks, you need to be careful with the overhead that a low segment length adds that can impact your playback quality.
In your manifest, make sure the #EXT-X-TARGETDURATION is accurate. From here:
The EXT-X-TARGETDURATION tag specifies the maximum Media Segment
duration. The EXTINF duration of each Media Segment in the Playlist
file, when rounded to the nearest integer, MUST be less than or equal
to the target duration; longer segments can trigger playback stalls
or other errors. It applies to the entire Playlist file.
For some reason, the #EXT-X-TARGETDURATION in my manifest was set to 5 and I was seeing a 16-20s delay. After changing that value to 2, which is the correct one according to my segments' length, I am now seeing delays of 6-10s.
In summary, you should expect a delay of at least 3X your #EXT-X-TARGETDURATION. So, lowering the segment length and the #EXT-X-TARGETDURATION value should help reducing the delay.

How does a stored image or video appear in binary on the hard drive?

In attempting to understand the concept of binary, my question is "How does a stored image or video look in binary on the hard drive?"
As for how it is physically stored, it depends on the technology of your storage device. For a hard disk drive you can read about it on Wikipedia.
The next layer is how the controller on the storage device sends the data to the motherboard.
Then how the motherboard sends the data to the operating system.
Then how the operating system stores the data on the disk (what file system it uses; NTFS is common in modern Windows installations.)
Finally, what you'll see when reading the data is groups of 8 bits (bytes) which are basically 8 on/off flags, which together form 256 possible combinations. Which is why most image formats are stored with colors varying from 0-255 for each channel (red, green, blue.) Most raw formats are stored linearly, so you can actually try reading them yourself. A raw image where the first pixel is red (assuming it stores the pixels left-to-right, top-to-bottom) would look like this in bits:
11111111 00000000 00000000
red green blue
For more information, you'll have to be more specific.
Every file on disk is basically a number of bits in a row.
The difference between "binary" and "something else" (often called ASCII, or text, or...) is that non-binary is basically human readable when opened in a text editor. In other words: the bytes in the file map to human readable letter (and other) characters in some way a generic text editor knows how to handle.
So called binary files can only be interpreted back to that data that they actually contain when you know the format which was used to map the content (image, sound, movie, whatever) to a stream of zeros and ones. This mapping is called the file format and is usually part of the file name in the form of an extension. You need a piece of software that knows the mapping and can interpret the row of bits back into the original content.
Mind you: this is usually only a hint. Renaming a JPEG image file to have a .mp3 extension doesn't change it into an audio file; it is still just an image file, containing the image (=dimensions of the image in pixels + the color values for each pixel, basically) encoded into a stream of zeros and ones in the way described in the JPEG file format encoding description.
Check out the link: Binary File Format
The images are sequential flow of colored dots... But it's not hardware dependent i.e. your hard-disk will store any thing in any format which your OS provide it to... However the OS maintain standards of saving file formats other wise a JPG image will not be valid one across different platforms...
Simillarly the videos are flows of images and voice data multiplexed into a sequential flow.
All data on commercial computer systems are stored in binary format (we'll ignore scientific studies into quantum and optical computing).
At the lowest level all files and processing by a computer are performed in binary. This is because our computing systems are powered by the flow of electrons. They either flow or don't. Electric current is on or off. 1 and 0.
The data stored on a hard disk is there due to pulsing of the hard disk write head coil which magnetises spots of hard disk material. These magnetised spots cause a current pulse in the read coil (in actual fact the read and write coils are the same) as the hard disk head passes over them. Hence the data is read as a stream of current pulses, 1s and 0s.
Now processors are built to accept process a finite number of binary "pulses" or data bits simultaneously (it can be anything from 4 bits upwards). Hence a modern 64bit PC can process 64 binary data bits i.e. 64 1s and 0s, at any one time.
Now at a higher level, although all files are stored as binary and can be read in binary format we help the processing of them by telling the processor what format to read them in. This is so that it process the file data as small chunks e.g. 8 bits or 1 byte for ASCII text.
The operating system provides the processor with a template for any given file. This is set up in an extension relation table. And according to what the file extension is the operating system will expect that data to be in a particular format and link it to code that can be used by the processor to interpret it. Hence changing a file name extension will confuse the processor as it won't interpret the data correctly. That's why changing the filename from *.jpg to *.exe won't show the image, as the processor has been told to expect executable code, which the data within the file clearly isn't.
So back to your original question the image within the jpeg file has been encoded as series of 1s and 0s in a specific order.
I'm not sure how exactly they are arranged, but as an example:
A picture was captured and stored as a bitmap at a resoultion of 800 x 600 in 24bit colour. The first pixel is stored as 3 bytes (8 bit binary) representing a red, green and blue value. The value of each byte dictates the intensity of that colour. 0 - 255, with 0 being none at all to 255 being the highest value. Unsigned 255 in binary is 11111111, I won't confuse you with 2's complement for signed values. So the full picture will require a file of minimum 1,440,000 bytes or about 1,406 kilobytes (a kilobyte being 1024 bytes).
The binary as follows: 000010101011010101101010101 would be stored on a hard drive actual microscopic bumps and troughs by changing the polarity of the metalic grains on the disk in specific regions.Binary is actually read from right to left, obviously the opposite way of how most people read text.
If your question is really "how does it look": See Figure 4 on this page; it shows high resolution measurements of a hard drive.
Although googletorp's answer does not look very helpful, it's not totally untrue. To store binary data, the only thing you need is the possibility to have two different states for each storage unit (be it an on/off switch, hole or no hole in a punchcard, or, as in the case of hard drives, the direction of ferromagnetic particles).
The Wikipedia page for the BMP File Format contains an example(Including all hex values) of a 2x2 pixel bitmap image, it should be very good at explaining the basis of the binary representation of an image.
In general if you're really curious how the binary looks for a file you could always use a Hex Viewer and take a look yourself :) I normally use od on Linux to dump the binary information of a file. I'm sure you can google a good Hex Editor for Windows (or maybe someone can suggest one.)
Headers ? Every file created contains header information, that are also stored as binary bits along with the data. The header bits of a files holds the information of header length, file type, file location and length. Now each application is designed to read certain file types. If the application tries to open a file on hard disk which has a header with a different file format, that is not supported by the application, it fails to read the file. Thus a text file cannot be opened using a media player. Because a media player expects a file that contains a header with audio file format binary pattern. Similarly, same in case of picture files.