How are access units aligned within PES packets in Apple's HLS? - h.264

Does Apple specify this? How many access units should one put in a PES packet payload?
Also, I'm wondering which prefix start codes (if any) are present in PES packets. I assume that the one preceding the first NAL unit within an access unit is useless and mustn't be put. Right?
I'd like to know how it's done specifically in HLS - not necessarily any other MPEG-2 TS application.

I'd like to know how it's done specifically in HLS - not necessarily
any other MPEG-2 TS application.
HLS is a standard MPEG-2 TS stream. HLS does not do it any differently, except limit to a single audio and singe video stream. And limit codecs to AVC/AAC/MP3
For the rest of the answers, I will assume you are referring to the AVC codec. (AAC and and MP3 have different answers)
How many access units should one put in a PES packet payload?
One. However for efficiency, the last NALU may be truncated on a TS boundary, and the remainder prepended to the next AU at the start of the next PES. This is optional, but it does reduce bitrate.
I'm wondering which prefix start codes (if any) are present in PES
packets.
MPEGTS requires annex B style start codes. One start code before every NALU. An AU will have several NALUs as AUDs are required in mpegts.
I assume that the one preceding the first NAL unit within an access
unit is useless and mustn't be put. Right?
Completely wrong. Every NALU must begin with a start code,

Related

"Play" sounds into a .wav

I'm trying to make a program that can convert ORG files into WAV files directly. The ORG format is similar to MIDI, in the sense that it is a list of "instructions" about when and how to play specific instruments, and a program plays these instruments for it to create the song.
However, as I said, I want to generate a WAV directly, instead of just playing the ORG. So, in a sense, I want to "play" the sounds into a WAV. I do know the WAV format and have created some files from raw PCM samples, but this isn't as simple.
The sounds generated by the ORG come from a bunch of files containing WAV samples I have. They're mono, 8-bit samples should be played at 22050Hz. They're all under a second long, and the largest aren't more than 11KB. I would assume that to play them all after each other, I would simply put the samples into the WAV one after the other. It isn't that simple though, as the ORG can have up to 16 different instruments playing at once, and each note of each instrument also has a pan (i.e. a balance, allowing stereo sound). What's more, each ORG has its own tempo (i.e. milliseconds between each point a sound can be played), and some sounds may be longer than this tempo, which means that two sounds on the same instrument can overlap. For instance, a note plays on an instrument, 90 milliseconds later the same note plays on the same instrument, but the first not hasn't finished, hence the first note plays into the second.
I just thought to explain all of that to be sure the situation is clear. In any case, I'd basically like to know how I would go about converting or "playing" an ORG (or if you like, a MIDI (since they're essentially the same)) into a WAV. As I mentioned each note does have a pan/balance, so the WAV would also need to be stereo.
If it matters at all, I'll be doing this in ActionScript 3.0 in FlashDevelop. I don't need any code (as that would be asking someone to do the work for me), but I just want to know how I would go about doing this correctly. An algorithm or two may be handy as well.
First let me say AS3 is not the best language to do these kind of things. Super collider would be a better and easier choice.
But if you want to do it in AS3 here's a general approach. I haven't tested any of it, this is pure theory.
First, put all your sounds into an array, and then find a way of matching the notes from your midi file to a position in the array.
I don't know the format of midi in depth, but I know the smallest value is a tick, and the length of a tick depends on the BPM. Here's the formula to calculate a midi tick: Midi Ticks to Actual PlayBack Seconds !!! ( Midi Music)
Let's say your tick is 2ms in length. So now you have a base value. You can fill a Vector (like an Array but faster) with what happens at every tick. If nothing happens at a particular tick, then insert a null value.
Now the big problem is reading that Vector. It's a problem because the Timer class does not work at small values like 2ms. But what you can do is check the ellapsed time in ms since the app started using getTimer(). You can have some loop that will check the ellapsed time, and whenever you have 2ms more, you read the next index in the Vector. If there are notes on that index, you play the sounds. If not you wait for the next tick.
The problem with this, is that if a loop goes on for more than 15 seconds (I'm not sure of that value) Flash will think the program is not responding and will kill it. So you have to take care of that too, ending the loop and opening a new one before Flash kills your program.
Ok, so now you have sounds playing. You can record the sounds that flash is making (wavs, mp3, mic) with a library called Standing Wave 3.
https://github.com/maxl0rd/standingwave3
This is very theoretical... and I'm quite sure depending on the number of sounds you want to play you can freeze your program... but I hope it will help to get you going.

How to calculate the time-length of a midi-file

I am reading midi files in as3 (flash cs5) with the help of the helpful library that is called midas
( http://code.google.com/p/midas3/) - the midi-as3 library.
I am trying to figure out a simple way to calculate the whole duration of the midi file (for example - total time of 4 minutes or 6 minutes...). I assume I could calculate the last note of each track + check the tempo and figure it out, but I was wondering if:
Is the duration of the midi file is written somewhere in the data that I could just pull out and use?
or
Is there an easy way to calculate it without running through the whole file and compare last-notes/tempos.
Nope, you need to read the entire file and determine the time when you read the last note. MIDI files are essentially streaming data, so there is no "length" field in the file's header.
Edit: Upon further thought, "streaming" isn't exactly a great way to describe MIDI files. MIDI files do have a fixed length in bytes, which is is stored in the IFF chunk header. However, there is no property as for the length of the file in seconds, but assuming that you can read all of the bytes into a sequence (and don't forget to take tempo changes into account!), it should not be too difficult to determine the length of the file in seconds.

Syncing two AS3 NetStreams

I'm writing an app that requires an audio stream to be recording while a backing track is played. I have this working, but there is an inconsistent gap in between playback and record starting.
I don't know if I can do anything to make the sync perfect every time, so I've been trying to track what time each stream starts so I can calculate the delay and trim it server-side. This also has proved to be a challenge as no events seem to be sent when a connection starts (as far as I know). I've tried using various properties like the streams' buffer sizes, etc.
I'm thinking now that as my recorded audio is only mono, I may be able to put some kind of 'control signal' on the second stereo track which I could use to determine exactly when a sound starts recording (or stick the whole backing track in that channel so I can sync them that way). This leaves me with the new problem of properly injecting this sound into the NetStream.
If anyone has any idea whether or not any of these ideas will work, how to execute them, or some alternatives, that would be extremely helpful! Been working on this issue for awhile
The only thing that comes to mind is to try and use metadata, flash media streams support metadata and the onMetaData callback. I assume you're using flash media server for the audio coming in and to record the audio going out. If you use the send method while your streaming the audio back to the server, you can put the listening audio track's playhead timestamp in it, so when you get the 2 streams back to the server you can mux them together properly. You can also try encoding the audio that is streamed to the client with metadata and try and use onMetaData to sync them up. I'm not sure how to do this, but a second approach is to try and combine the 2 streams together as the audio goes back so that you don't need to mux them later, or attach it to a blank video stream with 2 audio tracks...
If you're to inject something into the NetStream... As complex as SOUND... I guess here it would be better to go with Socket instead. You'll be directly reading bytes. It's possible there's a compression on the NetStream, so the data sent is not raw sound data - some class for decompressing the codec there would be needed. When you finally get the raw sound data, add the input in there, using Socket.readUnsignedByte() or Socket.readFloat(), and write back the modified data using Socket.writeByte(), or Socket.writeFloat().
This is the alternative with injecting the back into the audio.
For syncing, it is actually quite simple. Even though the data might not be sent instantly, one thing still stays the same - time. So, when user's audio is finished, just mix it without anything else to the back track - the time should stay the same.
IF the user has slow internet DOWNLOAD, so that his backtrack has unwanted breaks - check in the SWF if the data is buffered enough to add the next sound buffer (usually 4096 bytes if I remember correctly). If yes, continue streaming user's audio.
If not, do NOT stream, and start as soon as the data catches back on.
In my experience NetStream is one of the most inaccurate and dirty features of Flash (NetStream:play2 ?!!), which btw is quite ironic seeing how Flash's primary use is probably video playback.
Trying to sync it with anything else in a reliable way is very hard... events and statuses are not very straight forward, and there are multiple issues that can spoil your syncing.
Luckily however, netStream.time will tell you quite accurately the current playhead position, so you can eventually use that to determine starting time, delays, dropped frames, etc... Notice that determining the actual starting time is a bit tricky though. When you start loading a netStream, the time value is zero, but when it shows the first frame and is waiting for the buffer to fill (not playing yet) the time value is something like 0.027 (depends on the video), so you need to very carefully monitor this value to accurately determine events.
An alternative to using NetStream is embedding the video in a SWF file, which should make synchronization much easier (specially if you use frequent keyframes on encoding). But you will lose quality/filesize ratio (If I remember correctly you can only use FLV, not h264).
no events seem to be sent when a connection starts
sure there does.. NetStatusEvent.NET_STATUS fires for a multitude of reasons for NetConnections and Netstreams, you just have to add a listener and process the contents of NET_STATUS.info
the as3 reference docs here and you're looking for NET_STATUS.info

What are the various browser data cache sizes?

Browsers render content after "enough" data has been received or once data stops flowing in (Content Length reached, for example).
I want to slowly stream data to the browser; to do this, I have to work around this data caching.
For example, instead of sending 40 bytes of JavaScript, I have to send the 40 bytes of JS followed by about 4 KB of spaces in order to get the browser to interpret the script.
This works fine. But I don't remember where I first heard the number "4 KB" and was wondering what the true required amount is per browser.
I could of course write a bunch of tests to find these numbers, but I was curious if anyone has already done this work for me. I am also at a loss for what to ask the Google regarding this.
If you want to know what response size browsers need before rendering content when you flush the response early, I found these numbers buried in a comment in a post about flushing the document early:
IE: 255 bytes
Safari: 1K
Chrome: 2K
If you're looking into this so that you can implement streaming, you might want to look into how various comet implementations handle this.

How would you go about reverse engineering a set of binary data pulled from a device?

A friend of mine brought up this questiont he other day, he's recently bought a garmin heart rate moniter device which keeps track of his heart rate and allows him to upload his heart rate stats for a day to his computer.
The only problem is there are no linux drivers for the garmin USB device, he's managed to interpret some of the data, such as the model number and his user details and has identified that there are some binary datatables essentially which we assume represent a series of recordings of his heart rate and the time the recording was taken.
Where does one start when reverse engineering data when you know nothing about the structure?
I had the same problem and initially found this project at Google Code that aims to complete a cross-platform version of tools for the Garmin devices ... see: http://code.google.com/p/garmintools/. There's a link on the front page of that project to the protocols you need, which Garmin was thoughtful enough to release publically.
And here's a direct link to the Garmin I/O specification: http://www.garmin.com/support/pdf/IOSDK.zip
I'd start looking at the data in a hexadecimal editor, hopefully a good one which knows the most common encodings (ASCII, Unicode, etc.) and then try to make sense of it out of the data you know it has stored.
As another poster mentioned, reverse engineering can be hairy, not in practice but in legality.
That being said, you may be able to find everything related to your root question at hand by checking out this project and its' code...and they do handle the runner's heart rate/GPS combo data as well
http://www.gpsbabel.org/
I'd suggest you start with checking the legality of reverse engineering in your country of origin. Most countries have very strict laws about what is allowed and what isn't regarding reverse engineering devices and code.
I would start by seeing what data is being sent by the device, then consider how such data could be represented and packed.
I would first capture many samples, and see if any pattern presents itself, since heart beat is something which is regular and that would suggest it is measurement related to the heart itself. I would also look for bit fields which are monotonically increasing, as that would suggest some sort of time stamp.
Having formed a hypothesis for what is where, I would write a program to test it and graph the results and see if it makes sense. If it does but not quite, then closer inspection would probably reveal you need some scaling factors here or there. It is also entirely possible I need to process the data first before it looks anything like what their program is showing, i.e. might need to integrate the data points. If I get garbage, then it is back to the drawing board :-)
I would also check the manufacturer's website, or maybe run strings on their binaries. Finding someone who works in the field of biomedical engineering would also be on my list, as they would probably know what protocols are typically used, if any. I would also look for these protocols and see if any could be applied to the data I am seeing.
I'd start by creating a hex dump of the data. Figure it's probably blocked in some power-of-two-sized chunks. Start looking for repeating patterns. Think about what kind of data they're probably sending. Either they're recording each heart beat individually, or they're recording whatever the sensor is sending at fixed intervals. If it's individual beats, then there's going to be a time delta (since the last beat), a duration, and a max or avg strength of some sort. If it's fixed intervals, then it'll probably be a simple vector of readings. There'll probably be a preamble of some sort, with a start timestamp and the sampling rate. You can try decoding the timestamp yourself, or you might try simply feeding it to ctime() and see if they're using standard absolute time format.
Keep in mind that lots of cheap A/D converters only produce 12-bit outputs, so your readings are unlikely to be larger than 16 bits (and the high-order 4 bits may be used for flags). I'd recommend resetting the device so that it's "blank", dumping and storing the contents, then take a set of readings, record the results (whatever the device normally reports), then dump the contents again and try to correlate the recorded results with whatever data appeared after the "blank" dump.
Unsure if this is what you're looking for but Garmin has created an API that runs with your browser. It seems OSX is supported, as well as Windows browsers... I would try it from Google Chromium to see if it can be used instead of this reverse engineering...
http://developer.garmin.com/web-device/garmin-communicator-plugin/
API Features
Auto-detection of devices connected to a computer Access to device
product information like product name and software version Read
tracks, routes and waypoints from supported recreational, fitness and
navigation devices Write tracks, routes and waypoints to supported
recreational, fitness and navigation devices Read fitness data from
supported fitness devices Geo-code address and save to a device as a
waypoint or favorite Read and write Garmin XML files (GPX and TCX) as
well as binary files. Support for most Garmin devices (USB, USB
mass-storage, most serial devices) Support for Internet Explorer,
Firefox and Chrome on Microsoft Windows. Support for Safari, Firefox
and Chrome on Mac OS X.
Can you synthesize a heart beat using something like a computer speaker? (I have no idea how such devices actually work). Watch how the binary results change based on different inputs.
Ripping apart the device and checking out what's inside would probably help too.