In pyaudio, how can I get a stream to 'consistently' produce sound? - pyaudio

I am using an endless while loop to convert byte data (used in single chunks), to an integer value (in order to 'manipulate' those values and then reconvert the values back to bytes again and write the bytes to the PyAudio stream (for sound).
Everything plays smoothly until I write a complex function that takes up too much processing time. Then, I hear a bunch of pops, snaps and clicks over the audio. I notice that the reason for this happening, is because (between the time that the program loops to play the next chunk of streamed data provided to the PyAudio stream), there is a 'transition' of silence as there is a 'wait' for the loop to repeat... and that is what creates the pops between each chunk being played if the loop is 'too slow'.
Is there a way to make the 'voltage' going to the speakers at 'constant' based on the last data value provided to the PyAudio's stream? This would be a great way to smoothen out the 'pops', 'snaps' and 'clicks' when playing sound, instead of there being silence until the next value is passed to the stream. The reason I don't use a chunk greater than 1, is because I want to do many 'creative' things with PyAudio (through an endless loop), and have values inside the loop determine the 'voltage level' going to the speakers.

Related

Is a single integer considered an "audio frame" in MediaCodec?

I read the following from the official docs on MediaCodec:
Raw audio buffers contain entire frames of PCM audio data, which is one sample for each channel in channel order. Each PCM audio sample is either a 16 bit signed integer or a float, in native byte order.
https://source.android.com/devices/graphics/arch-sh
The way I read this is that a buffer contains an entire frame of audio but a frame is just one signed integer. This doesn't seem to make sense. Or is this two values for the left and right audio? Why call it a buffer when it only contains a single value? To me, a buffer refers to several values spanning several milliseconds.
Here's what the docs for AudioFormat say:
For linear PCM, an audio frame consists of a set of samples captured at the same time, whose count and channel association are given by the channel mask, and whose sample contents are specified by the encoding. For example, a stereo 16 bit PCM frame consists of two 16 bit linear PCM samples, with a frame size of 4 bytes.
You are right that it doesn't make sense to use a buffer for just one frame. And in practice buffers are filled with many frames.
You can figure out the number of frames in a buffer from the size property of MediaCodec.BufferInfo and the frame size.

Zero-padded h264 in mdat

I'd like to do some stuff with h.264 data recorded from Android phone.
My colleague told me there should be 4 bytes right after mdat wich specifies NALU size, then one byte with NALU metadata and then the raw data, and then (after NALU size), another 4 bytes with another NALU size and so on.
But I have a lot of zeros right after mdat:
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0e00000000000000000000000000000000000000000000000000000000000000
8100000000000000000000000000000000000000000000000000000000000000
65b84f0f87fa890001022e7fcc9feef3e7fabb8e0007a34000f2bbefffd07c3c
bfffff08fbfefff04355f8c47bdfd05fd57b1c67c4003e89fe1fe839705a699d
c6532fb7ecacbfffe82d3fefc718d15ffffbc141499731e666f1e4c5cce8732f
bf7eb0a8bd49cd02637007d07d938fd767cae34249773bf4418e893969b8eb2c
Before mdat atom are just ftyp mp42, isom mp42 and free atoms. All other atoms (moov, ...) are at the end of the file (that's what Android does, when it writes to socket and not to the file). But If necessary, I've got PPS and SPS from other file with same camera and encoder settings recorded just a seond before this, just to get those PPS and SPS data.
So how exactly can i get NALUs from that?
You can't. The moov atom contains information required to parse the mdat. Without it the mdat has little value. For instance, the first NALU does not need to start at the begining of the mdat, It can start anywhere within the mdat. The byte it starts at is recorded in (I believe) the stco box. If the file has audio, you will find audio and video mixed within mdat with no way to determine what is what without the chunk offsets. In addition, if the video has B frames, there is no way to determine render order without the cts, again only available in the moov. And Technically, the nalu size does not need to be 4 bytes and you cant know that without the moov. I recommend not used mp4. Use a streamable container such as ts or flv. Now if you can make some assumption about the code that is producing the file; Like the chunk offset is always the same, and there is no b frames, you can hard code these values. But is not guaranteed to work after a software update.

as3 bytearray splice

I'm pretty much an actionscript novice and I'm trying just slice the first and last X bytes out of a byte array in as3, and can't seem to find anything anywhere on how to do that.
If it matters, the byte array is a set of floats recorded from a microphone that I'm trying to cut the first and last 1/4 of a second off of before it's encoded as a .wav file.
Assuming you an existing ByteArray, let's call it rawBytes:
var trimmedBytes:ByteArray = new ByteArray();
var quarterSecond:int = 1000; // no. bytes per 1/4 second (arbitrary estimate)
rawBytes.readBytes(trimmedBytes, quarterSecond, rawBytes.length - quarterSecond * 2);
Your trimmedBytes variable will now be populated with the second recording minus the first and last quarter second - assuming that quarterSecond variable has the right value. I don't know what that value should be, I'd imagine it would depend on the bitrate at which you're recording. You could probably get there via trial and error though!

Unlimited Map Dimensions for a game in AS3

Recently I've been planning out how I would run a game with an environment/map that is capable of unlimited dimensions (unlimited being a loose terms as there's obviously limitations on how much data can be stored in memory, etc). I've achieved this using a "grid" that contains level data stored as a String that can be converted to a 2D Array that would represent objects and their properties.
Here's an example of two objects stored as a String:
"game.doodads.Tree#200#10#terrain$game.mobiles.Player#400#400#mobiles"
The "grid" is a 3D Array, of which the contents would represent the x/y coordinate of the grid cell. The grid cells would be, say, 600x600.
An example of this "grid" Array would be as follows:
var grid:Array = [[["leveldata: 0,0"],["leveldata 0,1"]],
[["leveldata: 1,0"],["leveldata 1,1"]]];
The environment will handle loading a grid square and it's 8 surrounding squares based on a given point. ie the position of the Player. It would have a function along the lines of
function loadCells(xp:int, yp:int):void
This would also handle the unloading of the previously loaded cells that are no longer close enough to be required. In the unload process, the data at grid[x][y] would be overwritten with the new data, which is created by looping through the objects in that cell and appending each new set of data to the grid cell data.
Everything works fine in terms of when you move in a direction, cells are unloaded/saved and new cells are loaded. The problem is this:
Say this is a large city infested by zombies. If you walk three grid squares in any direction and return, everything is as you left it. I'm struggling to find a way to at least simulate all objects still moving around and doing their thing. It looks silly when you for example throw a grenade, walk away, return and the grenade still hasn't detonated.
I've considered storing a timestamp on each object when I unload the level, and when it's initialized there's a loop that runs it's "step" function amount of times. Problem here is obviously that when you come back 5 minutes later, 20 zombies are going to try and step 248932489 times and the game will crash.
Ideas?
I don't know AS3 but let me try give you some tips.
It seems you want to make a seamless world since you load / unload cells as a player moves. That's a good idea. What you should do here is deciding what data a cell should load / unload( or, even further, what data a cell should hold or handle ).
For example, the grenade should not be unloaded by the cell as it needs to be updated even after the player leaves the cell. It's NOT a good idea for a cell to manage all game objects just because they are located in the cell. Instead, the player object could handle the grenade as an owner. Or, there could be one EntityManager which handles all game entities like grenades or zombies.
With this idea, you can update your 20 zombies even when they are in an unloaded zone (their location does not matter anymore) instead of calling update() 248932489 tiems at once. However, do you really need to keep updating the zombies? Perhaps, unloading all of them and spawning new zombies with the number of the latest active zombies in the cell would work. It depends on your game design but, usually, you don't have to update entities which are not visible or far enough from the player. Hope it helps. Good luck! :D
Interesting problem. Of course, game cannot simulate unlimited environment precisely. If you left some zone for a few minutes, you don't need every step of zombies (or any actors there) to be simulated precisely. Every game has its own simplications. Maybe approximate simulation will help you. For example, if survivor was in heavily infested zone for a long time, your simulator could decide that he turned into zombie, without computing every step of the process. Or, if horde was rampant in some part of the city, this part should be damaged randomly.
I see two methods as to how you could handle this issue.
first: you have the engine keep any active cells loaded, active means there are object-initiated events involving cell-owned objects OR player-owned objects (if you have made such a distinction that is). Each time such a cell is excluded from normal unloading behaviour it is assigned a number and if memory happens to run out the cells which have the lowest number will be unloaded. Clearly, this might be the hardest method code-wise but still it might be the only one truly doing what you desire.
second: use very tiny cells and let the engine keep a narrow path loaded. The cells migth then be 100x100 instead of 600x600 and 36 of them do the work one bigger cell would do ( risk: more cluttter code-wise) then each cell your player actually traverses ( not all that have been loaded to produce a natural visibility-range ) is kept in memory including every cell that has player-owned objects in it for a limited amount of time ( i.e. 5 minutes ).
I believe you can find out how to check for these conditions upon unloading yourself and hope to have been of help to you.

How to use FFT for large chunks of data to plot amplitude-frequency response?

I am a programmer and not a good mathematician so FFT is like some black box to me, I would like t throw some data into some FFT library and get out a plottable AFR (amplitude-frequency response) data, like some software like Rightmark audio does:
http://www.ixbt.com/proaudio/behringer/3031a/fr-hf.png
Now I have a system which plays back a logarithmic swept sine (with short fade-in/fade-out to avoid sharp edges) and records the response from the audio system.
As far as I understand, I need to pad the input with zeros to 2^n, use audio samples as a real part of a complex numbers, set imaginary=0, and I'll get back from FFT the frequency bins array whith half length of input data.
But if I do not need as big frequency resolution as some seconds audio buffer give to me, then what is the right way to make, lets say, 1024 size FFT window, feed chunks of audio and get back 512 frequency points which take into account all the data I passed in? Or maybe it is not possible and I need to feed entire swept sine at once to get back all the AFR data I need?
Also is there any smoothing needed? I have seen that the raw output from FFT may be really noisy. What is the right way to avoid the noise as early as possible, so I see the noise only as it comes from the AFR itself and not from FFT calculations (like the image in the link I have given - it seems pretty smooth)?
I am a C++/C# programmer. I would be grateful for any examples which show how to process chunks of swept sine end get back AFR data. For now I have found only examples which process data in small chunks in realtime, and that is not what I need.
Window function should help you reducing the noise
All you need to do is multiply your input data by w(n) :