Behavior of Pa_GetDeviceInfo() and PaAsio_GetAvailableBufferSizes() - pyaudio

I'm building a simple application using portaudio, but there is something I can't quite explain when it comes to ASIO drivers, and to the behavior of Pa_GetDeviceInfo() and PaAsio_GetAvailableBufferSizes()
Let's say I use an external application to record audio from my ASIO device at 48 kHz.
At this point if I run the "pa_devs" example that ships with portaudio, I get the below:
[ Default ASIO Input, Default ASIO Output ]
Name = xxx
Host API = ASIO
Max inputs = 8, Max outputs = 8
Default low input latency = 0.0196
Default low output latency = 0.0196
Default high input latency = 0.0196
Default high output latency = 0.0196
ASIO minimum buffer size = 864
ASIO maximum buffer size = 864
ASIO preferred buffer size = 864
ASIO buffer granularity = 0
Default sample rate = 44100.00
Now if I make a second recording at 192 kHz, and subsequently run "pa_devs" again, I get this:
[ Default ASIO Input, Default ASIO Output ]
Name = xxx
Host API = ASIO
Max inputs = 8, Max outputs = 8
Default low input latency = 0.0784
Default low output latency = 0.0784
Default high input latency = 0.0784
Default high output latency = 0.0784
ASIO minimum buffer size = 3456
ASIO maximum buffer size = 3456
ASIO preferred buffer size = 3456
ASIO buffer granularity = 0
Default sample rate = 44100.00
So what seems to be happening is that the buffer size is automatically adjusted based on the sample rate, to maintain a fixed latency in absolute terms:
864/48000 = 3456/192000 = 18 ms
Now here is the real question. As opposed to "pa_devs", my own application doesn't start and stop between recordings, but is constantly running on the side.
However, if between recordings my app calls Pa_GetDeviceInfo() and PaAsio_GetAvailableBufferSizes() then I invariably get the same latencies and the same ASIO buffer sizes. Basically it doesn't seem to see that the external recording application has changed the sample rate between successive calls. Any ideas what can be causing that ?
[edit]: using pyaudio builds from here (with ASIO support) as a different observation mechanism, I find that executing pyaudio.PyAudio().get_device_info_by_index(N) also invariably returns the same answer, even though the sample rate has been changed (which I can also monitor on the ASIO control panel)
Yet as a different observation mechanism, I have an Audio Precision box. On the ASIO connector settings, it instantly reflects any changes of sample rate and changes of buffer sizes as soon as recording starts on the external application. How come pyaudio and my own app do not get notified of changes too ?

At least now I think I understand the root cause of the problem. I'm not too sure if that's a bug or a design intent.
When the parameters of devices get changed externally (e.g. current sample rate or default sample rate), portaudio won't reflect these changes (in terms of sample rate but also in terms of buffer sizes and latencies) as long as all previous calls to Pa_Initialize() have not been closed by a Pa_Terminate() and that Pa_Initialize() is eventually called again.
So if an app calls Pa_Initialize() at startup, it will get the same replies from all subsequent calls to Pa_GetDeviceInfo() and PaAsio_GetAvailableBufferSizes(), even if devices properties are changed externally in the meantime.
Rather, it seems an app should always call Pa_Initialize() and Pa_Terminate() in pairs every time it wants to do anything, be it as simple as Pa_GetDeviceInfo(). It seems to raise further questions:
what about pyaudio ? it exposes terminate() but doesn't expose initialize(). I haven't found a way to force an update of devices properties. Even reload(pyaudio) doesn't do the trick.
What about apps that are concurrently running and that load the same shared portaudio library ? If they make interleaved calls to Pa_Initialize() and Pa_Terminate(), they presumably won't get correct values ?

Related

Pytorch DirectML computational inconsistency

I am trying to train a DQN on the OpenAI LunarLander Enviroment. I included an argument parser to control which device I use in different runs (CPU and GPU computing with Pytorch's to("cpu") or to("dml") command).
Here is my code:
# Putting networks to either CPU or DML e.g. .to("cpu") for CPU .to("dml") for Microsoft DirectML GPU computing.
self.Q = self.Q.to(self.args.device)
self.Q_target = self.Q_target.to(self.args.device)
However, in pytorch-directml some methods do not have support yet such as .gather(), .max(), MSE_Loss() etc. That is why I need to unload the data from GPU to CPU, do the computations, calculate loss and put it back to GPU for further actions. See it below.
Q_targets_next = self.Q_target(next_states.to("cpu")).detach().max(1)[0].unsqueeze(1).to("cpu") # Calculate target value from bellman equation
Q_targets = (rewards.to("cpu") + self.args.gamma * Q_targets_next.to("cpu") * (1-dones.to("cpu"))) # Calculate expected value from local network
Q_expected = self.Q(states).contiguous().to("cpu").gather(1, actions.to("cpu"))
# Calculate loss (on CPU)
loss = F.mse_loss(Q_expected, Q_targets)
self.optimizer.zero_grad()
loss.backward()
self.optimizer.step()
# Put the networks back to DML
self.Q = self.Q.to(self.args.device)
self.Q_target = self.Q_target.to(self.args.device)
The strange thing is this:
Code is bug free; when I run it with args.device = "cpu" it works perfectly however, when I run the exact same code with args.device = "dml" it is terrible and network does not learn anything.
I noticed in every iteration results between CPU and GPU are changing just a little bit(1e-5) but after long iterations this makes a huge difference and GPU and CPU results are almost completely different.
What am I missing here? Is there something I need to pay attention when moving matrices between CPU and GPU? Should I make them contiguous()? Or simply is this a bug in pytorch-dml library?

Parameters and computational time of epoch increases with the increasing channels of input image?

I have different sets of images at input (i.e. 56x56x3, 56x56x5, 56x56x10, 56x56x30, 56x56x61) with the same network.
1) I want to know that the number of parameters of network will be same for each input?
2) Computational time of each epoch is slightly higher by increasing the number of channels at input, is it normal?
UPDATE
Parameter calculation for 3 channels
3*3*3*64 = 1728
3*3*64*128 = 73728
3*3*128*256 = 294912
5*5*256*512 = 3276800
1*1*512*1024 = 524288
1*1*1024*4 = 4096
Parameter calculation for 10 channels
3*3*10*64 = 5760
3*3*64*128 = 73728
3*3*128*256 = 294912
5*5*256*512 = 3276800
1*1*512*1024 = 524288
1*1*1024*4 = 4096
For performing convolution it is necessary that any kernel (or filter) has the same number of channels as the input feature map (or image), for the corresponding layer. And the number of parameters for that layer is given as:
No of Kernels x Kernel Height x Kernel Width x No of Channels in the Kernel
So you see that the number of parameters are actually directly proportional to the number of channels in the input feature map. And it is obvious that as the number of parameters increase the number of computations also increase, hence the increased computational time.
You may see the detailed explanation of convolution operation in my post here.

Windows Phone IsolatedStorageSettings: capacity and dynamic allocation

I am going to save enough big amounts of data in my WP8 app using the handy IsolatedStorageSettings dictionary. However, the first question that arises is how big is it?
Second, in the documentation for the IsolatedStorageSettings.Save method we can find this:
If more space is required, use the IsolatedStorageFile.IncreaseQuotaTo
method to request more storage space from the host.
Can we estimate the amount of required memory and increase the room for IsolatedStorageSettings accordingly? What if we need to do that dynamically, as the user is entering new portions of data to store persistently? Or, maybe, we need to use another technique for that (though I would like to stay with the handy IsolatedStorageSettings class)?
I have found the answer to the first part of my question in this article: How to find out the Space in isolated storage in Windows Phone?. Here is the code to get the required value on a particular device with some enhancements:
long availablespace, Quota;
using (var store = IsolatedStorageFile.GetUserStoreForApplication())
{
availablespace = store.AvailableFreeSpace ;
Quota = store.Quota ;
}
MessageBox.Show("Available : " + availablespace.ToString("##,#") + "\nQuota : " + Quota.ToString("##,#));
The 512Mb WP8 emulator gave me the following values for a minimal app with few strings saved in IsolatedStorageSettings:
Lumia 920 reports even a much bigger value - about 20Gb, which gladdens my heart. Such a big value (which, I think, depends on the whole available memory in the device) will allow me to use the IsolatedStorageSettings object for huge amounts of data.
As for a method one can use to estimate the amount of data, I guess, this can be done only experimentally. For instance, when I added some strings to my IsolatedStorageSettings, the available space was reduced by 4Kb. However, adding the same portion of data again did not show any new memory allocation. As I can see, it is allocated by blocks of 4Kb.

How to diag imprecise bus fault after config of priority bit allocation, Cortex M3 STM32F10x w uC/OS-III

I have an issue in an app written for the ST Microelectronics STM32F103 (ARM Cortex-M3 r1p1). RTOS is uC/OS-III; dev environment is IAR EWARM v. 6.44; it also uses the ST Standard Peripheral Library v. 1.0.1.
The app is not new; it's been in development and in the field for at least a year. It makes use of two UARTs, I2C, and one or two timers. Recently I decided to review interrupt priority assignments, and I rearranged priorities as part of the review (things seemed to work just fine).
I discovered that there was no explicit allocation of group and sub-priority bits in the initialization code, including the RTOS, and so to make the app consistent with another app (same product, different processor) and with the new priority scheme, I added a call to NVIC_PriorityGroupConfig(), passing in NVIC_PriorityGroup_2. This sets the PRIGROUP value in the Application Interrupt and Reset Control Register (AIRCR) to 5, allocating 2 bits for group (preemption) priority and 2 bits for subpriority.
After doing this, I get an imprecise bus fault exception on execution, not immediately but very quickly thereafter. (More on where I suspect it occurs in a moment.) Since it's imprecise (BFSR.IMPRECISERR asserted), there's nothing of use in BFAR (BFSR.BFARVALID clear).
The STM32F family group implements 4 bits of priority. While I've not found this mentioned explicitly anywhere, it's apparently the most significant nybble of the priority. This assumption seems to be validated by the PRIGROUP table given in documentation (p. 134, STM32F10xxx/20xxx/21xxx/L1xxx Cortex-M3 Programming Manual (Doc 15491, Rev 5), sec. 4.4.5, Application interrupt and control register (SCB_AIRCR), Table 45, Priority grouping, p. 134).
In the ARM scheme, priority values comprise some number of group or preemption priority bits and some number of subpriority bits. Group priority bits are upper bits; subpriority are lower. The 3-bit AIRCR.PRIGROUP value controls how bit allocation for each are defined. PRIGROUP = 0 configures 7 bits of group priority and 1 bit of subpriority; PRIGROUP = 7 configures 0 bits of group priority and 8 bits of subpriority (thus priorities are all subpriority, and no preemption occurs for exceptions with settable priorities).
The reset value of AIRCR.PRIGROUP is defined to be 0.
For the STM32F10x, since only the upper 4 bits are implemented, it seems to follow that PRIGROUP = 0, 1, 2, 3 should all be equivalent, since they all correspond to >= 4 bits of group priority.
Given that assumption, I also tried calling NVIC_PriorityGroupConfig() with a value of NVIC_PriorityGroup_4, which corresponds to a PRIGROUP value of 3 (4 bits group priority, no subpriority).
This change also results in the bus fault exception.
Unfortunately, the STM32F103 is, I believe, r1p1, and so does not implement the Auxiliary Control Register (ACTLR; introduced in r2p0), so I can't try out the DISDEFWBUF bit (disables use of the write buffer during default memory map accesses, making all bus faults precise at the expense of some performance reduction).
I'm almost certain that the bus fault occurs in an ISR, and most likely in a UART ISR. I've set a breakpoint at a particular place in code, started the app, and had the bus fault before execution hit the breakpoint; however, if I step through code in the debugger, I can get to and past that breakpoint's location, and if I allow it to execute from there, I'll see the bus fault some small amount of time after I continue.
The next step will be to attempt to pin down what ISR is generating the bus fault, so that I can instrument it and/or attempt to catch its invocation and step through it.
So my questions are:
1) Anyone have any suggestions as to how to go about identifying the origin of imprecise bus fault exceptions more intelligently?
2) Why would setting PRIGROUP = 3 change the behavior of the system when PRIGROUP = 0 is the reset default? (PRIGROUP=0 means 7 bits group, 1 bit sub priority; PRIGROUP=3 means 4 bits group, 4 bits sub priority; STM32F10x only implements upper 4 bits of priority.)
Many, many thanks to everyone in advance for any insight or non-NULL pointers!
(And of course if I figure it out beforehand, I'll update this post with any information that might be useful to others encountering the same sort of scenario.)
Even if BFAR is not valid, you can still read other related registers within your bus-fault ISR:
void HardFault_Handler_C(unsigned int* hardfault_args)
{
printf("R0 = 0x%.8X\r\n",hardfault_args[0]);
printf("R1 = 0x%.8X\r\n",hardfault_args[1]);
printf("R2 = 0x%.8X\r\n",hardfault_args[2]);
printf("R3 = 0x%.8X\r\n",hardfault_args[3]);
printf("R12 = 0x%.8X\r\n",hardfault_args[4]);
printf("LR = 0x%.8X\r\n",hardfault_args[5]);
printf("PC = 0x%.8X\r\n",hardfault_args[6]);
printf("PSR = 0x%.8X\r\n",hardfault_args[7]);
printf("BFAR = 0x%.8X\r\n",*(unsigned int*)0xE000ED38);
printf("CFSR = 0x%.8X\r\n",*(unsigned int*)0xE000ED28);
printf("HFSR = 0x%.8X\r\n",*(unsigned int*)0xE000ED2C);
printf("DFSR = 0x%.8X\r\n",*(unsigned int*)0xE000ED30);
printf("AFSR = 0x%.8X\r\n",*(unsigned int*)0xE000ED3C);
printf("SHCSR = 0x%.8X\r\n",SCB->SHCSR);
while (1);
}
If you can't use printf at the point in the execution when this specific Hard-Fault interrupt occurs, then save all the above data in a global buffer instead, so you can view it after reaching the while (1).
Here is the complete description of how to connect this ISR to the interrupt vector (although, as I understand from your question, you already have it implemented):
Jumping from one firmware to another in MCU internal FLASH
You might be able to find additional information on top of what you already know at:
http://www.keil.com/appnotes/files/apnt209.pdf

Why doesn't RemoveAllCacheResponses completely clear the disk cache?

I have an application that uses the Google Map Javascript API v3 and a UIWebView to display a map onscreen. While on this map I can use the app to collect multiple points of GPS data to represent a line.
After collecting 1460-1480 points the app quits unexpectedly (pinch zooming on the map makes the app quit before the 1400+ threshold is reached). It appears to be a memory issue (my app is the blue wedge of the pie chart).
The map screen does receive multiple memory warnings that are handled in an overridden DidReceiveMemoryWarning method in this screen. There is some prior code that called NSUrlCache.SharedCache.RemoveAllCachedResponses.
public override void DidReceiveMemoryWarning()
{
// BEFORE
uint diskUsage = NSUrlCache.SharedCache.CurrentDiskUsage;
uint memUsage = NSUrlCache.SharedCache.CurrentMemoryUsage;
int points = _currentEntityManager.GeometryPointCount;
Console.WriteLine(string.Format("BEFORE - diskUsage = {0}, memUsage = {1}, points = {2}", diskUsage, memUsage, points));
NSUrlCache.SharedCache.RemoveAllCachedResponses();
// AFTER
diskUsage = NSUrlCache.SharedCache.CurrentDiskUsage;
memUsage = NSUrlCache.SharedCache.CurrentMemoryUsage;
points = _currentEntityManager.GeometryPointCount;
Console.WriteLine(string.Format("AFTER - diskUsage = {0}, memUsage = {1}, points = {2}", diskUsage, memUsage, points));
base.DidReceiveMemoryWarning();
}
I added the BEFORE and AFTER sections so I could track cache contents before and after RemoveAllCachedResponses is called.
The shared cache is configured when the application starts (prior to my working on this issue it was not being configured at all).
uint cacheSizeMemory = 1024 * 1024 * 4;
uint cacheSizeDisk = 1024 * 1024 * 32;
NSUrlCache sharedCache = new NSUrlCache(cacheSizeMemory, cacheSizeDisk, "");
NSUrlCache.SharedCache = sharedCache;
When we're on this screen collection point data and we receive a low memory warning, RemoveAllCachedResponses is called and the Before/After statistics are printed to the console. Here are the numbers for the first low memory warning we receive.
BEFORE - diskUsage = 2258864, memUsage = 605032, points = 1174
AFTER - diskUsage = 1531904, memUsage = 0, points = 1174
Which is what I would expect to happen - flushing the cache reduces disk and memory usage (though I would expect the the disk usage number to also go to zero).
All subsequent calls to RemoveAllCachedResponses display these statistics (this Before/After is immediately prior to the app crashing).
BEFORE - diskUsage = 1531904, memUsage = 0, points = 1471
AFTER - diskUsage = 1531904, memUsage = 0, points = 1471
This leads me to believe one of two things - 1. RemoveAllCachedResponses is not working (unlikely) or 2. there's something in the disk cache that can't be removed because it's currently in use, something like the current set of map tiles.
Regarding #2, I'd like to believe this, figuring the reduction in disk usage on the first call represents a set of tiles that were no longer being used because of a pinch zoom in, but no pinch zooming or panning at all was done on this map, i.e. only one set of initial tiles should have been downloaded and cached.
Also, we are loading the Google Map Javascript API file as local HTML so it could be that this file is what's remaining resident in the cache. But the file is only 18,192 bytes which doesn't jib with the 1,531,904 bytes remaining in the disk cache.
I should also mention that the Android version of this app (written with Xamarin.Android) has no such memory issue on its map screen - it is possible to collect 5500+ points without incident).
So why does the disk cache not go to zero when cleared?
Thanks in advance.
I am confused about this too. You can take a look at the question I asked.
removeAllCachedResponses can not clear sharedURLCache?
The comrade said "it has some additional overhead in the cache and it's not related to real network data."