Using alternative LAPACK driver in numpy's svd method? - exception

I'm using numpy.svd to compute singular value decompositions of badly conditioned matrices. For some special cases the svd won't converge and raise a Linalg.Error. I've done some research and found that numpy uses the DGESDD routine from LAPACK. The standard implementation has a hardcoded iteration limit of 35 or something iterations. If I try to decompose the same matrix in Matlab, everything works fine, and I think there's two reasons for that:
1. Matlab uses DGESVD instead of DGESDD which in general seems to be more robust.
2. Matlab uses an iteration limit of 75 in the routine. (They changed it in the source and recompiled it.)
Now the question is: Is there a simple way to change the used backend in numpy from DGESDD to DGESVD without having to modify the numpy source ?
Thanks in advance
Mischa

What worked for me was to only compute the "economy size" SVD of that matrix X:
U,S,V = np.linalg.svd(X, full_matrices=False)

I'm a little late, but maybe this will help someone else...
I had a similar problem in julia.
I found this approach from the R help list, which should work for any environment using the lapack library:
Basically, if svd(M) fails, try svd(M'), and swap the resulting U,V appropriately.
Here's how I'm doing it in julia:
try
U,S,V = svd( E_restricted )
failed = false
catch
failed = true
end
if failed
# try it with matrix transposed
try
V,S,U = svd( E_restricted' )
failed = false
catch
failed = true
end
end
if failed
error("ERROR: svd(E) and svd(E') failed!")
end

Related

How to add to / amend / consolidate JRuby Profiler data?

Say I have inside my JRuby program the following loop:
loop do
x=foo()
break if x
bar()
end
and I want to collect profiling information just for the invocations of bar. How to do this? I got so far:
pd = []
loop do
x=foo()
break if x
pd << JRuby::Profiler.profile { bar() }
end
This leaves me with an array pd of profile data objects, one for each invocation of bar. Is there a way to create a "summary" data object, by combining all the pd elements? Or even better, have a single object, where profile would just add to the existing profiling information?
I googled for a documentation of the JRuby::Profiler API, but couldn't find anything except a few simple examples, none of them covering my case.
UPDATE : Here is another attempt I tried, which does not work either.
Since the profile method initially clears the profile data inside the Profiler, I tried to separate the profiling steps from the data initializing steps, like this:
JRuby::Profiler.clear
loop do
x=foo()
break if x
JRuby::Profiler.send(:current_thread_context).start_profiling
bar()
JRuby::Profiler.send(:current_thread_context).stop_profiling
end
profile_data = JRuby::Profiler.send(:profile_data)
This seems to work at first, but after investigation, I found that profile_data then contains the profiling information from the last (most recent) execution of bar, not of all executions collected together.
I figured out a solution, though I have the feeling that I'm using a ton of undocumented features to get it working. I also must add that I am using (1.7.27), so later JRuby versions might or might not need a different approach.
The problem with profiling is that start_profiling (corresponding to the Java method startProfiling in the class Java::OrgJrubyRuntime::ThreadContext) not only turns on the profiling flag, but also allocates a fresh ProfileData object. What we want to do, is to reuse the old object. stop_profiling OTOH only toggles the profiling switch and is uncritical.
Unfortunately, ThreadContext does not provide a method to manipulate the isProfiling toggle, so as a first step, we have to add one:
class Java::OrgJrubyRuntime::ThreadContext
field_writer :isProfiling
end
With this, we can set/reset the internal isProfiling switch. Now my loop becomes:
context = JRuby::Profiler.send(:current_thread_context)
JRuby::Profiler.clear
profile_data_is_allocated = nil
loop do
x=foo()
break if x
# The first time, we allocate the profile data
profile_data_is_allocated ||= context.start_profiling
context.isProfiling = true
bar()
context.isProfiling = false
end
profile_data = JRuby::Profiler.send(:profile_data)
In this solution, I tried to keep as close as possible to the capabilities of the JRuby::Profiler class, but we see, that the only public method still used is the clear method. Basically, I have reimplemented profiling in terms of the ThreadContext class; so if someone comes up with a better way to solve it, I will highly appreciate it.

Chisel randomly initialize register value when simulating with verilator

I'm using Chisel and blackbox to run my chisel logic against a verilog register file.
The registerfile does not have reset signal so I expect the register to be randomly initialized.
I passed the --x-initial unique to verilator,
Basically this is how I launch the test:
private val backendName = "verilator"
"NOCDMA" should s" do blkwrite and blkread correctly (with $backendName)" in {
Driver.execute(Array("--fint-write-vcd","--backend-name",s"$backendName",
"--more-vcs-flags","--trace-depth 1 --x-initial unique"),
()=>new DMANetworkWithMem(memAddrWidth,memDataWidth)(nocDataWidth)(nNodesX,nNodesY)){
c => new DMANetworkRWTest(c)
}
}
But The data I read from the register file is all zero before I wrote anything to it.
The read data is correct after I wrote to it.
So, is there anything inside chisel that I need to tune or I did not do everything properly ?
Any suggestions?
I'm not certain, but I found the following issue on Verilator with a similar issue: https://github.com/verilator/verilator/issues/1399.
From skimming the above issue, I think you also need to pass +verilator+seed+<value> and +verilator+rand+reset+<value> at runtime. I am not an expert in the iotesters, but I believe you can add these runtime values through the iotesters argument: --more-vcs-c-flags.
Side note, I would also set --x-assign unique in Verilator if there are cases in the Verilog where runtime would otherwise inject an X (eg. out-of-bounds index).
I hope this helps!

sympy autowrap (cython): limit of # of arguments, arguments in array form?

I have the following issue:
I want to use autowrap to generate a compiled version of a sympy matrix, with cells containing sympy expressions. Depending on the specification of my problem, the number of arguments can get very large.
I ran into the following 2 issues:
The number of arguments that autowrap accepts seems to be limited to 509.
i.e., this works:
import sympy
from sympy.utilities.autowrap import autowrap
x = sympy.symbols("x:509")
exp = sum(x)
cyt = autowrap(exp, backend="cython", args=x)
and this fails to compile:
x = sympy.symbols("x:510")
exp = sum(x)
cyt = autowrap(exp, backend="cython", args=x)
The message I get seems not very telling:
[...] (Full output upon request)
Generating code
c:\users\[classified]\appdata\local\temp\tmp2zer8vfe_sympy_compile\wrapper_module_17.c(6293) : fatal error C1001: An internal error has occurred in the compiler.
(compiler file 'f:\dd\vctools\compiler\utc\src\p2\hash.c', line 884)
To work around this problem, try simplifying or changing the program near the locations listed above.
Please choose the Technical Support command on the Visual C++
Help menu, or open the Technical Support help file for more information
LINK : fatal error LNK1257: code generation failed
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN\\x86_amd64\\link.exe' failed with exit status 1257
Is there any way around this? I would like to use versions of my program that need ~1000 input variables.
(I have no understanding of C/cython. Is this an autowrap limitation, a C limitation ...?)
Partly connected to the above:
Can one compile functions that accept the arguments as array.
Is there any way to generate code that accepts a numpy array as input? I specifically mean one array for all the arguments, instead of providing the arguments as list. (Similar to lambdify using a DeferredVector). ufuncify supports array input, but as I understand only for broadcasting/vectorizing the function.
I would hope that an array as argument could circumvent the first problem above, which is most pressing for me. Apart from that, I would prefer array input anyways, both because it seems faster (no need to unpack the numpy array I have as input into a list), and also more straightforward and natural.
Does anyone have any suggestions what I can do?
Also, could anyone tell me whether f2py has similar limitations? This would also be an option for me if feasible, but I don't have it set up to work currently, and would prefer to know whether it helps at all before investing the time.
Thanks!
Edit:
I played around a bit with the different candidates for telling autowrap that the input argument will be something in array form, rather than a list of numbers. I'll document my steps here for posterity, and also to increase chances to get some input:
sympy.DeferredVector
Is what I use with lambdify for the same purpose, so I thought to give it a try. However, warning:
A = sympy.DeferredVector("A")
expression = A[0]+A[1]
cyt = autowrap(expression, backend="cython", args=A)
just completely crashed my OS - last statement started executing, (no feedback), everything got really slow, then no more reactions. (Can only speculate, perhaps it has to do with the fact that A has no shape information, which does not seem to bother lambdify, but might be a problem here. Anyways, seems not the right way to go.)
All sorts of array-type objects filled with the symbols in the expression to be wrapped.
e.g.
x0 ,x1 = sympy.symbols("x:2")
expression = x0 + x1
cyt = autowrap(expression, backend="cython", args=np.array([x0,x1]))
Still wants unpacked arguments. Replacing the last row by
cyt = autowrap(expression, backend="cython", args=[np.array([x0,x1])])
Gives the message
CodeGenArgumentListError: ("Argument list didn't specify: x0, x1 ", [InputArgument(x0), InputArgument(x1)])
Which is a recurrent theme to this approach: also happens when using a sympy matrix, a tuple, and so on inside the arguments list.
sympy.IndexedBase
This is actually used in the autowrap examples; however, in a (to me) inintuitive way, using an equation as the expression to be wrapped. Also, the way it is used there seems not really feasible to me: The expression I want to cythonize is a matrix, but its cells are themselves longish expressions, which I cannot obtain via index operations.
The upside is that I got a minimal example to work:
X = sympy.IndexedBase("X",shape=(1,1))
expression = 2*X[0,0]
cyt = autowrap(expression, backend="cython", args=[X])
actually compiles, and the resulting function correctly evaluates - when passed a 2d-np.array.
So this seems the most promising avenue, even though further extensions to this approach I keep trying fail.
For example this
X = sympy.IndexedBase("X",shape=(1,))
expression = 2*X[0]
cyt = autowrap(expression, backend="cython", args=[X])
gets me
[...]\site-packages\sympy\printing\codeprinter.py", line 258, in _get_expression_indices " rhs indices in %s" % expr)
ValueError: lhs indices must match non-dummy rhs indices in 2*X[0]
even though I don't see how it should be different from the working one above.
Same error message when sticking to two dimensions, but increasing the size of X:
X = sympy.IndexedBase("X",shape=(2,2))
expression = 2*X[0,0]+X[0,1]+X[1,0]+X[1,1]
cyt = autowrap(expression, backend="cython", args=[X])
ValueError: lhs indices must match non-dummy rhs indices in 2*X[0, 0] + X[0, 1] + X[1, 0] + X[1, 1]
I tried snooping around the code for autowrap, but I feel a bit lost there...
So I'm still searching for a solution and happy for any input.
Passing the argument as an array seems to work OK
x = sympy.MatrixSymbol('x', 520, 1)
exp = 0
for i in range(x.shape[0]):
exp += x[i]
cyt = autowrap(exp, backend='cython')
arr = np.random.randn(520, 1)
cyt(arr)
Out[48]: -42.59735861021934
arr.sum()
Out[49]: -42.597358610219345

Python backtracking

I have a basic problem in Python where I have to verify if my backtracking code found some solutions (I have to find all sublists of 1 to n numbers with property |x[i] - x[i-1]| == m). How do I check if there is some solution? I mean the potentially solutions I find, I just print them and not save them into memory. I have to print a proper message if there is no solutions.
As I suggested in comment, you might want to dissociate computing from I/O printing, by creating a generator of your solutions of |x[i] - x[i-1]| == m
Let's assume you defined a generator for yielding your solutions:
def mysolutions(...):
....
# Something with 'yield', or maybe not.
....
Here is a generator decorator that you can use to check if an implemented generator has a value
from itertools import chain
def checkGen(generator, doubt):
"""Decorator used to check that we have data in the generator."""
try:
first = next(generator)
except StopIteration:
raise RuntimeError("I had no value!")
return chain([first], generator)
Using this decorator, you can now define your previous solution with :
#checkGen
def mysolutions(...):
....
Then, you can simply use it as is, for dissociating your I/O:
try:
for solution in mysolutions(...):
print(solution) #Probably needs some formatting
except RuntimeError:
print("I found no value (or there was some error somewhere...)")

CUDA FORTRAN: function gives different answer if I pass variable instead of number

I'm trying to use the ISHFT() function to bitshift some 32-bit integers in parallel, using CUDA FORTRAN.
The problem is that I get different answers to ISHFT(-4,-1) and ISHFT(var,-1) even though var = -4. This is the test code I've written:
module testshift
integer :: test
integer, device :: d_test
contains
attributes(global) subroutine testshft ()
integer :: var
var = -4
d_test = ISHFT(var,-1)
end subroutine testshft
end module testshift
program foo
use testshift
integer :: i
call testshft<<<1,1>>>() ! carry out ishft on gpu
test = d_test ! copy device result to host
i = ISHFT(-4,-1) ! carry out ishft on cpu
print *, i, test ! print the results
end program foo
I then compile and execute:
pgf90 testishft.f90 -Mcuda
./a.out
2147483646 -2
Both should be 2147483646 if working correctly. I get the right answer if I replace var with 4.
How do I fix this problem?
Thanks for the help
When I remove the GPU-specific code from the above program I get 2147483646 2147483646 from the g95 compiler, as you expect. Have you tried running a "scalar" version of the program using the pgf90 compiler? If the scalar version works but the GPU version does not, that helps to isolate the problem. If the problem is pgf90/CUDA specific, perhaps the best place to ask your question is
PGI User Forum Forum Index -> Programming and Compiling
http://www.pgroup.com/userforum/viewforum.php?f=4 .
I've found a workaround, which is posted in this forum:
http://www.pgroup.com/userforum/viewtopic.php?t=2455&postdays=0&postorder=asc&start=15
Instead of using ISHFT I use IBITS, which is described here: http://gcc.gnu.org/onlinedocs/gfortran/IBITS.html
Also the problem has since been fixed in version 11.3 of the PGI compiler
http://www.pgroup.com/support/release_tprs_2011.htm