clarify some things about culasparse

clarify some things about culasparse - cuda

Checking this example (API example at the end), I want to ask a few questions.
1) In the example we are supplying matrix a with non zero elements.What is the real size of the matrix though?And these are the elements of the matrix or the positions that contain non zero elements?
2) Can I use at the calculations (use in a function like culaSparseSetDcooData) a matrix A which will contain zero and non zero elements?
If I want to create a sample matrix just to test ,should I have to create a matrix with zero elements,then fill it with some elements and then?

Regarding 1) Interestingly, the size of the matrix in COO format is not explicitly specified: It consists of coordinates of the non-zero elements of the matrix. If you have a COO matrix with 1 non-zero element, then this could be
double a[1] = { 1.0 };
int colInd[1] = { 10 };
int rowInd[1] = { 20 };
and (as you can tell from the row/column indices) describe elements of a matrix that has at least size 11*21, or it could be
double a[1] = { 1.0 };
int colInd[1] = { 1000 };
int rowInd[1] = { 2000 };
and describe elements of a matrix that has at least size 1001*2001
However, in this example, it seems like this is a quadratic matrix, and n=8 seems to be the size. (Unfortunately, there seems to be no detailed documentation of the culaSparseSetDcooData function...)
Regarding 2) This is not entirely clear. If your question is whether the "non-zero" values may (in reality) have a value of 0.0, then I can say: Yes, this should be allowed. However, the example that you referred to already shows how to create a simple test matrix.

Related

How to interpret FFT data for making a spectrum visualizer

I am trying to visualize a spectrum where the frequency range is divided into N bars, either linearly or logarithmic. The FFT seems to work fine, but I am not sure how to interpret the values in order to decide the max height for the visualization.
I am using FMODAudio, a wrapper for C#. It's set up correctly.
In the case of a linear spectrum, the bars are defined as following:
public int InitializeSpectrum(int windowSize = 1024, int maxBars = 16)
{
numSamplesPerBar_Linear.Clear();
int barSamples = (windowSize / 2) / maxBars;
for (int i = 0; i < maxBars; ++i)
{
numSamplesPerBar_Linear.Add(barSamples);
}
IsInitialized = true;
Data = new float[numSamplesPerBar_Linear.Count];
return numSamplesPerBar_Linear.Count;
}
Data is the array which holds the spectrum values received from the update loop.
The update looks like this:
public unsafe void UpdateSpectrum(ref ParameterFFT* fftData)
{
int length = fftData->Length / 2;
if (length > 0)
{
int indexFFT = 0;
for (int index = 0; index < numSamplesPerBar_Linear.Count; ++index)
{
for (int frec = 0; frec < numSamplesPerBar_Linear[index]; ++frec)
{
for (int channel = 0; channel < fftData->ChannelCount; ++channel)
{
var floatspectrum = fftData->GetSpectrum(channel); //this is a readonlyspan<float> by default.
Data[index] += floatspectrum[indexFFT];
}
++indexFFT;
}
Data[index] /= (float)(numSamplesPerBar_Linear[index] * fftData->ChannelCount); // average of both channels for more meaningful values.
}
}
}
The values I get when testing a song are very low across the bands.
A randomly chosen moment when playing a song gives these values:
16 bars = 0,0326 0,0031 0,001 0,0003 0,0004 0,0003 0,0001 0,0002 0,0001 0,0001 0,0001 0 0 0 0 0
I realize it's more useful to use a logarithmic spectrum in many cases, and I intend to, but I still need to figure how how to find the max values for each bar so that I can setup the visualization on a proper scale.
Q: How can I know the potential max values for each bar based on this setup (it's not 1.0)?

output from FFT call is an array where each element is a complex number ( A + Bi ) where A is the real number component and B the imaginary number component ... element zero of this array represents frequency zero as in DC which is the offset bias can typically be ignored ... as you iterate across each element of this array you increment the frequency ... this freq increment is calculated using
Audio_samples <-- array of raw audio samples in PCM format which gets
fed into FFT call
num_fft_bins := float64(len(Audio_samples)) / 2.0 // using Nyquist theorem
freq_incr_per_bin := (input_audio_sample_rate / 2.0) / num_fft_bins
so to answer your question the output array from FFT call is a linear progression evenly spaced based in above freq increment constant

Depends on your input data to the FFT, and the scaling that your particular FFT implementation uses (not all FFTs use the same scale factor).
With an energy preserving forward-FFT, Parseval's theorem applies. So the energy (sum of squares) of the input vector equals the energy of the FFT result vector. Note that for a single integer periodic in aperture sinusoidal input (a pure tone), all that energy can appear in a single FFT result element. So if you know the maximum possible input energy, you can use that to compute the maximum possible result element magnitude for scaling purposes.
The range is often large enough that visualizers commonly need to use log scaling, or else typical input can get pixel quantized to a graph of all zeros.

(cudaBindTexture2D) How to bind a pitched-array from the middle

I am trying to bind a pitched array from the middle partly (not from the beginning of the array), like followings.
/1. allocate/
cudaMallocPitch((void**)&d_texinput, &FloatPitch, cols*sizeof(float), rows);
cudaMallocPitch((void**)&d_output, &FloatPitch, cols*sizeof(float), rows);
/2. set row-length of target region (i.e., dividing rows 10 times)/
int row_div_times = 10;
int part_rows = rows / row_div_times;
int part_offset = part_rows*FloatPitch/sizeof(float);
dim3 threads(16,16);
dim3 Part_Blocks((cols + threads.x - 1) / threads.x, (Part_rows + threads.y - 1) / threads.y);
/3. processing divided rows, iteratively/
for (int i = 0; i < row_div_times; i++)
{
size_t offsetsize= i*part_offset;
/*computing values of "d_tex_input"*/
calibration << <Part_Blocks, threads, 0, stream[i] >> >
(d_texinput + i*part_offset );
/*
//###(QUESTION point!) I want to bind the device memory "d_texinput" to texture "tex_mem" only partly like below.
cudaBindTexture2D(0, tex_mem, &d_texinput[i*part_offset], channelDesc_flt, cols, Part_rows, FloatPitch); //tentative code a;
,,, or something like,,,
cudaBindTexture2D(&offsetsize, tex_mem, &d_texinput, channelDesc_flt, cols, Part_rows, FloatPitch); //tentative code b;
*/
//final computaion with texture
final_computationwithtexture << <Part_Blocks, threads, 0, stream[i] >> >
( d_output + i*part_offset );
cudaUnbindTexture(tex_mem);
}
Please kindly allow me to ask your instruction, advice how to bind the target region of the device memory array partly by revising above( QUESTION point!)?
I tried to understand first argument of cudaBindTExture2D as "offset". but it is not value. it is address. according to the documentation.
i still could not understand the documentation.
I hope I can understand what that is by knowing adequate inputting way to the cudaBindTexture2D.

The offset parameter is not an input, it is an output. That's why it is a pointer. The function will set the offset in bytes. If you want to bind in the middle of an allocation, you set the devPtr argument (third) appropriately and then the function will give you the offset required for texture accesses.
Here is how to understand this: Textures can only be bound with a certain alignment. Memory allocations are always properly aligned. Therefore it is not an issue in most cases. However, if you provide an arbitrary memory address, CUDA has to round down to the alignment and you have to apply the proper offset later on.
Let's say you bind &float[66], the proper alignment might be &float[64], so CUDA starts its texture at that offset and you have to add an offset of 8 bytes for each access to get the desired result. I'm picking random numbers here, I don't know the alignment requirements.

How to calculate separable convolutions in the Xception paper?

I read the Xception paper (there's even a Keras Model for the describe NN) and it talks about separable convolutions.
I was trying to understand how exactly they are calculated. Rather than leaving it to imprecise words, I have included the piece of pseudo-code below that summarizes my understanding. The code maps from a feature map 18x18x728 to a 18x18x1024 one :
XSIZE = 18;
YSIZE = 18;
ZSIZE = 728;
ZSIXE2 = 1024;
float mapin[XSIZE][YSIZE][ZSIZE]; // Input map
float imap[XSIZE][YSIZE][ZSIZE2]; // Intermediate map
float mapout[XSIZE][YSIZE][ZSIZE2]; // Output map
float wz[ZSIZE][ZSIZE2]; // Weights for 1x1 convs
float wxy[3][3][ZSIZE2]; // Weights for 3x3 convs
// Apply 1x1 convs
for(y=0;y<YSIZE;y++)
for(x=0;x<XSIZE;x++)
for(o=0;o<ZSIZE2;o++){
s=0.0;
for(z=0;z<ZSIZE;z++)
s+=mapin[x][y][z]*wz[z][o];
imap[x][y][o]=s;
}
// Apply 2D 3x3 convs
for(o=0;o<ZSIZE2;o++)
for(y=0y<YSIZE;y++)
for(x=0;x<XSIZE;x++){
s=0.0;
for(i=-1;i<2;i++)
for(j=-1;j<2;j++)
s+=imap[x+j][y+i][o]*wxy[j+1][i+1][o]; // This value is 0 if falls off the edge
mapout[x][y][o]=s;
}
Is this correct ? If not, can you suggest fixes similarly written in C or pseudo-C ?
Thank you very much in advance.

I found tf.nn.separable_conv2d in Tensorflow that does exactly this. So I built a very simple graph and, with the help of random numbers, I tried to get the code above to match the result. The correct code is :
XSIZE = 18;
YSIZE = 18;
ZSIZE = 728;
ZSIXE2 = 1024;
float mapin[XSIZE][YSIZE][ZSIZE]; // Input map
float imap[XSIZE][YSIZE][ZSIZE]; // Intermediate map
float mapout[XSIZE][YSIZE][ZSIZE2]; // Output map
float wxy[3][3][ZSIZE]; // Weights for 3x3 convs
float wz[ZSIZE][ZSIZE2]; // Weights for 1x1 convs
// Apply 2D 3x3 convs
for(o=0;o<ZSIZE;o++)
for(y=0y<YSIZE;y++)
for(x=0;x<XSIZE;x++){
s=0.0;
for(i=-1;i<2;i++)
for(j=-1;j<2;j++)
s+=mapin[x+j][y+i][o]*wxy[j+1][i+1][o]; // This value is 0 if falls off the edge
imap[x][y][o]=s;
}
// Apply 1x1 convs
for(y=0;y<YSIZE;y++)
for(x=0;x<XSIZE;x++)
for(o=0;o<ZSIZE2;o++){
s=0.0;
for(z=0;z<ZSIZE;z++)
s+=imap[x][y][z]*wz[z][o];
mapout[x][y][o]=s;
}
The main difference is in the order that the two groups of convolutions are performed.
To my surprise, the order is important even when ZSIZE==ZSIZE2.

Does a programming language with the following features exist?

Is there a language which will support the following concept or is there a pattern to achieve something similar with existing one?
Concept
I want to define a Rectangle with the following properties: Length, Height, Area, Perimeter; where Area = Length * Height and Perimeter = (2 * Length) + (2 * Height).
Given the statement above, if I want to create a Rectangle by giving it a Length and a Height, it should of course automatically fill out the rest of the properties.
However, it should go further and automatically allow you to create a Rectangle with any two properties (say Height and Perimeter) because that is also mathematically enough to create the same Rectangle.
Example
To help explain the idea, take this example:
//Declaration
Rectangle
{
Height, Length, Area, Perimeter;
Area = Height * Length;
Perimeter = (2 * Length) + (2 * Height);
}
//Usage
main()
{
var rectangleA = new Rectangle(Height, Length);
var rectangleB = new Rectangle(Height, Area);
Assert(rectangleA == rectangleB);
}
Notice how I didn't need to define constructors for Rectangle. Notice I did not need specify the specific logic needed if a Rectangle was created using Height and Area.
Edit: Should be rectangle and not a square for a proper example.

What you are looking for is a language with an integrated computer algebra system. It has to be able to resolve equations with respect to different variables.
While it would be possible to implement something like this, I doubt that it would make sense because in many cases there will be either no solution or multiple solutions.
Even your simple example will not work if only area and perimeter are given because there will usually be two solutions. (I assume that your class actually represents a rectangle and not a square, otherwise you should not have separate variables for length and height.)
Example:
Input: area = 2, perimeter = 6
Solution 1: length = 2, height = 1
Solution 2: length = 1, height = 2
Another remark not really related to your question: Your class obviously contains redundant member variables. This is a bad thing for various reasons, the most important being the possibility of inconsistencies. Unless you have very strict performance constraints, you should store only two of them, say length and width, and provide methods to calculate the others when needed.

Of course such a language exists. Many do, as you've now pointed out in your own comment to this answer.
In the example below I'll be using the Powerloom representation system, implemented in a language called STELLA.
You can play with it from within a Common Lisp environment.
Once you have everything installed you can load the language by running:
(cl:load "load-powerloom.lisp")
(in-package "STELLA")
(in-dialect "KIF")
That's about all you need to start building awesome geometrical objects.
Within STELLA you may define a concept with the primitive defconcept:
(defconcept Rectangle (?r)
:documentation "Curious geometrical objects that live on a plane.")
And define its properties with deffunction:
(deffunction rect-height ((?t Rectangle)) :-> (?n INTEGER))
(deffunction rect-length ((?t Rectangle)) :-> (?n INTEGER))
(deffunction area ((?t Rectangle)) :-> (?n INTEGER))
(deffunction perimeter ((?t Rectangle)) :-> (?n INTEGER))
To make the relations between area, perimeter and the sides of your rectangle, you'll have to make some assertions. That's what you'll have assert for.
(assert (forall (?t Rectangle)
(= (area ?t) (* (rect-height ?t) (rect-length ?t)))))
(assert (forall (?t Rectangle)
(= (perimeter ?t) (+ (* 2 (rect-height ?t))
(* 2 (rect-length ?t))))))
You are telling STELLA that for all rectangles, the area is the product of height and length, and that for all rectangles, the perimeter is twice the height plus twice the length.
Now you can instantiate your objects, and it doesn't matter what properties you give it, as long as they make sense.
(definstance rect1 :Rectangle true :rect-height 10 :rect-length 10)
(definstance rect2 :Rectangle true :area 40 :rect-height 20)
Here you instantiate rect1 with height and length as parameters, and rect2 with area and height.
But its always good to check that the language is doing what you expect:
STELLA> (retrieve all ?x (= (area rect1) ?x))
There is 1 solution:
#1: ?X=100
STELLA> (retrieve all ?x (= (rect-length rect2) ?x))
There is 1 solution:
#1: ?X=2
If you ever get tired of rectangles and decide to build a beautiful square, why not derive a concept?
(defconcept Square ((?r Rectangle))
:documentation "Weird rectangles that fascinated the Greeks"
:<=> (= (rect-height ?r) (rect-length ?r)))
Simply tell STELLA that squares are rectangles where height and length are equal.
Now try it out:
STELLA> (definstance nice-rectangle :Rectangle true :rect-length 10 :area 100)
|i|NICE-RECTANGLE
STELLA> (ask (Square nice-rectangle))
TRUE
I'm not an expert at all, but I find the language fascinating. It's sad that there is so little information about it on the internet. Even the manual is incomplete.
For more information I'd suggest starting with these slides.
The famous book SICP teaches how to build a nondeterministic evaluator for such a language here.
And finally, a wonderful write up describing motivations and applications behind these ideas can be seen here.

In C#, you can use properties, which have implicit getters and setters. That way you can write something like:
public class Square {
public int Length {
get { return length; }
set { length = value; }
}
public int Area {
get { return length * length; }
set { length = Math.Sqrt(value); }
}
public int Perimeter {
get { return length * 4; }
set { length = value / 4; }
}
private int length;
}
Now you can write:
Square square = new Square();
square.Length = 2;
Console.WriteLine(square.Length); // "2"
Console.WriteLine(square.Area); // "4"
Console.WriteLine(square.Perimeter); // "8"
square.Area = 9;
Console.WriteLine(square.Length); // "3"
Console.WriteLine(square.Area); // "9"
Console.WriteLine(square.Perimeter); // "12"
Edit:
C# also allows you name properties at your choosing when instantiating an object:
Square square1 = new Square { Perimeter = 12 };
Square square2 = new Square { Length = 4 };

I don't think something like this does exist in the form of a programming language.
Ontology
However the first approach I can think about is defining an Ontology, I mean a set of rules about
Entities: Rectangle, Square, Dog, Car, etc...
Attributes: Area, Height, Number of Wheels, etc...
Relations between (1) and (2): Rectangle's Area is Height * Width, ...
Now given a list of attributes and the required output Entity
I have height and width and I need a Rectangle
the system could search for a path through the rules graph to produce the required outcome based on the provided inputs.
Real world example
Wolfram Alpha probably follows the technique described above

Finding Median WITHOUT Data Structures

(my code is written in Java but the question is agnostic; I'm just looking for an algorithm idea)
So here's the problem: I made a method that simply finds the median of a data set (given in the form of an array). Here's the implementation:
public static double getMedian(int[] numset) {
ArrayList<Integer> anumset = new ArrayList<Integer>();
for(int num : numset) {
anumset.add(num);
}
anumset.sort(null);
if(anumset.size() % 2 == 0) {
return anumset.get(anumset.size() / 2);
} else {
return (anumset.get(anumset.size() / 2)
+ anumset.get((anumset.size() / 2) + 1)) / 2;
}
}
A teacher in the school that I go to then challenged me to write a method to find the median again, but without using any data structures. This includes anything that can hold more than one value, so that includes Strings, any forms of arrays, etc. I spent a long while trying to even conceive of an idea, and I was stumped. Any ideas?

The usual algorithm for the task is Hoare's Select algorithm. This is pretty much like a quicksort, except that in quicksort you recursively sort both halves after partitioning, but for select you only do a recursive call in the partition that contains the item of interest.
For example, let's consider an input like this in which we're going to find the fourth element:
[ 7, 1, 17, 21, 3, 12, 0, 5 ]
We'll arbitrarily use the first element (7) as our pivot. We initially split it like (with the pivot marked with a *:
[ 1, 3, 0, 5, ] *7, [ 17, 21, 12]
We're looking for the fourth element, and 7 is the fifth element, so we then partition (only) the left side. We'll again use the first element as our pivot, giving (using { and } to mark the part of the input we're now just ignoring).
[ 0 ] 1 [ 3, 5 ] { 7, 17, 21, 12 }
1 has ended up as the second element, so we need to partition the items to its right (3 and 5):
{0, 1} 3 [5] {7, 17, 21, 12}
Using 3 as the pivot element, we end up with nothing to the left, and 5 to the right. 3 is the third element, so we need to look to its right. That's only one element, so that (5) is our median.
By ignoring the unused side, this reduces the complexity from O(n log n) for sorting to only O(N) [though I'm abusing the notation a bit--in this case we're dealing with expected behavior, not worst case, as big-O normally does].
There's also a median of medians algorithm if you want to assure good behavior (at the expense of being somewhat slower on average).
This gives guaranteed O(N) complexity.

Sort the array in place. Take the element in the middle of the array as you're already doing. No additional storage needed.
That'll take n log n time or so in Java. Best possible time is linear (you've got to inspect every element at least once to ensure you get the right answer). For pedagogical purposes, the additional complexity reduction isn't worthwhile.
If you can't modify the array in place, you have to trade significant additional time complexity to avoid avoid using additional storage proportional to half the input's size. (If you're willing to accept approximations, that's not the case.)

Some not very efficient ideas:
For each value in the array, make a pass through the array counting the number of values lower than the current value. If that count is "half" the length of the array, you have the median. O(n^2) (Requires some thought to figure out how to handle duplicates of the median value.)
You can improve the performance somewhat by keeping track of the min and max values so far. For example, if you've already determined that 50 is too high to be the median, then you can skip the counting pass through the array for every value that's greater than or equal to 50. Similarly, if you've already determined that 25 is too low, you can skip the counting pass for every value that's less than or equal to 25.
In C++:
int Median(const std::vector<int> &values) {
assert(!values.empty());
const std::size_t half = values.size() / 2;
int min = *std::min_element(values.begin(), values.end());
int max = *std::max_element(values.begin(), values.end());
for (auto candidate : values) {
if (min <= candidate && candidate <= max) {
const std::size_t count =
std::count_if(values.begin(), values.end(), [&](int x)
{ return x < candidate; });
if (count == half) return candidate;
else if (count > half) max = candidate;
else min = candidate;
}
}
return min + (max - min) / 2;
}
Terrible performance, but it uses no data structures and does not modify the input array.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

clarify some things about culasparse - cuda

Related

How to interpret FFT data for making a spectrum visualizer

(cudaBindTexture2D) How to bind a pitched-array from the middle

How to calculate separable convolutions in the Xception paper?

Does a programming language with the following features exist?

Finding Median WITHOUT Data Structures

Categories

Resources