RTMP_Write function use - publish

I'm trying to use the librtmp library and it worked pretty well to pull a stream. But now I am trying to publish a stream and for that I believe I have to use the RTMP_Write function.
What I am trying to accomplish here is a simple c++ program that will read from a file and try to push the stream to a crtmp server. The connection and stream creation is ok, but I'm quite puzzled by the use of RTMP_Write.
Here is what I did:
int Upload(RTMP * rtmp, FILE * file){
int nRead = 0;
unsigned int nWrite = 0;
int diff = 0;
int bufferSize = 64 * 1024;
int byteSum = 0;
int count = 0;
char * buffer;
buffer = (char *) malloc(bufferSize);
do{
nRead = fread(buffer+diff,1,bufferSize-diff,file);
if(nRead != bufferSize){
if(feof(file)){
RTMP_LogPrintf("End of file reached!\n");
break;
}else if(ferror(file)){
RTMP_LogPrintf("Error reading from file stream detected\n");
break;
}
}
count += 1;
byteSum += nRead;
RTMP_LogPrintf("Read %d from file, Sum: %d, Count: %d\n",nRead,byteSum,count);
nWrite = RTMP_Write(rtmp,buffer,nRead);
if(nWrite != nRead){
diff = nRead - nWrite;
memcpy(buffer,(const void*)(buffer+bufferSize-diff),diff);
}
}while(!RTMP_ctrlC && RTMP_IsConnected(rtmp) && !RTMP_IsTimedout(rtmp));
free(buffer);
return RD_SUCCESS;
}
In this Upload function I am receiving the already initiallized RTMP structure and a pointer to an open file.
This actually works and I can see some video being displayed, but it soon gets lost and stops sending packages. I managed to understand that it happens whenever the buffer that I setup (and which I randomly required to be 64k, no special reason for that) happens to split the flv tag (http://osflash.org/flv#flv_format) of a new package.
For that I modified the RTMP_Write function and told it to verify if it will be able to decode the whole flv tag (packet type, body size, timestamp, etc..) and if it will not, then it should just return the amount of useful bytes left in the buffer.
if(s2 - 11 <= 0){
rest = size - s2;
return rest;
}
The code above takes notice of this, and if the value returned by RTMP_Write is not the amount of bytes it was supposed to send, then it knows that value is the amount of useful bytes left in the buffer. I then copy these bytes to the beginning of the buffer and read more from the file.
But I keep getting problems with it, so I was wondering: what is the correct use of this function anyway? is there a specific buffer value that I should be using? (don't think so) or is it buggy by itself?

Related

Using I2C_master library AVR

I am using I2C_master library for AVR, Communication works fine, but I have little problem, how can I get data.
I am using this function
uint16_t i2c_2byte_readReg(uint8_t devaddr, uint16_t regaddr, uint8_t* data, uint16_t length){
devaddr += 1;
if (i2c_start(devaddr<<1|0)) return 1;
i2c_write(regaddr >> 8);
i2c_write(regaddr & 0xFF);
if (i2c_start(devaddr<<1| 1)) return 1;
for (uint16_t i = 0; i < (length-1); i++)
{
data[i] = i2c_read_ack();
}
data[(length-1)] = i2c_read_nack();
i2c_stop();
return 0;}
And now I need to use received data, and send it by UART to PC
uint8_t* DevId;
i2c_2byte_readReg(address,REVISION_CODE_DEVID,DevId,2);
deviceH=*DevId++;
deviceL=*DevId;
UART_send(deviceH);
UART_send(deviceL);
I think that I am lost with pointers. Could you help me, how can I get received data for future use? (UART works fine for me in this case, but it sends only 0x00 with this code)
The function i2c_2byte_readReg takes as a third argument a pointer to the buffer where the data will be written. Note that it must have size bigger than the forth argument called length. Your DevId pointer doesn't point to any buffer so when calling the function you've got an access violation.
To get the data you should define an array before calling the function:
const size_t size = 8;
uint8_t data[size];
Then you can call the function passing the address of the buffer as an argument (the name of the array is converted into its address):
const uin16_t length = 2;
i2c_2byte_readReg(address, REVISION_CODE_DEVID, data, length);
Assuming that the function works well those two bytes will be saved into data buffer. Remember that size must be bigger or equal to length argument.
Then you can send the data over UART:
UART_send(data[0]);
UART_send(data[1]);

C - pass array as parameter and change size and content

UPDATE: I solved my problem (scroll down).
I'm writing a small C program and I want to do the following:
The program is connected to a mysql database (that works perfectly) and I want to do something with the data from the database. I get about 20-25 rows per query and I created my own struct, which should contain the information from each row of the query.
So my struct looks like this:
typedef struct {
int timestamp;
double rate;
char* market;
char* currency;
} Rate;
I want to pass an empty array to a function, the function should calculate the size for the array based on the returned number of rows of the query. E.g. there are 20 rows which are returned from a single SQL query, so the array should contain 20 objectes of my Rate struct.
I want something like this:
int main(int argc, char **argv)
{
Rate *rates = ?; // don't know how to initialize it
(void) do_something_with_rates(&rates);
// the size here should be ~20
printf("size of rates: %d", sizeof(rates)/sizeof(Rate));
}
How does the function do_something_with_rates(Rate **rates) have to look like?
EDIT: I did it as Alex said, I made my function return the size of the array as size_t and passed my array to the function as Rate **rates.
In the function you can access and change the values like (*rates)[i].timestamp = 123 for example.
In C, memory is either dynamically or statically allocated.
Something like int fifty_numbers[50] is statically allocated. The size is 50 integers no matter what, so the compiler knows how big the array is in bytes. sizeof(fifty_numbers) will give you 200 bytes here.
Dynamic allocation: int *bunch_of_numbers = malloc(sizeof(int) * varying_size). As you can see, varying_size is not constant, so the compiler can't figure out how big the array is without executing the program. sizeof(bunch_of_numbers) gives you 4 bytes on a 32 bit system, or 8 bytes on a 64 bit system. The only one that know how big the array is would be the programmer. In your case, it's whoever wrote do_something_with_rates(), but you're discarding that information by either not returning it, or taking a size parameter.
It's not clear how do_something_with_rates() was declared exactly, but something like: void do_something_with_rates(Rate **rates) won't work as the function has no idea how big rates is. I recommend something like: void do_something_with_rates(size_t array_size, Rate **rates). At any rate, going by your requirements, it's still a ways away from working. Possible solutions are below:
You need to either return the new array's size:
size_t do_something_with_rates(size_t old_array_size, Rate **rates) {
Rate **new_rates;
*new_rates = malloc(sizeof(Rate) * n); // allocate n Rate objects
// carry out your operation on new_rates
// modifying rates
free(*rates); // releasing the memory taken up by the old array
*rates = *new_rates // make it point to the new array
return n; // returning the new size so that the caller knows
}
int main() {
Rate *rates = malloc(sizeof(Rate) * 20);
size_t new_size = do_something_with_rates(20, &rates);
// now new_size holds the size of the new array, which may or may not be 20
return 0;
}
Or pass in a size parameter for the function to set:
void do_something_with_rates(size_t old_array_size, size_t *new_array_size, Rate **rates) {
Rate **new_rates;
*new_rates = malloc(sizeof(Rate) * n); // allocate n Rate objects
*new_array_size = n; // setting the new size so that the caller knows
// carry out your operation on new_rates
// modifying rates
free(*rates); // releasing the memory taken up by the old array
*rates = *new_rates // make it point to the new array
}
int main() {
Rate *rates = malloc(sizeof(Rate) * 20);
size_t new_size;
do_something_with_rates(20, &new_size, &rates);
// now new_size holds the size of the new array, which may or may not be 20
return 0;
}
Why do I need to pass the old size as a parameter?
void do_something_with_rates(Rate **rates) {
// You don't know what n is. How would you
// know how many rate objects the caller wants
// you to process for any given call to this?
for (size_t i = 0; i < n; ++i)
// carry out your operation on new_rates
}
Everything changes when you have a size parameter:
void do_something_with_rates(size_t size, Rate **rates) {
for (size_t i = 0; i < size; ++i) // Now you know when to stop
// carry out your operation on new_rates
}
This is a very fundamental flaw with your program.
I want to also want the function to change the contents of the array:
size_t do_something_with_rates(size_t old_array_size, Rate **rates) {
Rate **new_rates;
*new_rates = malloc(sizeof(Rate) * n); // allocate n Rate objects
// carry out some operation on new_rates
Rate *array = *new_rates;
for (size_t i = 0; i < n; ++i) {
array[i]->timestamp = time();
// you can see the pattern
}
return n; // returning the new size so that the caller knows
}
sizeof produces a value (or code to produce a value) of the size of a type or the type of an expression at compile time. The size of an expression can therefore not change during the execution of the program. If you want that feature, use a variable, terminal value or a different programming language. Your choice. Whatever. C's better than Java.
char foo[42];
foo has either static storage duration (which is only partially related to the static keyword) or automatic storage duration.
Objects with static storage duration exist from the start of the program to the termination. Those global variables are technically called variables declared at file scope that have static storage duration and internal linkage.
Objects with automatic storage duration exist from the beginning of their initialisation to the return of the function. These are usually on the stack, though they could just as easily be on the graph. They're variables declared at block scope that have automatic storage duration and internal linkage.
In either case, todays compilers will encode 42 into the machine code. I suppose it'd be possible to modify the machine code, though that several thousands of lines you put into that task would be much better invested into storing the size externally (see other answer/s), and this isn't really a C question. If you really want to look into this, the only examples I can think of that change their own machine code are viruses... How are you going to avoid that antivirus heuristic?
Another option is to encode size information into a struct, use a flexible array member and then you can carry both the array and the size around as one allocation. Sorry, this is as close as you'll get to what you want. e.g.
struct T_vector {
size_t size;
T value[];
};
struct T_vector *T_make(struct T_vector **v) {
size_t index = *v ? (*v)->size++ : 0, size = index + 1;
if ((index & size) == 0) {
void *temp = realloc(*v, size * sizeof *(*v)->value);
if (!temp) {
return NULL;
}
*v = temp;
// (*v)->size = size;
*v = 42; // keep reading for a free cookie
}
return (*v)->value + index;
}
#define T_size(v) ((v) == NULL ? 0 : (v)->size)
int main(void) {
struct T_vector *v = NULL; T_size(v) == 0;
{ T *x = T_make(&v); x->value[0]; T_size(v) == 1;
x->y = y->x; }
{ T *y = T_make(&v); x->value[1]; T_size(v) == 2;
y->x = x->y; }
free(v);
}
Disclaimer: I only wrote this as an example; I don't intend to test or maintain it unless the intent of the example suffers drastically. If you want something I've thoroughly tested, use my push_back.
This may seem innocent, yet even with that disclaimer and this upcoming warning I'll likely see a comment along the lines of: Each successive call to make_T may render previously returned pointers invalid... True, and I can't think of much more I could do about that. I would advise calling make_T, modifying the value pointed at by the return value and discarding that pointer, as I've done above (rather explicitly).
Some compilers might even allow you to #define sizeof(x) T_size(x)... I'm joking; don't do this. Do it, mate; it's awesome!
Technically we aren't changing the size of an array here; we're allocating ahead of time and where necessary, reallocating and copying to a larger array. It might seem appealing to abstract allocation away this way in C at times... enjoy :)

How can I record sound with 16 bits per sample (16 bit depth)?

I try to record PCM sound from flash (using Microphone class). I use org.bytearray.micrecorder.MicRecorder helper class.
In Microphone class I cannot find property like bitDepth or bitsPerSample.
I always get 32 bits.
Is it possible to do?
UPDATE: The asker John812 was able to solve this by using..
bit16_bytes.writeShort( data.readFloat() * 32767 ); see comments below for context
METHOD #2: Based on my experience with using the LoadPCMfromByteArray method
I have something you could try but I've only used it with an actual 32bit WAVE file and played via the LoadPCMFromByteArray command.
The AS3 Microphone Class records 32 bits. You have to write the conversion of samples to a different bit-depth by yourself. I have no idea how many samples you are processing but the general code below shows you how to convert. Note: * 512 means use your actual samples amount (example: * 4096? or * 8192?) If you get the numbers wrong there'll be hiss/distortion so either experiment from small or provide the full details in your question for a more helpful edit/answer.
CONVERT: Assuming your recorded byteArray is called data
public var bit16_bytes : ByteArray; //will hold the 16bit version
public function convert_to16Bit () : void
{
bit16_bytes = new ByteArray(); data.position = 0;
while (bit16_bytes.position < data.length - 4)
//if you get noise/distortion try either: 256, 512, 1024, 2048, 4096 or 8192
{ bit16_bytes.writeShort( data.readInt() * 512 ); } //multiply by samples amount
data = new ByteArray(); //recycle for re-use
bit16_bytes.position = 0; //reset or else E-O-File error
bit16_bytes.readBytes( data ); //copy 16bit back into Data byte-array
}
To run the above function whenever you're ready just add the line convert_to16Bit(); inside whatever function deals with your "recording complete" situation.

Cost of OpenCL get_local_id()

I have a simple scan kernel, which calculates scans of several blocks in a loop. I noticed that performance somewhat rises when get_local_id() is stored inside a local variable instead of calling it inside the loop. So to summarize with code, this:
__kernel void LocalScan_v0(__global const int *p_array, int n_array_size, __global int *p_scan)
{
const int n_group_offset = get_group_id(0) * SCAN_BLOCK_SIZE;
p_array += n_group_offset;
p_scan += n_group_offset;
// calculate group offset
const int li = get_local_id(0); // *** local id cached ***
const int gn = get_num_groups(0);
__local int p_workspace[SCAN_BLOCK_SIZE];
for(int i = n_group_offset; i < n_array_size; i += SCAN_BLOCK_SIZE * gn) {
LocalScan_SingleBlock(p_array, p_scan, p_workspace, li);
p_array += SCAN_BLOCK_SIZE * gn;
p_scan += SCAN_BLOCK_SIZE * gn;
}
// process all the blocks in the array (each block size SCAN_BLOCK_SIZE)
}
Has throughput of 74 GB/s on GTX-780, while this:
__kernel void LocalScan_v0(__global const int *p_array, int n_array_size, __global int *p_scan)
{
const int n_group_offset = get_group_id(0) * SCAN_BLOCK_SIZE;
p_array += n_group_offset;
p_scan += n_group_offset;
// calculate group offset
const int gn = get_num_groups(0);
__local int p_workspace[SCAN_BLOCK_SIZE];
for(int i = n_group_offset; i < n_array_size; i += SCAN_BLOCK_SIZE * gn) {
LocalScan_SingleBlock(p_array, p_scan, p_workspace, get_local_id(0));
// *** local id polled inside the loop ***
p_array += SCAN_BLOCK_SIZE * gn;
p_scan += SCAN_BLOCK_SIZE * gn;
}
// process all the blocks in the array (each block size SCAN_BLOCK_SIZE)
}
Has only 70 GB/s on the same hardware. The only difference is whether the call to get_local_id() is inside or outside of the loop. The code in LocalScan_SingleBlock() is pretty much described in this GPU Gems article.
Now this brings some questions. I always imagined that thread id is stored inside some register, and access to it is as fast as to any thread-local variable. This doesn't seem to be the case. I always used to have habit of caching the local id in a variable with reluctance of an old "C" programmer who wouldn't call a function in a loop, had he expect it to return the same value every time, but I didn't seriously think it would make any difference.
Any ideas as to why this might be? I didn't do any checking on the compiled binary code. Does anyone have the same experience? Is it the same with threadIdx.x in CUDA? How about ATI platforms? Is this behavior described somewhere? I quickly scanned through CUDA Best Practices, but didn't find anything.
This is just a guess, but as per the Khronos page
http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/get_local_id.html
get_local_id() isn't defined to return a constant value (merely size_t). That may mean that, as far as the compiler is aware, it may not be allowed to perform certain optimisations compared with a constant local_id because the return of the function value may change in the eyes of the compiler (even though it wont per-thread)

CUDA binary search implementation

I am trying to speed up the CPU binary search. Unfortunately, GPU version is always much slower than CPU version. Perhaps the problem is not suitable for GPU or am I doing something wrong ?
CPU version (approx. 0.6ms):
using sorted array of length 2000 and do binary search for specific value
...
Lookup ( search[j], search_array, array_length, m );
...
int Lookup ( int search, int* arr, int length, int& m )
{
int l(0), r(length-1);
while ( l <= r )
{
m = (l+r)/2;
if ( search < arr[m] )
r = m-1;
else if ( search > arr[m] )
l = m+1;
else
{
return index[m];
}
}
if ( arr[m] >= search )
return m;
return (m+1);
}
GPU version (approx. 20ms):
using sorted array of length 2000 and do binary search for specific value
....
p_ary_search<<<16, 64>>>(search[j], array_length, dev_arr, dev_ret_val);
....
__global__ void p_ary_search(int search, int array_length, int *arr, int *ret_val )
{
const int num_threads = blockDim.x * gridDim.x;
const int thread = blockIdx.x * blockDim.x + threadIdx.x;
int set_size = array_length;
ret_val[0] = -1; // return value
ret_val[1] = 0; // offset
while(set_size != 0)
{
// Get the offset of the array, initially set to 0
int offset = ret_val[1];
// I think this is necessary in case a thread gets ahead, and resets offset before it's read
// This isn't necessary for the unit tests to pass, but I still like it here
__syncthreads();
// Get the next index to check
int index_to_check = get_index_to_check(thread, num_threads, set_size, offset);
// If the index is outside the bounds of the array then lets not check it
if (index_to_check < array_length)
{
// If the next index is outside the bounds of the array, then set it to maximum array size
int next_index_to_check = get_index_to_check(thread + 1, num_threads, set_size, offset);
if (next_index_to_check >= array_length)
{
next_index_to_check = array_length - 1;
}
// If we're at the mid section of the array reset the offset to this index
if (search > arr[index_to_check] && (search < arr[next_index_to_check]))
{
ret_val[1] = index_to_check;
}
else if (search == arr[index_to_check])
{
// Set the return var if we hit it
ret_val[0] = index_to_check;
}
}
// Since this is a p-ary search divide by our total threads to get the next set size
set_size = set_size / num_threads;
// Sync up so no threads jump ahead and get a bad offset
__syncthreads();
}
}
Even if I try bigger arrays, the time ratio is not any better.
You have way too many divergent branches in your code so you're essentially serializing the entire process on the GPU. You want to break up the work so that all the threads in the same warp take the same path in the branch. See page 47 of the CUDA Best Practices Guide.
I'm must admit I'm not entirely sure what what your kernel does, but am I right in assuming that you are looking for just one index that satisfies your search criteria? If so then have a look at the reduction sample that comes with CUDA for some pointers on how to structure and optimize such a query. (What your are doing is essentially trying to reduce the closest index to your query)
Some quick pointers though:
You are performing an awful lot of reads and writes to global memory, which is incredibly slow. Try using shared memory instead.
Secondly remember that __syncthreads() only syncs threads in the same block, so your reads/writes to global memory won't necessarily get synced across all threads (though the latency from you global memory writes may actually make it appear as if they do)