play pcm data by webAudio API - html

Hi I am working on WebAudio API . I read HTML5 Web Audio API, porting from javax.sound and getting distortion link but not getting goodquality as in java API.I am getting PCM data from server in signed bytes . Then I have to changed this into it 16 bit format . for changing I am using ( firstbyte<<8 | secondbyte )
but I am not able to get good quality of sound . is there any problem in conversion or any other way to do for getting good quality of sound ?

The Web Audio API uses 32-bit signed floats from -1 to 1, so that's what I'm going to (hopefully) show you how to do, rather than 16-bit as you mentioned in the question.
Assuming your array of samples is called samples and are stored as 2's compliment from -128 to 127, I think this should work:
var floats = new Float32Array(samples.length);
samples.forEach(function( sample, i ) {
floats[i] = sample < 0 ? sample / 0x80 : sample / 0x7F;
});
Then you can do something like this:
var ac = new webkitAudioContext()
, ab = ac.createBuffer(1, floats.length, ac.sampleRate)
, bs = ac.createBufferSource();
ab.getChannelData(0).set(floats);
bs.buffer = ab;
bs.connect(ac.destination);
bs.start(0);

Related

Puzzling MediaCodec.BufferInfo.size values

I use MediaCodec to decode H.264/H.265 video streams. I don't think the code of thousands of lines is relevant to my question, so please allow me to avoid pasting the code here. Let me emphasize that the decoding works flawlessly. I am asking this question primarily out of curiosity, not for solving a problem.
MediaCodec.BufferInfo bi = new MediaCodec.BufferInfo();
...
int iOutputBufferIndex = myMediaCodec.dequeueOutputBuffer(bi, TIMEOUT_USEC);
logd("debug", "Output buffer size: " + bi.size);
The official document says that MediaCodec.BufferInfo.size is "The amount of data (in bytes) in the buffer." I have tested three video streams decoded by the code. MediaCodec.BufferInfo.size is 1 for an H.265 video and 8 for two H.264 video streams. These numbers do not look like "The amount of data (in bytes) in the buffer.". My understanding is that the buffer has decoded video frame.
Could anyone shed some light on this (i.e. the exact meaning of MediaCodec.BufferInfo.size) ?

Read raw Genicam H.264 data to avlib

I try to get familiar with libav in order to process a raw H.264 stream from a GenICam supporting camera.
I'd like to receive the raw data via the GenICam provided interfaces (API), and then forward that data into libav in order to produce a container file that then is streamed to a playing device like VLC or (later) to an own implemented display.
So far, I played around with the GenICam sample code, which transferres the raw H.264 data into a "sample.h264" file. This file, I have put through the command line tool ffmpeg, in order to produce an mp4 container file that I can open and watch in VLC
command: ffmpeg -i "sample.h264" -c:v copy -f mp4 "out.mp4"
Currently, I dig through examples and documentations for each H.264, ffmpeg, libav and video processing in general. I have to admit, as total beginner, it confuses me a lot.
I'm at the point where I think I have found the according libav functions that would help my undertaking:
I think, basically, I need the functions avcodec_send_packet() and avcodec_receive_packet() (since avcodec_decode_video2() is deprecated).
Before that, I set up an avCodedContext structure and open (or combine?!?) it with the H.264 codec (AV_CODEC_ID_H264).
So far, my code looks like this (omitting error checking and other stuff):
...
AVCodecContext* avCodecContext = nullptr;
AVCodec *avCodec = nullptr;
AVPacket *avPacket = av_packet_alloc();
AVFrame *avFrame = nullptr;
...
avCodec = avcodec_find_decoder(AV_CODEC_ID_H264);
avCodecContext = avcodec_alloc_context3(avCodec);
avcodec_open2 ( avCodecContext, avCodec, NULL );
av_init_packet(avPacket);
...
while(receivingRawDataFromCamera)
{
...
// receive raw data via GenICam
DSGetBufferInfo<void*>(hDS, sBuffer.BufferHandle, BUFFER_INFO_BASE, NULL, pPtr)
// libav action
avPacket->data =static_cast<uint8_t*>(pPtr);
avErr = avcodec_send_packet(avCodecContext, avPacket);
avFrame = av_frame_alloc();
avErr = avcodec_receive_frame( avCodecContext, avFrame);
// pack frame in container? (not implemented yet)
..
}
The result of the code above is, that both calls to send_packet() and receive_frame() return error codes (-22 and -11), which I'm not able to decrypt via av_strerror() (it only says, these are error codes 22 and 11).
Edit: Maybe as an additional information for those who wonder if
avPacket->data = static_cast<uint8_t*>(pPtr);
is a valid operation...
After the very first call to this operation, the content of avPacket->data is
{0x0, 0x0, 0x0, 0x1, 0x67, 0x64, 0x0, 0x28, 0xad, 0x84, 0x5,
0x45, 0x62, 0xb8, 0xac, 0x54, 0x74, 0x20, 0x2a, 0x2b, 0x15, 0xc5,
0x62}
which somehow looks as something to be expected becaus of the NAL marker and number in the beginning?
I don't know, since I'm really a total beginner....
The question now is, am I on the right path? What is missing and what do the codes 22 and 11 mean?
The next question would be, what to do afterwards, in order to get a container that I can stream (realtime) to a player?
Thanks in advance,
Maik
At least for the initally asked question I found the solution for myself:
In order to get rid of the errors on calling the functions
avcodec_send_packet(avCodecContext, avPacket);
...
avcodec_receive_frame( avCodecContext, avFrame);
I had to manually fill some parameters of 'avCodecContext' and 'avPacket':
avCodecContext->bit_rate = 8000000;
avCodecContext->width = 1920;
avCodecContext->height = 1080;
avCodecContext->time_base.num = 1;
avCodecContext->time_base.den = 25;
...
avPacket->data = static_cast<uint8_t*>(pPtr);
avPacket->size = datasize;
avPacket->pts = frameid;
whereas 'datasize' and 'frameid' are received via GenICam, and may not be the appropriate parameters for the fields, but at least I do not get any errors anymore.
Since this answers my initial question on how I get the raw data into the structures of libav, I think, the question is answered.
The discussion and suggestions with/from Vencat in the commenst section lead to additional questions I have, but which should be discussed in a new question, I guess.

Type of field 7 in auth response message of google cast protocol v2

The Google Cast protocol v2 has widely been reverse-engineered and is therefore already well-known. A good example of this is the Cast v2 Node library repository on GitHub which includes a detailed description of the cast v2 protocol.
However, whilst writing my own implementation of the protocol in Java using Netty, I realized that the auth response message is way more complex than described in the linked repository.
According to the repository, the message should look like:
message AuthResponse {
required bytes signature = 1;
required bytes client_auth_certificate = 2;
repeated bytes client_ca = 3;
}
However, the client sends 3 more fields. They have the indices 4, 6 and 7.
Field 4 is of wiretype VARINT and stands, as far as I know, for the SignatureAlgorithm the Cast-enabled device (Chromecast Gen2 and Chromecast Audio) has been challenged with.
Field 6 is also of type VARINT, but I have no idea what it stands for. During testing, it always had the value 0. (Maybe it stands for the client_ca certificate used for signing the client_auth_certificate?)
Field 7 is of wiretype LENGTH_DELIMITED. It is definetly not an UTF-8 encoded String since printing it out results in an unreadable mess. However, the sequence printed out contains the complete address that's also been used in the client_ca and client_auth_certificate, so I believe it has something to do with it. I've already tested whether this might be a certificate or RSA key, but both tests were negative. A file containing the raw byte sequence can be found here.
This brings me finally to my question:
Do you know what fields 6 and 7 stand for? Guesses based on the file's structure are also highly appreciated.
As I've found out, the protocol is practically open-source since the Chromium project includes the corresponding .proto-files in order to support streaming on Cast-enabled devices.
The complete protocol can be found here: https://github.com/chromium/chromium/blob/master/components/cast_channel/proto/cast_channel.proto
The structure of the AuthResponse message is therefore
message AuthResponse {
required bytes signature = 1;
required bytes client_auth_certificate = 2;
repeated bytes intermediate_certificate = 3;
optional SignatureAlgorithm signature_algorithm = 4
[default = RSASSA_PKCS1v15];
optional bytes sender_nonce = 5;
optional HashAlgorithm hash_algorithm = 6 [default = SHA1];
optional bytes crl = 7;
}

Tesseract is giving junk data as an output for Japaneses language

I'm trying to build a sample application in java for Japaneses language that will read an image file and just output the text extracted from the image. I found one sample application on net which is running perfect for English Language but not for Japanees it is giving unidentified text, following is my code:
BytePointer outText;
TessBaseAPI api = new TessBaseAPI();
// Initialize tesseract-ocr with japanees, without specifying tessdata path
if (api.Init(".", "jpn") != 0) {
System.err.println("Could not initialize tesseract.");
System.exit(1);
}
// Open input image with leptonica library
PIX image = pixRead("test.png");
api.SetImage(image);
// Get OCR result
outText = api.GetUTF8Text();
String string = outText.getString();
assertTrue(!string.isEmpty());
System.out.println("OCR output:\n" + string);
// Destroy used object and release memory
api.End();
outText.deallocate();
pixDestroy(image);
my output is:
OCR output:
ETCカー-ード申 込書
�申�込�日 09/02/2017
ETC FeatureID ETCFFL
ー申込枚輩交 画 枚
i has used jpn.tessdata and my application is reading tessdata file also. is any more configration needed? i'm using Tessaract 3.02 version with very clean image.
Yes! i got the solution, what we need to do is to set the locale in our java code as follows:
olocale = new Locale.Builder().setLanguage("ja").setRegion("JP").build();
we can set locale for English language also in order to extract both Japanese as well as English text from Image.
now it is working like charm for me!!

How to successfully parse the output of FFMpeg in NodeJS

So I have seen a lot of topics on FFMPeg and it's a great tool I learnt about today, but I have spent the day perfecting the command and now am a little stuck with the NodeJS part.
In essence the command does the following: take input from a Mac OSX webcam, and then stream it to a web-socket. Now I looked at a lot of the NodeJS libraries but I couldn't find one that did what I need; or did not understand how to. Here is an example of the command that I am using:
ffmpeg -f avfoundation -framerate 30 -video_size 640x480 -pix_fmt uyvy422 -i "0:1" -f mpegts -codec:v mpeg1video -s 640x480 -b:v 1000k -bf 0 http://localhost:8081/stream
This does everything I need for the streaming side of things, but I wish to call it via NodeJS, and then be able to monitor the log, and parse the data that comes back for example:
frame= 4852 fps= 30 q=6.8 size= 30506kB time=00:02:41.74 bitrate=1545.1kbits/s speed= 1x \r
and use it to get a JSON array back for me to output to a webpage.
Now all I am doing is working on ways of actually parsing the data, and I have looked at lots of other answers for things like this, but I can't seem to split/replace/regex it. I can't get anything but a long string from it.
Here is the code I am using (NodeJS):
var ffmpeg = require('child_process').spawn('/usr/local/Cellar/ffmpeg/3.3.1/bin/ffmpeg', ['-f', 'avfoundation', '-framerate', '30', '-video_size', '640x480', '-pix_fmt', 'uyvy422', '-i', '0:1', '-f', 'mpegts', '-codec:v', 'mpeg1video', '-s', '640x480', '-b:v', '1000k', '-bf', '0', 'http://localhost:8081/test']);
ffmpeg.on('error', function (err) {
console.log(err);
});
ffmpeg.on('close', function (code) {
console.log('ffmpeg exited with code ' + code);
});
ffmpeg.stderr.on('data', function (data) {
// console.log('stderr: ' + data);
var tData = data.toString('utf8');
// var a = tData.split('[\\s\\xA0]+');
var a = tData.split('\n');
console.log(a);
});
ffmpeg.stdout.on('data', function (data) {
var frame = new Buffer(data).toString('base64');
// console.log(frame);
});
I have tried splitting with new lines, carridge return, spaces, tabs, but I just can't seem to get a basic array of bits, that I can work with.
Another thing to note, is you will notice the log comes back via stderr, I have seen this online and apparently it does it for a lot of people? So I am not sure what the deal is with that? but the code is is the sdterr callback.
Any help is very appreciated as I am truly confused on what I am doing wrong.
Thanks.
An update on this, I worked with one of the guys off the IRC channel: #ffmpeg on FreeNode. The answer was to send the output via pipe to stdout.
For example I appended the following to the FFMpeg command:
-progress pipe:1
The progress flag is used to give an output every second with information about the stream, so this is pretty much everything you get from the stderr stream every second, but piped to the stdout stream in a format that I can parse. Below is taken from the documentation.
-progress url (global) Send program-friendly progress information to url. Progress information is written approximately every second and at the end of the encoding process. It is made of "key=value" lines. key consists of only alphanumeric characters. The last key of a sequence of progress information is always "progress".
Here is an example of the code I used to parse the stream information:
ffmpeg.stdout.on('data', function (data) {
var tLines = data.toString().split('\n');
var progress = {};
for (var i = 0; i < tLines.length; i++) {
var item = tLines[i].split('=');
if (typeof item[0] != 'undefined' && typeof item[1] != 'undefined') {
progress[item[0]] = item[1];
}
}
// The 'progress' variable contains a key value array of the data
console.log(progress);
});
Thanks to all that commented!
In the spirit of not reinventing the wheel, you might want to try using fluent-ffmpeg. It dispatches a progress event with a number of useful fields
'progress': transcoding progress information
The progress event is emitted every time ffmpeg reports progress
information. It is emitted with an object argument with the following
keys:
frames: total processed frame count
currentFps: framerate at which FFmpeg is currently processing
currentKbps: throughput at which FFmpeg is currently processing
targetSize: current size of the target file in kilobytes
timemark: the timestamp of the current frame in seconds
percent: an estimation of the progress percentage
If you're curious about how they do this, you can read the source, starting from here and here
Ffmpeg uses stderr to output log info because stdout is used for piping the output to other processes. The stuff in stderr is actually just debug information, and not the actual output of the process.
BONUS ROUND
I've seen some hacky video players that use websockets to stream videos, but that approach has a number of issues with it. I'm not going to go over those, but I will explain why I think you should use hls.js.
Support is pretty good; basically works everywhere except old IE. It uses MSE to upgrade the standard video element, so you don't have to wrestle with building a custom player.
Here are the docs for the hls format flag
Here's some code that I'm using to stream from an IPTV box to a web page.
this.ffmpeg = new FFFmpeg()
this.ffmpeg.input(request(this.http_stream))
.videoCodec('copy')
.audioCodec('copy')
.outputOptions([
'-f hls',
'-hls_list_size 6',
'-hls_flags delete_segments'
])
.output( path.join(this.out_dir, 'video.m3u8') )
.run()
It generates a .m3u8 manifest file along with segmented mpeg-ts video files. All you need to do after that is load the m3u8 file into the hls.js player and you have a live stream!
If you're going to re-encode the stream, you will probably see some low fps and glitchiness. I'm lucky since my source stream is already encoded as mpeg-ts.