I am trying to set properties of captured image in linux.
For example:
format, width, height, that could be achieved by:
VIDIOC_S_FMT/VIDIOC_G_FMT + struct v4l2_format fmt;
But, I am blocked in getting/setting more detail parameters:
like H264 key-frame period.
I found there are api to reach the goal.
that are v4l2_ext_controls, v4l2-ext-control and VIDIOC_G_EXT_CTRLS.
I have tried that, but that did not work in my example code.
My code is like this :
struct v4l2_ext_control extCtrl;
memset(&extCtrl, 0, sizeof(struct v4l2_ext_control));
extCtrl.id = V4L2_CID_MPEG_VIDEO_H264_I_PERIOD;
extCtrl.size = 0;
extCtrl.value = 2;
struct v4l2_ext_controls extCtrls;
extCtrls.controls = &extCtrl;
extCtrls.count = 1;
extCtrls.ctrl_class = V4L2_CTRL_CLASS_MPEG;
ret = ioctl(fd, VIDIOC_S_EXT_CTRLS, &extCtrls);
if (0 < ret)
{
printf("VIDIOC_S_EXT_CTRLS setting (%s)\n", strerror(errno));
return -3;
}/*if*/
ret = ioctl(fd, VIDIOC_G_EXT_CTRLS, &extCtrls);
if (0 < ret)
{
printf("VIDIOC_G_EXT_CTRLS setting (%s)\n", strerror(errno));
return -4;
}/*if*/
printf("extCtrl.value = %d\n", extCtrl.value );
That seems well, the key frame period be 2 (extCtrl.value).
But when I used
ffplay -skip_frame nokey -i saved_raw_h264
The key frame period is obviously much greater than 2.
May anyone help me?
By the way: The Logitech C920, is the only one camera I know, supports h264 output in consumer market.
Does any know other camera supporting h264?
Assuming you are setting the parameters correctly, it's very possible that the Logitech C920 Linux driver is ignoring some, if not many, of the control parameters you are passing in via V4L2. Do you have the driver source for the C920? Or is it using a generic Linux USB camera driver? You could at least see which V4L2 controls are supported by the driver.
edit:
Have you seen these threads which talk about adding C920 support to gstreamer?
http://sourceforge.net/p/linux-uvc/mailman/linux-uvc-devel/thread/505D0DAE.7020907#collabora.co.uk/
http://kakaroto.homelinux.net/2012/09/uvc-h264-encoding-cameras-support-in-gstreamer/
Related
I use C++ Builder (Delphi 10.2 and C++Builder 10.2 Update 2) and I need a method that, in case there is no particular table, creates it using TADO objects (ADODB)?
I mean TADOQuery, TADOTable, TADOConnection, etc.
How can I do this?
I tried looking at the methods of TADOConncection, of TADOTable, but none of them seem to be useful.
I also tried this route (https://docwiki.embarcadero.com/Libraries/Alexandria/en/Bde.DBTables.TTable.Exists) but there are compatibility problems.
Does this help ?
TADOConnection *YOUR_TADOCONNECTION; // your connection defined earlier in your code
TStringList *TableList = new TStringList;
bool WithSystemTables = true; // or false according to your requirements
YOUR_TADOCONNECTION->GetTableNames(TableList, WithSystemTables);
for (int i = 0 i < TableList->Count(); i++) {
String NextTableName = TableList->Strings[i];
/*.... your check for the table name being the one you want goes here .... */
}
delete TableList;
I have problems with the general recognition of subscript and superscript in text fragments.
Example-image:
I used Tesseract 4.1.1 with the training data available under https://github.com/tesseract-ocr/tessdata_best. The numerous options had default values except:
tessedit_create_hocr = 1 (to get result as HOCR)
hocr_font_info = 1 (to get additional font infos like font size)
hocr_char_boxes = 1 (to get character-based result)
The language was set to eng. Neither with page segmentation mode 3 (PSM_AUTO_OSD) nor 11 (PSM_SPARSE_TEXT) nor 12 (PSM_SPARSE_TEXT_OSD) the subscript/superscript was recognized correctly.
In the output the sub/sup-fragments were all more or less wrong:
"SubtextSub" is recognized as "Subtextsu,"
"SuptextSub" is recognized as "Suptexts?"
"P0" is recognized as "Po"
"P100" is recognized as "P1go"
"a2+b2" is recognized as "a+b?"
Using Tesseract for OCR is there a way to ...?
optimize subscript/superscript handling
get infos about recognized subscript/superscript (in the hocr-output - ideally for each character)
Working on the quality of the image as suggested in other questions/answers to this topic didn't really change anything.
Following these 2 links from the tesseract-google-newsgroup at first it really seemed to be a question of training:
link1 and link2.
But after doing some experiments I found out, that the used OEM_DEFAULT-OCR engine mode just doesn't bring up the needed information. I found a partial solution to the problem. Partial, because I now get most infos about sub/sup and also the recognized characters are right in most cases, but not for all characters.
Using the OEM_TESSERACT_ONLY-OCR engine mode (=the legacy mode) and some API methods provided by Tess4J I came up with the following java test class:
public class SubSupEvaluator {
public void determineSubSupCharacters(BufferedImage image) {
//1. initialize Tesseract and set image infos
TessBaseAPI handle = TessAPI1.TessBaseAPICreate();
try {
int bpp = image.getColorModel().getPixelSize();
int bytespp = bpp / 8;
int bytespl = (int) Math.ceil(image.getWidth() * bpp / 8.0);
TessBaseAPIInit2(handle, new File("./tessdata/").getAbsolutePath(), "eng", TessOcrEngineMode.OEM_TESSERACT_ONLY);
TessBaseAPISetPageSegMode(handle, TessPageSegMode.PSM_AUTO_OSD);
TessBaseAPISetImage(handle, ImageIOHelper.convertImageData(image), image.getWidth(), image.getHeight(), bytespp, bytespl);
//2. start actual OCR run
TessBaseAPIRecognize(handle, null);
//3. iterate over the result character-wise
TessResultIterator ri = TessBaseAPIGetIterator(handle);
TessPageIterator pi = TessResultIteratorGetPageIterator(ri);
TessPageIteratorBegin(pi);
do {
//determine character
Pointer ptr = TessResultIteratorGetUTF8Text(ri, TessPageIteratorLevel.RIL_SYMBOL);
String character = ptr.getString(0);
TessDeleteText(ptr); //release memory
//determine position information
IntBuffer leftB = IntBuffer.allocate(1);
IntBuffer topB = IntBuffer.allocate(1);
IntBuffer rightB = IntBuffer.allocate(1);
IntBuffer bottomB = IntBuffer.allocate(1);
TessPageIteratorBoundingBox(pi, TessPageIteratorLevel.RIL_SYMBOL, leftB, topB, rightB, bottomB);
//write info to console
System.out.println(String.format("%s - position [%d %d %d %d], subscript: %b, superscript: %b", character, leftB.get(), topB.get(),
rightB.get(), bottomB.get(), TessAPI1.TessResultIteratorSymbolIsSubscript(ri) == TessAPI1.TRUE,
TessAPI1.TessResultIteratorSymbolIsSuperscript(ri) == TessAPI1.TRUE));
} while (TessPageIteratorNext(pi, TessPageIteratorLevel.RIL_SYMBOL) == TessAPI1.TRUE);
} finally {
TessBaseAPIDelete(handle); //release memory
}
}
}
The legacy mode only works with 'normal' training data. Using the '-best' training data is bringing an error.
There is very little information on this topic.
One option to enhance sub/superscript character recognition (even if not the position itself) is by preprocessing the image, with cv2 / pil (also pillow) e.g., and then tesseract it.
See
How to detect subscript numbers in an image using OCR?
Related (but otherwise not answering the question):
https://www.mail-archive.com/tesseract-ocr#googlegroups.com/msg19434.html
https://github.com/tesseract-ocr/tesseract/blob/master/src/ccmain/superscript.cpp
what do you guys think about getting tesseract to recognize single letters?
Tesseract does not recognize single characters
I tried it with the option --psm 10
tesseract imTstg.png out5 --psm 10
but it did not seem to work. I am thinking about just running yolo to detect the single letters.
I am currently porting an UWP application from C++/CX to C++/WinRT. I encountered a safe_cast<Platform::IBoxArray<byte>^>(data) where data is of type Windows::Foundation::IInspectable ^.
I know that the safe_cast is represented by the as<T> method, and I know there are functions for boxing (winrt::box_value) and unboxing (winrt::unbox_value) in WinRT/C++.
However, I need to know the equivalent of Platform::IBoxArray in order to perform the cast (QueryInterface). According to https://learn.microsoft.com/de-de/cpp/cppcx/platform-iboxarray-interface?view=vs-2017, IBoxArray is the C++/CX equivalent of Windows::Foundation::IReferenceArray, but there is no winrt::Windows::Foundation::IReferenceArray...
Update for nackground: What I am trying to achieve is retrieving the view transform attached by the HoloLens to every Media Foundation sample from its camera. My code is based on https://github.com/Microsoft/HoloLensForCV, and I got really everything working except for this last step. The problem is located around this piece of code:
static const GUID MF_EXTENSION_VIEW_TRANSFORM = {
0x4e251fa4, 0x830f, 0x4770, 0x85, 0x9a, 0x4b, 0x8d, 0x99, 0xaa, 0x80, 0x9b
};
// ...
// In the event handler, which receives const winrt::Windows::Media::Capture::Frames::MediaFrameReader& sender:
auto frame = sender.TryAcquireLatestFrame();
// ...
if (frame.Properties().HasKey(MF_EXTENSION_VIEW_TRANSFORM)) {
auto /* IInspectable */ userData = frame.Properties().Lookup(MF_EXTENSION_VIEW_TRANSFORM);
// Now I would have to do the following:
// auto userBytes = safe_cast<Platform::IBoxArray<Byte> ^>(userData)->Value;
//viewTransform = *reinterpret_cast<float4x4 *>(userBytes.Data);
}
I'm also working on porting some code from HoloLensForCV to C++/WinRT. I came up with the following solution for a very similar case (but not the exact same line of code you ask about):
auto user_data = source.Info().Properties().Lookup(c_MF_MT_USER_DATA); // type documented as 'array of bytes'
auto source_name = user_data.as<Windows::Foundation::IReferenceArray<std::uint8_t>>(); // Trial and error to get the right specialization of IReferenceArray
winrt::com_array<std::uint8_t> arr;
source_name.GetUInt8Array(arr);
winrt::hstring source_name_str{ reinterpret_cast<wchar_t*>(arr.data()) };
Specifically, you can replace the safe_cast with .as<Windows::Foundation::IReferenceArray<std::uint8_t> for a boxed array of bytes. Then, I suspect doing the same cast as me (except to float4x4* instead of wchar_t*) will work for you.
The /ZW flag is not required for my example above.
I can't believe that actually worked, but using information from https://learn.microsoft.com/de-de/windows/uwp/cpp-and-winrt-apis/interop-winrt-cx, I came up with the following solution:
Enable "Consume Windows Runtime Extension" via /ZW and use the following conversion:
auto abi = reinterpret_cast<Platform::Object ^>(winrt::get_abi(userData));
auto userBytes = safe_cast<Platform::IBoxArray<byte> ^>(abi)->Value;
viewTransform = *reinterpret_cast<float4x4 *>(userBytes->Data);
Unfortunately, the solution has the drawback of generating
warning C4447: 'main' signature found without threading model. Consider using 'int main(Platform::Array^ args)'.
But for now, I can live with it ...
I am developing an application that using IMFSourceReader to read data from video files. I am using DXVA for improved performance. I am having trouble with one specific full-HD H.264 encoded AVI file. Based on my investigation this far, I believe that the IMFSample contains incorrect data. My workflow is below:
Create a source reader with a D3D manager to enable hardware acceleration.
Set the current media type to YUY2 as DXVA does not
decode to any RGB colorspace.
Call ReadSample to get an IMFSample. Works fine.
Use the VideoProcessorBlt to perform YUY2 to BGRA32
conversion. For this specific file it errors out with an
E_INVALIDARGS error code. Decided to do the conversion myself.
Used IMFSample::ConvertToContiguousBuffer to receive an IMFMediaBuffer. When locking this buffer, the pitch is reported as 1280 bytes. This I believe is incorrect, because for a full HD video, the pitch should be (1920 + 960 + 960 = 3840 bytes).
I dumped the raw memory and extracted the Y, U and V components based on my understanding of the YUY2 layout. You can find it below. So, the data is there but I do not believe it is laid out as YUY2. Need some help in interpreting the data.
My code for reading is below:
// Direct3D surface that stores the result of the YUV2RGB conversion
CComPtr<IDirect3DSurface9> _pTargetSurface;
IDirectXVideoAccelerationService* vidAccelService;
initVidAccelerator(&vidAccelService); // Omitting the code for this.
// Create a new surface for doing the color conversion, set it up to store X8R8G8B8 data.
hr = vidAccelService->CreateSurface( static_cast<UINT>( 1920 ),
static_cast<UINT>( 1080 ),
0, // no back buffers
D3DFMT_X8R8G8B8, // data format
D3DPOOL_DEFAULT, // default memory pool
0, // reserved
DXVA2_VideoProcessorRenderTarget, // to use with the Blit operation
&_pTargetSurface, // surface used to store frame
NULL);
GUID processorGUID;
DXVA2_VideoDesc videoDescriptor;
D3DFORMAT processorFmt;
UINT numSubStreams;
IDirectXVideoProcessor* _vpd;
initVideoProcessor(&vpd); // Omitting the code for this
// We get the videoProcessor parameters on creation, and fill up the videoProcessBltParams accordingly.
_vpd->GetCreationParameters(&processorGUID, &videoDescriptor, &processorFmt, &numSubStreams);
RECT targetRECT; // { 0, 0, width, height } as left, top, right, bottom
targetRECT.left = 0;
targetRECT.right = videoDescriptor.SampleWidth;
targetRECT.top = 0;
targetRECT.bottom = videoDescriptor.SampleHeight;
SIZE targetSIZE; // { width, height }
targetSIZE.cx = videoDescriptor.SampleWidth;
targetSIZE.cy = videoDescriptor.SampleHeight;
// Parameters that are required to use the video processor to perform
// YUV2RGB and other video processing operations
DXVA2_VideoProcessBltParams _frameBltParams;
_frameBltParams.TargetRect = targetRECT;
_frameBltParams.ConstrictionSize = targetSIZE;
_frameBltParams.StreamingFlags = 0; // reserved.
_frameBltParams.BackgroundColor.Y = 0x0000;
_frameBltParams.BackgroundColor.Cb = 0x0000;
_frameBltParams.BackgroundColor.Cr = 0x0000;
_frameBltParams.BackgroundColor.Alpha = 0xFFFF;
// copy attributes from videoDescriptor obtained above.
_frameBltParams.DestFormat.VideoChromaSubsampling = videoDescriptor.SampleFormat.VideoChromaSubsampling;
_frameBltParams.DestFormat.NominalRange = videoDescriptor.SampleFormat.NominalRange;
_frameBltParams.DestFormat.VideoTransferMatrix = videoDescriptor.SampleFormat.VideoTransferMatrix;
_frameBltParams.DestFormat.VideoLighting = videoDescriptor.SampleFormat.VideoLighting;
_frameBltParams.DestFormat.VideoPrimaries = videoDescriptor.SampleFormat.VideoPrimaries;
_frameBltParams.DestFormat.VideoTransferFunction = videoDescriptor.SampleFormat.VideoTransferFunction;
_frameBltParams.DestFormat.SampleFormat = DXVA2_SampleProgressiveFrame;
// The default values are used for all these parameters.
DXVA2_ValueRange pRangePABrightness;
_vpd->GetProcAmpRange(DXVA2_ProcAmp_Brightness, &pRangePABrightness);
DXVA2_ValueRange pRangePAContrast;
_vpd->GetProcAmpRange(DXVA2_ProcAmp_Contrast, &pRangePAContrast);
DXVA2_ValueRange pRangePAHue;
_vpd->GetProcAmpRange(DXVA2_ProcAmp_Hue, &pRangePAHue);
DXVA2_ValueRange pRangePASaturation;
_vpd->GetProcAmpRange(DXVA2_ProcAmp_Saturation, &pRangePASaturation);
_frameBltParams.ProcAmpValues = { pRangePABrightness.DefaultValue, pRangePAContrast.DefaultValue,
pRangePAHue.DefaultValue, pRangePASaturation.DefaultValue };
_frameBltParams.Alpha = DXVA2_Fixed32OpaqueAlpha();
_frameBltParams.DestData = DXVA2_SampleData_TFF;
// Input video sample for the Blt operation
DXVA2_VideoSample _frameVideoSample;
_frameVideoSample.SampleFormat.VideoChromaSubsampling = videoDescriptor.SampleFormat.VideoChromaSubsampling;
_frameVideoSample.SampleFormat.NominalRange = videoDescriptor.SampleFormat.NominalRange;
_frameVideoSample.SampleFormat.VideoTransferMatrix = videoDescriptor.SampleFormat.VideoTransferMatrix;
_frameVideoSample.SampleFormat.VideoLighting = videoDescriptor.SampleFormat.VideoLighting;
_frameVideoSample.SampleFormat.VideoPrimaries = videoDescriptor.SampleFormat.VideoPrimaries;
_frameVideoSample.SampleFormat.VideoTransferFunction = videoDescriptor.SampleFormat.VideoTransferFunction;
_frameVideoSample.SrcRect = targetRECT;
_frameVideoSample.DstRect = targetRECT;
_frameVideoSample.PlanarAlpha = DXVA2_Fixed32OpaqueAlpha();
_frameVideoSample.SampleData = DXVA2_SampleData_TFF;
CComPtr<IMFSample> sample; // Assume that this was read in from a call to ReadSample
CComPtr<IMFMediaBuffer> buffer;
HRESULT hr = sample->GetBufferByIndex(0, &buffer);
CComPtr<IDirect3DSurface9> pSrcSurface;
// From the MediaBuffer, we get the Source Surface using MFGetService
hr = MFGetService( buffer, MR_BUFFER_SERVICE, __uuidof(IDirect3DSurface9), (void**)&pSrcSurface );
// Update the videoProcessBltParams with frame specific values.
LONGLONG sampleStartTime;
sample->GetSampleTime(&sampleStartTime);
_frameBltParams.TargetFrame = sampleStartTime;
LONGLONG sampleDuration;
sample->GetSampleDuration(&sampleDuration);
_frameVideoSample.Start = sampleStartTime;
_frameVideoSample.End = sampleStartTime + sampleDuration;
_frameVideoSample.SrcSurface = pSrcSurface;
// Run videoProcessBlt using the parameters setup (this is used for color conversion)
// The returned code is E_INVALIDARGS
hr = _vpd->VideoProcessBlt( _pTargetSurface, // target surface
&_frameBltParams, // parameters
&_frameVideoSample, // video sample structure
1, // one sample
NULL); // reserved
After a call to ReadSample of IMFSourceReader or inside OnReadSample callback function of the IMFSourceReaderCallback implementation, you might receive the MF_SOURCE_READERF_CURRENTMEDIATYPECHANGED flag. It means that the current media type has changed for one or more streams. To get the current media type call the IMFSourceReader::GetCurrentMediaType method.
In your case you need to query (again) the IMFMediaType's MF_MT_FRAME_SIZE attribute for the video stream, in order to obtain the new correct video resolution. You should use the new video resolution to set the "width" and "height" values of the VideoProcessorBlt parameters' source and destination rectangles.
I need current display resolution. How can i get this? I know about Window.Current.Bounds, but application can worked in windows mode.
what do you mean VisibleBounds is not working on Deskptop?
I tried in my win10 UWP program, it works fine. I can get my desktop resotion like below:
var bounds = ApplicationView.GetForCurrentView().VisibleBounds;
var scaleFactor = DisplayInformation.GetForCurrentView().RawPixelsPerViewPixel;
var size = new Size(bounds.Width * scaleFactor, bounds.Height * scaleFactor);
Besides, if you are using DX in store app, you can create an IDXGIFactory object and use it to enumerate the available adapters. Then call IDXGIOutput::GetDisplayModeList to retrieve an array of DXGI_MODE_DESC structures and the number of elements in the array. Each DXGI_MODE_DESC structure represents a valid display mode for the output. e.g.:
UINT numModes = 0;
DXGI_MODE_DESC* displayModes = NULL;
DXGI_FORMAT format = DXGI_FORMAT_R32G32B32A32_FLOAT;
// Get the number of elements
hr = pOutput->GetDisplayModeList( format, 0, &numModes, NULL);
displayModes = new DXGI_MODE_DESC[numModes];
// Get the list
hr = pOutput->GetDisplayModeList( format, 0, &numModes, displayModes);
Please let me know if you need further information.