Extracting text from DjVu with Apache Tika - ocr

I'm using Apache Tika to OCR files. With PDF files works OK but with djvu is problem. From version 1.14 Tika seems to be supporting Djvu. Any ideas how resolve this?
D:\java -jar tika-app-1.18.jar -eUTF-8 test.djvu
Returns
sep 05, 2018 6:38:59 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.
sep 05, 2018 6:38:59 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
<?xml version="1.0" encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml"
>
<head>
<meta name="X-Parsed-By" content="org.apache.tika.parser.EmptyParser"/>
<meta name="resourceName" content="test.djvu"/>
<meta name="Content-Length" content="23038658"/>
<meta name="Content-Type" content="image/vnd.djvu"/>
<title/>
</head>
<body/></html>

Have just checked the current (1.26) sources. It seems that since 1.14 the Apache Tika is able to detect djvu header and report that the file is a djvu document. That's what it exactly did:
<meta name="resourceName" content="test.djvu"/>
<meta name="Content-Length" content="23038658"/>
<meta name="Content-Type" content="image/vnd.djvu"/>
Other errors and warnings in your output are irrelevant to djvu.
And Apache Tika has no parsers for djvu so can't do anything more than filetype detection. Nothing regarding djvu support is changed since 1.14. So, Apache Tika is useless for djvu. One may consider it not supporting this format at all.

Related

Location of configuration file used by Report Wizard Query Builder to source default values for report formatting

When I use the Report Wizard Query Builder to create a report the report is created with a selection of default values.
These default values are revealed by right clicking on the Report.RDL file in Solution Explorer and opening it with the XML (Text) Editor.
Examples of the default values that are applied by the Report Wizard when creating the Report.RDL file are:
Example <df:DefaultFontFamily>Segoe UI</df:DefaultFontFamily>
Example <Color>#666666</Color>
Example <BottomBorder>
<Style>Solid</Style>
</BottomBorder>
I know I can edit these values using the GUI or directly editing the XML file.
What I want to do is edit the configuration file that the Wizard is using to source these defaults so that my custom defaults are automatically applied when new reports are created.
I have looked in the MSDN doco and my SQL/SSRS/VS directories for this configuration file but cannot find it.
Following Alan's suggestion I have opened the Report.rdl file at:
C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\Common7\IDE\CommonExtensions\Microsoft\SSRS\ProjectItems\ReportProject.
The contents of the file are:
<?xml version="1.0" encoding="utf-8"?>
<Report xmlns:rd="http://schemas.microsoft.com/SQLServer/reporting/reportdesigner" xmlns="http://schemas.microsoft.com/sqlserver/reporting/2016/01/reportdefinition" xmlns:df="http://schemas.microsoft.com/sqlserver/reporting/2016/01/reportdefinition/defaultfontfamily" MustUnderstand="df">
<df:DefaultFontFamily>Segoe UI</df:DefaultFontFamily>
<ReportSections>
<ReportSection>
<Body>
<Height>2in</Height>
</Body>
<Width>6.5in</Width>
<Page>
</Page>
</ReportSection>
</ReportSections>
<rd:ReportTemplate>true</rd:ReportTemplate>
</Report>
I've tried changing the <df:DefaultFontFamily> but the change is not reflected in subsequent reports that I generate.
Also, I still don't understand where default <Color> and <BottomBorder> default values are being set as they are not referenced in Report.rdl.
Can anyone please tell me how I should best modify Report.rdl to change the defaults used. Can I just add arbitrary XML to it?
Alternatively, if Report.rdl is not the source file for default values can anyone tell me where I can find the default source file so that, if possible, I can edit it.
Are you using Visual Studio?
If so, you can find the defalut RDL here..
C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\Common7\IDE\CommonExtensions\Microsoft\SSRS\ProjectItems\ReportProject
Open and edit the Report.Rdl file in this folder.
Replace the 2019 and Professional with the version you are using.
Personally, I really don't like using the wizard, it's often means more work fixing the default report scheme and layout and it's much faster, certainly after a a little practice, you build the report from scratch.
The other advantage is that you can create a template with your default page size, orientation, headers/footer and fonts etc. Then when you create a new report, you can select this from the list of templates. If you create reports frequently, I promise you will find it faster in the long run.

Invalid package family name error on Windows PhoneGap app

I have done a windows app using phoneGap, But when I upload the APPX file in windows dev center, I got some errors like 'Invalid package family name'.
I worked through a similar problem today and I found this article helpful:
http://cordova.apache.org/docs/en/latest/guide/platforms/win8/index.html
Check your build configuration file (build.json) and make sure you select a release if your release settings are defined to match the Strong Name with the Windows Store expected Publisher Name.
If you are using Tools for Apache Cordova (TACO), you will also want to review your config.xml file both with the GUI and then with an raw XML editor like WordPad.
<preference name="WindowsStoreDisplayName" value="XXX" />
<preference name="WindowsStoreIdentityName" value="YYYXXX" />
<vs:platformSpecificValues>
<vs:platformSpecificWidget platformName="windows" id="_ZZZZZZZZ" />
</vs:platformSpecificValues>
Where you fill in the value the Windows Store expects in the XXX, YYY and ZZZ fields.
Hope that helps,
Adam
if you are using https://build.phonegap.com you need compile using cli-6.0.0

AIR 4.0 Flash Builder Error: Initial Content Not Found

I am making a starling app in flash builder 4.7 in actionscript 3 as mobile actionscript project. Everything was going fine until I decided that I wanted to change the name of the project and the name of the document class. Then I started getting the error message. I have already looked at a few posts, tried what they suggested, and still get the error when I try to debug it.
Error Message:
Process terminated without establishing connection to debugger.
initial content not found
Launch command details: "/Applications/Adobe Flash Builder 4.7/eclipse/plugins/com.adobe.flash.compiler_4.7.0.349722/AIRSDK/bin/adl" -runtime "/Applications/Adobe Flash Builder 4.7/eclipse/plugins/com.adobe.flash.compiler_4.7.0.349722/AIRSDK/runtimes/air/mac" -profile mobileDevice -screensize 640x1116:640x1136 -XscreenDPI 326 -XversionPlatform IOS /Users/bobolicious3000/Documents/Aclown/RuMenusDMT/bin-debug/HelloDMT2-app.xml /Users/bobolicious3000/Documents/Aclown/RuMenusDMT/bin-debug
What I have tried already:
Changing the parameter in the xml descriptor file:
It was:
[This value will be overwritten by Flash Builder in the output app.xml]
and I changed it to a lot of things including:
<content>RuMenusDMT.swf</content>
Nothing worked.
Changing the header of the xml file.
I am use AIR 4.0 swc, and the xml file header looks like this:
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<application xmlns="http://ns.adobe.com/air/application/4.0">
Cleaning the project
I normally like flash builder, but this is really giving me a hard time.
What worked for me was restoring my project's "bin-debug" folder (the directory where my compiled application is output) to a previous revision. Personally, I'm running Windows 7, so I just used the OS' revision control.

Azure Worker Role configuration issue while using SlowCheetah with custom config

We are using Nlog as logging tool with our Worker Role of Azure app.
It requires NLog.config file. We installed "SlowCheetah - XML Transforms", and have two Debug/Release transforms).
Solution does get rebuild successfully.
But when I try to run, I am getting following error. (I used exact transformation for nolog.config in one of my Windows service app, and it is working fine there).
Error 163 The item "bin\Debug\NLog.config" in item list "OutputGroups"
does not define a value for metadata "TargetPath". In order to use
this metadata, either qualify it by specifying
%(OutputGroups.TargetPath), or ensure that all items in this list
define a value for this metadata. C:\Program Files
(x86)\MSBuild\Microsoft\VisualStudio\v10.0\Windows Azure
Tools\1.6\Microsoft.WindowsAzure.targets 2299 5 Insight.CloudWeb
I don't know if this is done by the SlowCheetah extension, but could you verify if your *.csproj file contains the AfterCompile target similar to this?
<Import Project="$(MSBuildToolsPath)\Microsoft.CSharp.targets" />
<UsingTask TaskName="TransformXml"
AssemblyFile="$(MSBuildExtensionsPath32)\Microsoft\VisualStudio\v10.0\Web\Microsoft.Web.Publishing.Tasks.dll" />
<Target Name="AfterCompile" Condition="exists(’app.$(Configuration).config’)">
<TransformXml Source="NLog.config"
Destination="$(IntermediateOutputPath)$(TargetFileName).config"
Transform="NLog.$(Configuration).config" />
<ItemGroup>
<AppConfigWithTargetPath Remove="NLog.config"/>
<AppConfigWithTargetPath Include="$(IntermediateOutputPath)$(TargetFileName).config">
<TargetPath>$(TargetFileName).config</TargetPath>
</AppConfigWithTargetPath>
</ItemGroup>
</Target>
Take a look at Oleg's blog post .Config File Transformation under App.config File Transformation for more information.
I have a fix for this. Now you should be able to transform app.config as well as other XML files for Azure Worker Roles using SlowCheetah. Once I get the fix verified I will release the update to the VS gallery.
If you would like to try the fix you can download the updated VSIX at https://dl.dropbox.com/u/40134810/SlowCheetah/issue-44/SlowCheetah-issue-44.zip. If you are interested in following up on this please use the issue #44.

Submitting special characters, like "š č ć đ ž", over the webform in grails app

I'm developing grails app , connected to MySql database ... I have created database with utf-8 character set and with that collation ... also by default character-set is set to utf-8 on mysql server ... but I defined it explicitly for my schema
In grails app I defined in Datasource.groovy
url = "jdbc:mysql://localhost:3306/blabla?useUnicode=true&characterEncoding=utf-8"
also in Config.groovy
grails.views.gsp.encoding = "UTF-8"
grails.converters.encoding = "UTF-8"
in my .gsp files I added
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
So when I try to create new User via views/user/create.gsp , and in some field I input characters š, č, đ, ć, or ž ... the value inserted in database is something like ÄÄÄ ... but I want the value to be ččč
When I insert new user through SQL statment in MySql Workbench , it is saved as I want it with field value "ččč" , when I load that user on my list.gsp, or show.gsp I can see in browser ččč ...
So the problem is somewhere in the process when saving User via webform...
Can anyone help ?
P.S. I don't know if it is relevant, but when I type these characters in textfield on a webform I switch my keyboard from EN (English) to SR (Serbian Latin) in Language bar in Windows
Grails 1.3.7
STS 2.8.1
mysql-connector-java-5.1.18
Windows 7
I managed to make it work by using <g:uploadForm > tag instead of <g:form> , and within it, regular <g:textfield> for values I wanted to save. Beside that everything else is Grails generated, like DataSource.groovy, Config.groovy, my *.gsp files, like I explained them in question ... Does anyone know difference between these 2 tags?
The issue you are having is caused by a Grails plugin, specifically "webxml" plugin version 1.4.
You need to upgrade that plugin in your project like this:
Stop your Grails application if it's running.
Go to "%USERHOME%.grails\1.3.7\projects\%YOUR_PROJECT%\plugins" folder where %YOUR_PROJECT% is your project's name, and %USERHOME% on Windows 7 is "C:\Users\YOUR_NAME".
There should be a folder named "webxml-1.4". Delete it.
Go to your Grails project folder.
Type "grails install-plugin webxml" and confirm the upgrade to 1.4.1 if asked.
Run your application - non-English letters will now be correctly interpreted even under scaffolding.
Put a <meta charset="utf-8"> tag on your page if you're using HTML5, and set the attribute accept-charset="UTF-8" on your form elements. It seems to be necessary to reliably get your forms submitted in UTF-8 with all browsers.
Run
SET NAMES UTF8
after connect.