FineReader - How to create/use a custom dictionary - ocr

I'm trying to create a custom dictionary to use into Abby FineReader SDK for C#, but I'm getting no success.
Is there someone who knows how to create and use a custom dictionary into FineReader?

DocumentProcessingParams dpParams = engine.CreateDocumentProcessingParams();
dpParams.PageProcessingParams.RecognizerParams.TextLanguage = makeTextLanguage("DICTIONARY PATH");
private TextLanguage makeTextLanguage(string dictionaryPath)
{
// Create new TextLanguage object
LanguageDatabase languageDatabase = engine.CreateLanguageDatabase();
TextLanguage textLanguage = languageDatabase.CreateTextLanguage();
var textlanguageName = Path.GetFileName(new FileInfo(textBox_dictionary.Text).Name);
// Copy all attributes from predefined English language
TextLanguage tempL = engine.PredefinedLanguages.Find("PortugueseBrazilian")
.TextLanguage;
textLanguage.CopyFrom(tempL);
textLanguage.InternalName = textlanguageName;
// Bind new dictionary to first (and single) BaseLanguage object within TextLanguage
BaseLanguage baseLanguage = textLanguage.BaseLanguages[0];
// Change internal dictionary name to user-defined
baseLanguage.InternalName = textlanguageName;
//set custom doctionary for base language
setDictionary(baseLanguage, dictionaryPath);
return textLanguage;
}
//set custom dictinary for base language
private void setDictionary(BaseLanguage baseLanguage, string dictionaryPath)
{
//create dictionary file
// Get collection of dictionary descriptions and remove all items
DictionaryDescriptions dictionaryDescriptions = baseLanguage.DictionaryDescriptions;
//dictionaryDescriptions.DeleteAll();
// Create user dictionary description and add it to the collection
IDictionaryDescription dictionaryDescription = dictionaryDescriptions.AddNew(DictionaryTypeEnum.DT_UserDictionary);
UserDictionaryDescription userDictionaryDescription = dictionaryDescription.GetAsUserDictionaryDescription();
userDictionaryDescription.FileName = dictionaryPath;
}

Related

Is it possible to use JsonReaderWriterFactory to convert XML to JSON without using DataContractJsonSerializer?

I need a generic routine that takes any valid XML and converts it to JSON without knowing the underlying data type. I know that this is easily done with Json.Net and I also know how to do it with the DataContractJsonSerializer but our organisation doesn't use Json.Net and the DataContractJsonSerializer needs a Data Contract enabled object type.
My working code using Json.Net:
XmlDocument document = new XmlDocument();
document.LoadXml(xml);
string jsonText = JsonConvert.SerializeXmlNode(document);
The code I'd like to be able to use, using JsonReaderWriterFactory instead of Json.Net:
string jsonText = string.Empty;
MemoryStream stream = new MemoryStream();
StreamWriter streamWriter = new StreamWriter(stream);
streamWriter.Write(xml);
streamWriter.Flush();
stream.Position = 0;
using (XmlDictionaryWriter xmlWriter = JsonReaderWriterFactory.CreateJsonWriter(stream))
{
object someObject = new object();
DataContractJsonSerializer serializer = new DataContractJsonSerializer(someObject.GetType());
serializer.WriteObject(stream, someObject);
xmlWriter.Flush();
jsonText = Encoding.Default.GetString(stream.GetBuffer());
}
Is there a way around this?
Too bad the Json.Net isn't an option - we've used it for years now, and it's fantastic. Short of native parsing and json generation by hand, there's not a lot of fast ways to do this.
Check out the code from this link:
http://www.phdcc.com/xml2json.htm (See section "XmlToJSON C# code", should be fairly quick)
This code could easily be adapted to a class or even extension to convert an XML Document (or even just xml string being parsed into an XML document, then returning the json.
Another approach to consider could be the following. It specifies using anonymous types assuming you don't have control of the objects that could be deserialized from XML (and you don't want to manage those separate types).
Convert the XML into an anonymous type (probably through the
Use the JavascriptSerializer to serialize the anonymous object into the json
The code sample below shows this techinique:
using System;
using System.Collections.Generic;
using System.Dynamic;
using System.Linq;
using System.Text;
using System.Xml.Linq;
using System.Data.Entity.Design.PluralizationServices;
using System.Globalization;
namespace Scratch
{
class Program
{
static void Main(string[] args)
{
string xml = "<root><student><id>1</id></student><student><id>2</id></student></root>";
string json = XmlToJson(xml);
Console.WriteLine(json);
Console.ReadKey(true);
}
// Using JavaScriptSerializer
static string XmlToJson(string xml)
{
var obj = GetAnonymousType(xml);
var serializer = new System.Web.Script.Serialization.JavaScriptSerializer();
return serializer.Serialize(obj);
}
// Adapted from: http://www.codeproject.com/Tips/227139/Converting-XML-to-an-dynamic-object-using-ExpandoO
static dynamic GetAnonymousType(string xml, XElement node = null)
{
node = string.IsNullOrEmpty(xml) ? node : XDocument.Parse(xml).Root;
IDictionary<String, dynamic> result = new ExpandoObject();
var pluralizationService = PluralizationService.CreateService(CultureInfo.CreateSpecificCulture("en-us"));
node.Elements().AsParallel().ForAll(gn =>
{
var isCollection = gn.HasElements
&& (gn.Elements().Count() > 1
&& gn.Elements().All(e => e.Name.LocalName.ToLower() == gn.Elements().First().Name.LocalName)
|| gn.Name.LocalName.ToLower() == pluralizationService.Pluralize(gn.Elements().First().Name.LocalName).ToLower());
var items = isCollection ? gn.Elements().ToList() : new List<XElement>() { gn };
var values = new List<dynamic>();
items.AsParallel().ForAll(i => values.Add((i.HasElements) ? GetAnonymousType(null, i) : i.Value.Trim()));
result[gn.Name.LocalName] = isCollection ? values : values.FirstOrDefault();
});
return result;
}
}
}

View SharePoint 2010 list in JSON format

I am preparing to using Timeglider to create a timeline. One requirement is the data has to be in JSON format. One requirement for me is it needs to be client side as I do not have access to the servers or central admin.
When I try to do http://webname/_vti_bin/ListData.svc/listname I get an error for access permissions however when I issue it http://webname/subsite/_vti_bin/ListData.svc/listname I have no problem pulling data.
My situation is the list is on the TLD. I tried to follow this post How to retrieve a json object from a sharepoint list but it relates to SP 2007.
To implement pure JSON support in SharePoint 2007, 2010 and so on have a look at this project, http://camelotjson.codeplex.com/. It requires the commercial product Camelot .NET Connector to be installed on the server.
If you don't like to go commercial you can resort to the sp.js library, here is a small example I wrote, enjoy!
// Object to handle some list magic
var ListMagic = function () {
/* Private variables */
var that = this;
var clientContext = SP.ClientContext.get_current();
var web = clientContext.get_web();
var lists = web.get_lists();
/**
* Method to iterate all lists
*/
that.getLists = function () {
clientContext.load(lists);
clientContext.executeQueryAsync(execute, getFailed);
function execute() {
var listEnumerator = lists.getEnumerator();
while (listEnumerator.moveNext()) {
var l = listEnumerator.get_current();
// TODO! Replace console.log with actual routine
console.log(l.get_title());
}
}
function getFailed() {
// TODO! Implement fail management
console.log('Failed.');
}
};
/**
* Method to iterate all fields of a list
*/
that.getFields = function (listName) {
// Load list by listName, if not stated try to load the current list
var loadedList = typeof listName === 'undefined' ? lists.getById(SP.ListOperation.Selection.getSelectedList()) : that.lists.getByTitle(listName);
var fieldCollection = loadedList.get_fields();
clientContext.load(fieldCollection);
clientContext.executeQueryAsync(execute, getFailed);
function execute() {
var fields = fieldCollection.getEnumerator();
while (fields.moveNext()) {
var oField = fields.get_current();
// TODO! Replace console.log with actual routine
var listInfo = 'Field Title: ' + oField.get_title() + ', Field Name: ' + oField.get_internalName();
console.log(listInfo);
}
}
function getFailed() {
// TODO! Implement fail management
console.log('Failed.');
}
};
/**
* Method to get a specific listitem
*/
that.getListItem = function (itemId) {
var loadedList = lists.getById(SP.ListOperation.Selection.getSelectedList());
var spListItem = loadedList.getItemById(itemId);
clientContext.load(spListItem);
clientContext.executeQueryAsync(execute, getFailed);
function execute() {
// TODO! Replace console.log with actual routine
//spListItem.get_fieldValues()
console.log(spListItem.get_fieldValues()["Title"]);
}
function getFailed() {
// TODO! Implement fail management
console.log('Failed.');
}
};
/**
* Method to fake an init (optional)
*/
that.init = function () {
// Run any init functionality here
// I.e
that.getFields("Tasks");
};
return that;
};
// In case of no jquery use window.onload instead
$(document).ready(function () {
ExecuteOrDelayUntilScriptLoaded(function () {
var sp = new ListMagic();
sp.init();
}, 'sp.js');
});
Personally, I make HttpHandlers. I install them in the SharePoint isapi folder and the GAC and I can call them just like you might the owssvr.dll. http://servername/_vti_bin/myhttphandelr.dll
Pass it querystring variables or call it from jquery ajax. You can use the httpcontext and make a spcontext from it and have access to all sorts of information from the current location in SharePoint. Then you can javascriptserialize the objects and pass them as JSON. Looking for some code... Hang on... I can't put all the code but this should get you close. I use this to add a submenu to the context menu to allow a user to delete or rename a file if they uploaded it to a library and it is version 1.0 and to collect a file from a library and create a eml file with the selected file(s) as an attachment(s). We don't give our users delete privileges normally. Point being, you can now create a class with just the information you need from SharePoint and pass it as JSON. The only downfall I have with this, is iisreset is required if you make any changes to the dll.
I task schedule a iisreset every night at midnight anyway to keep it fresh and free from memory bloat. I come in the next day and my changes are there. The cool thing is, the spcontext has information about the current location in SharePoint from where it is called. So, http://servername/_vti_bin/myhttphandelr.dll vs http://servername/subsite/library/_vti_bin/myhttphandelr.dll
I might add. Don't try to serialize SharePoint objects. One they are huge, complex objects. Two, I don't think they are marked serializable. Just make you own class and populate it with the values you need from the SharePoint objects.
using System;
using System.Collections.Generic;
using System.Runtime.InteropServices.ComTypes;
using System.Web;
using System.Web.Script.Serialization;
using ADODB;
using interop.cdosys;
using Microsoft.SharePoint;
namespace owssvr2
{
public class OWSsvr2 : IHttpHandler, System.Web.SessionState.IRequiresSessionState
{
private string cmd;
ctx ctx = new ctx();
private string currentuser;
private SPContext SPcontext;
private HttpContext cntx;
public bool IsReusable
{
get { return false; }
}
public void ProcessRequest(HttpContext context)
{
SPcontext = SPContext.GetContext(context); <-- Gets spcontext from the httpcontext
cntx = context;
ctx = GetData(context.Request); <-- I parse some information from the request to use in my app
cmd = ctx.Cmd;
ctx.User = context.User.Identity.Name;
currentuser = context.User.Identity.Name;
switch (cmd)
{
case "Delete":
Delete();
context.Response.Redirect(ctx.NextUsing);
break;
case "HasRights":
HasRights();
JavaScriptSerializer javaScriptSerializer = new JavaScriptSerializer();
string serEmployee = javaScriptSerializer.Serialize(ctx);
context.Response.Write(serEmployee);
context.Response.ContentType = "application/json; charset=utf-8";
break;
case "Rename":
Rename(context);
//context.Response.Redirect(context.Request["NextUsing"]);
break;
case "SendSingleFile":
try
{
context.Response.Clear();
context.Response.ClearHeaders();
context.Response.BufferOutput = true;
ADODB.Stream stream = SendSingleFile(context.Request["URL"]);
stream.Type = StreamTypeEnum.adTypeBinary;
stream.Position = 0;
context.Response.ContentType = "application/octet-stream";
context.Response.AddHeader("content-disposition", "attachment;filename=Email.eml");
IStream iStream = (IStream)stream;
byte[] byteArray = new byte[stream.Size];
IntPtr ptrCharsRead = IntPtr.Zero;
iStream.Read(byteArray, stream.Size, ptrCharsRead);
context.Response.BinaryWrite(byteArray);
context.Response.End();
}
catch(Exception ex) {context.Response.Write(ex.Message.ToString()); }
break;
case "SendMultiFile":
try
{
//SendMultiFile(context.Request["IDs"]);
context.Response.Clear();
context.Response.ClearHeaders();
context.Response.BufferOutput = true;
ADODB.Stream stream = SendMultiFile(context.Request["IDs"]);
stream.Type = StreamTypeEnum.adTypeBinary;
stream.Position = 0;
context.Response.ContentType = "application/octet-stream";
context.Response.AddHeader("content-disposition", "attachment;filename=Email.eml");
IStream iStream = (IStream)stream;
byte[] byteArray = new byte[stream.Size];
IntPtr ptrCharsRead = IntPtr.Zero;
iStream.Read(byteArray, stream.Size, ptrCharsRead);
context.Response.BinaryWrite(byteArray);
context.Response.End();
}
catch(Exception ex) {context.Response.Write("There was an error getting the files. </br>" + ex.Message.ToString()); }
break;
case "FileInfo":
JavaScriptSerializer javaScriptSerializer1 = new JavaScriptSerializer();
string serEmployee1 = javaScriptSerializer1.Serialize(FileInfo(context));
context.Response.Write(serEmployee1);
context.Response.ContentType = "application/json; charset=utf-8";
break;
case "UsersInGroups":
UsersInGroups ug = new UsersInGroups(context, context.Request["job"],context.Request["groups"]);
break;
}
}

Using SharedObject with complicated classes

I know that if I want to store a custom class with SharedObject, I have to use registerClassAlias.
registerClassAlias("MyClass", MyClass);
sharedObject.data.myObject = new MyClass();
But in my case, I have a custom class whose fields are themselves instances of custom classes. How can I store it in such a way as to recover the types when I load the data?
Specifically, the class in question is a Graph class which contains an array of Objects. This isn't the actual code, just an overview:
class Graph {
public var vertices : Array;
}
I have an instance of this Graph class, and I'm filling its vertices field with instances of another class, called Node. I need to store this Graph instance in such a way that I can:
Recover it as a Graph instance.
Access the vertices field of this recovered instance, and then access the elements of that array as Node types.
I've tried throwing some registerClassAlias("Node", Node)'s in appropriate-seeming places, but it's not having any effect. Is there a way to do this?
With more complex data such as this, your best bet is to define load() and save() methods manually on objects that you want to store in a SharedObject. Those methods will define what data should be saved, simplified as much as possible, and collate it into a format of your choice such as JSON.
In this example, your Graph could have a method save() which looks like this:
public function save(name:String, sharedObject:SharedObject):void
{
var list:Array = [];
for each(var node:Node in vertices)
{
// Add a simple object defining the important Node properties
// to the array we will save as JSON.
list.push({ x: node.x, y: node.y });
}
sharedObject.data[name] = JSON.encode(list);
}
And then a load() function like so:
public function load(name:String, sharedObject:SharedObject):void
{
// Empty current list of vertices.
vertices = [];
var list:Array = JSON.decode(sharedObject.data[name]);
for each(var def:Object in list)
{
// Create real Node from simpler definition.
var node:Node = new Node();
node.x = def.x;
node.y = def.y;
vertices.push(node);
}
}
Which would be utilized like:
existingGraph.save('myGraph', sharedObject);
var newGraph:Graph = new Graph();
newGraph.load('myGraph', sharedObject);

Object's unique integer or string identifier (hash)

Is there a method or other native way to get or store a unique identifier for an object? (perhaps with haxe native access or the actionscript API).
The reason is to facilitate contemplation about the Dictionary and other datastructures that operate on the uniqueness of an object index.
If your are using Dictionaries, the object itself could be used as key, so there you have your uniqueness.
var dict:Dictionary = new Dictionary();
var obj:SomeClass = new SomeClass();
dict[obj] = "whatever";
For other data structures, you could try to generate a sequential number statically. In many cases, this should be enough, I think.
Something like:
class UniqueKey {
private static var _key:int = 0;
public static function getNextKey():int {
return ++_key;
}
}
And to use it:
var obj:SomeClass = new SomeClass();
obj.unique = UniqueKey.getNextKey();

Shared objects not working correclty in flex -- Mobile

I have a very simple test going with shared objects in flex with mobile I have a person class.
package
{
import flash.display.MovieClip;
public class Person extends MovieClip
{
var personsname:String="";
public function Person(name:String)
{
personsname = name;
}
}
}
And then some simplish code in a view.
var person1:Person;
var person2:Person;
var person3:Person;
var person4:Person;
var thePeople:Array=[];
var so:SharedObject;
function init():void{
person1 = new Person("james");
person2 = new Person("mike");
person3 = new Person("Amanda");
person4 = new Person("Shelly");
thePeople.push(person1,person2,person3,person4);
//so = SharedObject.getLocal("savedData"); //clear it again
///so.clear(); // clear it again
savePeople();
getPeople();
}
private function savePeople():void{
so = SharedObject.getLocal("savedData");
if(so.data.thePeopleArray == null){
so.data.thePeopleArray = thePeople;
so.flush();
}
}
private function getPeople():void{
so = SharedObject.getLocal("savedData");
var thePeeps:Array = so.data.thePeopleArray;
trace(thePeeps);
}
The first time I run this it traces out
[object Person] 4 times
I close the emulator and rebuild and run it traces out
,,,
If I clear out the so it show the [object Person] again, but comment out get the ,,,
Can shared objects even store an array of objects properly. It is the same with the persistanceManager I believe.
The root of the problem here is that you are trying to save an instance MovieClip into the SharedObject. Since the MovieClip is an intrinsic object (native to flash) it cannot be converted into a form which can be stored. This causes flash to convert the data into a generic Object which is stored to disk. I can only guess at exactly what is going into the SharedObject at this point.
It seems to work the first time because flash does not actually load the shared object in the getPeople() call, it just uses the object which is already in memory. The second time the app runs it reads the generic object from disk and creates a generic object.
There is another problem which is that the flash player does not know to pass data to the constructor when it reads the object.
There are a few possible workarounds, some are:
Store the data as text
Store the data as a ByteArray
Store the data in a "Data Object"
Each of these requires some conversion during the read and write process, but this can be simplified using an interface. This also adds flexibility in case your object changes you will still be able to read the data in the SharedObject.
1: Text
As an example, you might add two methods to the Person object, call them serialise() and deserialise(). The serialise() method will return text which can be stored in the shared object. The deserialise() will parse text and populate the values of the object.
Here's a sample to illustrate this:
class Person {
private var name:String;
private var age:int;
public function serialise():String {
return [name, age].join("\t");
}
public function deserialise(input:String):void {
var tokens:Array = input.split("\t");
name = tokens[0];
age = parseInt(tokens[1]);
}
public static function create(name:String, age:int):Person
{
var output:Person = new Person();
output.name = name;
output.age = age;
return output;
}
}
For ease of use we can create a class for managing a collection of people:
class People {
private var people:Vector.<Person> = new Vector.<Person>();
public function clear():void {
people.splice(0, people.length);
}
public function add(person:Person):void {
people.push(person);
}
public function serialise():String {
var output:Array = [];
for each (var person:Person in people)
output.push(person.serialise());
return output.join("\n");
}
public function deserialise(input:String):void {
var tokens:Array = input.split("\n");
for each (var token:String in tokens) {
var person:Person = new Person();
person.deserialise(token);
add(person);
}
}
public function save():void {
var so:SharedObject = SharedObject.getLocal("cookie");
so.data.people = serialise();
so.flush();
}
public function load():void
{
var so:SharedObject = SharedObject.getLocal("cookie");
if (so.data.people != null)
deserialise(so.data.people);
}
}
Usage:
var people:People = new People();
people.load();
trace(people.serialise());
people.clear();
people.add(Person.create("Candy", 21));
people.add(Person.create("Sandy", 23));
people.add(Person.create("Randy", 27));
people.save();
trace(people.serialise());
An obvious flaw in this example is that the \n and \t characters cannot be used as part of the data (ie for the name of a person). This is a common short-coming with text data.
** Update: Look into the built-in JSON methods for a consistent approach to serialising objects to and from text.
2: ByteArray
Very similar to the text method described above, except the serialise/deserialise methods would accept an additional parameter of a ByteArray, which the object would write to. The ByteArray would then be saved and loaded from the shared object. The advantages of this method is that resulting data is usually is compact and versatile than the text method.
Flash also defines the IDataInput and IDataOutput interface which can be used here.
3: Data Objects
If you still prefer the storing objects directly, then you could create a proxy object which serves the sole purpose of carrying data. A data object (aka DO) is a an object which only has variables, and not methods. Eg:
class PersonDO {
public var name:String;
}
It would be used something like this:
var person2:Person;
var person3:Person;
var person4:Person;
var thePeople:Array=[];
var so:SharedObject;
function init():void{
person1 = new Person("james");
person2 = new Person("mike");
// store the people data into data objects
person1DO = new PersonDO();
person1DO.name = person1.name;
person2DO = new PersonDO();
person2DO.name = person2.name;
thePeople.push(person1DO,person2DO);
savePeople();
// load the people into data objects
getPeople();
person1 = new Person(thePeople[0].name);
person2 = new Person(thePeople[1].name);
private function savePeople():void{
so = SharedObject.getLocal("savedData");
if(so.data.thePeopleArray == null){
so.data.thePeopleArray = thePeople;
so.flush();
}
}
private function getPeople():void{
so = SharedObject.getLocal("savedData");
var thePeeps:Array = so.data.thePeopleArray;
trace(thePeeps);
}
Even though this may appear simpler than the alternatives there are downsides to storing objects directly:
- Stored data is very fragile - if you change the object then your data will become unusable unless you have several versions of each object.
- You need to ensure that a reference to the data object is compiled into the application.
- A common usage scenario for Shared Objects is to save data objects from one SWF, and load them in another. You need ensure that both SWFs use identical version of the class being saved and loaded.
Hope that helps.