Using 'preserve_interword_spaces' in tesseract.js

Using 'preserve_interword_spaces' in tesseract.js - ocr

I am trying to use Tesseract.js for OCR, but I'm not able to get the 'preserve_interword_spaces' option to work. Here is what I am trying:
Tesseract.recognize(
element.files[0],
'eng',
{ preserve_interword_spaces: 1,
logger: progress => {
console.log(progress);
progressBar.querySelector("div").innerText = progress.status;
progressBar.querySelector("progress").value = progress.progress;
} }
).then( //etc )
The OCR is coming out with multiple spaces combined into one. Help?
I'd prefer to define the .recognize() this way, rather than using await(). I know preserve_interword_spaces is supported since I can see it in the documentation here and here
but I'm not sure how to get it to work in my case.

Just an update that I was able to resolve the issue by changing to async(). As the documentation states, Tesseract.recognize() is only meant for quick tasks, not more involved ones.

Related

Angular: Routing between pages using condition

I am trying to route between pages using basic if condition in Angular.
GoToHome() {
if(this.router.url=='/chat'){
console.log(this.router.url)
this.router.navigate(['login']);
} else {
this.router.navigate(['people']);
}
}
The problem is that the route chat isn't really correct, there are many pages in chat (chat\x , chat\y and many others) I want that it will work for all the pages in chat, but right now it doesn't work. If I write a specific route like chat\x it does work, but only for x. Is there a way to do it for all?

you can read and check Guards. Read about CanActivate method, maybe it will help you?

RouteGuards might do a better job of handling the redirects as per your requirement.
But a quick workaround would be to do a split() on the URL and compare for the chat part. Try the following
if(((this.router.url).split('/')[1]) === 'chat') {
// proceed
}

As other had said, best solution is to use Angular Guard https://medium.com/#ryanchenkie_40935/angular-authentication-using-route-guards-bf7a4ca13ae3.
Anyway to resolve your problem you can use startsWith() function which determines whether a string begins with the characters of a specified string.
GoToHome() {
if((this.router.url).startsWith('/chat'){
console.log(this.router.url)
this.router.navigate(['login']);
} else {
this.router.navigate(['people']);
}
}

how to se table contents using simplehtmldom

i'm using a parsing library called 'simplehtmldom'. all i want to do is extract the textual contents of table cells. that's all! it seems so simple... everything i've tried results in the ENTIRE FRIGIN PAGE being dumped because apparently all of the primitives traverse the dom tree up, down, and sideways. here's a trivialised example of what i'm trying to do :
$saved = '';
foreach($html->find('tr') as $tr) {
foreach($tr->find('td') as $td) {
$contents = $td->plaintext;
if ($saved) {
echo "$saved : $contents<br>\n";
$saved = '';
}
if (strstr($contents, 'Title') || strstr($contents, 'Author')) {
$saved = $contents;
}
}
}
i've tried using 'plaintext', 'innertext', and 'text', but no matter what i try, i end up getting either endless loads of crap echoed out, or else nothing at all.
does anyone know how to use this parser ? or else could suggest an alternative to do what i want to do ?

CAVEAT - this is not really an answer, but rather an alternative.
i'm closing this question because i was able to solve the problem using a different approach, the DOM class, mentioned here. hopefully this will save someone some time if you're just looking for a way to get the contents of table cells and aren't constrained to a particular package or approach.

Android ListView binding programmatically

There are many examples of doing this in axml, but I would like to have a complete binding using code behind. To be honest, I would like to have NO axml, but seems like creating all the controls programmatically is a nightmare.
I first tried the suggestions at:
MvxListView create binding for template layout from code
I have my list binding from code-behind, and I get six rows (so source binding is working); but the cells itself does not bind.
Then at the following url:
Odd issue with MvvmCross, MvxListViewItem on Android
Stuart has the following comment: Have looked through. In this case, I don't think you want to use DelayBind. DelayBind is used to delay the binding action until next time the DataContext is set. In Android's MvxAdapter/MvxListItemView case, the DataContext is passed in the ctor - so DataContext isn't set again until the cell is reused. (This is different to iOS MvxTableDataSource).
So in essence, the only example I see shows DelayBind, which shouldn't work.
Can someone please show me some examples... thanks in advance.
Added reply to Comments:
Cheesebaron, first of all, a huge thank you and respect for all your contributions;
Now, why not use axml? Well, as programmers, we all have our own preferences and way of doing stuff - I guess I am old school where we didn't have any gui designer (not really true).
Real reasons:
Common Style: I have a setup where Core has all the style details, including what all the colors would be. My idea is, each platform would get the style details from core and update accordingly. It's easy for me to create controls with the correct style this way.
Copy-Paste across platform (which then I can even have as linked files if I wanted). For example, I have a login screen with web-like verification, where a red error text appears under a control; overall on that screen I have around 10 items that needs binding. I have already got iOS version working - so starting on Droid, I copied the whole binding section from ios, and it worked perfectly. So, the whole binding, I can make it same across all platform... Any possible error in my way will stop at building, which I think is a major advantage over axml binding. Even the control creation is extremely similar, where I have helpers with same method name.
Ofcourse I understand all the additional layout that has to be handled; to be honest, it's not that bad if one really think it through; I have created a StackPanel for Droid which is based on WP - that internally handles all the layouts for child views; so for LinearLayout, all I do is setup some custom parameters, and let my panel deal with it. Relative is a different story; so far, I have only one screen that's relative, and I can even make it Linear to reduce my additional layout code.
So, from my humble point of view, for my style, code-behind creation allows me to completely copy all my bindings (I do have some custom binding factories to allow that), copy all my control create lines; then only adding those controls to the view is the only part that is different (then again, droid and WP are almost identical). So there is no way I can miss something on one platform and all are forced to be the same. It also allows me to change all the styles for every platform just by changing the core. Finally, any binding error is detected during compile - and I love that.
My original question wasn't about NOT using axml... it was on how to use MvxListView where all the binding is done in code-behind; as I have explained, I got the list binding, but not the item/cell binding working.
Thanks again in advance.
Here is part of my LoginScreen from droid; I think it's acceptable amount of code for being without axml file.
//======================================================================================================
// create and add all controls
//======================================================================================================
var usernameEntry = ControlHelper.GetUITextFieldCustom(this, "Username.", maxLength: 20);
var usernameError = AddErrorLabel<UserAuthorization, string>(vm => ViewModel.Authorization.Username);
var passwordEntry = ControlHelper.GetUITextFieldCustom(this, "Password.", maxLength: 40, secureTextEntry: true);
var passwordError = AddErrorLabel<UserAuthorization, string>(vm => ViewModel.Authorization.Password);
var loginButton = ControlHelper.GetUIButtonMain(this);
var rememberMe = new UISwitch(this);
var joinLink = ControlHelper.GetUIButtonHyperLink(this, textAlignment: UITextAlignment.Center);
var copyRightText = ControlHelper.GetUILabel(this, textAlignment: UITextAlignment.Center);
var copyRightSite = ControlHelper.GetUIButtonHyperLink(this, textAlignment: UITextAlignment.Center);
var layout = new StackPanel(this, Orientation.Vertical)
{
Spacing = 15,
SubViews = new View[]
{
ControlHelper.GetUIImageView(this, Resource.Drawable.logo),
usernameEntry,
usernameError,
passwordEntry,
passwordError,
loginButton,
rememberMe,
joinLink,
ControlHelper.GetSpacer(this, ViewGroup.LayoutParams.MatchParent, weight: 2),
copyRightText,
copyRightSite
}
};

I just came across a similar situation myself using Mvx4.
The first link you mentioned had it almost correct AND when you combine it from Staurts comment in the second link and just remove the surrounding DelayBind call, everything should work out ok -
public class CustomListItemView
: MvxListItemView
{
public MvxListItemView(Context context,
IMvxLayoutInflater layoutInflater,
object dataContext,
int templateId)
: base(context, layoutInflater, dataContext, templateId)
{
var control = this.FindViewById<TextView>(Resource.Id.list_complex_title);
var set = this.CreateBindingSet<CustomListViewItem, YourThing>();
set.Bind(control).To(vm => vm.Title);
set.Apply();
}
}
p.s. I have asked for an Edit to the original link to help others.

Filtering the iNotes Calendar in extlib

I need too filter the iNotes calendar control in extlib. When I look in the examples in the extlib application I can see that it is suppose to be connected to a xecalendarJsonLegacyService.
The problem I find with this service is that I can't filter the content based on category or search as with the other view services.
I need to create different calendars/json data based on a search or category in a view.
I have looked at some of the other services but not sure if it is possible to use them instead.
If you have any ideas for how I should create my filter, please respond.
I have attached pictures below showing both the jsonservice and the calendarcontrol.
This is what the json data look like in the xsCalendarJsonLegacyService
{
"#timestamp":"20120311T171603",
"#toplevelentries":"3",
"viewentry":
[
{
"#unid":"37F0330979C04AF2C12579BE004F5629",
"#noteid":"32E1A",
"#position":"1",
"#read":"true",
"#siblings":"3",
"entrydata":
[
{
"#columnnumber":"0",
"#name":"$134",
"datetime":
{
"0":"20120314T100000"
}
},
{
"#columnnumber":"1",
"#name":"$149",
"number":
{
"0":119
}
}, etc...

You could implement your own REST service (or extension to existing one) in an extension library, but I guess you are looking for something easier.

Sorry no code, but maybe (and hopefully) an answer.
Have you looked at the xc:CalendarStoreCustomRestService custom control inside the Xpages Extension Library demo? It looks like they connected the calendar control with a normal JSON view store and that supports search en keys.

I found code you could use but you will have to extend the custom control. I think it is a new component that is not yet included as a xe: component inside the Extension Library.
This is how you use the control:
<xc:CalendarStoreCustomRestService id="cc4ccCalendarStoreCustomRestService"
storeComponentId="notesCalendarStore1" databaseName="#{sessionScope.databaseName}"
viewName="($Calendar)">
</xc:CalendarStoreCustomRestService>
This is your calendar component, it uses the above storeComponentId.
<xe:calendarView id="calendarView1" jsId="cview1"
summarize="false"
type="#{javascript: null == viewScope.calendarType? 'M' : viewScope.calendarType }"
storeComponentId="notesCalendarStore1">
<xe:this.loaded><![CDATA[${javascript:if (sessionScope.databaseName == null) {
return false;
} else {
return true;
}}]]></xe:this.loaded>
</xe:calendarView>
If you need some more info, this example is included inside the DWA_iNotesRest.xsp.

I googled a long time and the only solution I`ve found is to build your own Rest service
have you managed to filter the Calendar without this?

Stylistic Question: Use of White Space

I have a particularly stupid insecurity about the aesthetics of my code... my use of white space is, frankly, awkward. My code looks like a geek dancing; not quite frightening, but awkward enough that you feel bad staring, yet can't look away.
I'm just never sure when I should leave a blank line or use an end of line comment instead of an above line comment. I prefer to comment above my code, but sometimes it seems strange to break the flow for a three word comment. Sometimes throwing an empty line before and after a block of code is like putting a speed bump in an otherwise smooth section of code. For instance, in a nested loop separating a three or four line block of code in the center almost nullifies the visual effect of indentation (I've noticed K&R bracers are less prone to this problem than Allman/BSD/GNU styles).
My personal preference is dense code with very few "speed bumps" except between functions/methods/comment blocks. For tricky sections of code, I like to leave a large comment block telling you what I'm about to do and why, followed by a few 'marker' comments in that code section. Unfortunately, I've found that some other people generally enjoy generous vertical white space. On one hand I could have a higher information density that some others don't think flows very well, and on the other hand I could have a better flowing code base at the cost of a lower signal to noise ratio.
I know this is such a petty, stupid thing, but it's something I really want to work on as I improve the rest of my skill set.
Would anyone be willing to offer some hints? What do you consider to be well flowing code and where is it appropriate to use vertical white space? Any thoughts on end of line commenting for two or three words comments?
Thanks!
P.S.
Here's a method from a code base I've been working on. Not my best, but not my worst by far.
/**
* TODO Clean this up a bit. Nothing glaringly wrong, just a little messy.
* Packs all of the Options, correctly ordered, in a CommandThread for executing.
*/
public CommandThread[] generateCommands() throws Exception
{
OptionConstants[] notRegular = {OptionConstants.bucket, OptionConstants.fileLocation, OptionConstants.test, OptionConstants.executable, OptionConstants.mountLocation};
ArrayList<Option> nonRegularOptions = new ArrayList<Option>();
CommandLine cLine = new CommandLine(getValue(OptionConstants.executable));
for (OptionConstants constant : notRegular)
nonRegularOptions.add(getOption(constant));
// --test must be first
cLine.addOption(getOption(OptionConstants.test));
// and the regular options...
Option option;
for (OptionBox optionBox : optionBoxes.values())
{
option = optionBox.getOption();
if (!nonRegularOptions.contains(option))
cLine.addOption(option);
}
// bucket and fileLocation must be last
cLine.addOption(getOption(OptionConstants.bucket));
cLine.addOption(getOption(OptionConstants.fileLocation));
// Create, setup and deploy the CommandThread
GUIInteractiveCommand command = new GUIInteractiveCommand(cLine, console);
command.addComponentsToEnable(enableOnConnect);
command.addComponentsToDisable(disableOnConnect);
if (!getValue(OptionConstants.mountLocation).equals(""))
command.addComponentToEnable(mountButton);
// Piggy-back a Thread to start a StatReader if the call succeeds.
class PiggyBack extends Command
{
Configuration config = new Configuration("piggyBack");
OptionConstants fileLocation = OptionConstants.fileLocation;
OptionConstants statsFilename = OptionConstants.statsFilename;
OptionConstants mountLocation = OptionConstants.mountLocation;
PiggyBack()
{
config.put(OptionConstants.fileLocation, getOption(fileLocation));
config.put(OptionConstants.statsFilename, getOption(statsFilename));
}
#Override
public void doPostRunWork()
{
if (retVal == 0)
{
// TODO move this to the s3fronterSet or mounts or something. Take advantage of PiggyBack's scope.
connected = true;
statReader = new StatReader(eventHandler, config);
if (getValue(mountLocation).equals(""))
{
OptionBox optBox = getOptionBox(mountLocation);
optBox.getOption().setRequired(true);
optBox.requestFocusInWindow();
}
// UGLY HACK... Send a 'ps aux' to grab the parent PID.
setNextLink(new PSCommand(getValue(fileLocation), null));
fireNextLink();
}
}
}
PiggyBack piggyBack = new PiggyBack();
piggyBack.setConsole(console);
command.setNextLink(piggyBack);
return new CommandThread[]{command};
}

It doesn't matter.
1) Develop a style that is your own. Whatever it is that you find easiest and most comfortable, do it. Try to be as consistent as you can, but don't become a slave to consistency. Shoot for about 90%.
2) When you're modifying another developer's code, or working on a group project, use the stylistic conventions that exist in the codebase or that have been laid out in the style guide. Don't complain about it. If you are in a position to define the style, present your preferences but be willing to compromise.
If you follow both of those you'll be all set. Think of it as speaking the same language in two different ways. For example: speaking differently around your friends than you do with your grandfather.

It's not petty to make pretty code. When I write something I'm really proud of, I can usually take a step back, look at an entire method or class, and realize exactly what it does at a glance - even months later. Aesthetics play a part in that, though not as large of a part as good design. Also, realize you can't always write pretty code, (untyped ADO.NET anyone?) but when you can, please do.
Unfortunately, at this higher level at least, I'm not sure there are any hard rules you can adhere to to always produce aesthetically pleasing code. One piece of advice I can offer is to simply read code. Lots of it. In many different frameworks and languages.

I like to break up logical "phrases" of code with white space. This helps others easily visualize the logic in the the method - or remind me when I go back and look at old code. For example, I prefer
reader.MoveToContent();
if( reader.Name != "Limit" )
return false;
string type = reader.GetAttribute( "type" );
if( type == null )
throw new SecureLicenseException( "E_MissingXmlAttribute" );
if( String.Compare( type, GetLimitName(), false ) != 0 )
throw new SecureLicenseException( "E_LimitValueMismatch", type, "type" );
instead of
reader.MoveToContent();
if( reader.Name != "Limit" )
return false;
string type = reader.GetAttribute( "type" );
if( type == null )
throw new SecureLicenseException( "E_MissingXmlAttribute" );
if( String.Compare( type, GetLimitName(), false ) != 0 )
throw new SecureLicenseException( "E_LimitValueMismatch", type, "type" );
The same break can almost be accomplished with braces but I find that actually adds visual noise and reduces the amount of code that can be visually consumed simultaneously.
Commens on code line
As for comments at the end of the line - almost never. The're not really bad, just easy to miss when scanning through code. And they clutter up the line taking away from the code making it harder to read. Our brains are already wired to grok line by line. When the comment is at the end of the line we have to split the line into two concrete concepts - code and comment. I say if it's important enough to comment on, put it on the line proceeding the code.
That being said, I do find one or two line hint comments about the meaning of a specific value are sometimes OK.

I find code with very little whitespace hard to read and navigate in, since I need to actually read the code to find logical structure in it. Clever use of whitespace to separate logical parts in functions can increase the ease of understanding the code, not only for the author but also for others.
Keep in mind that if you are working in an environment where your code is likely to be maintained by others, they will have spent the majority of their time looking at code that was not written by you. If your style distinctly differs from what they are used to seeing, your smooth code may be a speed bump for them.

I minimize white space. I put the main comment block above the code block and Additional end of line comments on the Stuff that may not be obvious to another dveloper. I think you are doing that already

My preferred style is probably anathema to most developers, but I will add occasional blank lines to separate what seem like appropriate 'paragraphs' of code. It works for me, nobody has complained during code reviews (yet!), but I can imagine that it might seem arbitrary to others. If other people don't like it I'll probably stop.

The most important thing to remember is that when you join an existing code base (as you almost always will in your professional career) you need to adhere to the code style guide dictated by the project.
Many developers, when starting a project afresh, choose to use a style based on the Linux kernel coding-style document. The latest version of that doc can be viewed at http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/CodingStyle;h=8bb37237ebd25b19759cc47874c63155406ea28f;hb=HEAD.
Likewise many maintainers insist that you use Checkpatch before submitting changes to version control. You can see the latest version that ships with the Linux kernel in same tree I linked to above at scripts/checkpatch.pl (I would link to it but I'm new and can only post one hyperlink per answer).
While Checkpatch is not specifically related to your question about whitespace usage, it will certainly help you eliminate trailing whitespace, spaces before tabs, etc.

Code Complete, by Steve McConnell (available in the usual locations) is my bible on this sort of thing. It has a whole chapter on layout and style that is just excellent. The whole book is just chock full of useful and practical advice.

I use exactly the same amount of whitespace as you :) Whitespace before methods, before comment blocks. In C, C++ the brackets also provide some "pseudo-whitespace" as there is only a single opening/closing brace on some lines, so this also serves to break up the code density.

Your code is fine, just do what you (and others you might work with) are comfortable with.
The only thing I see wrong with some (inexperienced) programmers about whitespace is that they can be afraid to use it, which is not true in this case.
I did however notice that you did not use more than one consecutive blank line in your sample code, which, in certain cases, you should use.

Here is how I would refactor that method. Things can surely still be improved and I did not yet refactor the PiggyBack class (I just moved it to an upper level).
By using the Composed Method pattern, the code becomes easier to read when it's divided into methods that each do one thing and work on a single level of abstraction. Also less comments are needed. Comments that answer to the question "what" are code smells (i.e. the code should be refactored to be more readable). Useful comments answer to the question "why", and even then it would be better to improve the code so that the reason will be obvious (sometimes that can be done by having a test that will fail without the inobvious code).
public CommandThread[] buildCommandsForExecution() {
CommandLine cLine = buildCommandLine();
CommandThread command = buildCommandThread(cLine);
initPiggyBack(command);
return new CommandThread[]{command};
}
private CommandLine buildCommandLine() {
CommandLine cLine = new CommandLine(getValue(OptionConstants.EXECUTABLE));
// "--test" must be first, and bucket and file location must be last,
// because [TODO: enter the reason]
cLine.addOption(getOption(OptionConstants.TEST));
for (Option regularOption : getRegularOptions()) {
cLine.addOption(regularOption);
}
cLine.addOption(getOption(OptionConstants.BUCKET));
cLine.addOption(getOption(OptionConstants.FILE_LOCATION));
return cLine;
}
private List<Option> getRegularOptions() {
List<Option> options = getAllOptions();
options.removeAll(getNonRegularOptions());
return options;
}
private List<Option> getAllOptions() {
List<Option> options = new ArrayList<Option>();
for (OptionBox optionBox : optionBoxes.values()) {
options.add(optionBox.getOption());
}
return options;
}
private List<Option> getNonRegularOptions() {
OptionConstants[] nonRegular = {
OptionConstants.BUCKET,
OptionConstants.FILE_LOCATION,
OptionConstants.TEST,
OptionConstants.EXECUTABLE,
OptionConstants.MOUNT_LOCATION
};
List<Option> options = new ArrayList<Option>();
for (OptionConstants c : nonRegular) {
options.add(getOption(c));
}
return options;
}
private CommandThread buildCommandThread(CommandLine cLine) {
GUIInteractiveCommand command = new GUIInteractiveCommand(cLine, console);
command.addComponentsToEnable(enableOnConnect);
command.addComponentsToDisable(disableOnConnect);
if (isMountLocationSet()) {
command.addComponentToEnable(mountButton);
}
return command;
}
private boolean isMountLocationSet() {
String mountLocation = getValue(OptionConstants.MOUNT_LOCATION);
return !mountLocation.equals("");
}
private void initPiggyBack(CommandThread command) {
PiggyBack piggyBack = new PiggyBack();
piggyBack.setConsole(console);
command.setNextLink(piggyBack);
}

For C#, I say "if" is just a word, while "if(" is code - a space after "if", "for", "try" etc. doesn't help readability at all, so I think it's better without the space.
Also: Visual Studio> Tools> Options> Text Editor> All Languages> Tabs> KEEP TABS!
If you're a software developer who insists upon using spaces where tabs belong, I'll insist that you're a slob - but whatever - in the end, it's all compiled. On the other hand, if you're a web developer with a bunch of consecutive spaces and other excess whitespace all over your HTML/CSS/JavaScript, then you're either clueless about client-side code, or you just don't give a crap. Client-side code is not compiled (and not compressed with IIS default settings) - pointless whitespace in client-side script is like adding pointless Thread.Sleep() calls in server-side code.

I like to maximize the amount of code that can be seen in a window, so I only use a single blank line between functions, and rarely within. Hopefully your functions are not too long. Looking at your example, I don't like a blank line for an open brace, but I'll have one for a close. Indentation and colorization should suffice to show the structure.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Using 'preserve_interword_spaces' in tesseract.js - ocr

Just an update that I was able to resolve the issue by changing to async(). As the documentation states, Tesseract.recognize() is only meant for quick tasks, not more involved ones.

Related

Angular: Routing between pages using condition

how to se table contents using simplehtmldom

Android ListView binding programmatically

Filtering the iNotes Calendar in extlib

Stylistic Question: Use of White Space

Categories

Resources