I’m pretty new with CBLite, I inherited a code that heavily uses FTS and all looked good until we discovered if we search for some words containing dash - like A-Something the query execution takes 4-5 times longer than usual. I couldn’t figure out why this happens and how I could find a solution on my end (mobile CBLite for the iOS platform).
Also, the search for some words that contain parenthesis (, won't work at all, the result is always empty for those items.
Related
After we had performed an upgrade of Rails, we have noticed an issue (or rather a weird behavior) with our application. Specifically, the upgrade from Rails 3 to newer version 4.
The scenario is as follows:
In time to time, when we access the web page, the very first request ends up with page loaded, however HTML is rendered as a plain text (the top part of the figure). After I press F5, the page loads correctly (the bottom part of the figure) and so it does subsequent times (you may see it in the figure below), simply works as expected.
This behavior I consider completely random as there is not deterministic time or situation I can clearly predict it to happen again:
it sometimes happens, when I access the page after longer period of time (a day or two)
it does not happen at all in rare cases (we have carried out tests on various browsers and machines)
Wish I could provide more information on the matter. However, I feel lost at the moment and have no idea, what kind of other relevant application details would be beneficial to resolve the problem.
Either your thoughts or suggestions would be more than appreciated. Thank you.
Edit:
Well, it seems like the behavior takes place once we run the application in the production environment. Additionally, the statement above regarding occurrence of the issue is no more true as it happens on any browser and any machine at the time.
Short version:
What is the proper way to list/query files by suffix, "fullText contains 'ext', "fileExtension = 'ext'" or "title contains 'ext'"? These do not always return the same results; only one of them is documented (the first), and it's not consistent.
Long versions:
I've been developing Google Drive apps for years. Every now and then I have to change to my list queries to get the correct results. My application needs to find files with a certain suffix. Official documentation indicates that I need to use the "fullText contains 'ext'" syntax, but sometimes this fails to find some files. At one time I switched to the undocumented "fileExtension = 'ext'" syntax, but again after some time I found files that wouldn't show up and went back to fullText searches. However, again I've seen files not showing up with that search and tried using "title contains 'ext'" (or v3 "name contains 'ext'") which seems to work, but for how long? I don't like using undocumented queries which might just suddenly stop working.
I feel like I'm going in circles since I don't know why fullText fails (and only for some users, and when it does work I've seen the parents field come up empty sometimes...which doesn't happen with other queries) or why the title search works (not documented to search suffixes...and I'm pretty sure it didn't used to work). I might just perform all three searches, but this affects performance, and the "or" keyword with some combinations of those three searches returns no results at all.
My application has thousands of files, each with multiple revisions, in hundreds of folders and each folder is shared with dozens of users and those permissions are changing on a regular basis as people are added and removed from projects. There are hundreds of different owners of the individual files. I suspect this complexity and the time it takes to propagate permissions and file changes affects my queries, but doesn't explain why one search would work and another wouldn't or why the information returned on a file in one query would be different from the other. That is, even after several days the problem doesn't correct itself and often a file must be remove and re-uploaded for everyone to see it. I have experienced the slow updates to meta data for shared files resulting in mismatches between meta data, files, and search results, but I take all of that into account and still have queries which simply won't work properly.
Maybe I'm expecting too much from a free API? Overall I'm very happy with what i can do, but it can be very frustrating when it's not working and you know you're doing it right! :)
You can search or filter files with the 'files.list' or 'children.list' methods of the Drive API. These methods accept the 'q' parameter which is search query.
For more information, see: https://developers.google.com/drive/v3/web/search-parameters
I'm using Chrome's tts service in my extension.
According to the chrome.tts documentation:
The maximum length of the text is 32,768 characters.
However, when I pass string that have more than 250 characters the engine will not read all utterance (it will just stop reading it in the middle of the word). I'm now wondering if this is a bug or this is by design. Web speech API have similar character limit described in the spec and it behaves the same way.
I'd like to know if I'm doing something wrong or it only depends on TTS Engine in the browser and I can't do anything with it?
I have read official documentations and how you reported it is written that:
The maximum length of the text is 32,768 characters.
Maybe they refer to SSML file max length!
I used chrome api for a while and I can say to you that:
The breaking of the utterances only happens when the voice is not a native voice,
The cutting out usually occurs between 200-300 characters,
When it does break you can un-freeze it by doing speechSynthesis.cancel();
The 'onend' event sometimes decides not to fire. A quirky work-around to this is to console.log() out the utterance object before speaking it. Also I found wrapping the speak invocation in a setTimeout callback helps smooth these issues out.
I resolved the problem splitting text:
Firstly in sentences (trying non to split them and maintain prosody);
If the first step is not sufficient split the sentence again in sub-sentences;
So I suggest to fix the problem this way.
I really enjoy using Chrome's URL bar because it remembers commonly-visited sites and often suggests a good completion based on what I've typed and/or visited before. So, for example, I can type t in the URL bar and Chrome will automatically fill it in with twitter.com, or I can type maps and Chrome will fill in the .google.com. This gives me the convenience of data-driven domain name shortcuts without having to maintain an explicit list.
What I'm wondering, though, is how Chrome determines that an old shortcut should be replaced with a new one. For example, if I visit twitter.com often, then that becomes the completion when I type t. But if I then start visiting twilio.com often enough, then, after some time, Chrome will start to fill that in as the default completion for t. What I can't figure out is how or when that transition takes place. It also seems that there are (at least) two cases involved : one for domain names, and another for path strings, because if I visit a certain full URL often, and then want to get to the root of the same domain, I end up having to type the entire domain name out to get Chrome to ignore the full-URL completion.
If I had to guess, I'd imagine that Chrome stores the things that I type in the URL bar in a trie whose values are the number of times that a particular string has been typed (and/or visited ?). Then I'd imagine it has some sort of exponential decay model for the "counts" in the trie. But this is just a guess. Does anyone know how this updating process happens ?
Well, I ended up finding some answers by having a look at the Chromium source code ; I'd imagine that Chrome itself uses this code without too much modification.
When you type something into the search/URL bar (which is apparently called the "Omnibox"), Chrome starts looking for suggestions and completions that match what you've typed. To do this, there are several "providers" registered with the browser, each of which knows how to make a particular type of suggestion. The URL history provider is one of these.
The querying process is pretty cool, actually. It all happens asynchronously, with particular attention paid to which activity happens in which thread (the main thread being especially important not to block). When the providers find suggestions, they call back to the omnibox, which appears to merge and sort things before updating the UI widget.
History provider
It turns out that URLs in Chrome are stored in at least one, and probably two, sqlite databases (one is on disk, and the second, which I know less about, seems to be in memory).
This comment at the top of HistoryURLProvider explains the lookup process, complete with multithreaded ASCII art !
Sqlite lookup
Basically, typing in the omnibox causes sqlite to run this SQL query for looking up URLs by prefix. The suggestions are ordered by the number of visits to the URL, as well as by the number of times that a URL has been typed.
Interestingly, this is not a trie ! The lookup is indeed based on prefix, but the scoring of those lookups does not appear to be aggregated by prefix, like I'd imagined.
I had a little less success in determining how the scores in the database are updated. This part of the code updates a URL after a visit, but I haven't yet run across where the counts are decremented (if at all ?).
Updating suggestions
What I think is happening regarding the updating of suggestions -- and this is still just a guess right now -- is that the in-memory sqlite database essentially has priority over the on-disk DB, and then whenever Chrome restarts or otherwise flushes the contents of the in-memory DB to disk, the visit and typed counts for each URL get updated at that time. Again, just a guess, but I'll keep looking as I get time.
The code is really nice to read through, actually. I definitely recommend it if you have similar questions about Chrome.
I have to pull two pre-printed (not hand-written) fields out of a paper form, such that it can be automatically routed after being scanned. The fields contain batch and item identifiers, like "GG-9192" or "EPN/245G".
I've tried the following software:
Tesseract-OCR
Cuneiform
Canon ImageRunner built-in OCR
Asprise OCR Java API (demo)
I've tried the following settings:
Scanning at resolutions of 300dpi and 600dpi
Tried different fonts, including OCR-A and OCR-B.
In all cases output was pretty much all over the place. I can kick back documents for which I can't properly extract the necessary information, but I'm thinking it's going to be at least half of them. I considered some sort of fuzzy logic based on known values in a database, but sometimes these identifiers can differ by a single character, like "123G" and "123C".
Is this a lost cause? Perhaps OCR just isn't mature enough to handle a requirement of this nature? What other techniques might you recommend? Barcodes?
Edit: the containing application is in Java, so any recommendations for which there are free or cheap Java-based APIs for would help.
Edit 2: if anyone is interested...without any special tuning, Cuneiform for Linux and the Canon ImageRunner worked best, with Tesserect-OCR and Asprise Java API producing the worst results...none of the four was acceptable for anything but standard document search grade OCR. I'm beginning to think that this isn't going to work out.
If you have control over the fields, why use a human-readable format in the first place? For scanning, it seems like a QR Code, or something similar would be best. It is marked for orientation, and has some built-in error correction.
http://en.wikipedia.org/wiki/QR_Code
I started digging for products starting with Tomato's suggestion. I tried ABBYY and CVISION. Both have products that can automate OCR:
CVISION Maestro Recognition Server 4.0
ABBYY Recognition Server 2.0
In addition, ABBYY has SDKs for various platforms, and CVISION has an SDK that appears to work with at least VB/VC++.
I haven't tried either SDK yet, and am not sure it's necessary for my project. All I need is PDFs coming in that I can extract the text from. I did however try CVISION's server product and with the OCR on its most accurate settings, it worked really well. I haven't tried ABBYY's server product yet because I have to go through a reseller to get a trial. I'm in the process of doing so, but if it starts getting annoying I'm probably going to go with CVISION. I did try ABBYY's FineReader standalone product, and it worked very well, so I assume that their server product would also.