I have a database of strings containing names mixed with and colour codes (such as §2, §6, §a), for example.
§2joe (joe)
However some colour codes are mixed in the word, for example.
§4ha§6rr§ay (harry)
Lets say I want to select everyone from the database with a name of Harry, regardless of their colour codes.
My current solution is to add % to each letter of the search. For example.
SELECT * FROM people WHERE name LIKE '%h%a%r%r%y%';
This works for names that start with a colour code, but does not seem to always work with names mixed with colour codes.
Is there a better way for me to do the query?
There is probably a better way to store the data.
Split the formatting from the actual data (save two fields). For example. on insert save a "clean" version of the data, along with a
"formatted" version.
There's no reason the wild cards shouldn't work, but they will match more than just "harry." They will also be much slower.
SELECT * FROM people WHERE REPLACE(name,'§a','') LIKE '%h%a%r%r%y%';
Taking out problem color codes may get what you want. That is if you don't have a color code for every letter of the alphabet
Related
I write in Python, using pytesseract or direct Popen calls if needed.
I try to OCR a document with irregular structure, a letter looking like this:
The problem is in the .hocr file generated by Tesseract I get lines consisting of left and right column glued together like "Recipient: Sender:"
What I'd like to achieve is output from the left and right column separated. Using third party Python utilities to pre-process the image is an acceptable solution if explained in reasonable detail. The script must be autonomous and somehow detect this issue as not all the letters have such strange formatting.
Tried/ideas:
Using --psm 1 to allow input format detection - no improvement over default, likely because structure is too complicated.
Tweaking some config file options like gapmap_use_ends and textord_words_maxspace - I couldn't find a good documentation on these and probably there is a right combination of values but there are 57 options with "space" in name... any insight on these would be much appreciated.
Editing the .hocr - not sure how to write appropriate grouping rules for the word boxes that do not interfere with normal text everywhere else...
I have a similar question to the one asked here.
HTML select tag autocomplete
A list on a website I use had a large (~20,000) number of entries. So when I highlight an option and start typing to find an entry I'm looking for, the browser (Chrome) can't find the option quickly enough. If I tried to find an entry, for example, called 'Apple', I would begin typing the word and the list would highlight an entry beginning with 'A', then another entry beginning with 'P' and so on. It is able to find strings of characters (eg, an entry beginning with 'Ap') but only if I type at a very specific speed.
My question is, as a user, are there any settings, browser or otherwise, that I could access to allow me to search this list? Perhaps to change the speed I need to type in order to search for strings of characters. In Chrome's settings (and advanced settings), there appear to be no settings related to this.
Thanks.
As a user, there's not much that can be done. You don't have access to the underlying data structures of the <select> element, so the browser has to search through the <option>s one at a time.
As a developer, the trick is to not search the <select> box. Instead, use it as the raw data to build a searchable data structure when the box first loads, and then run your searches on that instead. The result of a search should contain the appropriate index into the <select> box, and then you just select that.
A trie (not to be confused with a binary search tree) might work well for something like this. At each node in the trie, you store the index of the first <option> element whose prefix matches the string up to that point. Then you branch off to child nodes corresponding to the next character in the string. John Resig, of jQuery fame, did some work with JavaScript tries a few years ago. He was using it on a text dictionary, but it should be adaptable to something like this.
I want to do a query that matches anything containing the characters 0xFB50-0xFDFF (Arabic Presentation Forms-A) and 0xFE70-0xFEFF (Arabic Presentation Forms-B). I have tried various things, including simple REGEXP with those characters enclosed in [] with a dash in the middle (e.g., [ݐ-ݭ]) but it seems to return everything with Arabic in it, even if it's not in the "presentation form" range. I was wondering if there was something like:
SELECT column FROM db WHERE CHAR(0xFE70) THROUGH CHAR(0xFEFF);
Obviously there is no "through" operator, but that's my pseudo-code :)
Thanks!
Found the answer from another article here. I decided MySQL's regex engine was not clever enough to do what I wanted to natively, so I used the PCRE functionality of PHP...
$arabic_presentation_forms = "[\x{fb50}-\x{feff}]";
preg_match("/$arabic_presentation_forms/u",$db_output);
Worked well.
I found a tutorial about adding hexadecimal icons to the front of html buttons, and would like to see a list of all possible icons.
Is there a list somewhere, or do I have to manually check each number to find out what is available to use?
Here is a sample of one of them:
.save:before, input[type="submit"]:before {content: "\2714";}
And here is the tutorial I got it from:
http://www.red-team-design.com/just-another-awesome-css3-buttons
They are Unicode code points and there are quite a few of them :-)
Those particular ones are related to Windows Dingbats characters.
You can change the 2714 in that link to get the other ones, or you can select individual ones from the entire Dingbats block (warning, this may take some time to load due to the large number of images).
The hex codes are Unicode characters. Here's a list of all Unicode dingbat characters. Wherever that page lists U+somenumber, rewrite it as \somenumber in the CSS.
Here is a site where you can find the unicode of some characters:
http://unicode-table.com/
I really like graphemica.com for detailed information, but shapecatcher.com can be really useful since it lets you draw out the symbol you're after.
If you can copy the character, you can paste it into the entity conversion calculator and it will spit out the numeric value, as well as values you can use in css and js. It also looks up characters that you know the decimal value.
basically I have a table of keywords and posts I want tagged with attributes on the display. like I want to draw a green border if #green# is present in the post. Is there a clean way for the DB to do this internally? I am prepared to do it all in C++ by fetching the entire table of keywords and throwing it in a trie and scanning each word, but this approach seems a bit inelegant.
You are mixing storage and display concerns. There be dragons. You are looking for content like '%#green#%'. It would be better to set a bit column or some other flag as the content was inserted or updated from the application side. When reading, retrieve this information along with the content. Let your display logic do the colouring.
Take a look at "separation of concerns" (SoC) as part of the S.O.L.I.D. practices.
Hope this helps.