I want to search for "Cole" in FileMaker. When I search for that string, I want to find entries like "Čole". When I use the internal search function of FileMaker, this entry does not show in the results.
It depends on the language you have selected for indexing the searched field. For example, if the selected language is Czech or Unicode, then you will get the behavior you describe. When the language is English or Default, you will get the behavior you expect.
Related
I'm using MySQL, and I am trying to find common strings over a given character length within a series of messages that are highly dynamic, Each message may have a common phrase, but they will be appended with reference codes or names that don't match a specific format on either side of the string. for example, this is an example of the types of common phrases I'm trying to scan for, but has dynamic content embedded as well, and in different formats (https://screencast.com/t/rlABTWitQ)
The end result I am looking for is something akin to this (https://screencast.com/t/qXzrGNFuf)
Because of the highly variable nature of the formats of these messages, uses of substring_index and regexp (as much as my amateur familiarity with REGEXP has taken me), I can't seem to get anything going
SELECT LEFT("first_middle_last", CHAR_LENGTH("first_middle_last") - LOCATE('_', REVERSE("first_middle_last")));
I can't use something like this, as it would just strip out on a specific type of character. As you can see, the types of strings are too variant in format
I need to test the working of Box Net search in my application. For this I need more information about the search pattern. I see search results are compared with both file title and content.
Search is showing different behaviour when I have file names with special characters? Will search work when I have special characters as file names?
Following is the query I am using
boxSearch = client.getSearchManager().search(searchFileName, boxDefaultRequestObject);
Can you share me the pattern used during search and characters allowed and in what character combination results are seen?
Here are some resources on search:
https://support.box.com/hc/en-us/articles/200519888-How-do-I-search-for-files-and-folders-in-Box-
Box's search returns folder/file names and content, and it also accepts booleans. Just don't use mixed case (aNd is NOT okay, while AND or and is okay).
Box also accepts special characters in uploads and search. See the description here, as this was a fairly recent product update that came in mid-2013.
Additional special character support – Box will add support for more types of special characters across the Box website, desktop and mobile apps. Once the change is live, Box products will support almost all printable characters (except / \ or empty file names; also will not support leading or trailing spaces on files and folders).
I would like to have a list of pages existing in wiktionary for a given word.
The case : I search the definition of the word მამა (means dad) in georgian. There is no page for this word in the georgian wiktionary so I would like to have a list of all synonyms.
I have searched and made test with this page :
https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&prop=revisions&format=json&rvprop=ids&rvlimit=10&titles=Foo&titles=Foo
Any idea?
Thanks for the help
There is no API for Wiktionary at the dictionary level, only at the wiki level, which is probably not much help. See this question for more details: How to retrieve Wiktionary word content?
So there is no API that will allow you to query for synonyms or even to query for Georgian language entries or data.
You can fetch Wiktionary pages either in raw wikitext or in HTML, and you can download entire database dumps in an XML format, which you can try to parse from scratch.
One thing that will help in the case of Georgian is that is has a unique alphabet so the vast majority of Wiktionary entries in the Georgian script will be for the Georgian language. (But there are also a very small number of entries for words in Laz, Mingrelian, Old Georgian, and Svan which will also use the Georgian script.)
I am using a list of keywords to put into a meta tag for a localized Japanese website (based on an English one). The site is for the Japanese regional branch of a client, so the native speakers from the branch have translated the keywords themselves for us. When we sent the list to them, we had it organized with commas, as one would typically do for keywords:
foo, bar, baz
However, it seems that the Japanese language (in which I have pretty much 0 expertise) has its own comma character, and they used that character when translating the list of keywords. So something akin to the above (from Google translate, used purely for example and not for translation accuracy, for curious people I used fu instead of foo) would come out as something like:
フー、バー、バズ
This uses the Japanese comma character, 、, instead of a normal Latin comma, ,.
Will this affect how the keywords are used? Is 、 preferred for separating the keyword tokens for Japanese-targeted SEO, or is ,?
I searched through Google for some hint at what to do, but any pages I found dealing with localized keyword text were either using a Latin-based alphabet (like French), or for a couple of Japanese ones I found did not actually display any examples that might have even suggested which comma character to use (they really only talked about not using literal translations, which we've already done by having native speakers translate the content). The one place I found with an essentially duplicate question to mine was the forum question posted here, but it has no answers (and isn't likely to get any since it's 1.5 years old...).
Note: I've seen talk about the lack of use of keywords by SEO engines. The client wants keywords, though, so we will be doing keywords, meaning there's little use in bringing up this point in comments/answers.
The keywords have to be separated by the , character, no matter which language the keywords are in. For keywords it is defined that the value "must be a set of comma-separated tokens", which is defined as:
[…] a string containing zero or more tokens each separated from the next by a single "," (U+002C) character […]
Note that this , is not part of the keywords. It's like a reserved character. If a keyword itself should contain a ,, it would have to be encoded (for example as ,).
If you hand over keywords for translation, you shouldn't include the separator character (unless it is part of the keyword itself).
So better send the translator a list like …
foo
bar
baz
… instead of "foo, bar, baz".
When using MySQL full text search in boolean mode there are certain characters like + and - that are used as operators. If I do a search for something like "C++" it interprets the + as an operator. What is the best practice for dealing with these special characters?
The current method I am using is to convert all + characters in the data to _plus. It also converts &,#,/ and # characters to a textual representation.
There's no way to do this in nicely using MySQL's full text search. What you're doing (substituting special characters with a pre-defined string) is the only way to do it.
You may wish to consider using Sphinx Search instead. It apparently supports escaping special characters, and by all reports is significantly faster than the default full text search.
MySQL is fairly brutal in what tokens it ignores when building its full text indexes. I'd say that where it encountered the term "C++" it would probably strip out the plus characters, leaving only C, and then ignore that because it's too short. You could probably configure MySQL to include single-letter words, but it's not optimised for that, and I doubt you could get it to treat the plus characters how you want.
If you have a need for a good internal search engine where you can configure things like this, check out Lucene which has been ported to various languages including PHP (in the Zend framework).
Or if you need this more for 'tagging' than text search then something else may be more appropriate.