Lilypond: Accidentals #3 and b2 for Turkish music - lilypond

I'm transcribing some music for bağlama, a stringed instrument with frets that can produce notes that are not part of traditional Western music.
I'd like to transcribe some notes using accidentals ♭2 and ♯3. Is there a way to do so in Lilypond?

LilyPond is indeed capable of notating non-Western music and is already set for notating Turkish classical music. Please refer to the two pages below:
http://lilypond.org/doc/v2.18/Documentation/notation/common-notation-for-non_002dwestern-music
and
http://lilypond.org/doc/v2.18/Documentation/notation/turkish-classical-music

Related

Optical recognition of text and analysis of its structure (title, subtitle, text body)

We wish to analyze scans of documents with text (non-handwritten) and images with very broad range of arrangements/structures in different languages. The first problem we try to solve, is extracting text and identifying and separating titles, subtitles and text bodies.
At the moment we are doing a literature research. There is plenty of literature about deep learning, computer vision, optical character recognition or natural language processing but none of these are actually focused on optical recognition of the structure of text.
We wonder, what is the name of the discipline/field that deals with optical recognition of structure of text?
What are the state-of-the-art approaches and tools for solving these problems?
Optical Layout Recognition (OLR). A good example of an open-source tool for Layout Analysis and Region Extraction can be found here.

Is there a name for font families (such as fangchan-secret) that are used to prevent web scraping?

In trying to scrape some data from the website of a housing agency in China (the name of the agency is Anjuke) to gather data for a small personal project I realized that all of the numbers on the website are visually displayed as numbers, but are digitally read as obscure Chinese characters.
Is there a name for this kind of a font or this kind of a technique more specific than "anti-scraping measures"?
Additional information about this specific case:To see this in action you can click on any of the listings from the Anjuke website, and then attempt to copy-and-paste the price (or any HTML element that has the "strongbox" class), and you will see that instead of pasting the number is pastes and obscure Chinese character (such as 驋, 齤, 麣, 龤, or 龒).
Looking at the CSS revealed that these numbers have a font of "fangchan-secret", and a bit of quick googling linked to a blog post in Chinese by zhyuzh3d. I read some Chinese, although not loads. This blog post appears to be a Chinese explanation of how how fangchan-secret is a method to prevent to prevent webscraping, and also an explanation of how to get around around this preventative measure.

Recognize Micr font using OCR Engine?

I am using Microsoft OCR Library for reading text.
The Microsoft OCR library works perfectly. However i want to read the following list of characters given in the link http://www.ict4u.net/databases/database-images/micr.jpg . Is there a way in which i can train the OCR library to read the following characters or is there a language that allows to read the following characters.
[Microsoft OCR crew here] We don't yet support training OCR to customize it for your use-cases. However, we do actively keep an eye on stackoverflow to see what developers need, so we can keep improving the OCR engine.
I have been working with Microsoft OCR for a while now.
Compared with Tesseract it has very basic functionality.
For example Microsoft OCR returns the words and lines.
But the lines are nonsense. Randomly 2 or 3 words are grouped together as a "line" but they are not a real line. And the "lines" are completely unordered. In this aspect it is worse than Tesseract. You have to take the coordinates of each word and order them on your own.
Microsoft does not return the rectangles of characters and there is absolutely no way to configure or train Microsoft OCR in any way. You can add languages with Windows Update for "Basic Typing" = OCR (see http://www.thewindowsclub.com/install-uninstall-languages-windows-10), but you cannot train your own language data.
MSDN says that the following 25 languages are supported with different accuracy:
Excellent: Czech, Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Serbian Cyrillic, Serbian Latin, Slovak, Spanish and Swedish.
Very good: Chinese Simplified, Greek, Japanese, Russian and Turkish.
Good: Chinese Traditional and Korean.
The recognition quality is very similar to Tesseract. It has even exactly the same problems as Tesseract. Some single characters are not recognized (separate symbols like a single '$') and it has the same huge problem with asterisks as Tesseract. Also does it insert spaces at the wrong places as Tesseract does. So I ask myself if Microsoft is using Tesseract under the hood?
However Microsoft OCR has an advantage over Tesseract: The image preprocessing is much better. It does not matter if you have red text on yellow background or white text on black. This is a catch for Tesseract which needs a black and white image of good quality as input.
For both OCR libraries applies: If you have recognition problems, try to amplify the image. Even blurring the image may be very helful because this removes the noise from the image.

What are situations with western languages where you'd use HTML 5's Ruby element?

HTML 5 is introducing a new element: <ruby>; here's the W3C's description:
The ruby element allows one or more spans of phrasing content to be marked with ruby annotations. Ruby annotations are short runs of text presented alongside base text, primarily used in East Asian typography as a guide for pronunciation or to include other annotations. In Japanese, this form of typography is also known as furigana.
They then go on to give a few examples of Ruby annotations in use for Chinese and Japanese text. I'm wondering though: is this element going to be useful only for east-asian HTML documents, or are there good semantic applications for the <ruby> element in other western languages like English, German, Spanish, etc.?
id-ee-oh-SINK-ruh-sees
Could be useful for people learning English, as our writing system has many idiosyncrasies that make it somewhat less than phonetic.
As a linguist, I can see the benefits in using <ruby> for marking up linguistic examples with various theoretical notational conventions. One example that comes to mind is indicating tonal levels in autosegmental phonology. Here's a quick example I threw together that can be seen in the latest Webkit/Chromium (at least):
http://miketaylr.com/code/western_ruby.html
Currently, this type of notation is left for LaTex and friends, and if on the web, generally a non-accessible image.
As I understand it, ruby annotations are not really relevant in Western languages because Western alphabets are (more or less) phonetic. In Japanese they are used to give a pronunciation guide for logographic characters which don't have obvious pronunciations (unless you've memorized them). I suppose the Western analog would be IPA notation in brackets following a word, but those are rarely used and I don't know if Ruby annotations would be appropriate for them.
My list:
theoretical notational conventions (miketylr's answer)http://miketaylr.com/code/western_ruby.html
language learning (Adam Bellaire's answer) id-ee-oh-SINK-ruh-sees foo idiosyncrasies bar - made with ascii 'nbsp' art
abbreviation, acronym, initialism (possibly - why hover?)
learning technical terms of English origin accidentally translated to your non-english native language
I'm often forced to do the latter in uni. While the translated terminology is often consistent, very often it's not at all self-explaining or not as much as the original english one.
Also the same term may have been translated using several translation systems by different authors/groups.
Another problem group is when, for example, queue, row, series (and sometimes tuple) are translated to the very same word in your language.
Given a western language with less users, and the low percentage of technical people in the population, this actually makes learning the topic much easier directly from English and then learn the translations in a second step.
Ruby could be a tool to transform this into a one-step process, providing either the translations or the original as a "Furigana".

British English to American English (and vice versa) Converter

Does anyone know of a library or bit of code that converts British English to American English and vice versa?
I don't imagine there's too many differences (some examples that come to mind are doughnut/donut, colour/color, grey/gray, localised/localized) but it would be nice to be able to provide localised site content.
I've been working on one to convert US English to UK English. As I've discovered it's actually a lot harder to write something to convert the other way but I hope to get around to providing a reverse conversion one day.
This isn't perfect, but it's not a bad effort (even if I do say so myself). It'll convert most US spellings to UK ones but there are some words where UK English retains the US spelling (e.g. "program" where this refers to computer software). It won't convert words like pants to trousers because my main goal was simply to make the spelling uniform across the whole document.
There are also words such as practice and license where UK English uses either those or practise & licence, depending on whether the word's being used as a verb or a noun. For those two examples the conversion tool will highlight them and an explanatory note pops up on the lower left hand of your screen when you hover your mouse over them. All word patterns which are converted are underlined in red, and the output is shown in a side by side comparison with your original input.
It'll do quite large blocks of text quite quickly, but I prefer to go use it just for a couple of paragraphs at a time - copying them in from a Word doc.
It's still a work in progress so if anyone has any comments or suggestions then I'd appreciate feedback I can use to improve it.
http://www.us2uk.eu/
The difference between UK and US English is far greater than just a difference in spelling. There is also the hood/bonnet, sidewalk/pavement, pants/trousers idea.
Guess it depends how far you need to take it.
I looked forever to find a solution to this, but couldn't find one, so, I wrote my own bit of code for it, using a master list of ~20,000 different spellings that were freely available from the varcon project and the language experts at wordsworldwide:
https://github.com/HoldOffHunger/convert-british-to-american-spellings
Since I had two source lists, I used them each to crosscheck each other, and I found numerous errors and typos (varcon lists "preexistent"'s british equivalent as "preaexistent"). It is possible that I may have accidentally made typos, too, but, since I didn't do any wordsmithing here, I don't believe that to be the case.
Example:
require('AmericanBritishSpellings.php');
$american_british_spellings = new AmericanBritishSpellings();
$text = "Axiomatically ax that door, would you, my neighbour?";
$text = $american_british_spellings->SwapBritishSpellingsForAmericanSpellings(['text'=>$text]);
print($text); // output: Axiomatically axe that door, would you, my neighbor?
I think if you're thinking of converting from American English to British English, I personally wouldn't bother. Britain is very Americanised anyway, we accept silly yank spellings on the net :)
I had a similar problem recently. I discovered the following tool, called VarCon. I haven't tested it out, but I needed a rough converter for some text data. Here's an example.
echo "I apologise for my colourful tongue ." | ./translate british american
# >> I apologize for my colorful tongue .
It looks like it works for various dialects. Be sure to read the README and proceed with caution.
*note: This will only correct spelling variations.