I used Yii Sluggable Behavior and it worked fine for English, but if I write Arabic language in the name text field and create a new entry, the value of the slug column in the database is empty.
public function behaviors()
{
return [
[
'class' => SluggableBehavior::className(),
'attribute' => 'name',
'ensureUnique'=>true,
'slugAttribute' => 'slug',
],
];
}
Does anybody know how to make it support Arabic and other languages? If not, is there an extension that do the job?
Thanks in advance.
Creating a slug usually involves using a regular expression to exclude unwanted characters that are not compatible with the normal URL format.
Regular expressions used in preg_replace or preg_match usually are written to support English slugs, thus excluding all other types of characters like Arabic.
I suggest that you write your own slugging function with an appropriate character range including arabic characters. Something like this:
$slug = preg_replace("/[^a-zA-Z0-9ء-ي _\-]/u", "", $title);
This will exclude all characters other than the ones in the ranges a-z, A-Z, 0-9, and the Arabic alphabet. It will also keep underscores, dashes and spaces, which you can replace later on if you need.
Related
I am trying to search my codebase for code that calls a function named "foo" so I am searching for "foo(" but the results I'm getting includes everything with the word foo in it which includes css, comments and strings that don't even have the trailing open parenthesis.
Anyone know how to do a search for strings that include special characters like ),"'?
When searching for special characters, try using escape character before the character, i.e. \, e.g. "foo\(".
Additionally, I found a reply for a similar question (see http://marc.info/?l=opensolaris-opengrok-discuss&m=115776447032671). It seems that frequently occurring special characters are not indexed because of performance issues, therefore it might not be possible to effectively search for such pattern.
Opengrok supports escaping special characters that are part of the query syntax. Current special characters are:
+ - && || ! ( ) { } [ ] ^ " ~ * ? : \ /
To escape these character use the \ before the character. For example to search for (1+1):2 use the query \(1\+1)\:2
I have tried using something like "struct a {" and "struct a {" to look for the declaration of "a". But it seems opengrok just ignores the curly brackets. Is there a way to search for the phrase "struct a {"?
Grok supports escaping special characters that are part of the query syntax.
The current list of special characters are
+ - && || ! ( ) { } [ ] ^ " ~ * ? : \
To escape these character use the \ before the character.
For example to search for (1+1):2 use the query: \(1\+1\)\:2
You should be able to search with "struct a {" (with quotes)
From OpenGrok documentation:
Escaping special characters:
Opengrok supports escaping special characters that are part of the query syntax. Current special characters are:
+ - && || ! ( ) { } [ ] ^ " ~ * ? : \ /
To escape these character use the \ before the character. For example to search for (1+1):2 use the query: (1+1):2
NOTE on analyzers: Indexed words are made up of Alpha-Numeric and Underscore characters. One letter words are usually not indexed as symbols!
Most other characters (including single and double quotes) are treated as "spaces/whitespace" (so even if you escape them, they will not be found, since most analyzers ignore them).
The exceptions are: # $ % ^ & = ? . : which are mostly indexed as separate words.
Because some of them are part of the query syntax, they must be escaped with a reverse slash as noted above.
So searching for +1 or + 1 will both find +1 and + 1.
Valid FIELDs are
full
Search through all text tokens (words,strings,identifiers,numbers) in
index.
defs
Only finds symbol definitions (where e.g. a variable (function, ...)
is defined).
refs
Only finds symbols (e.g. methods, classes, functions, variables).
path
path of the source file (no need to use dividers, or if, then use "/"
Windows users, "" is an escape key in Lucene query syntax! Please don't use "", or replace it with "/"). Also note that if you want
just exact path, enclose it in "", e.g. "src/mypath", otherwise
dividers will be removed and you get more hits.
hist
History log comments.
type
Type of analyzer used to scope down to certain file types (e.g. just C
sources). Current mappings: [ada=Ada, asm=Asm, bzip2=Bzip(2), c=C,
clojure=Clojure, csharp=C#, cxx=C++, eiffel=Eiffel, elf=ELF,
erlang=Erlang, file=Image file, fortran=Fortran, golang=Golang,
gzip=GZIP, haskell=Haskell, jar=Jar, java=Java, javaclass=Java class,
javascript=JavaScript, json=Json, kotlin=Kotlin, lisp=Lisp, lua=Lua,
mandoc=Mandoc, pascal=Pascal, perl=Perl, php=PHP, plain=Plain Text,
plsql=PL/SQL, powershell=PowerShell script, python=Python, ruby=Ruby,
rust=Rust, scala=Scala, sh=Shell script, sql=SQL, swift=Swift,
tar=Tar, tcl=Tcl, troff=Troff, typescript=TypeScript,
uuencode=UUEncoded, vb=Visual Basic, verilog=Verilog, xml=XML,
zip=Zip] The term (phrases) can be boosted (making it more relevant)
using a caret ^ , e.g. help^4 opengrok - will make term help boosted
Opengrok search is powered by Lucene, for more detail on query syntax refer to Lucene docs.
I need to insert a special character in the html file translation.
The character is a space, need it to try to solve another problem.
But it is not working. The code for that character is displayed in the subject of the email.
For this I insert these lines:
pt.yml
subjects:
...
release_auto_pause_triggered_html: "%{project_name} %{release_name} - pausa automática disparada"
release_mailer.rb
subject = t('subjects.release_auto_pause_triggered_html', project_name: #project.name, release_name: #release.name).html_safe
But the subject of the email sent is as follows: pausa automática disparada
The "'" I added just to make this post, but it would not give to see here.
I need to look like this: "pausa automática disparada"
Where am I going wrong?
I think I manage to do it. Try this "pausa automática\xA0disparada"
Source:
Using single-quoted scalars, you may express any value that does not contain special characters. No escaping occurs for single quoted scalars except that a pair of adjacent quotes '' is replaced with a lone single quote '.
Double-quoted is the most powerful style and the only style that can express any scalar value. Double-quoted scalars allow escaping. Using escaping sequences \x** and \u**, you may express any ASCII or Unicode character.
And here I found the necessary code
My output is:
# YML
title: "Title \xA0 aaa"
# console
I18n.t('title')
=> "Title aaa"
It should work this way. Try to check for typos.
In your example
"pausa automática ';disparada""
there is extra ' between   and ;.
I want to replace "\cite{foo123a}" with "[1]" and backwards. So far I was able to replace text with the following command
body.replaceText('.cite{foo}', '[1]');
but I did not manage to use
body.replaceText('\cite{foo}', '[1]');
body.replaceText('\\cite{foo}', '[1]');
Why?
The back conversion I cannot get to work at all
body.replaceText('[1]', '\\cite{foo}');
this will replace only the "1" not the [ ], this means the [] are interpreted as regex character set, escaping them will not help
body.replaceText('\[1\]', '\\cite{foo}');//no effect, still a char set
body.replaceText('/\[1\]/', '\\cite{foo}');//no matches
The documentation states
A subset of the JavaScript regular expression features are not fully supported, such as capture groups and mode modifiers.
Can I find a full description of what is supported and what not somewhere?
I'm not familiar with Google Apps Script, but this looks like ordinary regular expression troubles.
Your second conversion is not working because the string literal '\[1\]' is just the same as '[1]'. You want to quote the text \[1\] as a string literal, which means '\\[1\\]'. Slashes inside of a string literal have no relevant meaning; in that case you have written a pattern which matches the text /1/.
Your first conversion is not working because {...} denotes a quantifier, not literal braces, so you need \\\\cite\\{foo\\}. (The four backslashes are because to match a literal \ in a regular expression is \\, and to make that a string literal it is \\\\ — two escaped backslashes.)
I am proposing to convert my windows-1252 XHTML web pages to UTF-8.
I have the following character entities in my coding:
' — apostrophe,
► — right pointer,
◄ — left pointer.
If I change the charset and save the pages as UTF-8 using my editor:
the apostrophe remains in as a character entity;
the pointers are converted to symbols within the code (presumably because the entities are not supported in UTF-8?).
Questions:
If I understand UTF-8 correctly, you don't need to use the entities and can type characters directly into the code. In which case is it safe for me to replace #39 with a typed in apostrophe?
Is it correct that the editor has placed the pointer symbols directly into my code and will these be displayed reliably on modern browsers, it seems to be ok? Presumably, I can't revert to the entities anyway, if I use UTF-8?
Thanks.
It's charset, not chartset.
1) it depends on where the apostrophe is used, it's a valid ASCII character as well so depending on the characters intention (wether its for display only (inside a DOMText node) or used in code) you may or may not be able to use a literal apostrophe.
2) if your editor is a modern editor, it will be using utf sequences instead of just char to display text. most of the sequences used in code are just plain ASCII (and ASCII is a subset of utf8) so those characters will take up one byte. other characters may take up two, three or even four bytes in a specialized manner. they will still be displayed to you as one character, but the relation between character and byte has become different.
Anyway; since all valid ASCII characters are exactly the same in ASCII, utf8 and even windows-1252. you should not see any problems using utf8. And you can still use numeric and named entities because they are written in those valid characters. You just don't have to.
P.S. All modern browsers can do utf8 just fine. but our definitions of "modern" may vary.
Entities have three purposes: Encoding characters it isn't possible to encode in the character encoding used (not relevant with UTF-8), encoding characters it is not convenient to type on a given keyboard, and encoding characters that are illegal unescaped.
► should always produce ► no matter what the encoding. If it doesn't, it's a bug elsewhere.
► directly in the source is fine in UTF-8. You can do either that or the entity, and it makes no difference.
' is fine in most contexts, but not some. The following are both allowed:
<span title="Jon's example">This is Jon's example</span>
But would have to be encoded in:
<span title='Jon's example'>This is Jon's example</span>
because otherwise it would be taken as the ' that ends the attribute value.
Use entities if you copy/paste content from a word processor or if the code is an XML dialect. Use a macro in your text-editor to find/replace the common ones in one shot. Here is a simple list:
Half: ½ => ½
Acute Accent: é => é
Ampersand: & => &
Apostrophe: ’ => '
Backtick: ‘ => `
Backslash: \ => \
Bullet: • => •
Dollar Sign: $ => $
Cents Sign: ¢ => ¢
Ellipsis: … => …
Emdash: — => —
Endash: – => –
Left Quote: “ => “
Right Quote: ” => ”
References
XML Entity Names