I use to code Sublime Text 2 and sometimes I need remove diacritics.
How to remove diacritics from part of text? Is there som filter or something?
Thanks for any suggestions.
Feri
My suggestion would be to look at this question and it's answers:
Regex - What would be regex for matching foreign characters?
You can use the regex find utility of Sublime and try the regex that is suggested by Fredrik Mörk
\p{L}
Related
so I can replace "<" -square brackets in VIM using ":s/</</g" and thats great, but It does not replace the ending bracket, thats problem number one. But the main issue is:
How do I replace all square brackets marked in VISUAL MODE? Say I want to replace all marked brackets " just as on the screenshot with "<", so It will be displayed in HTML as code. I want to replace only them, not the others, do you know what is the pattern?
Sorry for beginners question, but I would like to know.
Thank you so much
Those are angle brackets, not "square brackets".
:s/</</g does nothing useful. Did you mean :s/</\</g?
You already know how to change the opening bracket so you only need to do the same thing for the closing bracket but with the appropriate pattern and replacement.
If I correctly interpret your question and screenshot, you want the > marked in red to be replaced with > -- yes? If so, then I think #romainl suggested a good answer:
:s/>/>/g
:s/>/>/g is working fine, thank you.
Basically, I also wanted to know if I can do this for multiple lines. Bat all you need to do is to select or mark specific "angle brackets", so again, thanks to romainnl and m_mlvx.
I have a big html file that was split in lines in notepad++ (I had a very long line: 110.000 characters). This inserted NewLine characters in the text. So if I search for "The quick brown fox jumps over the lazy dog" there is a possibility that it wont come up as search result because notepad++ added in a NewLine between words. So how can I ignore NewLine when searching for a text? In which html-editor can I do this?
It should be possible to use a regular expression in notepad++ to match either a space or a newline, or a newline and a space.
I do know know which one you should use, however, but this may get you on your way.
My best guess is to use something like (\s\n|\n\s) between words, but I could not get this to work.
EDIT: I think your question is already answered here
Anyone know of a way to clean a <table> of all formatting leaving just the basic tags and text?
I have tries Komposer which was useless and even added more formatting rubbish of its own. I them tried Aptana but that only seems to be a text editor, again no use at all.
Any ideas?
When you would like to clean HTML tables (e.g. when you copy them from Word or Excel to an HTML editor) you can use the online Table Cleaner at https://www.r2h.nl/tablecleaner
I strips all the formatiing and returns only clean HTML code so will you have a table without any styling.
How about using a text editor that supports find and replace using regular expressions (such as Notepad++) to remove the unwanted attributes using one regex, and the font tags using another regex?
To match the attributes you need to remove the following regex should do the job:
( style| class| height| width)=("[A-Za-z0-9:;_ -]*"|'[A-Za-z0-9:;_ -]*'|[A-Za-z0-9:;_-]*)
To match font tags, try
<font.*font>
(I've tested these regular expressions with http://gskinner.com/RegExr/).
Edit
It turns out that Notepad++ does not support the logical OR operator in regular expressions. An alternative would be to use another text editor that does, or to write a small app/script to perform the replacements.
We've got a large amount of static that HTML has links like e.g.
Link
However some of them contain spaces in the anchor e.g.
Link
Any ideas on what kind of regular expression I'd need to use to find the Spaces after the # and replace them with a - or _
Update: Just need to find them using TextMate, hence no need for a HTML parsing lib.
This regex should do it:
#[a-zA-Z]+\s+[a-zA-Z\s]+
Three Caveats.
First, if you are afraid that the page text itself (and not just the links) might contain information like "#hashtag more words", then you could make the regex more restrictive, like this:
#[a-zA-Z]+\s+[a-zA-Z\s]+\">
Second, if you have hash tags that contain characters beyond A-Z, then just add them in between the second set of brackets. So, if you have '-' as well, you would modify to:
#[a-zA-Z]+\s+[a-zA-Z-\s]+\">
Finally, this assumes that all the links you are trying to match start with a letter/word and are followed by a space, so, in the current form, it would not match "Anchor-tags-galore", but would match "Anchor tags galore."
Have you considered using an HTML parsing library like BeautifulSoup? It would make finding all the hrefs much easier!
Here, this regex matches the hash and all the words and spaces in between:
#(\w+\s)+\w+
http://dl.getdropbox.com/u/5912/Jing/2009-08-12_1651.png
When you have some time, you should download "The Regex Coach", which is an awesome tool to develop your own regexes. You get instant feedback and you learn very fast. Plus it comes at no cost!
Visit the homepage
This question already has answers here:
RegEx match open tags except XHTML self-contained tags
(35 answers)
Closed 8 years ago.
I am trying to build a regular expression to extract the text inside the HTML tag as shown below. However I have limited skills in regular expressions, and I'm having trouble building the string.
How can I extract the text from this tag:
text
That is just a sample of the HTML source of the page. Basically, I need a regex string to match the "text" inside of the <a> tag. Can anyone assist me with this? Thank you. I hope my question wasn't phrased too horribly.
UPDATE: Just for clarification, report_drilldown is absolute, but I don't really care if it's present in the regex as absolute or not.
145817 is a random 6 digit number that is actually a database id. "text" is just simple plain text, so it shouldn't be invalid HTML. Also, most people are saying that it's best to not use regex in this situation, so what would be best to use? Thanks so much!
The answer is... DON'T!
Use a library, such as this one
([^<]*)
This won't really solve the problem, but it may just barely scrape by. In particular, it's very brittle, the slightest change to the markup and it won't match. If report_drilldown isn't meant to be absolute, replace it with [^']*, and/or capture both it and the number if you need.
If you need something that parses HTML, then it's a bit of a nightmare if you have to deal with tag soup. If you were using Python, I'd suggest BeautifulSoup, but I don't know something similar for C#. (Anyone know of a similar tag soup parsing library for C#?)
I agree regex might not be the best way to parse this, but using backreference it's easily done:
<(?<tag>\w*)(?:.*)>(?<text>.*)</\k<tag>>
Where tag and text are named capture groups.
hat-tip: expresso library
<a href\=\"[^\x00]*?\">
should get you the opening tag.
<\/a>
will give you the closing tag. Just extract out what is in between. Untested though.