StringEscapeUtils escapeHtml - html

Is there a difference between org.apache.commons.lang.StringEscapeUtils.escapeHtml and org.apache.commons.lang3.StringEscapeUtils.escapeHtml4 ?

they come from different packages in fact
escapeHtml comes from org.commons.lang (lang 2.5 API) and let's you use a writer object. It can also escape string from an SQL source
escapeHtml4 is from org.commons.lang3 (lang 3.1 API) is specialy use to escape character from a HTML4 source. Nothing more, nothing less
They do the same job but i would recommend using "escapeHtml4" since its from a newer package
See : EscapeHtml4 and escapehtml

Note that StringEscapeUtils.escapeHtml() has issues with apostrophes

Related

How to set decimal mark character or localization in CSVeed?

new Problem: my employer wishes me to implement CSVeed utility for a project. It works just fine except that data formatting is not recognised correctly. The data to read is formatted with semicolon (;) as field separator and colon (,) as decimal mark. The information on the projects home page is telling me that decimal conversion is done automatically, but e.g. a string 0,5 in csv file is interpeted as 5, a string 9,5 read as 95. In the source code of the project i find Information: "Makes sure that a specific Locale is used to convert numbers.". I am not exactly sure where to tell the csveed lib which l10n to use. At another point of source doc it says utility will use l10n of framework. Is this from Eclipse RCP which i am using oder from the machine ? Sorry for not posting any code, but i didnt find barely a hint where to setup
the decimal mark in the utility...
Anyone an idea ?
Greetings :)
My Goodness, why this verbose ? ^^
CsvClient<BeanClass> reader = new CsvClientImpl<BeanClass>(reader, BeanClass.class);
reader.setConverter("[name of property]", new CustomNumberConverter(Double.class, NumberFormat.getNumberInstance(Locale.[whereever]), false));
[name of property] has to be the name of the actual instance variable.
Greetings :)

mySQL replace string + additional string with a static value

Unlike PHP, I don't believe mySQL has any preg_replace() feature, only matching via REGEXP. Here are the strings I have in the code:
http://ourcompany.com/theapplestore/...
http://ourcompany.com/anotherstore/...
http://ourcompany.com/yetanotherstore/...
As you can see, there is a constant in there, http://ourcompany.com/, but there is also a variable string namely theapplestore, anotherstore, etc. etc.
I want to replace the constant string, plus the variable string(s), and then the trailing slash (/) after the variable string(s), with a single shortcode value, namely {{store url=''}}
EDIT
If it helps, the store codes are always the same length, they are going to be
sch131785
sch185399
sch634019
etc.
i.e., they are all 9 characters long
How would I do this? Thanks.
I thought this might be useful: there is currently NO WAY to do this in mysql. Find using REGEXP, yes; replace, no. That said, there is another post with an extension library mentioned, sagi:
Is there a MySQL equivalent of PHP's preg_replace?
MariaDB-10.0.5 has REGEXP_REPLACE(), REGEXP_INSTR() and REGEXP_SUBSTR()
You can use following regex,
(ourcompany.com\/\w+\/)
Demo
Uses the concept of Group Capture

Regex conversion from Java to mysql for '?:'

I have this Expression
(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
that is working fine in java but when i am trying to use the same in mySql it is giving me
Error Code: 1139
Got error 'repetition-operator operand invalid' from regexp
So when i replace above Expression with this
(((^25[0-5])|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}((^25[0-5])|2[0-4][0-9]|[01]?[0-9][0-9]?)
This is giving me some additional Results?
Plz let me know what i am Doing Wrong , Thanks For your time
The construct ?: is a non-capturing group.
When refering to the current documentation of MySQL one can see that it
is aimed at conformance with POSIX 1003.2
and that
MySQL uses the extended version
In other words, it uses POSIX ERE and this flavor simply doesn't support non-capturing groups.
For a detailed listing on what constructs are available in Java and in POSIX ERE you can use regular-expressions.info as a reference, specifically this site for capturing groups (just be sure to choose Java and POSIX ERE in the two dropdowns).
As to why you get additional results:
Did you notice that your regexes aren't equal?
In the first and second half of your first regex you write (?:25[0-5]|, while in the second regex this becomes ((^25[0-5])|.
A ^ matches the beginning of a string (here's the documentation again), so at least the second occurence can never match if anything was matched before.
Perhaps you want to use this regex in MySQL, but i can only take a guess here as i don't know your data and what exactly you want to match.
((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
I just took your first regex and removed all ?:.

NLTK letter 'u' in front of text result?

I'm learning NLTK with a tutorial and whenever I try to print some text contents, it returns with 'u' in front of it.
In the tutorial it looks like this,
firefox.txt Cookie Manager: "Don't allow sites that set removed cookies to se...
But in my result, it looks like this
(u'firefox.txt', u'Cookie Manager: "Don\'t allow sites that set removed cookies to se', '...')
I am not sure why. I followed exact way the tutorial is explaining. Can someone help me understand this problem? Thank you!
That leading u just means that that string is Unicode. All strings are Unicode in Python 3. The parentheses means that you are dealing with a tuple. Both will go away if you print the individual elements of the tuple, as with t[0], t[1], and so on (assuming that t is your tuple).
If you want to print the whole tuple as a whole, removing u's and parentheses, try the following:
print " ".join (t)
As mentioned in other answer the leading u just means that string is Unicode. str() can be used to convert unicode to str but there doesnt seem to be a direct way to convert all the values in a tuple from unicode to string.
Simple function as below and using it when ever you are referring to any tuple in nltk.
>>> def str_tuple(t, encoding="ascii"):
... return tuple([i.encode(encoding) for i in t])
>>> str_tuple(nltk.corpus.gutenberg.fileids())
('austen-emma.txt', 'austen-persuasion.txt', 'austen-sense.txt', 'bible-kjv.txt', 'blake-poems.txt', 'bryant-stories.txt', 'burgess-busterbrown.txt', 'carroll-alice.txt', 'chesterton-ball.txt', 'chesterton-brown.txt', 'chesterton-thursday.txt', 'edgeworth-parents.txt', 'melville-moby_dick.txt', 'milton-paradise.txt', 'shakespeare-caesar.txt', 'shakespeare-hamlet.txt', 'shakespeare-macbeth.txt', 'whitman-leaves.txt')
I guess you are using Python2.6 or any version before 3.0.
Python allows its users to do the same operation on 'str()' and 'unicode' in the early version. They tried to make conversion between 'str()' and 'unicode' directly in some case rely on default encoding, which on most platform is ASCII. That's probably the reason cause your problem. Here are two ways may solve it:
First, manually assign decoding method. For example:
>> for name in nltk.corpus.gutenberg.fileids():
>> name.decode('utf-8')
>> print(name)
The other way is to UPDATE your Python to version 3.0+ (Recommended). They fix this problem in Python3.0. Here is the link to update detail description:
https://docs.python.org/release/3.0.1/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit
Hope this helps you.

Iterating over a string in Vimscript or Parse a JSON file

So I'm creating a vim script that needs to load and parse a JSON file into a local object graph. I searched and I couldn't find any native way to process a JSON file, and I don't want to add any dependencies to the script. So I wrote my own function to parse the JSON string (gotten from the file), but it's really slow. At the moment, I iterate through each character in the file like so:
let len = strlen(jsonString) - 1
let i = 0
while i < len
let c = strpart(jsonString, i, 1)
let i += 1
" A lot of code to process file....
" Note: I've tried short cutting the process by searching for enclosing double-quotes when I come across the initial double quotes (also taking into account escaping '\' character. It doesn't help
endwhile
I've also tried this method:
for c in split(jsonString, '\zs')
" Do a lot of parsing ....
endfor
For reference, a file with ~29,000 characters takes about 4 seconds to process, which is unacceptable.
Is there a better way to iterate over a string in vim script?
Or better yet, have I missed a native function to parse JSON?
Update:
I asked for no dependencies because I:
Didn't want to deal with them
Genuinely wanted some ideas for best way to do this without someone else's work.
Sometimes I just like to do things manually even though the problem has already been solved.
I'm not against plugins or dependencies at all, it's just that I'm curious. Thus the question.
I ended up creating my own function to parse the JSON file. I was creating a script that could parse the package.json file associated with node.js modules. Because of this, I could rely on a fairly consistent format and quit the processing whenever I'd retrieved the information I needed. This usually cut out large chunks of the file since most developers put the largest chunk of the file, their "readme" section, at the end. Because the package.json file is strictly defined, I left the process somewhat fragile. It assumed a root dictionary { } and actively looks for certain entries. You can find the script here: https://github.com/ahayman/vim-nodejs-complete/blob/master/after/ftplugin/javascript.vim#L33.
Of course, this doesn't answer my own question. It's only the solution to my unique problem. I'll wait a few days for new answers and pick the best one before the bounty ends (already set an alarm on my phone).
The simplest solution with the least dependencies is just using the json_decode vim function.
let dict = json_decode(jsonString)
Even though Vim's origin dates back a lot it happens that its internal string() eval() representation is that close to JSON that its likely to work unless you need special characters.
You can lookup the implementation here which even supports true/false/null if you want:
https://github.com/MarcWeber/vim-addon-json-encoding
Better use that library (vim-addon-manager allows to install dependencies easily).
Now it depends on your data whether this is good enough.
Now Benjamin Klein posted your question to vim_use which is why I'm replying.
Best and fast replies happen if you subscribe to the Vim mailinglist.
Goto vim.sf.net and follow the community link.
You cannot expect the Vim community to scrape stackoverflow.
I've added the keyword "json" and "parsing" to that little code that it can be found easier.
If this solution does not work for you you can try the many :h if_* bindings or write an external script which extracts the information you're looking for, or turns JSON into Vim's dictionary representation which can be read by eval() escaping special characters you care about correctly.
If you seek for completely correct solution omitting dependencies is one of the worst thing you can do. The eval() variant mentioned by #MarcWeber is one of the fastest, but it has its disadvantages:
Using solution for securing eval I mentioned in comment makes it no longer the fastest. In fact after you use this it makes eval() slower by more then an order of magnitude (0.02s vs 0.53s in my test).
It does not respect surrogate pairs.
It cannot be used to verify that you have correct JSON: it accepts some strings (e.g. "\<C-o>") that are not JSON strings and it allows trailing commas.
It fails to give normal error messages. It fails badly if you use vam#VerifyIsJSON I mentioned in p.1.
It fails to load floating point values like 1e10 (vim requires numbers to look like 1.0e10, but numbers like 1e10 are allowed: note “and/or” in the first paragraph).
. All of the above (except for the first) statements also apply to vim-addon-json-encoding mentioned by #MarcWeber because it uses eval. There are some other possibilities:
Fastest and the most correct is using python: pyeval('json.loads(vim.eval("varname"))'). Not faster then eval, but fastest among other possibilities. (0.04 in my test: approximately two times slower then eval())
Note that I use pyeval() here. If you want solution for vim version that lacks this functionality it will no longer be one of the fastest.
Use my json.vim plugin. It has an advantages of slightly better error reporting compared to failed vam#VerifyIsJSON, slightly worse compared to eval() and it correctly loads floating-point numbers. It can be used for verification of strings (it does not accept "\<C-a>"), but it loads lists with trailing comma just fine. It does not support surrogate pairs. It is also very slow: in the test I used (it uses 279702 character long strings) it takes 11.59s to load. Json.vim tries to use python if possible though.
For the best error reporting you can take yaml.vim and purge YAML support out of it leaving only JSON (I once have done the same thing for pyyaml, though in python: see markedjson library used in powerline: it is pyyaml minus YAML stuff plus classes with marks). But this variant is even slower then json.vim and should only be used if the main thing you need is error reporting: 207 seconds for loading the same 279702 character long string.
Note that the only variant mentioned that satisfies both requirements “no dependencies” and “no python” is eval(). If you are not fine with its disadvantages you have to throw away one or both of these requirements. Or copy-paste code. Though if you take speed into account only two candidates are left: eval() and python: if you want to parse json fast you really must use C and only these solutions spend most time in functions written in C.
Most other interpreters (ruby/perl/TCL) do not have pyeval() equivalent so they will be slower even if their JSON implementation is written in C. Some other (lua/racket (mzscheme)) have pyeval() equivalent, but e.g. luaeval('{}') is zero meaning that you will have to add additional step explicitly and recursively converting objects into vim dictionaries and lists (e.g. luaeval('vim.dict({})')) which will impact performance. Cannot say anything about mzeval(), but I have never heard about anybody actually using racket (mzscheme) with vim.