Can the compromise library handle indefinite pronouns?

Can the compromise library handle indefinite pronouns? - nlp-compromise

I need to eliminate pronouns in a script.
I'm wondering if there is a built in identifier for indefinite pronouns?

you can find and remove pronouns with this:
nlp(myScript).match('#Pronoun').remove()
but if you are looking for a specific list of words (like indefinite pronouns) you can do something like this:
nlp(myScript).match('(anybody|anyone|everything)').remove()

Related

Search for id="* *" in Eclipse

I am searching in my Eclipse project for id="* *" in order to find something like below in all files of my project
id="Text1 Text2"
Since space between attribute value needs to be removed I am searching like this to find out all the instances in my project.
But it searches and brings up lots of results since * represent any string.
For example:
id="xyzImg" onLoad="test();" SRC="abc.png"
It taking the complete element instead of desired results.
Please suggest me is there any way to get my desired result or is there any online tool that serve the same purpose.

I am going to write it here as an answer instead. Try with *|*. By asterisk I meant the wildcard symbol.

How to compile a complete list of MySQL "Words"

Really getting into MySQL and one thought I've had on mastering one aspect of it is to gather a complete listing of MySQL words. One example of this might be the Reserved Words list, though it appears that's not a complete list; example: CONCAT, CRC32, etc.
Bizarre as it may seem, I was thinking that such a list might exist, or that there might even be a query that would yield it, and/or a way to extract it from the source code of MySQL.

It is a non-scientific method, but what I would do is:
extract all strings from Native_func_registry func_array. Lookup for it sql/item_create.cc , e.g in
http://bazaar.launchpad.net/~mysql/mysql-server/mysql-trunk/view/head:/sql/item_create.cc
Those should cover builtin functions.
extract strings from 'symbols' and 'functions' in lexer :
http://bazaar.launchpad.net/~mysql/mysql-server/mysql-trunk/view/head:/sql/lex.h
extract symbols from bison input http://bazaar.launchpad.net/~mysql/mysql-server/mysql-trunk/view/head:/sql/sql_yacc.yy from lines
%token SOMETOKEN
except when tokens have _SYM suffix (they are covered by sql/lex.h)
Combine all of those, and the resulting set might come near :)

should I write more descriptive function names or add comments?

This is a language agnostic question, but I'm wandering what people prefer in terms of readability and maintainability... My hypothetical situation is that I'm writing a function which given a sequence will return a copy with all duplicate element removed and the order reversed.
/*
*This is an extremely well written function to return a sequence containing
*all the unique elements of OriginalSequence with their order reversed
*/
ReturnSequence SequenceFunction(OriginalSequence)
{...}
OR
UniqueAndReversedSequence MakeSequenceUniqueAndReversed(OriginalSequence)
{....}
The above is supposed to be a lucid example of using comments in the first instance or using very verbose function names in the second to describe the actions of the function.
Cheers,
Richard

I prefer the verbose function name as it make the call-site more readable. Of course, some function names (like your example) can get really long.
Perhaps a better name for your example function would be ReverseAndDedupe. Uh oh, now it is a little more clear that we have a function with two responsibilities*. Perhaps it would be even better to split this out into two functions: Reverse and Dedupe.
Now the call-site becomes even more readable:
Reverse(Dedupe(someSequence))
*Note: My rule of thumb is that any function that contains "and" in the name has too many responsibilities and needs to be split up in to separate functions.

Personally I prefer the second way - it's easy to see from the function name what it does - and because the code inside the function is well written anyway it'll be easy to work out exactly what happens inside it.
The problem I find with comments is they very quickly go out of date - there's no compile time check to ensure your comment is correct!
Also, you don't get access to the comment in the places where the function is actually called.
Very much a subjective question though!

Ideally you would do a combination of the two. Try to keep your method names concise but descriptive enough to get a good idea of what it's going to do. If there is any possibility of lack of clarity in the method name, you should have comments to assist the reader in the logic.

Even with descriptive names you should still be concise. I think what you have in the example is overkill. I would have written
UniqueSequence Reverse(Sequence)

I comment where there's an explanation in order that a descriptive name cannot adequately convey. If there's a peculiarity with a library that forced me to do something that appears non-standard or value in dropping a comment inline, I'll do that but otherwise I rely upon well-named methods and don't comment things a lot - except while I'm writing the code, and those are for myself. They get removed when it is done, typically.
Generally speaking, function header comments are just more lines to maintain and require the reader to look at both the comment and the code and then decide which is correct if they aren't in correspondence. Obviously the truth is always in the code. The comment may say X but comments don't compile to machine code (typically) so...
Comment when necessary and make a habit of naming things well. That's what I do.

I'd probably do one of these:
Call it ReverseAndDedupe (or DedupeAndReverse, depending which one it is -- I'd expect Dedupe alone to keep the first occurrence and discard later ones, so the two operations do not commute). All functions make some postcondition true, so Make can certainly go in order to shorten a too-long name. Functions don't generally need to be named for the types they operate on, and if they are then it should be in a consistent format. So Sequence can probably be removed from your proposed name too, or if it can't then I'd probably call it Sequence_ReverseAndDedupe.
Not create this function at all, make sure that callers can either do Reverse(Dedupe(x)) or Dedupe(Reverse(x)), depending which they actually want. It's no more code for them to write, so only an issue of whether there's some cunning optimization that only applies when you do both at once. Avoiding an intermediate copy might qualify there, but the general point is that if you can't name your function concisely, make sure there's a good reason why it's doing so many different things.
Call it ReversedAndDeduped if it returns a copy of the original sequence - this is a trick I picked up from Python, where l.sort() sorts the list l in place, and sorted(l) doesn't modify a list l at all.
Give it a name specific to the domain it's used in, rather than trying to make it so generic. Why am I deduping and reversing this list? There might be some term of art that means a list in that state, or some function which can only be performed on such a list. So I could call it 'Renuberate' (because a reversed, deduped list is known as a list "in Renuberated form", or 'MakeFrobbable' (because Frobbing requires this format).
I'd also comment it (or much better, document it), to explain what type of deduping it guarantees (if any - perhaps the implementation is left free to remove whichever dupes it likes so long as it gets them all).
I wouldn't comment it "extremely well written", although I might comment "highly optimized" to mean "this code is really hard to work with, but goes like the clappers, please don't touch it without running all the performance tests".
I don't think I'd want to go as far as 5-word function names, although I expect I have in the past.

How do you work around the need for apostrophes in certain function names?

When I'm programming, I often find myself writing functions that -should- (to be proper english) contain apostrophes (too bad C started everyone thinking that an apostrophe was an appropriate delimiter). For example: get_user's_group() -> get_users_group() . What do you guys do with that forced-bad-english ambiguous english? Just ignore the apostrophe? Create a different phrasing?

In that case, I would do get_group_for_user().
So, yes, I would "create a different phrasing" :)
Either that, or user.get_group().

getGroupForUser()
or
getGroupByUser()

My original answer of Ignore it, move on! is incomplete. You should ignore the fact you can't use ' in your method/function names. But you should continue to look at the naming of them to better explain what they do. I think this is a worthwhile pursuit in programming.
Picking on JavaScript, you could if you wanted to use apostrophes:
const user = {
"get_user's_group": () => console.log("Naming things! Am I right?!")
}
user["get_user's_group"]()
But don't do that 😬
Taking it further, you could if you wanted to, use a transpiler to take your grammatically correct name and transform it into something you never see.
Again with JavaScript as an example, maybe you could write a babel transform.
But don't do that 😛
As others have said, if there is context available from an object, that's a nice option:
user.get_group()
Failing that, the context of the surrounding code should be enough to make this your choice:
get_users_group()

How about getGroupByUser?

Either get_user_ApostropheShouldBeHereButLanguageWillNotLetMe_s_group or just ignore it because it really doesn't matter.

I ignore the apostraphe getGroupyUser and group_from_user are both perfectly understandable. Worrying about having correct grammer in your function names is a waste of time and distracts from the correct goal of having clear and understandable user names.

the point of proper english in function naming is a bit extreme ...
i mean why is the apostrophe bothering you but the _ instead of a space is not ?

Depending on the programming language you may be able to use Unicode variable names, this SO thread lists a few.
With Unicode identifiers you could use one of the unicode apostrophes to give the proper english language formatting to your variable name. Though this only speculative. And it would be hard to maintain. Actually, now that I think about it, it sounds downright evil.

Two points: First, don't use a name that would otherwise require an apostrophe if you can avoid it. Second, you are right in being concerned about ambiguity. For example, you could have:
getUsersGroup: gets the group of a list of users. If you are using an object-oriented language, this could have more information than just a group ID string. You could also have something like createUsersGroup, which would create a group object from a list of users passed in.
getGroupOfUser: takes in some sort of user object; returns the name of the group of the user
getGroupByUserId: takes in the user's name or a unique ID associated with that user; returns the name of the group of the user
The best way to delineate the difference between all of these is to just use standard method comments that explain the method names. This would depend on what language you are working with and what style of method comments your organization conventionally uses.

Normally I just drop the apostrophe, but do back-ticks work? (get_user`s_group)

getGroupOfUser? getUserGroup?
It's a programming language, not literature...
It would be getBackgroundColour in proper English (rather than getBackgroundColor)

Personally I'd write get_user_group() rather than get_group_for_user() since it feels like it reads better to me. Of course, I use a programming language where apostrophes are allowed in names:
proc get_user's_group {id} {#...}
Although, some of the more prolific non-English-native European users use it as a word separator:
proc user'group {id} {#...}
to each his own I guess..

What is your system for avoiding keyword naming clashes?

Typically languages have keywords that you are unable to use directly with the exact same spelling and case for naming things (variables,functions,classes ...) in your program. Yet sometimes a keyword is the only natural choice for naming something. What is your system for avoiding/getting around this clash in your chosen technology?

I just avoid the name, usually. Either find a different name or change it slightly - e.g. clazz instead of class in C# or Java. In C# you can use the # prefix, but it's horrible:
int #int = 5; // Ick!

There is nothing intrinsically all-encompassing about a keyword, in that it should stop you from being able to name your variables. Since all names are just generalized instances of some type to one degree or another, you can always go up or down in the abstraction to find another useful name.
For example, if your writing a system that tracks students and you want an object to represent their study in a specific field, i.e. they've taken a "class" in something, if you can't use the term directly, or the plural "classes", or an alternative like "studies", you might find a more "instanced" variation: studentClass, currentClass, etc. or a higher perspective: "courses", "courseClass" or a specfic type attribute: dailyClass, nightClass, etc.
Lots of options, you should just prefer the simplest and most obvious one, that's all.
I always like to listen to the users talk, because the scope of their language helps define the scope of the problem, often if you listen long enough you'll find they have many multiple terms for the same underlying things (with only subtle differences). They usually have the answer ...
Paul.

My system is don't use keywords period!
If I have a function/variable/class and it only seems logical to name it with a keyword, I'll use a descriptive word in front of the keyword.
(adjectiveNoun) format. ie: personName instead of Name where "Name" is a keyword.

I just use a more descriptive name. For instance, 'id' becomes identifier, 'string' becomes 'descriptionString,' and so on.

In Python I usually use proper namespacing on my modules to avoid name clashes.
import re
re.compile()
instead of:
from re import *
compile()
Sometimes, when I can't avoid keyword name clashes I simply drop the last letter off the name of my variable.
for fil in files:
pass

As stated before either change class to clazz in Java/C#, or use some underscore as a prefix, for example
int _int = 0;

There should be no reason to use keywords as variable names. Either use a more detailed word or use a thesaraus. Capitalizing certain letters of the word to make it not exactly like the keyword is not going to help much to someone inheriting your code later.

Happy those with a language without ANY keywords...
But joke apart, I think in the seldom situations where "Yet sometimes a keyword is the only natural choice for naming something." you can get along by prefixing it with "my", "do", "_" or similar.
I honestly can't really think of many such instances where the keyword alone makes a good name ("int", "for" and "if" are definitely bad anyway). The only few in the C-language family which might make sense are "continue" (make it "doContinue"), "break" (how about "breakWhenEOFIsreached" or similar ?) and the already mentioned "class" (how about "classOfThingy" ?).
In other words: make the names more reasonable.
And always remember: code is WRITTEN only once, but usualy READ very often.

Typically I follow Hungarian Notation. So if, for whatever reason, I wanted to use 'End' as a variable of type integer I would declare it as 'iEnd'. A string would be 'strEnd', etc. This usually gives me some room as far as variables go.
If I'm working on a particular personal project that other people will only ever look at to see what I did, for example, when making an add-on to a game using the UnrealEngine, I might use my initials somewhere in the name. 'DS_iEnd' perhaps.

I write my own [vim] syntax highlighters for each language, and I give all keywords an obvious colour so that I notice them when I'm coding. Languages like PHP and Perl use $ for variables, making it a non-issue.

Developing in Ruby on Rails I sometime look up this list of reserved words.

In 15 years of programming, I've rarely had this problem.
One place I can immediately think of, is perhaps a css class, and in that case, I'd use a more descriptive name. So instead of 'class', I might use 'targetClass' or something similar.

In python the generally accepted method is to append an '_'
class -> class_
or -> or_
and -> and_
you can see this exemplified in the operator module.

I switched to a language which doesn't restrict identifier names at all.

First of all, most code conventions prevent such a thing from happening.
If not, I usually add a descriptive prose prefix or suffix:
the_class or theClass infix_or (prefix_or(class_param, in_class) , a_class) or_postfix
A practice, that is usually in keeping with every code style advice you can find ("long names don't kill", "Longer variable names don't take up more space in memory, I promise.")
Generally, if you think the keyword is the best description, a slightly worse one would be better.
Note that, by the very premise of your question you introduce ambiguity, which is bad for the reader, be it a compiler or human. Even if it is a custom to use class, clazz or klass and even if that custom is not so custom that it is a custom: it takes a word word, precisely descriptive as word may be, and distorts it, effectively shooting w0rd's precision in the "wrd". Somebody used to another w_Rd convention or language might have a few harsh wordz for your wolds.
Most of us have more to say about things than "Flower", "House" or "Car", so there's usually more to say about typeNames, decoratees, class_params, BaseClasses and typeReferences.
This is where my personal code obfuscation tolerance ends:
Never(!!!) rely on scoping or arcane syntax rules to prevent name clashes with "key words". (Don't know any compiler that would allow that, but, these days, you never know...).
Try that and someone will w**d you in the wörd so __rd, Word will look like TeX to you!

My system in Java is to capitalize the second letter of the word, so for example:
int dEfault;
boolean tRansient;
Class cLass;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Can the compromise library handle indefinite pronouns? - nlp-compromise

I need to eliminate pronouns in a script. I'm wondering if there is a built in identifier for indefinite pronouns?

you can find and remove pronouns with this: nlp(myScript).match('#Pronoun').remove() but if you are looking for a specific list of words (like indefinite pronouns) you can do something like this: nlp(myScript).match('(anybody|anyone|everything)').remove()

Related

Search for id="* *" in Eclipse

How to compile a complete list of MySQL "Words"

should I write more descriptive function names or add comments?

How do you work around the need for apostrophes in certain function names?

What is your system for avoiding keyword naming clashes?

Categories

Resources