decompose text into multi-class - deep-learning

am looking for a machine learning algorithm which serves as String decoder (Multi-label classifier).
for example , decompose an email xyz#qwe.abc.com into different classes with/out probability
xyz:owner
#:special_character
qwe:sub-domain
abc.domain
or 231115
23:hour
11:minute
15:seconds
my case is more complex, but the above would guide me where to look.
for example I might have
74als353d
74:family
als:ftype
353d:id
and
73das
7:family
as:ftype
3d:id
I came across deep learning alogirthm SEQ2SEQ would be a good starting point ?

Related

How to visualize 3d joints of a SMPL model based on pose params

I am trying to use demo.py in nkolot
/
GraphCMR | GitHub. I am interested in obtaining joints from the inferred SMPL image and visualize it similar to described in README of this project: gulvarol
/
smplpytorch | GitHub.
I also posted the issue here: https://github.com/nkolot/GraphCMR/issues/36.
What I tried that didn't work.
I changed https://github.com/nkolot/GraphCMR/blob/4e57dca4e9da305df99383ea6312e2b3de78c321/demo.py#L118 to
pred_vertices, pred_vertices_smpl, pred_camera, smpl_pose, smpl_shape = model(...) to get smpl_pose (of shape torch.Size([1, 24, 3, 3])). Then I just flattened it by doing smpl_pose.cpu().data.numpy()[:, :, :, -1].flatten('C').reshape(1, -1) and used the resulting (1, 72) pose params as input in pose_params variable of smplpytorch demo.
The resulting visualization doesn't look correct to me. Is this the right approach? Perhaps there is an easier way to do what I am doing.
How to get 3d joints from demo.py and visualize it | nkolot
/
GraphCMR
The problem is that
smpl_pose (of shape torch.Size([1, 24, 3, 3]))
is the SMPL pose parameters expressed as a rotation matrix.
You need to make a transformation from rotation matrix to axis-angle representation which is (72,1). You can use Rodrigues formula to do it, as claimed in the paper:
Get more information from the paper:
SMPL: A Skinned Multi-Person Linear Model

How can I use "Interpolated Absolute Discounting" for a bigram model in language modeling?

I want to compare two smoothing methods for a bigram model:
Add-one smoothing
Interpolated Absolute Discounting
For the first method, I found some codes.
def calculate_bigram_probabilty(self, previous_word, word):
bigram_word_probability_numerator = self.bigram_frequencies.get((previous_word, word), 0)
bigram_word_probability_denominator = self.unigram_frequencies.get(previous_word, 0)
if self.smoothing:
bigram_word_probability_numerator += 1
bigram_word_probability_denominator += self.unique__bigram_words
return 0.0 if bigram_word_probability_numerator == 0 or bigram_word_probability_denominator == 0 else float(
bigram_word_probability_numerator) / float(bigram_word_probability_denominator)
However, I found nothing for the second method except for some references for 'KneserNeyProbDist'. However, this is for trigrams!
How can I change my code above to calculate it? The parameters of this method must be estimated from a development-set.
In this answer I just clear up a few things that I just found about your problem, but I can't provide a coded solution.
with KneserNeyProbDist you seem to refer to a python implementation of that problem: https://kite.com/python/docs/nltk.probability.KneserNeyProbDist
There exists an article about Kneser–Ney smoothing on wikipedia: https://en.wikipedia.org/wiki/Kneser%E2%80%93Ney_smoothing
The article above links this tutorial: https://nlp.stanford.edu/~wcmac/papers/20050421-smoothing-tutorial.pdf but this has a small fault on the most important page 29, the clear text is this:
Modified Kneser-Ney
Chen and Goodman introduced modified Kneser-Ney:
Interpolation is used instead of backoff. Uses a separate discount for one- and two-counts instead of a single discount for all counts. Estimates discounts on held-out data instead of using a formula
based on training counts.
Experiments show all three modifications improve performance.
Modified Kneser-Ney consistently had best performance.
Regrettable the modified Version is not explained in that document.
The original documentation by Chen & Goodman luckily is available, the Modified Kneser–Ney smoothing is explained on page 370 of this document: http://u.cs.biu.ac.il/~yogo/courses/mt2013/papers/chen-goodman-99.pdf.
I copy the most important text and formula here as screenshot:
So the Modified Kneser–Ney smoothing now is known and seems being the best solution, just translating the description beside formula in running code is still one step to do.
It might be helpful that below the shown text (above in screenshot) in the original linked document is still some explanation that might help to understand the raw description.

python find_all_next in beautifulsoup can't find a string

I'm trying to get a username from Instagram page. And I should use a part of data which I get after "data = soup.find_all('script') [3]"
It looks like this:
Blockquote
(script type="text/javascript">window._sharedData = {"config":{"csrf_token":"hIuZDxW17bTXz5EDLY25ftqivOOrLEeZ","viewer":null,"viewerId":null},"supports_es6":false,"country_code":"RU","language_code":"en","locale":"en_US","entry_data":{"PostPage":[{"graphql":{"shortcode_media":{"__typename":"GraphImage","id":"1968747493659350883","shortcode":"BtSZWokAZdj","dimensions":{"height":640,"width":640},"gating_info":null,"media_preview":"ACoq5miitSxxIGTHPXPGcd8ZFAGXRXSSWypFsAAZ/lzjpn/Csm5sjAu7Ib8MUAUaKU0lABVq0lMUqsPUA/Q8VVpynBB9CKAOtuOFB9CD+uP5Gq19HuiOPTP5Ul1exhdgy7kdF7fU/wCGatJiRPqv5ZFIZybnP4UynOpUlT1HFNpiClDFeRSUUATLcSJ904+lPF5MvR2H41WooAc7lzuY5J702iigD//Z","display_url":"https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","display_resources":[{"src":"https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","config_width":640,"config_height":640},{"src":"https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","config_width":750,"config_height":750},{"src":"https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","config_width":1080,"config_height":1080}],"accessibility_caption":"Image may contain: one or more people and closeup","is_video":false,"should_log_client_event":false,"tracking_token":"eyJ2ZXJzaW9uIjo1LCJwYXlsb2FkIjp7ImlzX2FuYWx5dGljc190cmFja2VkIjp0cnVlLCJ1dWlkIjoiN2Q1Yjg2NmY5OGIwNDVhNWIxMmRhNjEwZTA3NDY1MmYxOTY4NzQ3NDkzNjU5MzUwODgzIn0sInNpZ25hdHVyZSI6IiJ9","edge_media_to_tagged_user":{"edges":[]},"edge_media_to_caption":{"edges":[{"node":{"text":"\u2022\nScars show your story. \nYour pain. \nYour hate.\nYour sadness and despair. \nThey make you who you are, and one of a kind with every different mark. \nSome stay, some go.\nSome brighter, some lighter.\nSome bigger, some smaller.\nSome deeper, some one the surface. \nBut they are really all the same, you see?\nThey are all scars, just telling different points of our life, our story. \nOur souvenir throughout our whole life, that shows us how much we've grown. \nHow much we have overcome. How strong we've become.\nHow brave and courageous we've become from the hardest and darkest times of our life. \u2022\n\u2022\n\u2022\n\u2022\n#poem #cuts #selfharm #tatoo #dark #pain #sad #lonely #anxiety #depressed"}}]},"caption_is_edited":true,"has_ranked_comments":false,"edge_media_to_comment":{"count":1,"page_info":{"has_next_page":false,"end_cursor":null},"edges":[]},"comments_disabled":false,"taken_at_timestamp":1548913011,"edge_media_preview_like":{"count":17,"edges":[]},"edge_media_to_sponsor_user":{"edges":[]},"location":null,"viewer_has_liked":false,"viewer_has_saved":false,"viewer_has_saved_to_collection":false,"viewer_in_photo_of_you":false,"viewer_can_reshare":true,"owner":{"id":"10173498181","is_verified":false,"profile_pic_url":"https://instagram.fhel3-1.fna.fbcdn.net/vp/9a17134e8d0a36efec53f1da5cac1f38/5D14BC0F/t51.2885-19/s150x150/47690762_475199173011446_4764198224049209344_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","username":"devils..tea.","blocked_by_viewer":false,"followed_by_viewer":false,"full_name":"depressed\ud83e\udd40","has_blocked_viewer":false,"is_private":false,"is_unpublished":false,"requested_by_viewer":false}......
There is "username" part (at the end of blockquote). I think that it is a string, but I can't catch it. So it's not a string, but what is it? It is a class? Which method I should use to retreive the username "username":"devils..tea.". Thank you in advance, if you can help.
....
req = requests.get(url)
soup = BeautifulSoup(req.text, "lxml")
data = soup.find_all('script') [3]
username = data.find_all_next(string="username")
print (username)
You could use regex
import re
data = '''
(script type="text/javascript">window._sharedData = {"config":{"csrf_token":"hIuZDxW17bTXz5EDLY25ftqivOOrLEeZ","viewer":null,"viewerId":null},"supports_es6":false,"country_code":"RU","language_code":"en","locale":"en_US","entry_data":{"PostPage":[{"graphql":{"shortcode_media":{"__typename":"GraphImage","id":"1968747493659350883","shortcode":"BtSZWokAZdj","dimensions":{"height":640,"width":640},"gating_info":null,"media_preview":"ACoq5miitSxxIGTHPXPGcd8ZFAGXRXSSWypFsAAZ/lzjpn/Csm5sjAu7Ib8MUAUaKU0lABVq0lMUqsPUA/Q8VVpynBB9CKAOtuOFB9CD+uP5Gq19HuiOPTP5Ul1exhdgy7kdF7fU/wCGatJiRPqv5ZFIZybnP4UynOpUlT1HFNpiClDFeRSUUATLcSJ904+lPF5MvR2H41WooAc7lzuY5J702iigD//Z","display_url":"https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","display_resources":[{"src":"https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","config_width":640,"config_height":640},{"src":"https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","config_width":750,"config_height":750},{"src":"https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","config_width":1080,"config_height":1080}],"accessibility_caption":"Image may contain: one or more people and closeup","is_video":false,"should_log_client_event":false,"tracking_token":"eyJ2ZXJzaW9uIjo1LCJwYXlsb2FkIjp7ImlzX2FuYWx5dGljc190cmFja2VkIjp0cnVlLCJ1dWlkIjoiN2Q1Yjg2NmY5OGIwNDVhNWIxMmRhNjEwZTA3NDY1MmYxOTY4NzQ3NDkzNjU5MzUwODgzIn0sInNpZ25hdHVyZSI6IiJ9","edge_media_to_tagged_user":{"edges":[]},"edge_media_to_caption":{"edges":[{"node":{"text":"\u2022\nScars show your story. \nYour pain. \nYour hate.\nYour sadness and despair. \nThey make you who you are, and one of a kind with every different mark. \nSome stay, some go.\nSome brighter, some lighter.\nSome bigger, some smaller.\nSome deeper, some one the surface. \nBut they are really all the same, you see?\nThey are all scars, just telling different points of our life, our story. \nOur souvenir throughout our whole life, that shows us how much we've grown. \nHow much we have overcome. How strong we've become.\nHow brave and courageous we've become from the hardest and darkest times of our life. \u2022\n\u2022\n\u2022\n\u2022\n#poem #cuts #selfharm #tatoo #dark #pain #sad #lonely #anxiety #depressed"}}]},"caption_is_edited":true,"has_ranked_comments":false,"edge_media_to_comment":{"count":1,"page_info":{"has_next_page":false,"end_cursor":null},"edges":[]},"comments_disabled":false,"taken_at_timestamp":1548913011,"edge_media_preview_like":{"count":17,"edges":[]},"edge_media_to_sponsor_user":{"edges":[]},"location":null,"viewer_has_liked":false,"viewer_has_saved":false,"viewer_has_saved_to_collection":false,"viewer_in_photo_of_you":false,"viewer_can_reshare":true,"owner":{"id":"10173498181","is_verified":false,"profile_pic_url":"https://instagram.fhel3-1.fna.fbcdn.net/vp/9a17134e8d0a36efec53f1da5cac1f38/5D14BC0F/t51.2885-19/s150x150/47690762_475199173011446_4764198224049209344_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","username":"devils..tea.","blocked_by_viewer":false,"followed_by_viewer":false,"full_name":"depressed\ud83e\udd40","has_blocked_viewer":false,"is_private":false,"is_unpublished":false,"requested_by_viewer":false}......
'''
r = re.compile(r'username":"(.*)(?=","blocked)')
print(r.findall(data))
Or, for those of us who don't like regex (nudge, nudge #QHarr :D), you can try this:
data = [your quote above]
data_list = data.split(",")
for i in data_list:
if 'username' in i:
print(i)
Output:
"username":"devils..tea."

Custom mesh jittering in Mujoco environment in OpenAI gym

I've tried modifying the FetchPickAndPlace-v1 OpenAI environment to replace the cube with a pair of scissors. Everything works perfectly except for the fact that my custom mesh seems to jitter a few millimeters in and out of the table every few time steps. I've included a picture mid-jitter below:
As you can see, the scissors are caught mid-way through the surface of the table. How can I prevent this? All I've done is switch out the code for the cube in pick_and_place.xml with the asset related to the scissor mesh. Here's the code of interest:
<body name="object0" pos="0.0 0.0 0.0">
<joint name="object0:joint" type="free" damping="0.01"></joint>
<geom size="0.025 0.025 0.025" mesh="tool0:scissors" condim="3" name="object0" material="tool_mat" class="tool0:matte" mass="2"></geom>
<site name="object0" pos="0 0 0" size="0.02 0.02 0.02" rgba="1 0 0 1" type="sphere"></site>
</body>
I've tried playing around with the coordinates of the position and geometry but to no avail. Any tips? Replacing mesh="tool0:scissors" with type="box" gets rid of the problem entirely but I'm back to square one.
As suggested by Emo Todorov in the MuJoCo forums:
Replace the ground box with a plane and
use MuJoCo 2.0. The latest version of the collision detector
generates multiple contacts between a mesh and a plane, which
results in more stable simulation. But this only works for
plane-mesh, not for box-mesh.
The better solution is to break the mesh into several meshes, and include them as multiple geoms in the same body. Then MuJoCo will construct the convex hull of each sub-mesh, resulting in multiple contact points (even without the special plane mechanism mentioned above) and furthermore it will be a better approximation to the actual object geometry.

Get the most probable color from a words set

Are there any libraries existing or methods that let you to figure out the most probable color for a words set? For example, cucumber, apple, grass, it gives me green color. Did anyone work in that direction before?
If i have to do that, i will try to search images based on the words using google image or others and recognize the most common color of top n results.
That sounds like a pretty reasonable NLP problem and one thats very easy to handle via map-reduce.
Identify a list of words and phrases that you call colors ['blue', 'green', 'red', ...].
Go over a large corpus of sentences, and for the sentences that mention a particular color, for every other word in that sentence, note down (word, color_name) in a file. (Map Step)
Then for each word you have seen in your corpus, aggregate all the colors you have seen for it to get something like {'cucumber': {'green': 300, 'yellow': 34, 'blue': 2}, 'tomato': {'red': 900, 'green': 430'}...} (Reduce Step)
Provided you use a large enough corpus (something like wikipedia), and you figure out how to prune really small counts, rare words, you should be able to make pretty comprehensive and robust dictionary mapping millions of the items to their colors.
Another way to do that is to do a text search in google for combinations of colors and the word in question and take the combination with the highest number of results. Here's a quick Python script for that:
import urllib
import json
import itertools
def google_count(q):
query = urllib.urlencode({'q': q})
url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' % query
search_response = urllib.urlopen(url)
search_results = search_response.read()
results = json.loads(search_results)
data = results['responseData']
return int(data['cursor']['estimatedResultCount'])
colors = ['yellow', 'orange', 'red', 'purple', 'blue', 'green']
# get a list of google search counts
res = [google_count('"%s grass"' % c) for c in colors]
# pair the results with their corresponding colors
res2 = list(itertools.izip(res, colors))
# get the color with the highest score
print "%s is %s" % ('grass', sorted(res2)[-1][1])
This will print:
grass is green
Daniel's and Xi.lin's answers are very good ideas. Along the same axis, we could combine both with an approach similar to Xilin's but more simple: Query Google Image with the word you want to find the color associated with + a "Color" filter (see in the lower left bar). And see which color yields more results.
I would suggest using a tightly defined set of sources if possible such as Wikipedia and Wordnet.
Here, for example, is Wordnet for "panda":
S: (n) giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca
(large black-and-white herbivorous mammal of bamboo forests of China and Tibet;
in some classifications considered a member of the bear family or of a separate
family Ailuropodidae)
S: (n) lesser panda, red panda, panda, bear cat, cat bear,
Ailurus fulgens (reddish-brown Old World raccoon-like carnivore;
in some classifications considered unrelated to the giant pandas)
Because of the concise, carefully constructed language it is highly likely that any colour words will be important. Here you can see that pandas are both black-and-white and reddish-brown.
If you identify subsections of Wikipedia (e.g. "Botanical Description") this will help to increase the relevance of your results. Also the first image in Wikipedia is very likely to be the best "definitive" one.
But, as with all statistical methods, you will get false positives (and negatives , though these are probably less of a problem).