python find_all_next in beautifulsoup can't find a string - html

I'm trying to get a username from Instagram page. And I should use a part of data which I get after "data = soup.find_all('script') [3]"
It looks like this:
Blockquote
(script type="text/javascript">window._sharedData = {"config":{"csrf_token":"hIuZDxW17bTXz5EDLY25ftqivOOrLEeZ","viewer":null,"viewerId":null},"supports_es6":false,"country_code":"RU","language_code":"en","locale":"en_US","entry_data":{"PostPage":[{"graphql":{"shortcode_media":{"__typename":"GraphImage","id":"1968747493659350883","shortcode":"BtSZWokAZdj","dimensions":{"height":640,"width":640},"gating_info":null,"media_preview":"ACoq5miitSxxIGTHPXPGcd8ZFAGXRXSSWypFsAAZ/lzjpn/Csm5sjAu7Ib8MUAUaKU0lABVq0lMUqsPUA/Q8VVpynBB9CKAOtuOFB9CD+uP5Gq19HuiOPTP5Ul1exhdgy7kdF7fU/wCGatJiRPqv5ZFIZybnP4UynOpUlT1HFNpiClDFeRSUUATLcSJ904+lPF5MvR2H41WooAc7lzuY5J702iigD//Z","display_url":"https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","display_resources":[{"src":"https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","config_width":640,"config_height":640},{"src":"https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","config_width":750,"config_height":750},{"src":"https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","config_width":1080,"config_height":1080}],"accessibility_caption":"Image may contain: one or more people and closeup","is_video":false,"should_log_client_event":false,"tracking_token":"eyJ2ZXJzaW9uIjo1LCJwYXlsb2FkIjp7ImlzX2FuYWx5dGljc190cmFja2VkIjp0cnVlLCJ1dWlkIjoiN2Q1Yjg2NmY5OGIwNDVhNWIxMmRhNjEwZTA3NDY1MmYxOTY4NzQ3NDkzNjU5MzUwODgzIn0sInNpZ25hdHVyZSI6IiJ9","edge_media_to_tagged_user":{"edges":[]},"edge_media_to_caption":{"edges":[{"node":{"text":"\u2022\nScars show your story. \nYour pain. \nYour hate.\nYour sadness and despair. \nThey make you who you are, and one of a kind with every different mark. \nSome stay, some go.\nSome brighter, some lighter.\nSome bigger, some smaller.\nSome deeper, some one the surface. \nBut they are really all the same, you see?\nThey are all scars, just telling different points of our life, our story. \nOur souvenir throughout our whole life, that shows us how much we've grown. \nHow much we have overcome. How strong we've become.\nHow brave and courageous we've become from the hardest and darkest times of our life. \u2022\n\u2022\n\u2022\n\u2022\n#poem #cuts #selfharm #tatoo #dark #pain #sad #lonely #anxiety #depressed"}}]},"caption_is_edited":true,"has_ranked_comments":false,"edge_media_to_comment":{"count":1,"page_info":{"has_next_page":false,"end_cursor":null},"edges":[]},"comments_disabled":false,"taken_at_timestamp":1548913011,"edge_media_preview_like":{"count":17,"edges":[]},"edge_media_to_sponsor_user":{"edges":[]},"location":null,"viewer_has_liked":false,"viewer_has_saved":false,"viewer_has_saved_to_collection":false,"viewer_in_photo_of_you":false,"viewer_can_reshare":true,"owner":{"id":"10173498181","is_verified":false,"profile_pic_url":"https://instagram.fhel3-1.fna.fbcdn.net/vp/9a17134e8d0a36efec53f1da5cac1f38/5D14BC0F/t51.2885-19/s150x150/47690762_475199173011446_4764198224049209344_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","username":"devils..tea.","blocked_by_viewer":false,"followed_by_viewer":false,"full_name":"depressed\ud83e\udd40","has_blocked_viewer":false,"is_private":false,"is_unpublished":false,"requested_by_viewer":false}......
There is "username" part (at the end of blockquote). I think that it is a string, but I can't catch it. So it's not a string, but what is it? It is a class? Which method I should use to retreive the username "username":"devils..tea.". Thank you in advance, if you can help.
....
req = requests.get(url)
soup = BeautifulSoup(req.text, "lxml")
data = soup.find_all('script') [3]
username = data.find_all_next(string="username")
print (username)

You could use regex
import re
data = '''
(script type="text/javascript">window._sharedData = {"config":{"csrf_token":"hIuZDxW17bTXz5EDLY25ftqivOOrLEeZ","viewer":null,"viewerId":null},"supports_es6":false,"country_code":"RU","language_code":"en","locale":"en_US","entry_data":{"PostPage":[{"graphql":{"shortcode_media":{"__typename":"GraphImage","id":"1968747493659350883","shortcode":"BtSZWokAZdj","dimensions":{"height":640,"width":640},"gating_info":null,"media_preview":"ACoq5miitSxxIGTHPXPGcd8ZFAGXRXSSWypFsAAZ/lzjpn/Csm5sjAu7Ib8MUAUaKU0lABVq0lMUqsPUA/Q8VVpynBB9CKAOtuOFB9CD+uP5Gq19HuiOPTP5Ul1exhdgy7kdF7fU/wCGatJiRPqv5ZFIZybnP4UynOpUlT1HFNpiClDFeRSUUATLcSJ904+lPF5MvR2H41WooAc7lzuY5J702iigD//Z","display_url":"https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","display_resources":[{"src":"https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","config_width":640,"config_height":640},{"src":"https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","config_width":750,"config_height":750},{"src":"https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","config_width":1080,"config_height":1080}],"accessibility_caption":"Image may contain: one or more people and closeup","is_video":false,"should_log_client_event":false,"tracking_token":"eyJ2ZXJzaW9uIjo1LCJwYXlsb2FkIjp7ImlzX2FuYWx5dGljc190cmFja2VkIjp0cnVlLCJ1dWlkIjoiN2Q1Yjg2NmY5OGIwNDVhNWIxMmRhNjEwZTA3NDY1MmYxOTY4NzQ3NDkzNjU5MzUwODgzIn0sInNpZ25hdHVyZSI6IiJ9","edge_media_to_tagged_user":{"edges":[]},"edge_media_to_caption":{"edges":[{"node":{"text":"\u2022\nScars show your story. \nYour pain. \nYour hate.\nYour sadness and despair. \nThey make you who you are, and one of a kind with every different mark. \nSome stay, some go.\nSome brighter, some lighter.\nSome bigger, some smaller.\nSome deeper, some one the surface. \nBut they are really all the same, you see?\nThey are all scars, just telling different points of our life, our story. \nOur souvenir throughout our whole life, that shows us how much we've grown. \nHow much we have overcome. How strong we've become.\nHow brave and courageous we've become from the hardest and darkest times of our life. \u2022\n\u2022\n\u2022\n\u2022\n#poem #cuts #selfharm #tatoo #dark #pain #sad #lonely #anxiety #depressed"}}]},"caption_is_edited":true,"has_ranked_comments":false,"edge_media_to_comment":{"count":1,"page_info":{"has_next_page":false,"end_cursor":null},"edges":[]},"comments_disabled":false,"taken_at_timestamp":1548913011,"edge_media_preview_like":{"count":17,"edges":[]},"edge_media_to_sponsor_user":{"edges":[]},"location":null,"viewer_has_liked":false,"viewer_has_saved":false,"viewer_has_saved_to_collection":false,"viewer_in_photo_of_you":false,"viewer_can_reshare":true,"owner":{"id":"10173498181","is_verified":false,"profile_pic_url":"https://instagram.fhel3-1.fna.fbcdn.net/vp/9a17134e8d0a36efec53f1da5cac1f38/5D14BC0F/t51.2885-19/s150x150/47690762_475199173011446_4764198224049209344_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","username":"devils..tea.","blocked_by_viewer":false,"followed_by_viewer":false,"full_name":"depressed\ud83e\udd40","has_blocked_viewer":false,"is_private":false,"is_unpublished":false,"requested_by_viewer":false}......
'''
r = re.compile(r'username":"(.*)(?=","blocked)')
print(r.findall(data))

Or, for those of us who don't like regex (nudge, nudge #QHarr :D), you can try this:
data = [your quote above]
data_list = data.split(",")
for i in data_list:
if 'username' in i:
print(i)
Output:
"username":"devils..tea."

Related

How to add mass item to the Mysql in Django

I made a django website and i must add thousands of item to mysql but my mind is now stopped.
When you see the picture, rubik, remedi and point fields available.
I have a sentence forexamle: Headache : Pulsatilla(3)
I click the rubik field choose headache, and then click remedi field choose pulsatilla, and then lastly click point field and choose 3 point.
Everything is good.
But, when i have sentence like this;
ABSORBED, buried in thought : Acon., aloe., am-m., ant-c., arn., bell., bov., calc., cann-i., canth., caps., carl., caust., cham., chin., cic., clem., cocc., con., cupr., cycl., elaps., grat., ham., Hell., ign., ip., lil-t., mang., merc., Mez., mosch., mur-ac., nat-c., nat-m., nat-p., nit-ac., Nux-m., ol-an., onos., op., phel., phos., puls., rheum., sabad., sars., spig., stann., stram., Sulph.
I don't know how to add this easily.
I wanna explain this sentence.
ABSORBED, buried in thought = This is rubik
Acon., aloe., am-m., ant-c., arn., bell., = These are remedi (each one). Bold remedi (for. Acon) = 3 point, italic remedi (for. caps) = 2 point and other remedies = 1 point.
When i use django admin panel it will took too many times because one by one add. And how can i add this rubik and remedies and points easily. I need vision.
Thanks for your supports.

How can I use "Interpolated Absolute Discounting" for a bigram model in language modeling?

I want to compare two smoothing methods for a bigram model:
Add-one smoothing
Interpolated Absolute Discounting
For the first method, I found some codes.
def calculate_bigram_probabilty(self, previous_word, word):
bigram_word_probability_numerator = self.bigram_frequencies.get((previous_word, word), 0)
bigram_word_probability_denominator = self.unigram_frequencies.get(previous_word, 0)
if self.smoothing:
bigram_word_probability_numerator += 1
bigram_word_probability_denominator += self.unique__bigram_words
return 0.0 if bigram_word_probability_numerator == 0 or bigram_word_probability_denominator == 0 else float(
bigram_word_probability_numerator) / float(bigram_word_probability_denominator)
However, I found nothing for the second method except for some references for 'KneserNeyProbDist'. However, this is for trigrams!
How can I change my code above to calculate it? The parameters of this method must be estimated from a development-set.
In this answer I just clear up a few things that I just found about your problem, but I can't provide a coded solution.
with KneserNeyProbDist you seem to refer to a python implementation of that problem: https://kite.com/python/docs/nltk.probability.KneserNeyProbDist
There exists an article about Kneser–Ney smoothing on wikipedia: https://en.wikipedia.org/wiki/Kneser%E2%80%93Ney_smoothing
The article above links this tutorial: https://nlp.stanford.edu/~wcmac/papers/20050421-smoothing-tutorial.pdf but this has a small fault on the most important page 29, the clear text is this:
Modified Kneser-Ney
Chen and Goodman introduced modified Kneser-Ney:
Interpolation is used instead of backoff. Uses a separate discount for one- and two-counts instead of a single discount for all counts. Estimates discounts on held-out data instead of using a formula
based on training counts.
Experiments show all three modifications improve performance.
Modified Kneser-Ney consistently had best performance.
Regrettable the modified Version is not explained in that document.
The original documentation by Chen & Goodman luckily is available, the Modified Kneser–Ney smoothing is explained on page 370 of this document: http://u.cs.biu.ac.il/~yogo/courses/mt2013/papers/chen-goodman-99.pdf.
I copy the most important text and formula here as screenshot:
So the Modified Kneser–Ney smoothing now is known and seems being the best solution, just translating the description beside formula in running code is still one step to do.
It might be helpful that below the shown text (above in screenshot) in the original linked document is still some explanation that might help to understand the raw description.

Is there any easy way to add color to single words in VB2010?

I been trying to research it a bit and this may be frowned upon, but I don't want to learn how to do everything in VB since I'm just doing what should be a 5-10 minute program to make something a little easier for myself. So sorry for that in advance.
Anyways, I just want to add colors or any formatting really to make stuff easier to read or the like. I originally was doing textboxes with the read only attribute and found you couldn't add good formatting with it. Label was similar in terms you could add one color a label. RichTextBox was the next idea and while it works, it seems like quite a bit of work for what I'm needing.
I just want an idea on how to make a single form have a font like "these three words"
It's super easy with bbcone and html, and I can't imagine the best way in VB is something that takes around 10 lines for one string of text.
Thanks.
Like you don't want to learn how to, but have notions of Html, then I think that the best solution for your needs is to use this 3rd party user-control, Html Renderer, which can render html/css code.
Another approach is to use a WebBrowser as mentioned in the comments, however, it will be (very)slower, or also you can use the MSHTML ocx which Microsoft specified that is more focused to document renderization tasks, but is harder to use than the specified user-control because you will need to investigate for the online documentation and usage of the MSHTML members, and seems that you don't want to do, since you specified that a RichTextBox has too much effort for your needs.
I have a richtextbox1 which has some text in it - I use the code below to search through the text and change specific words to a different font/style and color :-
DIM TZZ as String
TZZ = RichTextBox1.Text
TZZ = UCase(TZZ)
Dim x As Single
For X = 1 To Len(TZZ)
Dim y As Single = InStr(TZZ, "CHANGES MADE")
If y > 0 Then
Dim intLength As Integer = 12
'select the text
RichTextBox1.Select(y - 1, intLength)
RichTextBox1.SelectionFont = New System.Drawing.Font("Tahoma", 10, FontStyle.Bold Or FontStyle.Italic)
RichTextBox1.SelectionColor = Color.Red
Mid(TZZ, y, 12) = "123456789012"
X = y
End If
Next X
RichTextBox1.Select(0, 0)

What's stopping these variable's from being defined in my code?

I've been designing this game, and Ive come across a weird problem. Now maybe it's just something simple I've missed or some way I formatted it. When I run my code I get these errors that suggest that none of the variables I've been defining are being declared/defined. On top of that none of the functions seem to run... Should I define text1, button1Name, etc... as Global variable? or is their a better way?
##Project 1 v4
##Structure whole program with base functionality
##Begin with an outline then fill in functions and windows from V2, and V3.
import Tkinter
import random
inventory=[]
playCount=0
def startGame():
##adds name
##button that calls StoryCard
text1="Welcome to Treasure Quest!! The Game that allows you to pick your own destiny! Treasure quest is a very simple game. The story will display on the screen until it reaches an event. At each event you will be shown two choices. Each decision may cause you to leave with more treasure or to die a horrible death... Still like any true or false question on a final exam, you only have a 50/50 chance of ruining your entire life if you do not know! Now that you know how to play, please click the button below to start the game!"
button1= introCard()
button1Name="Play"
button2= win.destroy
button2Name="Quit"
def introCard():
text1="You are the bravest knight in the service of the great kingdom Universitas! The kingdom has long been an icon throughout the world, but now it faces great peril. The king of Universitas has foolishly spent the entirety of the kingdoms fortune shopping online. The only hope for the kingdom, is to find the money to pay off it's debts. Luckily recently some scrolls have been found that describe the locations of hidden treasures. You have been selected to find these treasures. Do you except?"
button1=winCard()
outcome1="You embark upon your quest!"
button1Name="Accept!"
button2=lossCard()
outcome2="The kingdom falls into debt and you die of dysentary... Nice going Oregon Trail..."
button2Name="Decline!"
def storyCard(n):##This function will randomly select a story chunk
##recieves a random int
##tests for true int w/ if then else line
##Story blurbs for these are stored in V2 copy them!
if n ==1:
text1="You come across a dark cave... A dragon is said to lurk within... You enter and see the Holy Sword... How do you take it?"
button1=lossCard()
outcome1="The dragon bakes you to a crisp... Sorry!"
button1Name="Bargain for it!"
button2=winCard()
outcome2="Victory! You sneak past the dragon and make off with the sword!"
button2Name="Steal it!"
#get="Holy Sword"
elif n==2:
text1="You come across a mummys tomb... You see two doors. Take the front entrance or the hidden entrance?"
button1=winCard()
outcome1="The mummy is awake and glad for company! He feeds you tea and sets you on your way with gifts!"
button1Name="Front"
#get="A beautiful glass vase!"
button2=lossCard()
outcome2="The mummy sees you! It screams thief! and shivs you..."
button2Name="Hidden"
elif n==3:
text1="You come across a magic tree inhabited by elves. They ask you to climb up and see their beautiful home."
button1=lossCard()
outcome1="You climb high but slip off a branch and fall to your death. The elves laugh."
button1Name="Climb"
button2=winCard()
outcome2="The elves are perplexed as to why you do not want to see them. They come down and give you a new couch to show off their things."
button2Name="Stay"
#get="A rad couch"
elif n==4:
text1="You come across a cave filled with strange lights! A troll stands guard."
button1=winCard()
outcome1="This is a hip and happening new dwarven nightclub! The troll lets you in because you seem cool! The dwarves give you some rare cave mushrooms. It seems lame but they said the kings a regular and they will go over well. You have a fun night... You drink to much ale and get a late start the next morning though..."
button1Name="Strut in!"
#get="Rare Mushrooms"
button2=winCard()
outcome2="This is actually a hip new dwarven nightclub! The bouncer troll won't let you in though... Oh well! dwarves are hipsters anyways! Gratefull for your company the troll gives you an amulet!"
button2Name="Wait in line."
#get="Sweet Amulet"
elif n==5:
text1="You come across a hut in the woods! Its owned by Merlin! The great wizard starts up conversation and invites you inside!"
button1=lossCard()
outcome1="Insulted, Merlin kills you with an axe... you would have expected magic... but no, Merlin is a psychopath and prefers to murder with axes."
button1Name="Leave"
button2=winCard()
outcome2="He talks about his GOD DAMNED GRANDKIDS FOR FOUR HOURS... He is grateful for you company however and lets you leave with his staff."
button2Name="Enter"
#get="A sweet Staff"
## each if, elif, ect must contain
##Text blurb
##Button 1 function def
##Button 2 function def
##button 1 name
##button 2 name
##Outcome 1 Text
##Outcome 2 Text
def winGame():
text1="CONGRATULATIONS!! You have collected enough treasure to pay off the kingdoms debts! You will forever be known to the people as a great knight! The king is in your debt!"
button1Name="Hooray!"
button1=win.destroy
button2Name="Huzzah!"
button2=win.destroy
def winCard():##Describes how event played out
#inventory.append[get]
playCount+=1
text1= outcome1 or outcome2
if playCount == 5:
button1Name="Onward!"
button1=winGame()
button2Name="Go Forth!"
button2=winGame()
else:
button1Name="Onward!"
button1=storyCard(random.randint(1,5))
button2Name="Go Forth!"
button2=storyCard(random.randint(1,5))
def lossCard():
text1=outcome1 or outcome2
button1Name="Game Over"
button1= win.destroy
button2Name="Game Over"
button2= win.destroy
startGame()
win=Tkinter.Tk()
win.title("TREASURE QUEST!!")
Textlabel=Tkinter.Label(win,text = text1,font=('Times New Roman',12),justify=LEFT,)
Textlabel.pack()
Row2=Tkinter.Frame(win)
Btn1=Tkinter.Button\
(Row2, text=button1Name, command=button1,font=('Times New Roman',12))
Btn2=Tkinter.Button\
(Row2, text=button2Name, command=button2, font=('Times New Roman',12))
Btn1.pack(side='left')
Btn2.pack(side='left')
Row2.pack()
win.mainloop()
When I run the code I get these errors:
Traceback (most recent call last):
File "C:/Users/Logan/Desktop/Project#1V4.py", line 141, in <module>
startGame()
File "C:/Users/Logan/Desktop/Project#1V4.py", line 17, in startGame
button1= introCard()
File "C:/Users/Logan/Desktop/Project#1V4.py", line 26, in introCard
button1=winCard()
File "C:/Users/Logan/Desktop/Project#1V4.py", line 116, in winCard
playCount+=1
UnboundLocalError: local variable 'playCount' referenced before assignment
If you want to change a global variable in a function, you need to use the keyword global as follows -
def winCard():##Describes how event played out
global playCount
#inventory.append[get]
playCount+=1

Get the most probable color from a words set

Are there any libraries existing or methods that let you to figure out the most probable color for a words set? For example, cucumber, apple, grass, it gives me green color. Did anyone work in that direction before?
If i have to do that, i will try to search images based on the words using google image or others and recognize the most common color of top n results.
That sounds like a pretty reasonable NLP problem and one thats very easy to handle via map-reduce.
Identify a list of words and phrases that you call colors ['blue', 'green', 'red', ...].
Go over a large corpus of sentences, and for the sentences that mention a particular color, for every other word in that sentence, note down (word, color_name) in a file. (Map Step)
Then for each word you have seen in your corpus, aggregate all the colors you have seen for it to get something like {'cucumber': {'green': 300, 'yellow': 34, 'blue': 2}, 'tomato': {'red': 900, 'green': 430'}...} (Reduce Step)
Provided you use a large enough corpus (something like wikipedia), and you figure out how to prune really small counts, rare words, you should be able to make pretty comprehensive and robust dictionary mapping millions of the items to their colors.
Another way to do that is to do a text search in google for combinations of colors and the word in question and take the combination with the highest number of results. Here's a quick Python script for that:
import urllib
import json
import itertools
def google_count(q):
query = urllib.urlencode({'q': q})
url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' % query
search_response = urllib.urlopen(url)
search_results = search_response.read()
results = json.loads(search_results)
data = results['responseData']
return int(data['cursor']['estimatedResultCount'])
colors = ['yellow', 'orange', 'red', 'purple', 'blue', 'green']
# get a list of google search counts
res = [google_count('"%s grass"' % c) for c in colors]
# pair the results with their corresponding colors
res2 = list(itertools.izip(res, colors))
# get the color with the highest score
print "%s is %s" % ('grass', sorted(res2)[-1][1])
This will print:
grass is green
Daniel's and Xi.lin's answers are very good ideas. Along the same axis, we could combine both with an approach similar to Xilin's but more simple: Query Google Image with the word you want to find the color associated with + a "Color" filter (see in the lower left bar). And see which color yields more results.
I would suggest using a tightly defined set of sources if possible such as Wikipedia and Wordnet.
Here, for example, is Wordnet for "panda":
S: (n) giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca
(large black-and-white herbivorous mammal of bamboo forests of China and Tibet;
in some classifications considered a member of the bear family or of a separate
family Ailuropodidae)
S: (n) lesser panda, red panda, panda, bear cat, cat bear,
Ailurus fulgens (reddish-brown Old World raccoon-like carnivore;
in some classifications considered unrelated to the giant pandas)
Because of the concise, carefully constructed language it is highly likely that any colour words will be important. Here you can see that pandas are both black-and-white and reddish-brown.
If you identify subsections of Wikipedia (e.g. "Botanical Description") this will help to increase the relevance of your results. Also the first image in Wikipedia is very likely to be the best "definitive" one.
But, as with all statistical methods, you will get false positives (and negatives , though these are probably less of a problem).