I'm having trouble with sending a form using POST to retrieve data in R - html

I'm having trouble collecting doctors from https://www.uchealth.org/providers/. I've found out it's a POST method but with httr I can't seem to create the form. Here's what I have
url = url = 'https://www.uchealth.org/providers/'
formA = list(title = 'Search', onClick = 'swapForms();', doctor-area-of-care-top = 'Cancer Care')
formB = list(Search = 'swapForms();', doctor-area-of-care = 'Cancer Care')
get = POST(url, body = formB, encode = 'form')
I'm fairly certain formB is the correct one. However, I can't test it since I yield an error when trying to make the list. I believe it is because you can't use "-" characters when naming although I could be wrong on that. Could somebody help please?

I am unable to comment properly but try this to create an list. Code below worked for me.
library(httr)
url = 'https://www.uchealth.org/providers/'
formB = list(Search = 'swapForms();', `doctor-area-of-care` = 'Cancer Care')
get = POST(url, body = formB, encode = 'form')
When you are creating names with spaces or some other special character you have to put it into the operator above.

Related

Writing items from function to separate text files?

I'm running some web scraping, and now have a list of 911 links saved in the following (I included 5 to demonstrate how they're stored):
every_link = ['http://www.millercenter.org/president/obama/speeches/speech-4427', 'http://www.millercenter.org/president/obama/speeches/speech-4425', 'http://www.millercenter.org/president/obama/speeches/speech-4424', 'http://www.millercenter.org/president/obama/speeches/speech-4423', 'http://www.millercenter.org/president/obama/speeches/speech-4453']
These URLs link to presidential speeches over time. I want to store each individual speech (so, 911 unique speeches) in different text files, or be able to group by president. I'm trying to pass the following function on to these links:
def processURL(l):
open_url = urllib2.urlopen(l).read()
item_soup = BeautifulSoup(open_url)
item_div = item_soup.find('div',{'id': 'transcript'},{'class': 'displaytext'})
item_str = item_div.text.lower()
item_str_processed = punctuation.sub('',item_str)
item_str_processed_final = item_str_processed.replace('—',' ')
for l in every_link:
processURL(l)
So, I would want to save to unique text files words from the all the processed speeches. This might look like the following, with obama_44xx representing individual text files:
obama_4427 = "blah blah blah"
obama_4425 = "blah blah blah"
obama_4424 = "blah blah blah"
...
I'm trying the following:
for l in every_link:
processURL(l)
obama.write(processURL(l))
But that's not working...
Is there another way I should go about this?
Okay, so you have a couple of issues. First of all, your processURL function doesn't actually return anything, so when you try to write the return value of the function, it's going to be None. Maybe try something like this:
def processURL(link):
open_url = urllib2.urlopen(link).read()
item_soup = BeautifulSoup(open_url)
item_div = item_soup.find('div',{'id': 'transcript'},{'class': 'displaytext'})
item_str = item_div.text.lower()
item_str_processed = punctuation.sub('',item_str)
item_str_processed_final = item_str_processed.replace('—',' ')
splitlink = link.split("/")
president = splitlink[4]
speech_num = splitlink[-1].split("-")[1]
filename = "{0}_{1}".format(president, speech_num)
return filename, item_str_processed_final # returning a tuple
for link in every_link:
filename, content = processURL(link) # yay tuple unpacking
with open(filename, 'w') as f:
f.write(content)
This will write each file to a filename that looks like president_number. So for example, it will write Obama's speech with id number 4427 to a file called obama_4427. Lemme know if that works!
You have to call the processURL function and have it return the text you want written. After that, you simply have to add the writing to disk code within the loop. Something like this:
def processURL(l):
open_url = urllib2.urlopen(l).read()
item_soup = BeautifulSoup(open_url)
item_div = item_soup.find('div',{'id': 'transcript'},{'class': 'displaytext'})
item_str = item_div.text.lower()
#item_str_processed = punctuation.sub('',item_str)
#item_str_processed_final = item_str_processed.replace('—',' ')
return item_str
for l in every_link:
speech_text = processURL(l).encode('utf-8').decode('ascii', 'ignore')
speech_num = l.split("-")[1]
with open("obama_"+speech_num+".txt", 'w') as f:
f.write(speech_text)
The .encode('utf-8').decode('ascii', 'ignore') is purely for dealing with non-ascii characters in the text. Ideally you would handle them in a different way, but that depends on your needs (see Python: Convert Unicode to ASCII without errors).
Btw, the 2nd link in your list is 404. You should make sure your script can handle that.

How to begin debugging POST and GET requests in Django and Python Shell

I have written the following code.
The issue that I am having is that the database is not being updated and a new record is not being made. I am assuming that the way I am retrieving my POST are incorrect but that's why I am asking where do I begin to debug? I have created template variables for quantity,action,building, and test. When I try to call them in the html {{ test }} for example nothing ever shows up even the one that is hardcoded. I used firebug and the form is indeed posting the values that should be posted.
the form consists of two drop down menus one for action and one for building. A numerical input quantity, a text box and a submit button. If anyone can offer me some advice it would be appreciated. I really don't understand why the hardcoded one doesn't show up
If you need any more information please let me know.
def update(request,Type_slug, slug, id):
error = False
Slug = slug
ID = id
Type = Type_slug
test = 'test'
quantity = request.POST['Qty']
action = request.POST['Action']
building = request.POST['Building']
comments = request.POST['Comments']
if Type == 'Chemicals':
item = Chemicals.objects.filter(S_field=Slug, id= ID)
New_Record = ChemicalRecord(Name=item.Name,Barcode=item.Barcode,Cost=item.Cost,Action=action,Building=building)
if building == 'Marcus':
building_two = 'Pettit'
elif building =='Pettit':
building_two ='Marcus'
Record_one = ChemicalRecord.objects.filter(Barcode=New_Record.Barcode).filter(Building=New_Record.Building)
if Record_one:
Record_one = ChemicalRecord.objects.filter(Barcode=New_Record.Barcode).filter(Building=New_Record.Building).latest('id')
New_Record.On_hand = Record_one.On_hand
else:
New_Record.On_hand = 0
if action == 'Receiving':
New_Record.On_hand = New_Record.On_hand+quantity
elif action == 'Removing':
New_Record.On_hand = New_Record.On_hand-quantity
Record_two = ChemicalRecord.objects.filter(Barcode=New_Record.Barcode).filter(Building=building_two)
if Record_two:
Record_two = ChemicalRecord.objects.filter(Barcode=New_Record.Barcode).filter(Building=building_two).latest('id')
Record_two_qty = Record_two.On_hand
else:
Record_two_qty = 0
New_qty = New_Record.On_hand+Record_two_qty
Chemicals.objects.filter(Barcode=obj.Barcode).update(On_hand=New_qty)
New_Record.save()
You can use import pdb;pdb.set_trace() for debugging.

page number variable: html,django

i want to do paging. but i only want to know the current page number, so i will call the webservice function and send this parameter and recieve the curresponding data. so i only want to know how can i be aware of current page number? i'm writing my project in django and i create the page with xsl. if o know the page number i think i can write this in urls.py:
url(r'^ask/(\d+)/$',
'ask',
name='ask'),
and call the function in views.py like:
ask(request, pageNo)
but i don't know where to put pageNo var in html page. (so fore example with pageN0=2, i can do pageNo+1 or pageNo-1 to make the url like 127.0.0.01/ask/3/ or 127.0.0.01/ask/2/). to make my question more cleare i want to know how can i do this while we don't have any variables in html?
sorry for my crazy question, i'm new in creating website and also in django. :">
i'm creating my html page with xslt. so i send the total html page. (to show.html which contains only {{str}} )
def ask(request:
service = GetConfigLocator().getGetConfigHttpSoap11Endpoint()
myRequest = GetConfigMethodRequest()
myXml = service.GetConfigMethod(myRequest)
myXmlstr = myXml._return
styledoc = libxml2.parseFile("ask.xsl")
style = libxslt.parseStylesheetDoc(styledoc)
doc = libxml2.parseDoc(myXmlstr)
result = style.applyStylesheet(doc, None)
out = style.saveResultToString( result )
ok = mark_safe(out)
style.freeStylesheet()
doc.freeDoc()
result.freeDoc()
return render_to_response("show.html", {
'str': ok,
}, context_instance=RequestContext(request))
i'm not working with db and i just receive xml file to parse it. so i don't have contact_list = Contacts.objects.all(). can i still use this way? should i put the first parameter inpaginator = Paginator(contact_list, 25) blank?
if you user standart django paginator, thay send you to url http://example.com/?page=N, where N - number you page
So,
# urls.py
url('^ask/$', 'ask', name='viewName'),
You can get page number in views:
# views.py
def ask(request):
page = request.GET.get('page', 1)

Using MATLAB to parse HTML for URL in anchors, help fast

I'm on a strict time limit and I really need a regex to parse this type of anchor (they're all in this format)
20120620_0512_c2_102..>
for the URL
20120620_0512_c2_1024.jpg
I know its not a full URL, it's relative, please help
Here's my code so far
year = datestr(now,'yyyy');
timestamp = datestr(now,'yyyymmdd');
html = urlread(['http://sohowww.nascom.nasa.gov//data/REPROCESSING/Completed/' year '/c2/' timestamp '/']);
links = regexprep(html, '<a href=.*?>', '');
Try the following:
url = 'http://sohowww.nascom.nasa.gov/data/REPROCESSING/Completed/2012/c2/20120620/';
html = urlread(url);
t = regexp(html, '<a href="([^"]*\.jpg)">', 'tokens');
t = [t{:}]'
The resulting cell array (truncated):
t =
'20120620_0512_c2_1024.jpg'
'20120620_0512_c2_512.jpg'
...
'20120620_2200_c2_1024.jpg'
'20120620_2200_c2_512.jpg'
I think this is what you are looking for:
htmlLink = '20120620_0512_c2_102..>';
link = regexprep(htmlLink, '(.*)', '$2');
link =
20120620_0512_c2_1024.jpg
regexprep works also for cell arrays of strings, so this works too:
htmlLinksCellArray = { '20120620_0512_c2_102..>', '20120620_0512_c2_102..>', '20120620_0512_c2_102..>' };
linksCellArray = regexprep(htmlLinksCellArray, '(.*)', '$2')
linksCellArray =
'20120620_0512_c2_1024.jpg' '20120620_0512_c2_1025.jpg' '20120620_0512_c2_1026.jpg'

Extracting links and twitter replies from a string

I am getting a string from Twitter into my Actionscript which is a unformatted string. I want to be able to extract any links and or any #replies from the string, then display it in htmlText.
So far I have this
var txt:String = "This is just some text http://www.thisisawebsite.com and some more text via #sumTwitter";
var twitterText:String = txt.slice(txt.indexOf("#"),txt.indexOf(" ",txt.indexOf("#")));
var urlText:String = txt.slice(txt.indexOf("http"),txt.indexOf(" ",txt.indexOf("http")));
var newURL:String = ""+urlText+"";
var arr:Array = txt.split(urlText);
var newString:String = arr[0] + newURL + arr[1];
var txtField:TextField = new TextField();
txtField.width = 500;
txtField.htmlText = newString;
addChild(txtField);
This is fine for extracting links, which finish with a space. But what if, like the #sumTwitter, it finishes at the end of the string. And also what if there are multiple links or #'s, is the best way to put it in a while loop?
Regular expressions are the best option for what you want, I think.
Check Grant Skinner's RegExr. You could write and test your own RegExp there, which is very convenient. But you also can find a lot of useful ready-to-use regexps created by different users. Check out the "community" tab in the right panel. There, search by some meaningful keywords like "twitter" and "url" and you'll get a good number of options.
For example,
Grab urls:
http://regexr.com?2s5m4
Capture twitter usernames:
http://regexr.com?2s5m7