Adding http headers to sunspot rails request - sunspot-rails

We are using sunspot-rails to connect to websolr. I am trying to find out a way to add http headers to the outgoing request. The samples are present only for rsolr but not for sunspot-rails.(https://github.com/onemorecloud/websolr-demo-advanced-auth).
The purpose is to use the headers for authentication.Is there a way to add/modify http headers from sunspot-rails for both indexing and querying calls?

I think I found the answer to this:
https://groups.google.com/forum/#!searchin/ruby-sunspot/authentication/ruby-sunspot/-FtTQdg4czs/mvOuB7g8yCgJ
The example quoted by outoftime in this would be the solution to retrieve the http object.
class SolrConnectionFactoryWithTimeout
def initialize(timeout = 60)
#timeout = timeout
end
def connect(opts = {})
client = RSolr.connect(opts)
solr_connection = client.connection
http = solr_connection.connection
http.read_timeout = #timeout
client
end
end
Sunspot::Session.connection_class =
SolrConnectionFactoryWithTimeout.new(timeout.to_f)
Then use in combination with
http://ruby-doc.org/stdlib-2.0/libdoc/net/http/rdoc/Net/HTTP.html#label-Setting+Headers
req = Net::HTTP::Get.new(uri)
req['If-Modified-Since'] = file.mtime.rfc2822

Related

Extracting the start and end time of Task Id running inside a Job (Job orchestration) from Databricks Jobs API 2.1 using Python

I want to fetch the list of all running jobs by using Jobs API 2.1 "2.1/jobs/runs/list" and the detail of task running inside it. I have gone through this link "https://docs.databricks.com/dev-tools/api/latest/jobs.html#operation/JobsRunsList" which states that we can view the Task details in the JSON but I am not sure how to make expand_task = True. Basically, I am only able to extract the start and end time of the particular job/run ID but I am looking for start and end time of the task ID’s under that particular job.
I have tried this code:
API_URL = '{instance}'
TOKEN = '{Token}'
API_VERSION = '/api/2.1'
API_CMD = '/jobs/runs/list'
REQ_HEADERS = {"Authorization": "Bearer " + TOKEN}
response = requests.get(url = API_URL + API_VERSION + API_CMD,
params = {'limit' : '1000', 'offset' : '0'},
headers = REQ_HEADERS)
response.status_code
Let me know, How I can view the Tasks ID inside a running Jobs or enable the expand_tasks.

How to dump all results of a API request when there is a page limit?

I am using an API to pull data from a url, however the API has a pagination limit. It goes like:
Page (default is 1 and it's the page number you want to retrieve)
Per_page (default is 100 and it's the maximum number of results returned in the response(max=500))
I have a script which I can get the results of a page or per page but I want to automate it. I want to be able to loop through all the pages or per_page(500) and load it in to a json file.
Here is my code that can get 500 results per_page:
import json, pprint
import requests
url = "https://my_api.com/v1/users?per_page=500"
header = {"Authorization": "Bearer <my_api_token>"}
s = requests.Session()
s.proxies = {"http": "<my_proxies>", "https": "<my_proxies>" }
resp = s.get(url, headers=header, verify=False)
raw=resp.json()
for x in raw:
print(x)
The output is 500 but is there a way to keep going and pull the results starting from where it left off? Or even go by page and get all the data per page until there's no data in a page?
It will be helpful, if you present a sample response from your API.
If the API is equipped properly, there will be a next property in a given response that leads you to the next page.
You can then keep calling the API with the link given in the next recursively. On the last page, there will be no next in the Link header.
resp.links["next"]["url"] will give you the URL to the next page.
For example, the GitHub API has next, last, first, and prev properties.
To put it into code, first, you need to turn your code into functions.
Given that there is a maximum of 500 results per page, it implies you are extracting a list of data of some sort from the API. Often, these data are returned in a list somewhere inside raw.
For now, let's assume you want to extract all elements inside a list at raw.get('data').
import requests
header = {"Authorization": "Bearer <my_api_token>"}
results_per_page = 500
def compose_url():
return (
"https://my_api.com/v1/users"
+ "?per_page="
+ str(results_per_page)
+ "&page_number="
+ "1"
)
def get_result(url=None):
if url_get is None:
url_get = compose_url()
else:
url_get = url
s = requests.Session()
s.proxies = {"http": "<my_proxies>", "https": "<my_proxies>"}
resp = s.get(url_get, headers=header, verify=False)
# You may also want to check the status code
if resp.status_code != 200:
raise Exception(resp.status_code)
raw = resp.json() # of type dict
data = raw.get("data") # of type list
if not "url" in resp.links.get("next"):
# We are at the last page, return data
return data
# Otherwise, recursively get results from the next url
return data + get_result(resp.links["next"]["url"]) # concat lists
def main():
# Driver function
data = get_result()
# Then you can print the data or save it to a file
if __name__ == "__main__":
# Now run the driver function
main()
However, if there isn't a proper Link header, I see 2 solutions:
(1) recursion and (2) loop.
I'll demonstrate recursion.
As you have mentioned, when there is pagination in API responses, i.e. when there is a limit of maximum number of results per page, there is often a query parameter called page number or start index of some sort to indicate which "page" you are querying, so we'll utilize the page_number parameter in the code.
The logic is:
Given a HTTP request response, if there is less than 500 results, it means there is no more pages. Return the results.
If there are 500 results in a given response, it means there's probably another page, so we advance page_number by 1 and do a recursion (by calling the function itself) and concatenate with previous results.
import requests
header = {"Authorization": "Bearer <my_api_token>"}
results_per_page = 500
def compose_url(results_per_page, current_page_number):
return (
"https://my_api.com/v1/users"
+ "?per_page="
+ str(results_per_page)
+ "&page_number="
+ str(current_page_number)
)
def get_result(current_page_number):
s = requests.Session()
s.proxies = {"http": "<my_proxies>", "https": "<my_proxies>"}
url = compose_url(results_per_page, current_page_number)
resp = s.get(url, headers=header, verify=False)
# You may also want to check the status code
if resp.status_code != 200:
raise Exception(resp.status_code)
raw = resp.json() # of type dict
data = raw.get("data") # of type list
# If the length of data is smaller than results_per_page (500 of them),
# that means there is no more pages
if len(data) < results_per_page:
return data
# Otherwise, advance the page number and do a recursion
return data + get_result(current_page_number + 1) # concat lists
def main():
# Driver function
data = get_result(1)
# Then you can print the data or save it to a file
if __name__ == "__main__":
# Now run the driver function
main()
If you truly want to store the raw responses, you can. However, you'll still need to check the number of results in a given response. The logic is similar. If a given raw contains 500 results, it means there is probably another page. We advance the page number by 1 and do a recursion.
Let's still assume raw.get('data') is the list whose length is the number of results.
Because JSON/dictionary files cannot be simply concatenated, you can store raw (which is a dictionary) of each page into a list of raws. You can then parse and synthesize the data in whatever way you want.
Use the following get_result function:
def get_result(current_page_number):
s = requests.Session()
s.proxies = {"http": "<my_proxies>", "https": "<my_proxies>"}
url = compose_url(results_per_page, current_page_number)
resp = s.get(url, headers=header, verify=False)
# You may also want to check the status code
if resp.status_code != 200:
raise Exception(resp.status_code)
raw = resp.json() # of type dict
data = raw.get("data") # of type list
if len(data) == results_per_page:
return [raw] + get_result(current_page_number + 1) # concat lists
return [raw] # convert raw into a list object on the fly
As for the loop method, the logic is similar to recursion. Essentially, you will call the get_result() function a number of times, collect the results, and break early when the furthest page contains less than 500 results.
If you know the total number of results in advance, you can simply the run the loop for a predetermined number of times.
Do you follow? Do you have any further questions?
(I'm a little confused by what you mean by "load it into a JSON file". Do you mean saving the final raw results into a JSON file? Or are you referring to the .json() method in resp.json()? In that case, you don't need import json to do resp.json(). The .json() method on resp is actually part of the requests module.
On a bonus point, you can make your HTTP requests asynchronous, but this is slightly beyond the scope of your original question.
P.s. I'm happy to learn what other solutions, perhaps more elegant ones, that people use.

cant send message with facebook graph api

in my django project
i have this function:
def mesaj_yolla():
fbid="my_facebook_id"
post_message_url = 'https://graph.facebook.com/v2.6/me/messages?access_token=<my_access_token>'
response_msg = json.dumps({"recipient":{"id":fbid}, "message":{"text":"hello"}})
status = requests.post(post_message_url, headers={"Content-Type": "application/json"},data=response_msg)
print(status)
it returns: <Response [400]>
what is wrong with these codes? i just want to send a message to user.
According to the API documentation, you should use recipients.id instead of recipients.user_id:
def mesaj_yolla():
fbid="my_facebook_id"
post_message_url = 'https://graph.facebook.com/v2.6/me/messages?access_token=<my_access_token>'
response_msg = json.dumps({"recipient":{"id":fbid}, "message":{"text":"hello"}})
status = requests.post(post_message_url, headers={"Content-Type": "application/json"},data=response_msg)
print(status)
That explains the HTTP 400 code (Bad request).

How do you update an MTurk worker qualification score with boto3?

The older MTurk API (and boto2) had an UpdateQualificationScore method that would allow users to update the score of a specific worker, but this seems to have disappeared in the latest version(s) based on boto3.
The latest MTurk API has a GetQualificationScore method (which actually returns a full worker qualification record, not just the score), but no corresponding UpdateQualificationScore method. What is the mechanism to update a score for an existing worker?
As best as I can tell, the proper way to do this with the boto3 is to use the AssociateQualificationWithWorker endpoint:
session = boto3.Session(profile_name='mturk')
client = session.client('mturk')
response = client.associate_qualification_with_worker(
QualificationTypeId=qualification_type_id,
WorkerId=worker_id,
IntegerValue=score,
SendNotification=False,
)
This seems to work, especially when taken alongside GetQualificationScore returning the "full" qualification record instead of just the score.
ex-nerd's answer is correct. Building off the Python sample available at http://requester.mturk.com/developer, the following works to assign a QualificationType then change the score for that Worker:
import boto3
region_name = 'us-east-1'
aws_access_key_id = 'YOUR_ACCESS_ID'
aws_secret_access_key = 'YOUR_SECRET_KEY'
endpoint_url = 'https://mturk-requester-sandbox.us-east-1.amazonaws.com'
# Uncomment this line to use in production
# endpoint_url = 'https://mturk-requester.us-east-1.amazonaws.com'
client = boto3.client(
'mturk',
endpoint_url=endpoint_url,
region_name=region_name,
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key,
)
# This will assign the QualificationType
client.associate_qualification_with_worker(QualificationTypeId = '3KIOU9ULHKIIS5OPUVORW7OE1070V0', WorkerId = 'A39ECJ12CY7TE9', IntegerValue = 100)
# This will set the QualificationScore from 100 to 90
client.associate_qualification_with_worker(QualificationTypeId = '3KIOU9ULHKIIS5OPUVORW7OE1070V0', WorkerId = 'A39ECJ12CY7TE9', IntegerValue = 90)

Accessing a request's body using classic ASP?

How do I access what has been posted by a client to my classic ASP server?
I know that there is the Request.Forms variable, but the client's request was not made using a Form.
The client request's body is just a string made using a standard POST statement.
Thanks
You need to read request bytes if content type of request sent by client is not form data. In this case, request is not a form-data that is accessible through name-value pairs so you cannot use Request.Form collection. I suggest investigate the BinaryRead method.
Reading posted data and convert into string :
If Request.TotalBytes > 0 Then
Dim lngBytesCount
lngBytesCount = Request.TotalBytes
Response.Write BytesToStr(Request.BinaryRead(lngBytesCount))
End If
Function BytesToStr(bytes)
Dim Stream
Set Stream = Server.CreateObject("Adodb.Stream")
Stream.Type = 1 'adTypeBinary
Stream.Open
Stream.Write bytes
Stream.Position = 0
Stream.Type = 2 'adTypeText
Stream.Charset = "iso-8859-1"
BytesToStr = Stream.ReadText
Stream.Close
Set Stream = Nothing
End Function
Hope it helps.
Update #1:
With using JScript
if(Request.TotalBytes > 0){
var lngBytesCount = Request.TotalBytes
Response.Write(BytesToStr(Request.BinaryRead(lngBytesCount)))
}
function BytesToStr(bytes){
var stream = Server.CreateObject("Adodb.Stream")
stream.type = 1
stream.open
stream.write(bytes)
stream.position = 0
stream.type = 2
stream.charset = "iso-8859-1"
var sOut = stream.readtext()
stream.close
return sOut
}
To get the JSON string value just use CStr(Request.Form)
Works a treat.
In Classic ASP, Request.Form is the collection used for any data sent via POST.
For the sake of completeness, I'll add that Request.QueryString is the collection used for any data sent via GET/the Query String.
I would guess based on the above that even though the client is not a web browser, the Request.Form collection should be populated.
note: all of this is assuming the data being sent is textual in nature, and that there are no binary uploads (e.g. pictures or files) being sent. Update your question body if this is an incorrect assumption.
To test, write out the raw form data and see what you have - something along the lines of:
Response.Write(Request.Form)
Which with a regular web page will output something like
field=value&field2=value2
If you get something along those lines, you could then use that as a reference for a proper index.
If you do not get something like that, update your question with what you tried and what you got.