How to send and receive large numpy arrays (several GBs) using flask - json

I am creating a micro-service to be used locally. From some input I am generating one large matrix each time. Right now I am using json to transfer the data but it is really slow and became the bottleneck of my application.
Here is my client side:
headers={'Content-Type': 'application/json'}
data = {'model': 'model_4', \
'input': "this is my input."}
r = requests.post("http://10.0.1.6:3000/api/getFeatureMatrix", headers=headers, data=json.dumps(data))
answer = json.loads(r.text)
My server is something like:
app = Flask(__name__, static_url_path='', static_folder='public')
#app.route('/api/getFeatureMatrix', methods = ['POST'])
def get_feature_matrix():
arguments = request.get_json()
#processing ... generating matrix
return jsonify(matrix=matrix.tolist())
How can I send large matrices ?

In the end I ended up using
np.save(matrix_path, mat)
return send_file(matrix_path+'.npy')
On the client side I save the matrix before loading it.

I suppose that the problem is that the matrix takes time to generate. It's a CPU bound application
One solution would be to handle the request asynchronously. Meaning that:
The server receives request and returns a 202 ACCEPTED and the link to where the client can check the progress of the creation of the matrix
The client checks the returned url he either gets:
a 200 OK response if the matrix is not yet created
a 201 CREATED response if the matrix is finally created, with a link to the resource
However, Flask handles one request at a time. So you'll need to use multithreading or multiprocessing or greenthreads.

On the client side you could do something like:
with open('binariy.file', 'rb') as f:
file = f.read()
response = requests.post('/endpoint', data=file)
and on the Server side:
import numpy as np
...
#app.route('/endpoint', methods=['POST'])
def endpoint():
filestr = request.data
file = np.fromstring(filestr)

Related

Maximo Integration With External JSON API

I have found various methods of invoking a JSON/REST API call from Maximo to an external system on the web, but none of them have matched exactly what I'm looking for and all of them seem to use different methods, which is causing me a lot of confusion because I am EXTREMELY rusty at jython coding. So, hopefully, you all can help me out. Please be as detailed as possible: script language, script type (for integration, object launch point, publish channel process/user exit class, etc).
My Maximo Environment is 7.6.1.1. I have created the following...
Maximo Object Structure (MXSRLEAK): with 3 objects (SR, TKSERVICEADDRESS, and WORKLOG)
Maximo Publish Channel (MXSRLEAK-PC): uses the MXSRLEAK OS and contains my processing rules to skip records if they don't meet the criteria (SITEID = 'TWOO', TEMPLATEID IN ('LEAK','LEAKH','LEAKW'), HISTORYFLAG = 0)
Maximo End Point (LEAKTIX): HTTP Handler, HEADERS ("Content-Type:application/json"), HTTPMETHOD ("POST"), URL (https:///api/ticket/?ticketid=), USERNAME ("Maximo"), and PASSWORD (). The Allow Override is checked for HEADERS, HTTPMETHOD, and URL.
At this point, I need an automation script to:
Limit the Maximo attributes that I'm sending. This will vary depending on what happens on the Maximo side. If an externally created (SOURCE = LEAKREP, EXTERNALRECID IS NOT NULL) service request ticket gets cancelled, I need to send the last worklog with logtype = "CANCOMM" (Both summary/description and details/description_longdescription) as well as the USERID that changed status. If an externally created SR ticket gets closed, I need to send the last worklog with logtype <> "CANCOMM". If the externally created SR ticket was a duplicate, I need to also include a custom field called "DUPLICATE" (which uses a table domain to show all open SR's with similar TEMPLATEID's in the UI). If a "LEAK" SR ticket originated in Maximo (doesn't have a SOURCE or EXTERNALRECID), then I need to send data from the SR (ex. DESCRIPTION, REPORTDATE, REPORTEDBY, etc), TKSERVICEADDRESS (FORMATTEDADDRESS,etc), and WORKLOG (DESCRIPTION, LONGDESCRIPTION if they exist) objects to the external system and parse the response to update SOURCE and EXTERNALRECID.
Update Maximo End Point values for API call: HTTPMETHOD to "POST" or "PATCH", Add HEADERS (Authorization: Basic Base64Userid/Password), etc.
Below is my latest attempt with an automation script, which doesn't work because the "mbo is not defined" (I'm sure there are more problems with it but it fails early on in script). The script was created for integration, with a publish channel (MXSRLEAK-PC) using the External Exit option in Jython. I was trying to start with just one scenario where the Maximo SR ticket was originally created via an API call from the external system into Maximo and was actually a duplicate of another Maximo SR ticket. My thought was if I got this part correct, I could update the script to include the other scenarios, such as if the SR ticket originated in Maximo and needed to POST a new record to external system.
My final question is, is it better (easier for future eyes to understand) to have one Object Structure, Publish Channel, End Point, and Automation Script to handle all scenarios or to create separate ones for each scenario?
from com.ibm.json.java import JSONObject
from java.io import BufferedReader, IOException, InputStreamReader
from java.lang import System, Class, String, StringBuffer
from java.nio.charset import Charset
from java.util import Date, Properties, List, ArrayList, HashMap
from org.apache.commons.codec.binary import Base64
from org.apache.http import HttpEntity, HttpHeaders, HttpResponse, HttpVersion
from org.apache.http.client import ClientProtocolException, HttpClient
from org.apache.http.client.entity import UrlEncodedFormEntity
from org.apache.http.client.methods import HttpPost
from org.apache.http.entity import StringEntity
from org.apache.http.impl.client import DefaultHttpClient
from org.apache.http.message import BasicNameValuePair
from org.apache.http.params import BasicHttpParams, HttpParams, HttpProtocolParamBean
from psdi.mbo import Mbo, MboRemote, MboSet, MboSetRemote
from psdi.security import UserInfo
from psdi.server import MXServer
from psdi.iface.router import Router
from sys import *
leakid = mbo.getString("EXTERNALRECID")
#Attempting to pull current SR worklog using object relationship and attribute
maxlog = mbo.getString("DUPWORKLOG.description")
maxloglong = mbo.getString("DUPWORKLOG.description_longdescription")
clientEndpoint = Router.getHandler("LEAKTIX")
cEmap = HashMap()
host = cEmap.get("URL")+leakid
method = cEmap.get("HTTPMETHOD")
currhead = cEmap.get("HEADERS")
tixuser = cEmap.get("USERNAME")
tixpass = cEmap.get("PASSWORD")
auth = tixuser + ":" + tixpass
authHeader = String(Base64.encodeBase64(String.getBytes(auth, 'ISO-8859-1')),"UTF-8")
def createJSONstring():
jsonStr = ""
obj = JSONObject()
obj.put("status_code", "1")
obj.put("solution", "DUPLICATE TICKET")
obj.put("solution_notes", maxlog+" "+maxloglong)
jsonStr = obj.serialize(True)
return jsonStr
def httpPost(path, jsonstring):
params = BasicHttpParams()
paramsBean = HttpProtocolParamBean(params)
paramsBean.setVersion(HttpVersion.HTTP_1_1)
paramsBean.setContentCharset("UTF-8")
paramsBean.setUseExpectContinue(True)
entity = StringEntity(jsonstring, "UTF-8")
client = DefaultHttpClient()
request = HttpPost(host)
request.setParams(params)
#request.addHeader(currhead)
request.addHeader(HttpHeaders.CONTENT_TYPE, "application/json")
request.addHeader(HttpHeaders.AUTHORIZATION, "Basic "+authHeader)
request.setEntity(entity)
response = client.execute(request)
status = response.getStatusLine().getStatusCode()
obj = JSONObject.parse(response.getEntity().getContent())
System.out.println(str(status)+": "+str(obj))
Sorry for the late response. Ideally, the external exit script doesn’t have mbo as its instance. Instead, it uses irdata after breaking the structure data. Then another manipulation comes.
What I understood is that you need to post the payload dynamically based on some conditions in Maximo. For that, I think you can write a custom handler which will be called during the post.

Azure Message Routing: JSON message in wrong format

I'm working with a raspberry pi zero and Python to send and recieve sensor data with Azure IoT. I've already created an endpoint and message routing to the storage container. But when I check the JSON-Files in the container, I've got two problems:
The file include various general data which i don't need
My message body is in Base24-format
My message looks like this:
{"EnqueuedTimeUtc":"2021-06-25T13:03:25.7110000Z","Properties":{},"SystemProperties":{"connectionDeviceId":"RaspberryPi","connectionAuthMethod":"{"scope":"device","type":"sas","issuer":"iothub","acceptingIpFilterRule":null}","connectionDeviceGenerationId":"637555519600003402","enqueuedTime":"2021-06-25T13:03:25.7110000Z"},"Body":"eyJ0ZW1wZXJhdHVyZSI6IDI4Ljk1LCAicHJlc3N1cmUiOiA5ODEuMDg2Njk1NDU5MzMyNiwgImh1bWlkaXR5IjogNDYuMjE0ODE3NjkyOTEyODgsICJ0aW1lIjogIjIwMjEtMDYtMjUgMTQ6MDM6MjUuNjMxNzk1In0="}
The body included my sensor data in Base64-format. I've already read about contentType = application/JSON and contentEncoding = UTF-8 so that Azure can work with correct JSON files. But where do i apply these settings? When I apply it to the routing query, I get the following error:
Routing Query Error (The server didn't understand your query. Check your query syntax and try again)
I just want to get the body-message in correct JSON Format.
Thank you all for any kind of help! Since it's my first experience with this kind of stuff, I'm a little helpless.
Zero clue if this helps, but here is my code for sending data from Raspberry Pi Python to AWS - Parse Server using base64/JSON. The only reason I use base64 is to send pictures. You should only have to use JSON to send your other data.
import requests
import random, time
import math
import json
import Adafruit_DHT
import base64
from Adafruit_CCS811 import Adafruit_CCS811
from picamera import PiCamera
from time import sleep
DHT_SENSOR = Adafruit_DHT.DHT22
DHT_PIN =4
ccs = Adafruit_CCS811()
camera = PiCamera()
while True:
time.sleep(5)
camera.start_preview()
sleep(5)
camera.capture('/home/pi/Desktop/image.jpg')
camera.stop_preview()
with open('/home/pi/Desktop/image.jpg', 'rb') as binary_file:
binary_file_data = binary_file.read()
base64_encoded_data = base64.b64encode(binary_file_data)
base64_message = base64_encoded_data.decode('utf-8')
humidity, temperature = Adafruit_DHT.read_retry(DHT_SENSOR, DHT_PIN)
ccs.readData()
parseServer = {
"temp": temperature,
"humid": humidity,
"co2": ccs.geteCO2(),
"pic": base64_message
}
resultJSON = json.dumps(parseServer)
headers = {
'X-Parse-Application-Id': 'myappID',
'Content-Type': 'application/json',
}
data = resultJSON
response =
requests.put('http://1.11.111.1111/parse/classes/Gamefuck/TIuRnws3Ag',
headers=headers, data=data)
print(data)
If you're using the Python SDK for Azure IoT, sending the message as UTF-8 encoded JSON is as easy as setting two properties on your message object. There is a good example here
msg.content_encoding = "utf-8"
msg.content_type = "application/json"
Furthermore, you don't need to change anything in IoT Hub for this. This message setting is a prerequisite to be able to do message routing based on the body of the message.

Import csv file in drf

I'm trying to create a view to import a csv using drf and django-import-export.
My example (I'm doing baby steps and debugging to learn):
class ImportMyExampleView(APIView):
parser_classes = (FileUploadParser, )
def post(self, request, filename, format=None):
person_resource = PersonResource()
dataset = Dataset()
new_persons = request.data['file']
imported_data = dataset.load(new_persons.read())
return Response("Ok - Babysteps")
But I get this error (using postman):
Tablib has no format 'None' or it is not registered.
Changing to imported_data = Dataset().load(new_persons.read().decode(), format='csv', headers=False) I get this new error:
InvalidDimensions at /v1/myupload/test_import.csv
No exception message supplied
Does anyone have any tips or can indicate a reference? I'm following this site, but I'm having to "translate" to drf.
Starting with baby steps is a great idea. I would suggest get a standalone script working first so that you can check the file can be read and imported.
If you can set breakpoints and step into the django-import-export source, this will save you a lot of time in understanding what's going on.
A sample test function (based on the example app):
def test_import():
with open('./books-sample.csv', 'r') as fh:
dataset = Dataset().load(fh)
book_resource = BookResource()
result = book_resource.import_data(dataset, raise_errors=True)
print(result.totals)
You can adapt this so that you import your own data. Once this works OK then you can integrate it with your post() function.
I recommend getting the example app running because it will demonstrate how imports work.
InvalidDimensions means that the dataset you're trying to load doesn't match the format expected by Dataset. Try removing the headers=False arg or explicitly declare the headers (headers=['h1', 'h2', 'h3'] - swap in the correct names for your headers).

YouTube analytics API rows empty

I know this question has been answered before, but I seem to have a different problem. Up until a few days ago, my querying of YouTube never had a problem. Now, however, every time I query data on any video the rows of actual video data come back as a single empty array.
Here is my code in full:
# -*- coding: utf-8 -*-
import os
import google.oauth2.credentials
import google_auth_oauthlib.flow
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from google_auth_oauthlib.flow import InstalledAppFlow
import pandas as pd
import csv
SCOPES = ['https://www.googleapis.com/auth/yt-analytics.readonly']
API_SERVICE_NAME = 'youtubeAnalytics'
API_VERSION = 'v2'
CLIENT_SECRETS_FILE = 'CLIENT_SECRET_FILE.json'
def get_service():
flow = InstalledAppFlow.from_client_secrets_file(CLIENT_SECRETS_FILE, SCOPES)
credentials = flow.run_console()
#builds api-specific service
return build(API_SERVICE_NAME, API_VERSION, credentials = credentials)
def execute_api_request(client_library_function, **kwargs):
response = client_library_function(
**kwargs
).execute()
print(response)
columnHeaders = []
# create a CSV output for video list
csvFile = open('video_result.csv','w')
csvWriter = csv.writer(csvFile)
csvWriter.writerow(["views","comments","likes","estimatedMinutesWatched","averageViewDuration"])
if __name__ == '__main__':
# Disable OAuthlib's HTTPs verification when running locally.
# *DO NOT* leave this option enabled when running in production.
os.environ['OAUTHLIB_INSECURE_TRANSPORT'] = '1'
youtubeAnalytics = get_service()
execute_api_request(
youtubeAnalytics.reports().query,
ids='channel==UCU_N4jDOub9J8splDAPiMWA',
#needs to be of form YYYY-MM-DD
startDate='2018-01-01',
endDate='2018-05-01',
metrics='views,comments,likes,dislikes,estimatedMinutesWatched,averageViewDuration',
dimensions='day',
filters='video==ZeY6BKqIZGk,YKFWUX9w4eY,bDPdrWS-YUc'
)
You can see in the Reports: Query front page that you need to use the new scope:
https://www.googleapis.com/auth/youtube.readonly
instead of the old one:
https://www.googleapis.com/auth/yt-analytics.readonly
After changing the scope, perform a re-authentication (delete the old credentials) for the new scope to take effect.
This is also confirmed in this forum.
One of the mishaps may come if you chose wrong account/accounts during oAuth2 authorisation. For instance you may have to get "account" on the firs screen but then on second screen (during authorisation) use "brand account" and not the main account from the first step that also is on a list for second step.
I got the same problem and replacing with https://www.googleapis.com/auth/youtube.readonly scope doesn't work.
(Even making requests in the API webpage, it returns empty rows.)
Instead, using the https://www.googleapis.com/auth/youtube scope works fine in my case.

Using R's Plumber - create GET endpoint to host CSV formatted data rather than JSON

I think this is a good quick demo of R's plumber library in general, but mainly I'm struggling to serve data in a csv format
I am working with R's plumber package to host an API endpoint for some sports data of mine. Currently I have some data that grabs win totals for MLB baseball teams that I'm trying to serve. Using plumber, I have the following 2 scripts set up:
setupAPI.R: sets up my API with two GET endpoints:
library(plumber)
library(jsonlite)
# load in some test sports data to host
mydata = structure(list(Team = structure(c(8L, 20L, 7L, 28L, 2L, 30L,
23L, 1L, 6L, 19L), .Label = c("Angels", "Astros", "Athletics",
"Blue Jays", "Braves", "Brewers", "Cardinals", "Cubs", "Diamondbacks",
"Dodgers", "Giants", "Indians", "Mariners", "Marlins", "Mets",
"Nationals", "Orioles", "Padres", "Phillies", "Pirates", "Rangers",
"Rays", "Red Sox", "Reds", "Rockies", "Royals", "Tigers", "Twins",
"White Sox", "Yankees"), class = "factor"), GamesPlayed = c(162L,
162L, 162L, 162L, 162L, 162L, 162L, 162L, 162L, 162L), CurrentWins = c(92L,
75L, 83L, 85L, 101L, 91L, 93L, 80L, 86L, 66L)), .Names = c("Team",
"GamesPlayed", "CurrentWins"), row.names = c(NA, 10L), class = "data.frame")
# create a GET request for shareprices (in JSON format)
#* #get /shareprices_json
getSPs <- function(){
return(toJSON(mydata))
}
# create a GET request for MLB shareprices (in CSV format)
#* #get /shareprices_csv
csvSPs <- function(){
return(mydata)
}
# run both functions (i think needed for the endpoints to work)
getSPs()
csvSPs()
RunAPI.R: plumb's setupAPI.R, gets the endpoints hosted locally
library(plumber)
r <- plumb("setupAPI.R")
r$run(port=8000)
.
.
.
After I've run the RunAPI.R code in my console, when I go to the endpoints, my http://127.0.0.1:8000/shareprices_csv endpoint is clearly returning a JSON object, and my http://127.0.0.1:8000/shareprices_json endpoint is seemingly oddly returning an JSON of length 1, with a JSON in a string as the sole element in the returned JSON.
In short, I can see now that I should simply return the dataframe, and not toJSON(the dataframe), to have the endpoint host JSON formatted data, however I still do not know how to serve this data in CSV format. Is this possible in plumber? What should the return statement look like in the functions in setupAPI.R? Any help is appreciated!!
There are two tricks you need here:
You can bypass serialization on an endpoint by returning the response object directly. More docs here
You can specify the body of the response by mutating res$body.
You can combine these two ideas to create an endpoint like:
#' #get /data.csv
function(res) {
con <- textConnection("val","w")
write.csv(iris, con)
close(con)
res$body <- paste(val, collapse="\n")
res
}
Note that plumber does some nice things for you for free like setting the appropriate HTTP headers for your JSON responses. If you're sending a response yourself, you're on your own for all that, so you'll need to make sure that you set the appropriate headers to teach your API clients how they should interpret this response.
Just posting this answers if helps anyone!
The response from Jeff works perfectly, but becomes very slow when you have to return a big CSV file. I had problems getting stuck with a 22 MB file.
A faster solution, if you previously write the CSV on disk, is to use include_file function (docs here):
As an example:
#* #get /iris_csv
getIrisCsv <- function(req, res) {
filename <- file.path(tempdir(), "iris.csv")
write.csv(iris, filename, row.names = FALSE)
include_file(filename, res, "text/csv")
}
So, it depends on your use case:
If you're returning a small csv and you don't want to write it to disk: use Jeff's solution
If your CSV is medium or big (> 2MB) or you already have it on disk: use include_file solution
Hope it helps!