Using GraphQL machinery, but return CSV - csv

A normal REST API might let you request the same data in different formats, with a different Accept header, e.g. application/json, or text/html, or a text/csv formatted response.
However, if you're using GraphQL, it seems that JSON is the only acceptable return content type. However, I need my API to be able to return CSV data for consumption by less sophisticated clients that won't understand JSON.
Does it make sense for a GraphQL endpoint to return CSV data if given an Accept: text/csv header? If not, is there a better practise way to do this?
This is more of a conceptual question, but I'm specifically using Graphene to implement my API. Does it provide any mechanism for handling custom content types?

Yes, you can, but it's not built in and you have to override some things. It's more like a work around.
Take these steps and you will get csv output:
Add csv = graphene.String() to your queries and resolve it to whatever you want.
Create a new class inheriting GraphQLView
Override dispatch function to look like this:
def dispatch(self, request, *args, **kwargs):
response = super(CustomGraphqlView, self).dispatch(request, *args, **kwargs)
try:
data = json.loads(response.content.decode('utf-8'))
if 'csv' in data['data']:
data['data'].pop('csv')
if len(list(data['data'].keys())) == 1:
model = list(data['data'].keys())[0]
else:
raise GraphQLError("can not export to csv")
data = pd.json_normalize(data['data'][model])
response = HttpResponse(content_type='text/csv')
response['Content-Disposition'] = 'attachment; filename="output.csv"'
writer = csv.writer(response)
writer.writerow(data.columns)
for value in data.values:
writer.writerow(value)
except GraphQLError as e:
raise e
except Exception:
pass
return response
Import all necessary modules
Replace the default GraphQLView in your urls.py file with your new view class.
Now if you include "csv" in your GraphQL query, it will return raw csv data and then you can save the data into a csv file in your front-end. A sample query is like:
query{
items{
id
name
price
category{
name
}
}
csv
}
Remember that it is a way to get raw data in csv format and you have to save it. You can do that in JavaScript with the following code:
req.then(data => {
let element = document.createElement('a');
element.setAttribute('href', 'data:text/csv;charset=utf-8,' + encodeURIComponent(data.data));
element.setAttribute('download', 'output.csv');
element.style.display = 'none';
document.body.appendChild(element);
element.click();
document.body.removeChild(element);
})
This approach flattens the JSON data so no data is lost.

I have to implement the functionality of exporting list query into a CSV file. Here is how I implement extending #Sina method.
my graphql query for retriving list of users (with limit pagination) is
query userCsv{
userCsv{
csv
totalCount
results(limit: 50, offset: 50){
id
username
email
userType
}
}
}
Make CustomGraphQLView view by inheriting from GraphQLView and overide dispatch function to see if query has a csv also make sure you update graphql url pointing to this custom GraphQLView.
class CustomGraphQLView(GraphQLView):
def dispatch(self, request, *args, **kwargs):
try:
query_data = super().parse_body(request)
operation_name = query_data["operationName"]
except:
operation_name = None
response = super().dispatch(request, *args, **kwargs)
csv_made = False
try:
data = json.loads(response.content.decode('utf-8'))
try:
csv_query = data['data'][f"{operation_name}"]['csv']
csv_query = True
except:
csv_query = None
if csv_query:
csv_path = f"{settings.MEDIA_ROOT}/csv_{datetime.now()}.csv"
results = data['data'][f"{operation_name}"]['results']
# header = results[0].keys()
results = json_normalize(results)
results.to_csv(csv_path, index=False)
data['data'][f"{operation_name}"]['csv'] = csv_path
csv_made = True
except GraphQLError as e:
raise e
except Exception:
pass
if csv_made:
return HttpResponse(
status=200, content=json.dumps(data), content_type="application/json"
)
return response
Operation name is the query name by which you are calling. In previous example given it is userCsv and it is required because the final result as a response came with this key. Response obtained is django http response object. using above operation name we check if csv is present in the query if not present return response as it is but if csv is present then extract query results and make a csv file and store it and attach its path in response.
Here is the graphql schema for the query
class UserListCsvType(DjangoListObjectType):
csv = graphene.String()
class Meta:
model = User
pagination = LimitOffsetGraphqlPagination(default_limit=25, ordering="-id")
class DjangoListObjectFieldUserCsv(DjangoListObjectField):
#login_required
def list_resolver(self, manager, filterset_class, filtering_args, root, info, **kwargs):
return super().list_resolver(manager, filterset_class, filtering_args, root, info, **kwargs)
class Query(graphene.ObjectType):
user_csv = DjangoListObjectFieldUserCsv(UserListCsvType)
Here is the sample response
{
"data": {
"userCsv": {
"csv": "/home/shishir/Desktop/sample-project/media/csv_2021-11-22 15:01:11.197428.csv",
"totalCount": 101,
"results": [
{
"id": "51",
"username": "kathryn",
"email": "candaceallison#gmail.com",
"userType": "GUEST"
},
{
"id": "50",
"username": "bridget",
"email": "hsmith#hotmail.com",
"userType": "GUEST"
},
{
"id": "49",
"username": "april",
"email": "hoffmanzoe#yahoo.com",
"userType": "GUEST"
},
{
"id": "48",
"username": "antonio",
"email": "laurahall#hotmail.com",
"userType": "PARTNER"
}
]
}
}
}
PS: Data generated above are from faker library and I'm using graphene-django-extras and json_normalize is from pandas. CSV file can be download from the path obtained in response.

GraphQL relies on (and shines because of) responding nested data. To my understanding CSV can only display flat key value pairs. This makes CSV not really suitable for GraphQL responses.
I think the cleanest way to achieve what you want to do would be to put a GraphQL client in front of your clients:
+------+ csv +-------+ http/json +------+
|client|<----->|adapter|<----------->|server|
+------+ +-------+ +------+
The good thing here is that your adapter would only have to be able to translate the queries it specifies to CSV.
Obviously you might not always be able to do so (but how are you making them send GraphQL queries then). Alternatively you could build a middleware that translates JSON to CSV. But then you have to deal with the whole GraphQL specification. Good luck translating this response:
{
"__typename": "Query",
"someUnion": [
{ "__typename": "UnionA", "numberField": 1, "nested": [1, 2, 3, 4] },
{ "__typename": "UnionB", "stringField": "str" },
],
"otherField": 123.34
}
So if you can't get around having CSV transported over HTTP GraphQL is simply the wrong choice because it was not built for that. And if you disallow those GraphQL features that are hard to translate to CSV you don't have GraphQL anymore so there is no point in calling it GraphQL then.

Related

How to convert a fetch return object to csv from a fetch API in C3.ai COVID-19 datalake?

I am making some fetch API calls to the C3.ai COVID-19 datalake. How best can I convert that to a csv for easier reading? For reference, I am running the sample code below:
import requests, json
url = "https://api.c3.ai/covid/api/1/outbreaklocation/fetch/"
request_data = {
"spec": {
"include": "id,name,population2018",
"limit": 500
}
}
headers = {
"Accept": "application/json",
"Content-Type": "application/json"
}
response = requests.post(url=url, json=request_data, headers=headers)
fetch_object = json.loads(response.text)
fetch_object is now a python dict. But I would like to convert it to a csv. How do I do that generically? I could fetch one or more fields, as specified in the include field in the spec argument.
def convert_fetchResult_to_Pandas(fetch_object, required_fields):
fetch_objs = fetch_result["objs"]
df = pd.read_json(json.dumps(fetch_objs))
return df[required_fields]
One can then call:
df = convert_fetchResult_to_Pandas(fetch_object, ["id,name,population2018"])
csv_string = df.to_csv()
Using pd.json_normalize(fetch_object['objs']) instead of pd.read_json(json.dumps(fetch_object['objs'])) may be worth considering, too, depending on what you're after. It'll flatten any nested dicts in the case objects and separate the variable levels in the column names with dots.
Consider trying the open source repo c3covid19. You can find the docs here. It is a non-official c3 covid19 data lake connection wrapper for python.
Install
pip install c3covid19
Run
from c3covid19 import c3api
cnx=c3api()
request_data = {
"spec": {
"include": "id,name,population2018",
"limit": 500
}
}
output=cnx.request(
data_type='outbreaklocation',
parameters=request_data,
api='fetch',
output_type='csv',
outfile='./output'
)

How to take info from sub-object in request before validating in Django?

I'm writing an api in Django for which I use the django-rest-framework. I've got a simple model as follows:
class PeopleCounter(models.Model):
version = models.CharField(max_length=10)
timestamp = models.DateTimeField(db_index=True)
sensor = models.CharField(max_length=10)
count = models.IntegerField()
And I've got a serializer as follows:
class PeopleCounterSerializer(serializers.HyperlinkedModelSerializer):
class Meta:
model = PeopleCounter
fields = [
'version',
'timestamp',
'sensor',
'count',
]
When I post the following data to this endpoint it works great:
{
"version": "v1",
"timestamp": "2019-04-01T20:00:00.312",
"sensor": "sensorA",
"count": 4
}
but unfortunately I need to adjust the endpoint for the data to arrive as follows:
{
"version": "v1",
"timestamp": "2019-04-01T20:00:00.312",
"data": {
"sensor": "sensorA",
"count": 4
}
}
I thought I needed to add a create method to the serializer class. So I tried that, but when I post the json with the "data" object I get a message that the sensor field and the count field are required.
Where can I normalize this data so that I can insert it in the database correctly?
Also, what if I want to serve the data through the same endpoint like this as well, where would I be able to define that?
One of possible ways is implement it on view level. If you are using CBV override get_serializer something like this:
def get_serializer(self, *args, **kwargs):
request_body = kwargs.get("data") # obtain request body
data = request_body.get("data") # get data
request_body.update(data) # add data as request_body attributes
kwargs["data"] = request_body # override received request_body with updated one
serializer_class = self.get_serializer_class()
kwargs['context'] = self.get_serializer_context()
return serializer_class(*args, **kwargs)

How to write DRF serializer to handle JSON wrappers enveloping data fields defined in my models?

I am writing a REST API using Django Rest Framework and need to know how to write a serializer to handle this json request
{
"user_form": {
"fields": [
{"email": "tom.finet#hotmail.co.uk"},
{"password": "password"},
{"profile": {
"username": "Tom Finet",
"bio": "I like running, a lot.",
"location": "Switzerland"
}}
]
}
}
Models exist for both User and Profile objects, therefore I am using a ModelSerializer to make serialization easier. However, the relevant user and profile data is wrapped in a user_form and fields envelop. Therefore, when I make a POST request to create a user the server spits back status code 400 with a BadRequest.
Here are the User and Profile serializers
class ProfileSerializer(serializers.ModelSerializer):
class Meta:
model = Profile
fields = '__all__'
class UserSerializer(serializers.ModelSerializer):
profile = ProfileSerializer()
class Meta:
model = User
fields = ('email', 'password', 'profile')
def create(self, validated_data):
email_data = validated_data.pop('email')
password_data = validated_data.pop('password')
created, user = User.objects.get_or_create_user(
email=email_data,
password=password_data
)
return user
Here is the api create view:
def create(self, request):
user_serializer = UserSerializer(data=request.data)
if user_serializer.is_valid(raise_exception=True):
user_serializer.save()
return Response(
user_serializer.data,
status=status.HTTP_201_CREATED
)
What I want to happen is for the serializers to create a user from the json request specified, but I am unaware of how to go about handling the envelops wrapping the user and profile data.
Following from my comment,
Consider modifying your post payload (client side) as follows:
{
"email": "tom.finet#hotmail.co.uk",
"password": "password",
"profile": {
"username": "Tom Finet",
"bio": "I like running, a lot.",
"location": "Switzerland"
}
}
Following this your current serializer classes should suffice.
If it's not possible to change your post payload on the client, consider extrapolating it using the following comprehension to instantiate your serializer manually within your view:
serializer = UserSerializer(data={
k: v
for d in request.data.get('user_form').get('fields')
for k, v in d.items()
})
if not serializer.is_valid():
# handle invalid serializer error
pass
# save the new model
serializer.save()

Get rid of Mongo $ signs in JSON

I am building python backend for SPA (Angular) using MongoDB.
Here is what I use: Python 3.4, MongoDB 3, Flask, flask-mongoengine and flask-restful
Now I receive the following JSON from my backend:
[
{
"_id": {
"$oid": "55c737029380f82fbf52eec3"
},
"created_at": {
"$date": 1439129906376
},
"desc": "Description.....",
"title": "This is title"
},
etc...
]
And I want to receive something like that:
[
{
"_id": "55c737029380f82fbf52eec3",
"created_at": 1439129906376,
"desc": "Description.....",
"title": "This is title"
},
etc...
]
My code for now:
from flask import json
from vinnie import app
from flask_restful import Resource, Api
from vinnie.models.movie import Movie
api = Api(app)
class Movies(Resource):
def get(self):
movies = json.loads(Movie.objects().all().to_json())
return movies
api.add_resource(Movies, '/movies')
Model:
import datetime
from vinnie import db
class Movie(db.Document):
created_at = db.DateTimeField(default=datetime.datetime.now, required=True)
title = db.StringField(max_length=255, required=True)
desc = db.StringField(required=True)
def __unicode__(self):
return self.title
What is the best way to format convenient JSON for front-end?
If you are confident you want to get rid of all the similar cases, then you can certainly write code that matches that pattern. For example:
info = [
{
"_id": {
"$oid": "55c737029380f82fbf52eec3"
},
"created_at": {
"$date": 1439129906376
},
"desc": "Description.....",
"title": "This is title"
},
#etc...
]
def fix_array(info):
''' Change out dict items in the following case:
- dict value is another dict
- the sub-dictionary only has one entry
- the key in the subdictionary starts with '$'
In this specific case, one level of indirection
is removed, and the dict value is replaced with
the sub-dict value.
'''
for item in info:
for key, value in item.items():
if not isinstance(value, dict) or len(value) != 1:
continue
(subkey, subvalue), = value.items()
if not subkey.startswith('$'):
continue
item[key] = subvalue
fix_array(info)
print(info)
This will return this:
[{'title': 'This is title', 'created_at': 1439129906376, 'desc': 'Description.....', '_id': '55c737029380f82fbf52eec3'}]
Obviously, reformatting that with JSON is trivial.
I found a neat solution to my problem in flask-restful extension which I use.
It provides fields module.
Flask-RESTful provides an easy way to control what data you actually render in your response. With the fields module, you can use whatever objects (ORM models/custom classes/etc.) you want in your resource. fields also lets you format and filter the response so you don’t have to worry about exposing internal data structures.
It’s also very clear when looking at your code what data will be rendered and how it will be formatted.
Example:
from flask_restful import Resource, fields, marshal_with
resource_fields = {
'name': fields.String,
'address': fields.String,
'date_updated': fields.DateTime(dt_format='rfc822'),
}
class Todo(Resource):
#marshal_with(resource_fields, envelope='resource')
def get(self, **kwargs):
return db_get_todo() # Some function that queries the db
Flask-RESTful Output Fields Documentation

Bootstrap Backbone.js models from django model with TastyPie

I am using Django + TastyPie + Backbone.js. I am trying to figure out how to bootstrap the initial data for the models on the first request, instead of fetching them after the initial page load as recommended here: http://backbonejs.org/#FAQ-bootstrap
There are two issues--one is trying to load a single Django model and the other is trying to serialize a queryset.
As an example, I have a variable called user in the context which is a Django User and represents the current user who is logged in.
I do not want to do this:
var curUser = new App.Models.User({id: 1});
curUser.fetch();
and cause another request to the server, since I already have the user model loaded.
I want to boostrap this data into a Backbone model like:
var curUser = new App.Models.User({{user|json}});
(similar to How to load bootstrapped models of backbone in Django, but I do not want to do a special case on each view converting everything to json)
where I have created a custom template filter to convert to json
def json(object):
"""Return json string for object or queryset.
"""
if isinstance(object, QuerySet):
return mark_safe(serialize('json', object))
if isinstance(object, Model):
object = object.to_dict()
return mark_safe(simplejson.dumps(object))
register.filter('json', json)
The issue is if I seralize a django model, I get something that looks like this:
[{"pk": 1, "model": "auth.user", "fields": {"username": "user#gmail.com", "first_name": "Jeremy", "last_name": "Keeshin", "is_active": true, "is_superuser": false, "is_staff": false, "last_login": "2013-07-15T22:31:02", "groups": [], "user_permissions": [], "password": "passwordhash", "email": "user#gmail.com", "date_joined": "2012-06-14T00:59:18"}}]'
What I really want is the json representation to match up with the api that I've defined using TastyPie:
class UserResource(ModelResource):
"""A resource for the User model."""
class Meta:
queryset = User.objects.all()
resource_name = 'user'
authorization = Authorization()
fields = ['username', 'first_name', 'last_name', 'id']
filtering = {
'username': ALL,
'email': ALL,
'first_name': ALL,
'last_name': ALL
}
Here I only get a few fields passed back, not all of them when I serialize the model. I know Django lets you serialize only certain fields, but those fields are set on a per model basis, and I wouldn't want to include that call on every view.
I see a similar answer here Django Serialize Queryset to JSON to construct RESTful response with only field information and id, but this requires writing this call on every view.
My current solution is adding a method to_dict monkeypatched onto the User model
def to_dict(self):
"""Return a subset of fields as a dictionary."""
return {
'id': self.id,
'first_name': self.first_name,
'last_name': self.last_name,
'email': self.email
}
because this is easy to serialize. Additionally a django model cannot be serialized (not in a list of one) by itself, only querysets can be serialized.
I imagine lots of people are figuring out good ways to bootstrap their django model data into backbone models upon the initial page load (especially when working with TastyPie), but I haven't figured out a reasonable way to do this.
With querysets, there is additionally much more django info passed along, and I am trying to figure out a way to have the serialization match the output from the TastyPie api.
Does anyone have best practices to boostrap django models and querysets into backbone models?
Update:
I have changed my json filter to use TastyPie bundles like
from core.api import UserResource
from core.api import UserProfileResource
from tastypie.serializers import Serializer
def json(object):
"""Return json string for object or queryset."""
TYPE_TO_RESOURCE = {
'User': UserResource,
'UserProfile': UserProfileResource
}
Resource = TYPE_TO_RESOURCE[object.__class__.__name__]
r = Resource()
bundle = r.build_bundle(object)
r.full_dehydrate(bundle)
s = Serializer()
return mark_safe(s.serialize(bundle, 'application/json'))
Source: http://django-tastypie.readthedocs.org/en/latest/cookbook.html#using-your-resource-in-regular-views
This seems to be closer, work with single models, stay DRY with which fields I want that I have listed in TastyPie, but does not handle multiple models yet. I'm also not sure if it is problematic to have this in a filter.
Update
Also use this: https://gist.github.com/1568294/4d4007edfd98ef2536db3e02c1552fd59f059ad8
def json(object):
"""Return json string for object or queryset."""
TYPE_TO_RESOURCE = {
'User': UserResource,
'UserProfile': UserProfileResource,
}
if isinstance(object, QuerySet):
Resource = TYPE_TO_RESOURCE[object[0].__class__.__name__]
r = Resource()
bundles = [r.build_bundle(model) for model in object]
bundle = [r.full_dehydrate(b) for b in bundles]
elif isinstance(object, Model):
Resource = TYPE_TO_RESOURCE[object.__class__.__name__]
r = Resource()
bundle = r.build_bundle(object)
r.full_dehydrate(bundle)
else:
mark_safe(simplejson.dumps(object))
s = Serializer()
return mark_safe(s.serialize(bundle, 'application/json'))