How do I upload JSON data to Cloudant in bulk? - json

I'm trying to upload a 50Mb json file to a Cloudant database and I've tried over curl and using a NodeJS script. I get 
{"error":"too_large","reason":"the request entity is too large"}
but if I limit it to 1000 documents, it works.What is the fastest way of doing this , am I missing something? 

Cloudant has a document size limit of 1MB and each individual request must be less than 10MB. So if you need to upload 50MB of data, the work needs to be distributed amongst several API calls. I would recommend you use the _bulk_docs API to upload 500 documents per API call. You may have several API calls in flight at any one time.

Related

Google Drive API /files slow response

I want to ask for help/ideas on the issue I will describe below.
Our iOS app allows users to access their Google Drive files.
We use Changes API (https://developers.google.com/drive/api/v3/reference/changes). The main pre-condition to using this API is to build a local DB that holds the snapshot of the user's Drive file tree and the token. To initially fill the DB we must request the list of all files from user's Drive. Getting the list of all files (with metadata) takes too long for many of our users. This is the issue I want to address.
We request files with the series of Files requests (https://developers.google.com/drive/api/v3/reference/files/list). Most requests are plain files?q=trashed%20%3D%20false.
For example, at my own private Google Drive:
69K files
initial request of all files takes 5+ minutes with my current network speed (Download 527 Mbps, Upload 417 Mbps; ping www.googleapis.com – 40–45 ms)
~150 requests
each request brings information about ~460 files
each request takes around 2-2.5 seconds
Sometimes I observed requests to take up to 6 seconds, which means that getting all files list took 15 minutes at my account.
If I look at the Developer Console, the latency is below 0.1s
Many of our users have Drives far bigger than mine. Standard iOS app user's session is not long enough to complete the initial request. We do save every intermediate page token so that all data received during single app session is not lost if user leaves the app – next session we will keep downloading data from the last saved token. But still there're some cases when our app needs the DB to be filled out with data before starting some operations – in that case our users see "Pending..." progress and they complain that our app is slow.
So, questions:
is it possible to improve the described request speed/latency?
maybe there's some quota that we are missing and it can be changed?
maybe someone can advice a more effective way of getting all files list?
P.S. We could potentially reduce the amount of requests. We have to perform some double checks for Shared with Me folders as we observed that sometimes request of all files doesn't list all files from Shared folders. That's a bit of a side story, and I don't think this will dramatically improve situation for us. I can provide more details on the actual set of requests we perform if necessary.
Are you returning all the fields - I would assume so since the only query param provided is trashed=false as the query param. Do you need all the fields? Can you try to reduce the query to only return the fields you really care about (using a field mask) and see if that improves your performance?

Large API call with no pagination

I need to retrieve data from an API source that has a massive amount of entries. (1800+) The problem is that the source has no pagination or way to group the entries. We are then saving the entries as post on the site and will run through a Cronjob daily.
Using curl_init() to retrieve the data from the API source. But the we keep getting a 503 error, timing out. When it works it retrieves the data as json saving important info with as metadata and the rest as json.
Is there a more reliable way to retrieve the data. On other sites I have worked on we have been able to programmatically run through an API per page in the backend.
You might try saving the JSON to a file first, then running the post creation on the JSON in the file vs. the direct cURL connection. I ran into similar issues in the past, even with an API that had pagination.

Is there a simple way to host a JSON document you can read and update in Google Cloud Platform?

What I'm trying to do is host a JSON document that will then, essentially, serve as a hosted version of json-server. I'm aware I can do something similar with My JSON Server, but I plan to move my entire architecture to GCP so want to get more familiar with it.
At first I looked into the Storage JSON API, but it seems like that's just for getting data about buckets rather than the items in the buckets itself. I created a bucket called test-json-api and added a test-data.json, but there's seemingly no way to access the data in the json file via this API.
I'm trying to keep it as simple as possible for testing purposes. In time, I'll probably use a firestore allocation, but for now I'd like to avoid all that complexity, and instead have a simple GET and a PUT/PATCH to a json file.
The Storage JSON API you are talking about are only for getting and updating the metadata and not for getting and updating the data inside the object. Objects inside the Google Cloud Storage bucket are immutable and one way to update them may be to get the object data from Google Cloud Storage bucket within the code, updating it, then uploading it again into the Google Cloud Storage bucket.
As you want to deal with JSON files you may explore using Cloud Datastore or Cloud Firestore. Also if you wish to use Firebase then you may explore Firebase Realtime Database.
A very quick and dirty way to make it easy to read some of the information in the json doc is to use that information in the blob name. For example the key information in
doc = {'id':3, 'name':'my name', ... }
could be stored in an object called "doc_3_my name", so that it can be read while browsing the bucket. You can then download the right doc if you need to see all the original data. An object name can be up to 1024 bytes of UTF-8 (with some exclusions), which is normally sufficient for surfacing basic information.
Note that you should never store PII like this.
https://cloud.google.com/storage/docs/objects

Is it possible for firebase to update itself?

I need to provide every user with data from different sites. Each of the site provides data in JSON format, but some of them have restriction to maximal number of request.
My idea for solution is to download the data to firebase periodically, than users will access just the firebase database.
From docs it seems to me that firebase can somehow use http requests.
Can I use firebase to periodically update itself by http request?
Or should I establish server which will do the task?
I am pretty new to those topics so any tip where to look for information will be appreciated.

Asp.net Web API http response data size

I have an asp.net web api with getting a list of data from database with a very heavy sql query(using store procedure) then serialize to json, my data result could return sometime more than 100,000 rows of data and is beyond the max limitation of http JSON response which is 4MB, I've been trying to use pagination to limit my result size but it pulls down performance as everytime user click on next page will trigger a heavy sql command, but if I don't use pagination, sometimes the result data size is more than 4MB, and my client side grid won't render properly. Since I don't have a way to check the JSON data size before sending back to client from web api. So my questions would be:
Is there any way to check data size in asp.net web api before sending back to client? For example, if it's more than 4MB then send a response saying "please modify your date range to have less data"? Would this be a good idea in application design?
Is there any way to save the entire data result in cache or
somewhere with asp.net web api so that every time when user perform a
pagination, it will not get result again from database but from the
cache.
Is there any way to store the entire data result in cache or in a temp file on client side(using Angular 5) so that when user perform a pagination, it will not request another http call to web api.
I would be more than happy to listen any experience or suggestion from anyone! Thank you very much!