Youtube API: Get subscribers counts of a million channels - mysql

scenario: MySQL database with a million entries, containing Youtube channel IDs and subscribers counts. Now I would like to update the subscribers counts periodically (maybe once a week).
I am wondering if such large requests are even possible with the API? And if so, what would be the most efficient way to get the subscribers count of a million channels? And if not so, can you think of a work-around?
Thank you very much!

I think you may want to consider doing them in batches you should check the Google developer console see what your quota is for the API currently. If there is a pencil icon next to the quota you can increase it.
Also checking the YouTube Data API (v3) - Quota Calculator might help you decide how many to run each day.

Related

Discrepancies in STEPS data between GoogleFit App and GoogleFit API

we are a digital health company developing an API to access Health data to use by other companies, like health insurance. One of our customers complained that there are discrepancies between the daily steps data shown in the GoogleFit App and the data we provided for some of his users. However, when we queried GoogleFit API through Postman through https://www.googleapis.com/fitness/v1/users/me/dataset:aggregate , we received the same value we provided. Hence, the discrepancy is between the GoogleFit APP and GoogleFit API. The difference in steps data between the API and the APP is significant and can reach values up to 9262 steps of difference. Moreover, these discrepancies were observed for more users and for several days.
I read here https://developers.google.com/fit/faq#how_do_i_get_the_same_values_step_count_calories_distance_etc_as_the_google_fit_app (and I also checked related issues in Stack Overflow, e.g.,https://stackoverflow.com/questions/69030278/google-fit-rest-apis-giving-incorrect-steps-count ) that these discrepancies might be related to syncing delays. However, the user synced again, but the data in the App did not change. Moreover, the user doesn’t use multiple devices, and the discrepancies seem to be permanent and not temporary.
Can you please check on this and let us know which might be the reason for these discrepancies? Is there any troubleshooting we/the user can implement to solve it?
Thank you :)

Active Collab 5 Webhooks / Maintaining "metric" data

I have an application I am working on that basically takes the data from Active Collab and creates reports / graphs out of the data. The API itself is insufficient to get the proper data on a per request basis so I resorted to pulling the data down into a separate data set that can be queried more efficiently.
So in order to avoid needing to query the entire API constantly I decided to make use of webhooks in order to make the transformations to the relevant data and lower the need to resync the data.
However I notice not all events are sent, notably the following.
TaskListUpdated
MemberUpdated
TimeRecordUpdated
ProjectUpdated
There is probably more but these are the main ones I noticed so far,
Time reports is probably the most important, in fact it missing from webhooks means that almost any application has a good chance of incorrect data if it needs time record data. Its fairly common to do a typo in a time record and then adjust it later.
So am I missing anything here? Is there some way to see these events reliably?
EDIT:
In order to avoid a long comment to Ilija I am putting the bulk here.
Webhooks apart, what information do you need to pull? API that powers
time tracking reports can do all sorts of cross project filtering, so
your approach to keep a separate database may be an overkill.
Basically we are doing a multi-variable tiered time report. It can be sorted / grouped by any conceivable method you may want to look at.
http://www.appsmagnet.com/product/time-reports-plus/
This is the closest to what we are trying to do, back when we used Active Collab 4 this did the job, but even with it we had to consolidate it in our own spreadsheets.
So the idea of this is to better integrate our Active Collab data into our own workflow.
So the main data we are looking for in this case is
Job Types
Projects
Task Lists
Tasks
Time Records
Categories
Members / Clients
Companies
These items can feed not only our reports, but many other aspects of our company as well. For us Active Collab is the point of truth, so we want the data quickly accessible and fully query-able.
So I have set up a sync system that initially grabs all the data it can from Active Collab and then uses a mix of cron's and webhooks to keep it up to date.
Cron jobs work well for all aspects that do not have "sub items" (projects/tasks/task lists/time records). So those I need to rely on the webhook since syncing them takes to much time to be able to keep it up to date in real time.
For the webhook I noticed the above do not carry through. Time Records I figured out a way around it listed in my answer, and member can be done through the cron. However Task list and project updating are the only 2 of some concern. Project is fairly important as the budget can change and that would be used in reports, task lists has the start / end dates that could be used as well. Since going through every project / task list constantly to see if there is a change is really not a great idea I am looking for a way to reliably see updates for them.
I have based this system on https://developers.activecollab.com/api-documentation/ but I know there are at least a few end points that are not listed.
Cross-project time-record filtering using Active Collab 5 API
This question is actually from another developer on the same system (and also shows a TrackingFilter report not listed in the docs). Due to issues with maintaining an accurate set of data we had to adapt it. I actually notice that you (Ilija) are the person replying and did recommend we move over to this style of system.
This is not a total answer but a way to solve the issue with TimeRecordUpdated not going through the webhook.
There is another API endpoint for /whats-new This endpoint describes changes for the last day or so and it has a category called TrackingObjectUpdatedActivityLog this refers to an updated time record.
So I set up a cron job to check this fairly consistently and manually push the TimeRecordUpdated event through my system to keep it consistent.
For MemberUpdated since the data for a member being updated is unlikely to affect much, having a daily cron for checking the users seems good enough.
ProjectUpdated could technically be considered the same, but with the absence of TaskListUpdated that leads to far to many api calls to sync the data. I have not found a solution for this yet unfortunately.

Amazon API submitting requests too quickly

I am creating a games comparison website and would like to get Amazon prices included within it. The problem I am facing is using their API to get the prices for the 25,000 products I already have.
I am currently using the ItemLookup from Amazons API and have it working to retrieve the price, however after about 10 results I get an error saying 'You are submitting requests too quickly. Please retry your requests at a slower rate'.
What is the best way to slow down the request rate?
Thanks,
If your application is trying to submit requests that exceed the maximum request limit for your account, you may receive error messages from Product Advertising API. The request limit for each account is calculated based on revenue performance. Each account used to access the Product Advertising API is allowed an initial usage limit of 1 request per second. Each account will receive an additional 1 request per second (up to a maximum of 10) for every $4,600 of shipped item revenue driven in a trailing 30-day period (about $0.11 per minute).
From Amazon API Docs
If you're just planning on running this once, then simply sleep for a second in between requests.
If this is something you're planning on running more frequently it'd probably be worth optimising it more by making sure that the length of time it takes the query to return is taken off that sleep (so, if my API query takes 200ms to come back, we only sleep for 800ms)
Since it only says that after 10 results you should check how many results you can get. If it always appears after 10 fast request you could use
wait(500)
or some more ms. If its only after 10 times, you could build a loop and do this every 9th request.
when your request A lot of repetition.
then you can create a cache every day clear context.
or Contact the aws purchase authorization
I went through the same problem even if I put 1 or more seconds delay.
I believe when you begin to make too much requests with only one second delay, Amazon doesn't like that and thinks you're a spammer.
You'll have to generate another key pair (and use it when making further requests) and put a delay of 1.1 second to be able to make fast requests again.
This worked for me.

How many data points can the Google Maps API show at one time?

I have a large volume of data (100,000 points). Is it possible to show all the data points at the same time? Someone said that the Google Maps API 3 cannot load more than 3000 data points. Is this true? How many data points can it show at one time?
You might want to take a look at this article from the Google Geo APIs team, discussing various strategies to display a large number of markers on a map.
Storing the data using Fusion Tables in particular might be an interesting solution.

response time google spreadsheet with large number of rows

I'm developing a web application that will use a google spreadsheet as a database.
This will mean parsing up to 30.000 (guestimated size) rows in regular operations for searching ID's etc...
I'm worried about the response times i will be looking at. Does anybody have experience with this? I don't want to waste my time on something that will hit a deadend at an issue like that.
Thanks in advance
Using spreadsheets as a database for this data set is probably not a good idea. Do you already have this spreadsheet set up?
30K rows will allow you to have only 66 columns, is that enough for you? Check the Google Docs size limits help page for more info.
Anyway, Google Spreadsheets have a "live concurrent editing" nature to it that makes it a much slower database than any other option. You should probably consider something else.
Do you intend to use the spreadsheet to display data or only as a storage place.?
In this second option the relative slowness of the spreadsheet will not be an issue since you'll only have to read its data once to get its data in an array and play with that...
This implies of course that you build every aspect of data reading and writing in a dedicated UI and never show the spreadsheet itself , the speed will be depending only on the JavaScript engine on arrays, the speed of UI and the speed of your internet connection... all 3 factors being not very performing if compared to a 'normal' application but with the advantage of being easily shareable and available anywhere.:-)
That said, I have written such a database app with about 20 columns of data and 1000 rows and it is perfectly useable although having some latency even for simple next / previous requests. On the other hand the app can send mails and create docs.... the advantages of Google service integration :-)
You could have a look at this example to see what I'm talking about