I am planning on using data set that's available in SOCRATA platform. I am planning on hitting the REST endpoints instead of downloading and managing data on my own.
I have below questions.
is there a guaranteed uptime?
1000 requests per hour is that a hard limit?
do you have any metrics on response times?
Any help is appreciated
Thanks
Ravi
Per your questions:
is there a guaranteed uptime - You will want to check Socrata's maintenance windows to time any downloads.
1000 requests per hour is that a hard limit? - 1,000 records per request is only applicable to version 1.0 of their API. Version 2.0 has a maximum of 50,000 records and version 2.1 has no limit. See how you can determine the API version for the dataset you are using.
do you have any metrics on response times? - In my experience, it's highly variable, usually depending on your local ISP and network activity. Overnight and weekend jobs are usually faster while mid-day jobs are a bit slower. I'd recommend running some tests.
Related
I have wrote two functions in firebase, which maintain data. Like daily delete old data.
Question is when I write query to get data. does it count toward my GB downloaded limit which is $1/1GB for Blaze plan.
Since the data is transferred from Firebase Servers (Google Servers) to a user's computer (that is you in this case), you will be charged for all those data transfer into your computer.
I am creating a games comparison website and would like to get Amazon prices included within it. The problem I am facing is using their API to get the prices for the 25,000 products I already have.
I am currently using the ItemLookup from Amazons API and have it working to retrieve the price, however after about 10 results I get an error saying 'You are submitting requests too quickly. Please retry your requests at a slower rate'.
What is the best way to slow down the request rate?
Thanks,
If your application is trying to submit requests that exceed the maximum request limit for your account, you may receive error messages from Product Advertising API. The request limit for each account is calculated based on revenue performance. Each account used to access the Product Advertising API is allowed an initial usage limit of 1 request per second. Each account will receive an additional 1 request per second (up to a maximum of 10) for every $4,600 of shipped item revenue driven in a trailing 30-day period (about $0.11 per minute).
From Amazon API Docs
If you're just planning on running this once, then simply sleep for a second in between requests.
If this is something you're planning on running more frequently it'd probably be worth optimising it more by making sure that the length of time it takes the query to return is taken off that sleep (so, if my API query takes 200ms to come back, we only sleep for 800ms)
Since it only says that after 10 results you should check how many results you can get. If it always appears after 10 fast request you could use
wait(500)
or some more ms. If its only after 10 times, you could build a loop and do this every 9th request.
when your request A lot of repetition.
then you can create a cache every day clear context.
or Contact the aws purchase authorization
I went through the same problem even if I put 1 or more seconds delay.
I believe when you begin to make too much requests with only one second delay, Amazon doesn't like that and thinks you're a spammer.
You'll have to generate another key pair (and use it when making further requests) and put a delay of 1.1 second to be able to make fast requests again.
This worked for me.
I have a massive table that records events happening on our website. It has tens of millions of rows.
I've already tried adding indexing and other optimizations.
However, it's still very taxing on our server (even though we have quite a powerful one) and takes 20 seconds on some large graph/chart queries. So long in fact that our daemon intervenes to kill the queries often.
Currently we have a Google Compute instance on the frontend and a Google SQL instance on the backend.
So my question is this - is there some better way of storing an querying time series data using the Google Cloud?
I mean, do they have some specialist server or storage engine?
I need something I can connect to my php application.
Elasticsearch is awesome for time series data.
You can run it on compute engine, or they have a hosted version.
It is accessed via an HTTP JSON API, and there are several PHP clients (although I tend to make the API calls directly as i find it better to understand their query language that way).
https://www.elastic.co
They also have an automated graphing interface for time series data. It's called Kibana.
Enjoy!!
Update: I missed the important part of the question "using the Google Cloud?" My answer does not use any specialized GC services or infrastructure.
I have used ElasticSearch for storing events and profiling information from a web site. I even wrote a statsd backend storing stat information in elasticsearch.
After elasticsearch changed kibana from 3 to 4, I found the interface extremely bad for looking at stats. You can only chart 1 metric from each query, so if you want to chart time, average time, and avg time of 90% you must do 3 queries, instead of 1 that returns 3 values. (the same issue existing in 3, just version 4 looked more ugly and was more confusing to my users)
My recommendation is to choose a Time Series Database that is supported by graphana - a time series charting front end. OpenTSDB stores information in a hadoop-like format, so it will be able to scale out massively. Most of the others store events similar to row-based information.
For capturing statistics, you can either use statsd or reimann (or reimann and then statsd). Reimann can add alerting and monitoring before events are sent to your stats database, statsd merely collates, averages, and flushes stats to a DB.
http://docs.grafana.org/
https://github.com/markkimsal/statsd-elasticsearch-backend
https://github.com/etsy/statsd
http://riemann.io/
I have developed a C# WCF application, which when called performs inserts and updates in a MySQL 5.6 database, running on a Windows 2008 server, with IIS. The requests can range from a single update or insert for 1 row, to 1000 updates or 1000 inserts per request.
Initially the 3rd party remote connections was minimal. But now the load and number of requests has increased.
Therefore, I'm now looking at providing the best possible solutions in terms of a high availability service, with redundant MySQL fail-over; whilst being able to ensure that the service can handle the number of requests; providing rapid response.
Can anyone offer any advice on how to achieve this.
We have 500+ remote locations. Each location has a linux router which checks in to our management system (homemade using RoR3) every 15 minutes.
We need to log and calculate mean uptime of each boxes Internet connectivity.
Each router posts a request every 15 minutes to a script on the server. (Currently this just records the last checkin time and the uptime.)
If we want to plot the historical uptime of each box, what is the most efficient way to do this without clogging our db up.
500 boxes checking in every 15 minutes would (according to my calculations) result in 17,520,000 inserts. Quite a hefty amount of data that I don't think we need.
Could anyone help solve this riddle for us?
Why not take a look at RRDTool (Wiki-entry). It's just the tool for this kind of situation.
It works as a sort of a round-robin self-averaging database, and it's used in many logging applications just for similar purposes to your situation.
As an example take a look at Cacti which is a data-logging / network monitoring and graphing front-end app built around RRDTool (implemented in PHP).