Can I download historical CPU usage data for Google Compute Engine? - google-compute-engine

Anyone know if there is a way to download historical cpu usage data via API call for Google Compute Engine?
On the Console overview page, graphs are provided for this type of data for at least a month, but I don't see anything obvious on how to download the actual data directly.

The Google Compute Engine usage export feature recently launched - it sounds like what you're looking for. It gives daily detail in a CSV, as well as a month-to-date summary.

Related

Retrieve streaming data from API using Cloud Functions

I want to stream real time data from Twitter API to Cloud Storage and BigQuery. I have to ingest and transform the data using Cloud Functions but the problem is I have no idea how to pull data from Twitter API and ingest it into the Cloud.
I know I also have to create a scheduler and a Pub/Sub topic to trigger Cloud Functions. I have created a Twitter developer account. The main problem is actually streaming the data into Cloud Storage.
I'm really new to GCP and streaming data so it'll be nice to see a clear explanation on this. Thank you very much :)
You have to design first your solution. What do you want to achieve? Streaming or Microbatches?
If streaming, you have to use the streaming API of Twitter. In short, you initiate a connection and you stay up and running (and connected) receiving the data.
If batches, you have to query an API and to download a set of message. In a Query-response mode.
That being said, how to implement it with Google Cloud. Streaming is problematic because you have to be always connected. And with serverless product you have timeout concern (9 minutes for Cloud Functions V1, 60 minutes for Cloud Run and Cloud Functions V2).
However you can imagine to invoke regularly your serverless product, stay connected for a while (let say 1h) and schedule trigger every hour.
Or use a VM to do that (or a pod on a K8S container)
You can also consider microbatches where you invoke every minute your Cloud Functions and to get all the messages for the past minutes.
At then end, all depends on your use case. What's the real time that you expect? which product do you want to use?

What is running on my Google compute engine

There is a lot of activity on my Google Compute engine API. It's less than 1 request per second which probably keeps me in the free zone but how do I figure out what is running and if I should stop it?
I have some pub/sub topics and a cloud function to copy data into a dataStore database. But even if I am not publishing any data (for days), I still get activity on the compute engine? Can I disable it or will it stop my cloud functions?

Google Cloud SQL Timeseries Statistics

I have a massive table that records events happening on our website. It has tens of millions of rows.
I've already tried adding indexing and other optimizations.
However, it's still very taxing on our server (even though we have quite a powerful one) and takes 20 seconds on some large graph/chart queries. So long in fact that our daemon intervenes to kill the queries often.
Currently we have a Google Compute instance on the frontend and a Google SQL instance on the backend.
So my question is this - is there some better way of storing an querying time series data using the Google Cloud?
I mean, do they have some specialist server or storage engine?
I need something I can connect to my php application.
Elasticsearch is awesome for time series data.
You can run it on compute engine, or they have a hosted version.
It is accessed via an HTTP JSON API, and there are several PHP clients (although I tend to make the API calls directly as i find it better to understand their query language that way).
https://www.elastic.co
They also have an automated graphing interface for time series data. It's called Kibana.
Enjoy!!
Update: I missed the important part of the question "using the Google Cloud?" My answer does not use any specialized GC services or infrastructure.
I have used ElasticSearch for storing events and profiling information from a web site. I even wrote a statsd backend storing stat information in elasticsearch.
After elasticsearch changed kibana from 3 to 4, I found the interface extremely bad for looking at stats. You can only chart 1 metric from each query, so if you want to chart time, average time, and avg time of 90% you must do 3 queries, instead of 1 that returns 3 values. (the same issue existing in 3, just version 4 looked more ugly and was more confusing to my users)
My recommendation is to choose a Time Series Database that is supported by graphana - a time series charting front end. OpenTSDB stores information in a hadoop-like format, so it will be able to scale out massively. Most of the others store events similar to row-based information.
For capturing statistics, you can either use statsd or reimann (or reimann and then statsd). Reimann can add alerting and monitoring before events are sent to your stats database, statsd merely collates, averages, and flushes stats to a DB.
http://docs.grafana.org/
https://github.com/markkimsal/statsd-elasticsearch-backend
https://github.com/etsy/statsd
http://riemann.io/

Storing GPS data - GAE or in SQL LITE

I am developing a GPS application that tracks a user, sort of like the app Runkeeper that tracks where you have been on your run.
In order to do this, should I store GPS coordinates in a SQL Lite database on the phone or on Google App Engine and then when the user selects the data, I can send the entire set to the phone?
What would be a better design?
I have a lot of experience in this category. I am also developing a similar app. I save data on the phone, but periodically check for a connection and then upload the data to an app engine, and then delete the data on the phone (so it doesnt comsume too much phone memory). I store upload files name to data store, and files to the cloud storage (which can accept larger size than the data store). Then I upload the data to bigQuery for analysis and main storage of all data. I also remember what data was successfully uploaded to bigQuery. So there are really 4 memories just to get the data into BQ. Its quite an effort. My next phase is doing the analytics using BigQuery and sending info back the phone app. I also use Tableau as they have a good solution for reading bigQuery and they recognize geo data and you can plot lat,lon on an implementation of Open Street Map.
Certainly there are other solutions. The downside to my solution is that the upload to bigQuery is slow, sometimes takes minutes before the data is available. But on the other hand, Google's cloud tools are quite good and it integrates with 3rd party analytics/viewers.
Keep me updated on how you progress.

Google compute engine - API to get the total instance cost

Is there any API I can use to get the total cost of a VM instance in Google compute?
The usage scenario is like this:
Server starts
Runs for some Hours / Days
Gets shut down
For reporting purposes, we get the cost of the server and save it in our DB
Thanks
Google has a system for exporting billing information each day in a csv/json file to a storage bucket.
http://googlecloudplatform.blogspot.ca/2013/12/ow-get-programmatic-access-to-your-billing-data-with-new-billing-api.html