Storing GPS data - GAE or in SQL LITE - google-maps

I am developing a GPS application that tracks a user, sort of like the app Runkeeper that tracks where you have been on your run.
In order to do this, should I store GPS coordinates in a SQL Lite database on the phone or on Google App Engine and then when the user selects the data, I can send the entire set to the phone?
What would be a better design?

I have a lot of experience in this category. I am also developing a similar app. I save data on the phone, but periodically check for a connection and then upload the data to an app engine, and then delete the data on the phone (so it doesnt comsume too much phone memory). I store upload files name to data store, and files to the cloud storage (which can accept larger size than the data store). Then I upload the data to bigQuery for analysis and main storage of all data. I also remember what data was successfully uploaded to bigQuery. So there are really 4 memories just to get the data into BQ. Its quite an effort. My next phase is doing the analytics using BigQuery and sending info back the phone app. I also use Tableau as they have a good solution for reading bigQuery and they recognize geo data and you can plot lat,lon on an implementation of Open Street Map.
Certainly there are other solutions. The downside to my solution is that the upload to bigQuery is slow, sometimes takes minutes before the data is available. But on the other hand, Google's cloud tools are quite good and it integrates with 3rd party analytics/viewers.
Keep me updated on how you progress.

Related

Retrieve streaming data from API using Cloud Functions

I want to stream real time data from Twitter API to Cloud Storage and BigQuery. I have to ingest and transform the data using Cloud Functions but the problem is I have no idea how to pull data from Twitter API and ingest it into the Cloud.
I know I also have to create a scheduler and a Pub/Sub topic to trigger Cloud Functions. I have created a Twitter developer account. The main problem is actually streaming the data into Cloud Storage.
I'm really new to GCP and streaming data so it'll be nice to see a clear explanation on this. Thank you very much :)
You have to design first your solution. What do you want to achieve? Streaming or Microbatches?
If streaming, you have to use the streaming API of Twitter. In short, you initiate a connection and you stay up and running (and connected) receiving the data.
If batches, you have to query an API and to download a set of message. In a Query-response mode.
That being said, how to implement it with Google Cloud. Streaming is problematic because you have to be always connected. And with serverless product you have timeout concern (9 minutes for Cloud Functions V1, 60 minutes for Cloud Run and Cloud Functions V2).
However you can imagine to invoke regularly your serverless product, stay connected for a while (let say 1h) and schedule trigger every hour.
Or use a VM to do that (or a pod on a K8S container)
You can also consider microbatches where you invoke every minute your Cloud Functions and to get all the messages for the past minutes.
At then end, all depends on your use case. What's the real time that you expect? which product do you want to use?

Strategy to implement paid API in the mobile application

I'm developing an app that shows the score of sports-related games in real-time. I'm using a paid API that has limited no. of requests and to show the score in real-time, I'm using a short polling technique (hit the API after every 2-3 seconds to see if any change happens in the score)
If I directly place that API url in the application, then every application user would be directly hitting an API. Assuming 10 users are using an application, then 10 API calls would be deducted after specified time interval (2-3 seconds), right?
So what should be the strategy (better way or approach) to do this thing to prevent multiple API calls?
What I could come up with his store the API JSON response in the MYSQL database. This way, I would be serving the data to application users through the database (this way, users would hit the database, not an actual API) Is it the correct way to do it?
Store the API JSON response into the MYSQL database
Then reconvert the MySQL database into the JSON format
and then the application users would be polling the database JSON response
I don't know if this is the correct way to do it! That's why posted this question
Thank you

What is a good AWS solution (DB, ETL, Batch Job) to store large historical trading data (with daily refresh) for machine learning analysis?

I want to build a machine learning system with large amount of historical trading data for machine learning purpose (Python program).
Trading company has an API to grab their historical data and real time data. Data volume is about 100G for historical data and about 200M for daily data.
Trading data is typical time series data like price, name, region, timeline, etc. The format of data could be retrieved as large files or stored in relational DB.
So my question is, what is the best way to store these data on AWS and what'sthe best way to add new data everyday (like through a cron job, or ETL job)? Possible solutions include storing them in relational database like Or NoSQL databases like DynamoDB or Redis, or store the data in a file system and read by Python program directly. I just need to find a solution to persist the data in AWS so multiple team can grab the data for research.
Also, since it's a research project, I don't want to spend too much time on exploring new systems or emerging technologies. I know there are Time Series Databases like InfluxDB or new Amazon Timestream. Considering the learning curve and deadline requirement, I don't incline to learn and use them for now.
I'm familiar with MySQL. If really needed, i can pick up NoSQL, like Redis/DynamoDB.
Any advice? Many thanks!
If you want to use AWS EMR, then the simplest solution is probably just to run a daily job that dumps data into a file in S3. However, if you want to use something a little more SQL-ey, you could load everything into Redshift.
If your goal is to make it available in some form to other people, then you should definitely put the data in S3. AWS has ETL and data migration tools that can move data from S3 to a variety of destinations, so the other people will not be restricted in their use of the data just because of it being stored in S3.
On top of that, S3 is the cheapest (warm) storage option available in AWS, and for all practical purposes, its throughout is unlimited. If you store the data in a SQL database, you significantly limit the rate at which the data can be retrieved. If you store the data in a NoSQL database, you may be able to support more traffic (maybe) but it will be at significant cost.
Just to further illustrate my point, I recently did an experiment to test certain properties of one of the S3 APIs, and part of my experiment involved uploading ~100GB of data to S3 from an EC2 instance. I was able to upload all of that data in just a few minutes, and it cost next to nothing.
The only thing you need to decide is the format of your data files. You should talk to some of the other people and find out if Json, CSV, or something else is preferred.
As for adding new data, I would set up a lambda function that is triggered by a CloudWatch event. The lambda function can get the data from your data source and put it into S3. The CloudWatch event trigger is cron based, so it’s easy enough to switch between hourly, daily, or whatever frequency meets your needs.

Locally store large amounts of data

The main purpose is to store data locally so it can be accessed without internet connection.
In my React application I will need to fetch JSON data (such as images, text and videos) from the internet and display it for a certain amount of time.
To add flexibility, this should work offline as well.
I've read about options such as localStorage and Firebase but all of them so far require either access to the Internet, or are limited to 10Mb which is too low for what I'll need.
What would be my best option to persist data in some sort of offline
database or file trough react?
I'd also be thankful if you could point me to a good tutorial about
any provided solution.
To store large amounts of data on client side you can use indexedDB.
IndexedDB is a low-level API for client-side storage of significant amounts of structured data, including files/blobs.
You can read more about indexedDB api here

Does data stored using Mobile Air AS3 SharedObject persist across app updates?

Does data stored using Shared Object persist across mobile app updates?
Are there any gotchas that I should know before relying on this?
Would it be safer to store data using the file system API?
Shared object is not a reliable type of data storage since it's up to the device owner to allow or limit shared object capacity. Shared object can also be cleared by the owner at any time. You would see tremendous amount of AIR apps done using shared object not because it's a good solution but because app author save a few lines of code using it. More reliable is SQLite as3 built in capability. A simple txt file would do for short amount of data. For more amount of data SQLite is unbeatable and 100% reliable and ultra fast.