Storing Amazon's product API data in database

Storing Amazon's product API data in database - amazon-product-api

I am trying to build an application with retail data from US but experiment in India. I want to put in new features like group shopping and also analyze user behavior like buying, sharing etc.
But, I want data from Amazon and other affiliates. I am thinking if to store Amazon's data in my database without images being downloaded (as per its terms).
But, I am not clear at all from their terms if I can store the API results in my database and update regularly. I wouldn't want the whole of Amazon data though! Just a sample, like 20-30K products.
Is this allowed? Has anyone done this?
Link to terms - https://affiliate-program.amazon.com/gp/advertising/api/detail/agreement.html

You can store them on your DB for up to 24 hours.So, when you receive a request for a specific product,
First check if it has been updated within 24 hours since the last updated time.
If it has, simply use the data from the DB. Otherwise, fetch the latest data from the amazon and store it on the DB to use for the next 24 hours period.

Related

Best practices - Big data with mysql

I have a video surveillance project running on a cloud infrastructure and using MySQL database.
We are now integrating some artificial intelligence into our project including face recognition, plate recognition, tag search, etc.. which implies a huge amount of data every day
All the photos and the images derived from those photos by image processing algorithms are stored in cloud storage but their references and tags are stored in the database.
I have been thinking of the best way to integrate this, do I have to stick to MySQL or use another system. The different options I thought about are:
1- Use another database MongoDB to store the photos references and tags. This will cost me another database server, as well as the integration with a new database system along with the existent MySQL server
2- Use elastic search to retrieve data and perform tag searching. This leads to question the capacity of MySql to store this amount of data
3- Stick with MySQL purely, but is the user experience will be impacted?
Would you guide me to the best option to choose or give me another proposal?
EDIT:
For more information:
The physical pictures are stored in cloud storage, only the URLs are stored in the database.
In the database, we will store the metadata of the picture like id, the id of the client, URL, tags, date of creation, etc...
Operations are of the type :
It will be generally a SELECTs based on different criteria and search by tags
How big the data is?
Imagine a camera placed outdoor in the street and each time it detects a face it will send an image.
Imagine thousands of cameras are doing so. Then, we are talking about millions of images per client.

MySQL can handle billions of rows. You have not provided enough other information to comment on the rest of your questions.
Large blobs (images, videos, etc) are probably best handled by some large, cheap, storage. And then, as you say, a URL to the blob would be stored in the database.
How many rows? How frequently inserting? Some desired SELECT statements? Is it mostly just writing to the database? Or will you have large, complex, queries?

Sync multiple local databases to one remotely

I need to create a system with local webservers on Raspberry Pi 4 running laravel for API calls, websockets, etc. Each RPI will be installed in multiple customers places.
For this project i want to have the abality to save/sync the database to a remote server (when the local system is connected to internet).
Multiple locale databases => One remote database cutomers based
The question is, how to synchronize databases and identify properly each customers data and render them in a mutualised remote dashboard.
My first thought was to set a customer_id or a team_id on each tables but it seems dirty.
The other way is to create multiple databases on the remote server for the synchronization and one extra database to set customers ids and database connection informations...
Someone has already experimented something like that? Is there a sure and clean way to do this?

You refer to locale but I am assuming you mean local.
From what you have said you have two options at the central site. The central database can either store information from the remote databases into a single table with an additional column that indicates which remote site it's from, or you can setup a separate table (or database) for each remote site.
How do you want to use the data?
If you only ever want to work with the data from one remote site at a time it doesn't really matter - in both scenarios you need to identify what data you want to work with and build your SQL statement to either filter by the appropriate column, or you need to direct it to the appropriate table(s).
If you want to work on data from multiple remote sites at the same time, then using different tables requires tyhat you use UNION queries to extract the data and this is unlikely to scale well. In that case you would be better off using a column to mark each record with the remote site it references.
I recommend that you consider using Uuids as primary keys - it may be that key collision will not be an issue in your scenario but if it becomes one trying to alter the design retrospectively is likely to be quite a bit of work.
You also asked about how to synchronize the databases. That will depend on what type of connection you have between the sites and the capabilities of your software, but typically you would have the local system periodically talk to a webservice at the central site. Assuming you are collecting sensor data or some such the dialogue would be something like:
Client - Hello Server, my last sensor reading is timestamped xxxx
Server - Hello Client, [ send me sensor readings from yyyy | I don't need any data ]
You can include things like a signature check (for example an MD5 sum of the records within a time period) if you want to but that may be overkill.

Should I use files or a database?

I'm building a cloud sync application which syncs a users data across multiple devices. I am at a crossroads and am deciding whether to store the data on the server as files, or in a relational database. I am using Amazon Web Services and will use S3 for user files or their database service if I choose to store the data in a table instead. The data I'm storing is the state of the application every ten seconds. This could be problematic to be storing in a database because the average number of rows per user that would be stored is 100,000 and with my current user base of 20,000 people that's 2 billion rows right off the bat. Would I be better off storing that information in files? Because that would be about 100 files totaling 6 megabytes per user.

As discussed in the comments, I would store these as files.
S3 is perfectly suited to be a key/value store and if you're able to diff the changes and ensure that you aren't unnecessarily duplicating loads of data, the sync will be far easier to do by downloading the relevant files from S3 and syncing them client side.
You get a big cost saving of not having to operate a database server that can store tonnes of rows and stay up to provide them to the clients quickly.
My only real concern would be that the data in these files can be difficult to parse if you wanted to aggregate stats/data/info across multiple users as a backend or administrative view. You wouldn't be able to write simple SQL queries to sum up values etc, and would have to open the relevant files, process them with something like awk or regular expressions etc, and then compute the values that way.
You're likely doing that on the client side any for the specific files that relate to that user though, so there's probably some overlap there!

'Real Time' data methods MySQL

I've recently been researching data transfer methods that could replace my current inefficient setup.
So just to get started I will explain the issue that I am having with my current MySQL data transfer method...
I have a database that keeps track of product inventory levels within my warehouses, this stock data is constantly changing at a rapid rate.
A CSV file is created and set to cron every 15 minutes, I then delivery this CSV file via FTP to multiple sites that we manage to update inventory stock levels. These sites are also using MySQL and the data in the CSV file is added with a script.
This method is very inefficient and doesn't update our stock levels to our other databases as fast as we would like.
My question is are there any other methods to transfer MySQL data in real time between multiple databases other than CSV files? I have been researching another method however haven't came across anything useful as of yet.
Thanks

The best way to store clicks and views statistics?

I have some items and I want to store statistics like daily clicks and the traffic source of the page and I want to store these statistics for 100`s of items. Based on what stats I will save I want to be able to generate charts with the clicks of each day (like on google analytics) and to show the number of clicks from each traffic source.
I'm not a specialist, but I'm thinking to store statistics in a mysql table for a single day then write them in multiple .xml files. I have a slow, cheap server and I`m searching for the best method, please help me!
These "items" are embedded in other websites. I control these items using php

Since this items are embedded in other websites storing this infos / request is a NO GO.
This means you either need to install and setup mysql on this other websites, which is unlikely.
Or you connect to a remote mysql .. which is quite expensive for each request.
Especially when you say yourself, that you only have a "cheap" server.
Additionally you risk bringing the websites with the embedded items down, when your mysql server fails.

Better to use google analytics to track the visited pages correctly instead of developing one.
It will show you daily, country wise visitors.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008