I have a few set of API's written in CakePHP which we want to migrate to Amazon AWS.
Following is the current situation:
Website is hosted on GoDaddy as shared hosting with domain, for example: democompany.com
Backend database is MySQL which we access via PhpMyAdmin. It has several tables e.g. users, plans, purchases etc.
All API's are written in CakePHP which we access via base URL:
democompany.com/cake
For example, for adding an entry in users table, we create a JSON and send it via REST API. Below image show the JSON:
Now, since our users are growing, our API response time has slowed. Sending a POST or GET takes time to return the response.
We were thinking of migrating our API's and database to Amazon AWS or any other solution. I am not much aware of AWS, so don't know which product would be best.
Which would be the best solution and offer immediate response and would be cost-effective?
A slowing MySQL database with PHP backend can have many reasons. Try these:
One of the most important thing is to think about your indexes. You probably have a primary index on ID with auto_increment. But if you query a lot on another column (like SELECT * FROM users WHERE email LIKE '%john%') it is important to also set an index on the email column. How indexes work is vital if you want high performing databases. See this post for a start of how this works: How do MySQL indexes work?
Another thing is the amount and complexity of your queries. Do you use many queries in one page load or only a few? Try to get as much information as possible out of one query.
Sorting data can be extremely expensive as well. Does removing SORT BY whatever speed things up a lot? Check this out: MYSQL, very slow order by
If you looked at all of this and are sure that all your queries are running smooth you can look at persistent connections (re-using connections in one page load for example), bigger servers, etc.
Related
I need to create a system with local webservers on Raspberry Pi 4 running laravel for API calls, websockets, etc. Each RPI will be installed in multiple customers places.
For this project i want to have the abality to save/sync the database to a remote server (when the local system is connected to internet).
Multiple locale databases => One remote database cutomers based
The question is, how to synchronize databases and identify properly each customers data and render them in a mutualised remote dashboard.
My first thought was to set a customer_id or a team_id on each tables but it seems dirty.
The other way is to create multiple databases on the remote server for the synchronization and one extra database to set customers ids and database connection informations...
Someone has already experimented something like that? Is there a sure and clean way to do this?
You refer to locale but I am assuming you mean local.
From what you have said you have two options at the central site. The central database can either store information from the remote databases into a single table with an additional column that indicates which remote site it's from, or you can setup a separate table (or database) for each remote site.
How do you want to use the data?
If you only ever want to work with the data from one remote site at a time it doesn't really matter - in both scenarios you need to identify what data you want to work with and build your SQL statement to either filter by the appropriate column, or you need to direct it to the appropriate table(s).
If you want to work on data from multiple remote sites at the same time, then using different tables requires tyhat you use UNION queries to extract the data and this is unlikely to scale well. In that case you would be better off using a column to mark each record with the remote site it references.
I recommend that you consider using Uuids as primary keys - it may be that key collision will not be an issue in your scenario but if it becomes one trying to alter the design retrospectively is likely to be quite a bit of work.
You also asked about how to synchronize the databases. That will depend on what type of connection you have between the sites and the capabilities of your software, but typically you would have the local system periodically talk to a webservice at the central site. Assuming you are collecting sensor data or some such the dialogue would be something like:
Client - Hello Server, my last sensor reading is timestamped xxxx
Server - Hello Client, [ send me sensor readings from yyyy | I don't need any data ]
You can include things like a signature check (for example an MD5 sum of the records within a time period) if you want to but that may be overkill.
Suppose I want to keep millions of recharge codes into a separate database(named A) having a table . I want to design another database(named B) which will be used by a web application.
I want to keep my database A separate and as secure as it can be, preferably not exposed to the network. so that nobody could get access/hack to the huge sensitive data.
But I also have to populate one table of database B with the codes from table of Database A as needed or requested from web application.
I am using Mysql DB and Apache Tomcat as web server .
Can you please suggest me any best and secure way of designing the database keeping in mind that..
1) The safety of codes in database A are the priority.
2) the tables will contain millions of rows so quick response is also requirement.
I'm adding this as an answer because it is too long for a comment.
I think this is more about app design and layering than about the database design as such. In terms of DB design, you just need the tables to have indexes that use all the keys you will have. The db-access will be sub-second.
In terms of app design, I suppose your app will know when to look at table-B and when it has to retrieve from table-A.
So, the key issue is: how to access A. The simplest way would be for the app to connect to A, and read it via SQL. The problem with this is that a hacker who is on your app server could then see your connection details. You could try to obscure the connection details from app-server to A. This would be "security through obscurity" and would be something, but would not stop a good hacker.
If you're serious about control, you could have an app running on A. You can block all ways for apps from outside A to access the database on A, leaving the app on A as the sole point of access.
By it very uniqueness, the app could provide another level of obscurity. For instance, the app could insist on knowing the customer-id for whom the code is being requested, and could check this against some info (even on B). But, there are better reasons to use one...
The app on B could
impose controls: e.g. only 1000 codes given out per hour
send alerts: e.g. email an operator if more than 500 codes have been requested in
the hour
For a RoR app I'm helping develop, I need to save all search queries in a database so I can analyze them later.
My plan right now is to create a Result model and table, and just save each search query's text in that table, along with a user's ID, the time, etc.
However, the app has about 15,000 users, so I'm afraid the single table approach won't be super efficient when it comes time to parse that data. (The database is setup via MySQL, if that factors in at all.)
Am I just being paranoid? Is there a Ruby gem that handles this sort of thing, or a better approach I could take?
Any input would be appreciated.
there are couple of approaches you can try:
1. Enable mysql query logging and than analyze these logs
http://dev.mysql.com/doc/refman/5.1/en/query-log.html
2. Use key=>value store (redis comes to mind) to log the search query in a similar way you described
If you decide to go with the 2nd approach i would create an asynch observer on the model you want to track
The answer depends on what you want to do with the data.
If your users don't need access to this, and you're not doing real-time analytics, dump them out of your app and get them into another database to run analytics to your heart's content.
If you want something integrated into your app, try a single mysql table.
Unless your server is tiny or your users are crazy active searchers, it should work just peachy. At a certain point you'll probably want to clear out old records and save them elsewhere though.
I am about 70% of the way through developing a web application which contains what is essentially a largeish datatable of around 50,000 rows.
The app itself is a filtering app providing various different ways of filtering this table such as range filtering by number, drag and drop filtering that ultimately performs regexp filtering, live text searching and i could go on and on.
Due to this I coded my MySQL queries in a modular fashion so that the actual query itself is put together dynamically dependant on the type of filtering happening.
At the moment each filtering action (in total) takes between 250-350ms on average. For example:-
The user grabs one end of a visual slider, drags it inwards, when he/she lets go a range filtering query is dynamically put together by my PHP code and the results are returned as a JSON response. The total time from the user letting go of the slider until the user has recieved all data and the table is redrawn is between 250-350ms on average.
I am concerned with scaleability further down the line as users can be expected to perform a huge number of the filtering actions in a short space of time in order to retrieve the data they are looking for.
I have toyed with trying to do some fancy cache expiry work with memcached but couldn't get it to play ball correctly with my dynamically generated queries. Although everything would cache correctly I was having trouble expiring the cache when the query changes and keeping the data relevent. I am however extremely inexperienced with memcached. My first few attempts have led me to believe that memcached isn't the right tool for this job (due to the highly dynamic nature of the queries. Although this app could ultimately see very high concurrent usage.
So... My question really is, are there any caching mechanisms/layers that I can add to this sort of application that would reduce hits on the server? Bearing in mind the dynamic queries.
Or... If memcached is the best tool for the job, and I am missing a piece of the puzzle with my early attempts, can you provide some information or guidance on using memcached with an application of this sort?
Huge thanks to all who respond.
EDIT: I should mention that the database is MySQL. The siite itself is running on Apache with an nginx proxy. But this question is related purely to speeding up and reducing the database hits, of which there are many.
I should also add that the quoted 250-350ms roundtrip time is fully remote. As in from a remote computer accessing the website. The time includes DNS lookup, Data retrieval etc.
If I understand your question correctly, you're essentially asking for a way to reduce the number of queries against the database eventhough there will be very few exactly the same queries.
You essentially have three choices:
Live with having a large amount of queries against your database, optimise the database with appropriate indexes and normalise the data as far as you can. Make sure to avoid normal performance pitfalls in your query building (lots of ORs in ON-clauses or WHERE-clauses for instance). Provide views for mashup queries, etc.
Cache the generic queries in memcached or similar, that is, without some or all filters. And apply the filters in the application layer.
Implement a search index server, like SOLR.
I would recommend you do the first though. A roundtrip time of 250~300 ms sounds a bit high even for complex queries and it sounds like you have a lot to gain by just improving what you already have at this stage.
For much higher workloads, I'd suggest solution number 3, it will help you achieve what you are trying to do while being a champ at handling lots of different queries.
Use Memcache and set the key to be the filtering query or some unique key based on the filter. Ideally you would write your application to expire the key as new data is added.
You can only make good use of caches when you occasionally run the same query.
A good way to work with memcache caches is to define a key that matches the function that calls it. For example, if the model named UserModel has a method getUser($userID), you could cache all users as USER_id. For more advanced functions (Model2::largerFunction($arg1, $arg2)) you can simply use MODEL2_arg1_arg2 - this will make it easy to avoid namespace conflicts.
For fulltext searches, use a search indexer such as Sphinx or Apache Lucene. They improve your queries a LOT (I was able to do a fulltext search on a 10 million record table on a 1.6 GHz atom processor, in less than 500 ms).
OK so I'm kinda new to databases in general. I understand the basic theory behind them and have knocked up the odd Access DB here and there.
One thing I'm struggling to learn about is the specifics of how e.g. an SQL query accesses a database.
So say you have a scenario where there's a database on a LAN server (let's say it's MS Access for arguments sake). You run some SQL query or other on it from a client machine. Does the client machine have to download the entire database to run said query (even if the result of the query is just one line)? Or does it somehow manage to get just the data it wants to come down the ol' CAT5? Does the server have to be running anything to do that? Can't quite understand how the client could get JUST the query results without the server having to do some of the work...
I'm seeing two conflicting stories on this matter when googling stuff.
And so this follows on the next question (which may already be answered): if you CAN query a DB without having to get the whole damn thing, and without the server running any other software, can the same be done with a CSV? If not, why not?
Reason I ask is I'm developing an app for a mobile device that needs to talk to a db or CSV file of some kind, and it'll be updating records at a pretty high rate (barcode scanning), so don't want the network to grind to a halt (it's a slow bag of [insert relevant insult] as it is). The less data travelling from device to server, the better.
Thanks in advance
The various SQL servers are just that: a server. It's a program that listens for client queries and sends back a response. It is more than just its data.
A CSV file, or "flat file" is just data. There is no way for it to respond to a query by itself.
So, when you are on a network, your query is sent to the server, which does the work of finding the appropriate results. When you open a flat file, you're using the network and/or file system to read/write the entire file.
Edit to add a note about your specific usage. You'll probably want to use a database engine, as the queries are going to be the least amount of network traffic. For example, when you scan a barcode, your query may be as simple as the following text:
INSERT INTO barcode_table ('code', 'scan_date', 'user') VALUES ('1234567890', '2011-01-24 12:00:00', '1');
The above string is handled by the database engine and the code (along with whatever relevant support data) is stored. No need for your application to open a file, append data to it, and close it. The latter becomes very slow once files get to a large size, and concurrency can become a problem with many users accessing it.
If your application needs to display some data to your user, it would request specific information the same way, and the server would generate the relevant results. So, imagine a scenario in which the user wants a list of products that match some filter. If your products were books, suppose the user requested a list by a specific author:
SELECT products.title, barcode_table.code
FROM products, barcode_table
WHERE products.author = 'Anders Hejlsberg'
ORDER BY products.title ASC;
In this example, only those product titles and their barcodes are sent from the server to the mobile application.
Hopefully these examples help make a case for using a structure database engine of some kind, rather than using a flat file. The specific flavor and implementation of database, however, is another question unto itself.
Generally speaking, relational databases are stored on a remote server, and you access them via a client interface. Each database vendor has software that you'd install on your remote computer that would allow you to access the database on a server. The entire DB is not sent back to the client when a query is executed, although it can send very large result sets if you are not careful about how to structure your query. Generally speaking the flow is like this:
A database server listens for clients to connect
A client connects and issues a SQL
command to the database
The database builds a query plan to
figure out how to get the result
The plan is executed and the results
are sent back to the client.
CSV is simply a file format, not a fully functional platform like a relational database.