I am currently working on a web application using CakePhp as a framework and MySql as a RDMBS. The application will be used later on by tens (probably hundreds) of users at the same time and they might actually be updating the database several times a day, and I want to keep track of all the changes to the database for later analysis. Now, and in terms of optimizing my script/application, do you think that it would be better to use classic Triggers or Stored Procedures ,in order to keep track of any changes to the database after each "insert","update" or "delete" action done by the user.
Related
I have a MySQL database on my server, and a Windows WPF application from which my clients will be inserting and deleting rows corresponding to their data. There may be hundreds of users working on the application at the same time, and they will be inserting or deleting rows in the db.
My question is whether or not all the database execution can go successfully or should I adapt some other alternative?
PS: There won't be any clash on rows while insertion/deletion by users as a user will be able to add/remove his/her corresponding data only.
My question is whether or not all the database execution can go successfully ...
Yes, like most other relational database systems, MySQL supports concurrent inserts, updates and deletes so this shouldn't be an issue provided that the operations don't conflict with each other.
If they do, you need to find a way to manage concurrency.
MySQL concurrency, how does it work and do I need to handle it in my application
I am working in rails and would like to add dynamic charts to my app. My thinking was that, given the data is in flat files, a stored procedure could be created in the MySQL db to query the data based on the parameters a user wishes to see, ie. sum users and group by activity they do (y-axis) by the months of the year (x-axis). The stored procedure would then be called from rails and return the result to be built into a chart.
However, it has bee noted that stored procedures are slow and therefore may kill the user experience if there are long loading times to simply render a chart.
For what I am trying to do is this the best way to proceed forward?
stored procedures are slow
It has to be noted that stored procedures approach does not spend time on traversing data back and forth from your SQL server to rails and vice versa, so actually stored procedures should be executed much faster than traditional bunch of ActiveRecord queries.
may kill the user experience if there are long loading times
Based on what the loading times and user queries could be, there is a variety of approaches to improve user experience:
You could render a page with some placeholding text like "Generating data, plase hold on for a while" and make results of analytical query appear a bit later using technique like AJAX or server-side events. Report generation could be processed in a background using Delayed Job or Sidekiq. It's also possible to show something like a progress bar to let user know that his request is not abandoned.
You could take an advantage of caching data for common queries, so that each query is performed just once in a while. It's possible to choose different caching periods (e.g. update cached statistics once an hour or once a day) based on how relevant your chart should be. Take a look at rails caching manual page.
You could also be intested in some data mining researches, e.g. multi-dimensional online analytical processing approach. The way of using OLAP cubes to represent data would be slightly more difficult to follow, but could improve your user experience with multidimensional data analysis possibilities.
I have a stored procedure to simply run a series of UPDATE statements on a CRM2011 SQL Server. The goal is to have it run every 30 minutes via a SQL Server Agent job. The stored procedure does not expect any parameters.
I create the job and add a step to call a T-SQL statement "EXEC mystoredprocname". I right click and "Start Job at this Step" and it completes successfully. However, none of the updates are reflected in the database.
If I run "EXEC mystoredprocname" manually in a query line, it executes fine and the database is updated as expected.
This seems like something that should be incredibly simple, so I am not sure where the breakdown in my process is.
As you mention in your comments that your stored procedure uses a filtered view, I'm fairly willing to wager that you are not running the schedule as a user who authenticates via Windows Authentication and also has the correct CRM permissions, because, as has oft been noted, filtered views implement the CRM's Windows-based authentication model.
So I have three suggestions:
Double check to make sure the schedule is running under the Windows account of a CRM user who has the correct read permissions.
Since you're committed to updating the tables directly, the only reason why you'd want to use a filtered view is because it packages the retrieval of the string representations of OptionSets for you. You can instead query the StringMap tables directly and reference the regular views, for which you don't need to be a CRM user to access. You'll notice a speed improvement as well, as filtered views are slowed down by the security checks.
If you're not committed to updating the tables directly, why not rewrite your stored procedure as a small app that you can schedule that will do the updating every 30 minutes? Unless you have a massive delta, this should be the preferred approach. You gain the advantages of the built-in validation model in the CRM web service, and though you lose the benefits of a set-based approach, I think the pros of working with a third-party system outweigh the cons of potential hacks and breaks in the system. If you are not a .NET developer (and even if you are), the CRM SDK has many examples that could help you get started.
Below are some other questions that relate to my points above and may help you.
How to get option set values from sql server in an application outside crm
Schedule workflows via external tool
Scheduling tasks in Microsoft CRM 2011
How to save a record and immediately use its GUID
As I said in a previous post, our Rails app has to interface with an E-A-V type of table in a third-party application that we're pulling data from. I had created a View to make the data normal but it is taking way too long to run. We had one of our offshore PHP developers create a stored procedure to help speed it up.
Now we run into the issue that we need to call this stored procedure from the Rails app, as well as provide searching and filtering. The view could do this because Rails was treating it as a traditional Rails model. How could I do this with the stored proc? Would we need to write custom searching and ordering (we were using Searchlogic)? Management is incapable of understanding the drawbacks of using a stored proc from Rails; all they say is that the current method is taking too long to load the data and needs to be fixed, but searching and filtering are critical functions.
EDIT I posted the code for this query here: Optimizing a strange MySQL Query. What is funny is that when I run this query in a GUI (Navicat) it runs in about 5 seconds, but on the web page it takes over a minute to run; the view is complicated for reasons I outline in the original post but I would think that MySQL optimizes and caches views like SQL Server does (or rather, how I read that SQL Server does) to improve performance.
You can call stored procedures from Rails, but you are going to lose most of the benefits of ActiveRecord, as the standard generated SQL will not work. You can use the native database connection and call it, but it's going to be a leaky abstraction. You may want to consider DataMapper.
Looking back at your last question, I would get the DBA to create a trigger to create a more relational structure from the data. The trigger would insert the EVA data into a table, which is the only way I know of to do materialized views in MySQL. This way you only pay a small incremental background cost on insert, and the application can run normally.
Anyway...
ActiveRecord::Base.connection.execute("call SP_name (#{param1}, #{param2}, ... )")
But there's an open ticket out there on lighthouse indicating this approach may not work with out changing some of the parameters to use the connection.
We have an ASP.NET web application hosted by a web farm of many instances using SQL Server 2008 in which we do aggregation and pre-processing of data from multiple sources into a format optimised for fast end user query performance (producing 5-10 million rows in some tables). The aggregation and optimisation is done by a service on a back end server which we then want to distribute to multiple read only front end copies used by the web application instances to facilitate maximum scalability.
My question is about the best way to get this data from a back end database out to the read only front end copies in such a way that does not kill their performance during the process. The front end web application instances will be under constant high load and need to have good responsiveness at all times.
The backend database is constantly being updated so I suspect that transactional replication will not be the best approach, as the constant stream of updates to the copies will hurt their performance.
Staleness of data is not a huge issue so snapshot replication might be the way to go, but this will result in poor performance during the periods of replication.
Doing a drop and bulk insert will result in periods with no data for user queries.
I don't really want to get into writing a complex cluster approach where we drop copies out of the cluster during updating - is there something along these lines that we can do without too much effort, or is there a better alternative?
There is actually a technology built into SQL Server 2005 (and 2008) that is designed to address this kind of issues. Service Broker (I'll refer further as SSB). The problem is that it has a very steep learning curve.
I know MySpace went public how uses SSB to manage their park of SQL Servers: MySpace Uses SQL Server Service Broker to Protect Integrity of 1 Petabyte of Data. I know of several more (major) sites that use similar patterns but unfortunately they have not gone public so I cannot refer names. I was personally involved with some projects around this technology (I am a former member of the SQL Server team).
Now bear in mind that SSB is not a dedicate data transfer technology like Replication. As such you will not find anyhting similar to the publishing wizards and simple deployment options of Replication (check a table and it gets transferred). SSB is a reliable messaging technology and as such its primitives stop at the level of message exchange, you would have to write the code that leverages the data change capture, packs it as messages and also the unpacking of message into relational tables at destination.
Why still some companies preffer SSB over Replication at a task like you describe is because SSB has a far better story when it comes to reliability and scalability. I know of projects that exchange data between 1500+ sites, far beyond the capabilities of Replication. SSB is also abstracted from the physical topology: you can move databases, rename machines, rebuild servers all without changing the application. Because data flow occurs over logical routes the application can addapt on-the-fly to new topologies. SSB is also resilient to long periods of disocnnect and downtime, being capable of resuming the data flow after hours, days and even months of disconnect. High troughput achieved by engine integration (SSB is part of the SQL engine itself, is not a collection of sattelite applications and processes like Replication) means that the backlog of changes can be processes on reasonable times (I know of sites that are going through half a million transactions per minute). SSB applications typically rely on internal Activation to process the incomming data. SSB also has some unique features like built-in load balancing (via routes) with sticky session semantics, support for deadlock free application specific correlated processing, priority data delivery, specific support for database mirroring, certificate based authentication for cross domain operations, built-in persisted timers and many more.
This is not a specific answer 'how to move data from table T on server A to server B'. Is more a generic technology on how to 'exhange data between server A and server B'.
I've never had to deal with this scenario before but did come up with a possible solution for this. Basically, it would require a change in your main database structure. Instead of storing the data, you would keep records of modifications of this data. Thus, if a record is added, you store "Table X, inserted new record with these values: ..." With modifications, just store the table, field and changed value. With deletions, just store which record is deleted. Every modification will be stored with a timestamp.
Your client systems would keep their local copies of the database and will regularly ask for all database modifications after a certain date/time. You then execute those modifications on the local database and it will be up-to-date again.
And the back-end? Well, it would just keep a list of modifications and perhaps a table with the base data. Keeping just the modifications also means you're keeping track of history, allowing you to ask the system what it looked like a year ago.
How well this would perform depends on the number of modifications on the back-end database. But if you request the changes every 15 minutes, it shouldn't be that much data every time.
But again, I never had the chance to work this out in a real application so it's still a theoretic principle for me. It seems fast but a lot of work will be required.
Option 1: Write an app to transfer the data using row level transactions. It might take longer but would result in no interruption of the site using the data because the rows are there before and after the read occurs, just with new data. This processing would happen on a separate server to minimize load.
In sql server 2008 you can set READ_COMMITTED_SNAPSHOT to ON to ensure that the row being updated is not causing blocking.
But basically all this app does is read the new data as it is available out from one database and into the other.
Option 2: Move the data (tables or entire database) from the aggregation server to the front-end server. Automate this if possible. Then switch your web application to point to the new database or tables for future requests. This works but requires control over the web app, which you may not have.
Option 3: If you were talking about a single table (or this could work with many) what you can do is a view swap. So you write your code against a sql view which points to table A. You do you work on Table B and when it's ready, you update the view to point to Table B. You can even write a function that determines the active table and automate the whole swap thing.
Option 4: You might be able to use something like byte-level replication of the server. That sounds scary though. Which is basically copying the server from point A to point B exactly down to the very bytes. It's mostly used in DR situations which this sounds like it could be a kinda/sorta DR situation, but not really.
Option 5: Give up and learn how to sell insurance. :)