I have a huge SQL 2008 DB with ~300 million rows. some of the tables have columns with an encrypted value.
The DB was gradually build using an application (.net 4).
We are considering to move to some hardware (USB token) like encryption but that mean we will have to change the encrypted values in the DB.
We've written a small app that decrypts with the old key and then encrypt with the token but it takes days for it to run since I have to SELECT the row and UPDATE one at a time by ID. the db is indexed but still...
The Encrypt(string) method is a functionality provided by the USB token and I can access it via .net
I'm looking for a more direct way to use that functionality. maybe access it through SQL or something.
You can use a CLR stored procedure to access the USB dongle on the server. You'd need UNSAFE access of course
This will reduce network overhead because you don't want to do a single 300 million row update. You'd still need it RBAR or batched: I'd suggest RBAR to keep it simple.
Related
I have an application that runs on a MySQL database, the application is somewhat resource intensive on the DB.
My client wants to connect Qlikview to this DB for reporting. I was wondering if someone could point me to a white paper or URL regarding the best way to do this without causing locks etc on my DB.
I have searched the Google to no avail.
Qlikview is in-memory tool with preloaded data so your client have to get data only during periodical reloads not all the time.
The best way is that your client will set reload once per night and make it incremental. If your tables have only new records load every night only records bigger than last primary key loaded.
If your tables have modified records you need to add in mysql last_modified_time field and maybe also set index on that field.
last_modified_time TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
If your fields are get deleted the best is set it as deleted=1 in mysql otherwise your client will need to reload everything from that tables to get to know which rows were deleted.
Additionally your client to save resources should load only data in really simple style per table without JOINS:
SELECT [fields] FROM TABLE WHERE `id` > $(vLastId);
Qlikview is really good and fast for data modelling/joins so all data model your client can create in QLikview.
Reporting can indeed cause problems on a busy transactional database.
One approach you might want to examine is to have a replica (slave) of your database. MySQL supports this very well and your replica data can be as up to date as you require. You could then attach any reporting system to your replica to run heavy reports that won't affect your main database. This also gives you a backup (2nd copy) and the backup can further be used to create offline backups of your data also without affecting your main database.
There's lots of information on the setup of MySQL replicas so that's not too hard.
I hope that helps.
I am looking for a tip how to synchronize data from a local firebird database into online db? Few comments:
On a local machine I use sales software which keeps data on firebird db. There is an internet connection, but I want to avoid direct db access (as the PC after 9pm is being turned off).
I would like to create an online app (based on foundation + php + database) in which I will be able to view daily sales and explore past data.
In local db, I will need to pull data from several different tables, and I would like to keep them in online/final db as a single table (with fields: #id, transaction date, transaction value, sales manager).
While mostly I know how to create frontend of the app, and partially backend still I wonder what would be best choice in terms of db - mysql? (it was my first thought). Or rather I should focus on NoSQL?
What's your recommendation on data sync? I should use symmetricsDB (pretty hard to configure) or equivalent, I should write a script which will push data from firebird into json/xml? I'm referring to your knowledge and best practices
Put a scheduled job that will invoke a simple data pump / replication script.
From the script, connect to the source sales db, retrieve the joined data added from last replication and insert them into the "online" database.
You may keep also Firebird as online DB as it works great with PHP.
Firebird also in version 2.5 has all technology already build in to implement a fully functional replication. We have implemented this in the largest installation for a big restaurant company with about 0.6 billion records, daily about 1 million new records and 150 locations where replicated servers are working online or offline with the back office software.
If you simply want to upload the data from your local db to a remote db, you can rent a virtual server at a provider you like, install firebird there, create a secure connection (we use ssh, but any tcp over vpn can be used). copy your local database to the remote server, if required open firewall fb port (3050 or other) and when you a low number of writes on your local database, simply implement a trigger on each table, that does the same insert/update/delete with the same values using the "execute statement on external" feature.
When your local database has higher workload, it is better to put the change data (table name and pk values) from trigger into a log table and let a second connection upload the records to the target db, where the same "execute statement on external" can be used.
this is just a hint how to do that, if budget allows, we can do it for you, but stopping the database pc in the evening seems to be only typical for smaller companies
Suppose we have a condition in which we have many clients are running the same windows application and using the same database, but net connectivity is not good in that region so it wont be able to access the database server all the time. Can we store SQL queries during this time and then execute them later?
And also how we will maintain data consistency for all the clients in this situation?
Can we store SQL queries during this time and then execute them later?
You could, but it might not be the best way. I recommend solving this not on the data access layer, but rather in the business logic. So instead of storing a sql statement to be executed later, I would rather store objects that represent the business action that should be performed.
I'm planning to create an VB.net application for retrieving data from a database (MS Access) and store it to a web server (MySQL data base). I really have confusion in my mind. I'm planning to use task scheduler so that the program will automatically run. I'm planning to set the time every 5 minutes.
How can I avoid the redundancy of data?
For example, I'm planning to get the sales for 5 minutes, after 5 minutes I will do it again. I think there will be redundancy in that case. I would like to ask your ideas about this scenario: how would you handle it?
If at all possible you should avoid using two databases in a situation like this.
Look for information on the linked table manager -- the data that Access uses doesn't have to be stored in Access.
http://www.mssqltips.com/sqlservertip/1480/configure-microsoft-access-linked-tables-with-a-sql-server-database/
If you have to do this, then see about using/upgrading to Access 2010 and use data macros (triggers), to put the new/changed data into temp tables that you clear out once you've copied the data over.
In a comment you said "i dont have any idea about how to replace the native tables with ODBC".
Is that the only obstacle which prevents you consolidating the data into one set in MySQL? If so, try this suggestion for setting ODBC links to MySQL tables.
Install an ODBC driver for MySQL, if you don't have one already. The latest version is available here: Download Connector/ODBC
Create a DSN (Data Source Name) for your MySQL database from the Windows ODBC Data Source Administrator.
Create a new Access database and use the DSN to create links with guidance from the web page link #jmoreno provided.
If the Access names of the linked tables are different than the names you originally used for the native Access tables, change them to match those original names.
Then you can import your forms, queries, reports, etc. from the old Access application. Ideally everything will just work, since Access will find the table names it needs and won't care that they are external instead of native tables. However you many need to resolve any data type incompatibilities between Access and MySQL.
You would need the MySQL ODBC driver on each machine where the Access application is used. Personally I would prefer to deal with that rather than the challenges of synchronizing between separate Access and MySQL data stores. (YMMV)
When you're ready to deploy, you can convert the ODBC links to DSN-less connections so the client machines wouldn't need to each have the DSN configured. See Using DSN-Less Connections by Doug Steele, Access MVP, for detailed instructions.
You will need to think very carefully about how you identify the data which has changed since the last synchronization cycle. If every row of data has a 'last updated' timestamp (that is indexed) then you could write a process that selected the recently updated rows from each table in turn. That's apt to be a bit heavy on the originating database (MS Access), plus you still have to identify the corresponding row to replace (where replacement is required) in the MySQL database. Of course, you can put different tables on different change schedules. For example, the table of US states probably doesn't change once a year, but your customer orders tables (or SO questions and answers tables) may change a lot in five minutes.
Some DBMS have alternative mechanisms, especially for working between copies of themselves. Some DBMS also provide a mechanism that is sometimes called 'changed data capture' (CDC) that allows you to get the changed data. Sometimes, in DBMS where you have a 'transaction log' or 'logical log' (but not CDC or something similar), you can 'mine' the log files (or log backups) to find the changes. However, the logs are typically optimized for the DBMS internal recovery processes, not for your use.
Well, obviously you will have to keep track of data items (may be in a different metadata space/datastore) that you have already processed to avoid the redundancy. The metadata should be used to filter out records that have been processed from the source. The logic and what needs to be in the metadata would depend on the exact use case here.
I'm always with my Access app..
As far as I know, when I execute a sql clause to my back end (accdb file), say
SELECT * FROM tbl WHERE id=1;
It gets filtered on the back end, then just one record is transmitted over the network.
My question is, when I open a form bounded with a query (no where clause) using a filter parameter, like
DoCmd.OpenForm "Form",,, strFilter
how many records are transmitted on the network? They get filtered like that sql clause or they get filtered locally, meaning a big pile of data has to be sent over the network?
I'm concerned about this because I have many subforms bounded to queries, then I open them in the main forms with filter parameter. And of course, the network here is not very good.
EDIT: The environment of my app is on a factory with no local server. All network/information thing is in company's headquarter 300km away, maybe a WAN.
Except upgrading to SQL server alike, do I have other solutions to make it more reliable? I've heard of something 'Citrix', I happened to have a 'Citrix Neighborhood Agent Program' in my sys tray, can it host my app to make it faster?
DoCmd.OpenForm "Form",,, strFilter
how many records are transmitted on the network?
As many as match your strFilter condition. So, if WHERE id=1 returns one row in the earlier SELECT query, and strFiler = "id=1", that OpenForm will open the form with that single row as its record source.
The WhereCondition parameter is also available for DoCmd.OpenReport, and operates the same way as with OpenForm, which you also may find useful.
Edit: You should have an index to support the WHERE criteria whether you build it into the query or do it "ad hoc" with OpenForm WhereCondition. With an index the database engine will read the index to find which rows match, then retrieve those rows. So retrieval will be more efficient, and therefore faster, than forcing the engine to read every row to determine which of them include matches.
When Jet/ACE requests data from a file server, the first thing it needs is the database header information, which has data structures describing the structure of the data file. This is information is requested once in your Access session, so it's really only an issue at startup.
When you then request a record, Jet/ACE uses the metadata it has about the file to request the relevant index pages for the table(s) involved, then uses those indexes to determine the minimum number of data pages to request.
With properly structured indexing and filters on primary keys the amount of data retrieved is actually quite minimal.
However, it's still going to be more than will allow proper response times across a WAN. Access was designed for use across a wired LAN, back in the days when the networking standard was 10BaseT (10Mbps). Anything less than that and you'll have problems. WiFi is right out, as well, but not because of bandwidth, but because of the unreliability of the connections.
When you need to support users remotely, the easiest solution is to host the Access application on a Windows Terminal Server. WTS is built on technology licensed from Citrix, so you'll often see the whole concept described as Citrix, but your default WTS setup is quite different from a Citrix installation. You have to pay extra for Citrix, and it gives you a lot of different features.
I've used WTS without Citrix in many environments and frankly can't see what the justification would be for Citrix (except when you have to support large numbers of remote users, i.e., in the range of 100 or more). WTS is installed on every Windows Server starting with Windows 2000 and is very easy to set up and configure.
The second easiest solution, in my opinion, is to upsize the back end to a server database and then rewrite for efficiency to insure you're using the server as much as possible and not pulling too much data across the wire.
A third solution would be Sharepoint, but I'm not experienced with that. It is definitely the direction that MS is pushing for Access apps in distributed setups, but it's quite complex and has a whole lot of features. I wouldn't recommend plunging into it without lots of preparation and significant corporate support.
Actually, with Access, there is not really a true back-end as there is with a bona-fide client-server engine like SQL Server or Oracle or Postgres. Access uses a shared-file architecture where the client program itself "owns" chunks of the file on disk, as distinct from a message-passing architecture where the client program sends requests for data to a back-end engine process running on a server where that process "owns" the data. With shared-file, all work occurs on the client, so it is possible for freight-train-loads of data to be brought across the wire if the database file resides on a different machine.
When you ask Access for data, it often reads a lot more data from the MDB file on disk and caches at the local client a lot more data than what your statement has asked for. Access tries to do this intelligently, anticipating your needs. "Now that I'm here", Access says, "I might as well make the expensive trip to disk worthwhile and grab a sh*tload of data". Don't get me wrong. I'm not an Access basher and have been using it for more than 10 years, from back in the days when LAN bandwidth was 10mbit/sec. Access is very good for some things. But Access can gobble up bandwidth like you wouldn't believe.
Read up on "keysets" in Access.
P.S. I am not the same Tim as the Tim who left you a comment.
Some useful links:
http://msdn.microsoft.com/en-us/library/dd942824(v=office.12).aspx
http://support.microsoft.com/kb/209126
http://support.microsoft.com/kb/112112
http://support.microsoft.com/kb/128808