Large Local Data Analysis: SQL Server Express vs MS Access - ms-access

What is better for large, local data analysis; MS Access or SQL Server Express?
To paint the picture of my constraints/needs:
I do Cisco telephony analysis for a large corporation. I historically have imported data sets from TSQL into excel to manipulate the data locally. I do not have server space/rights to use our corporate SQL Servers for my work so everything must be done locally.
The analysis consists of merging several data sets together before beginning analysis. Each data set will typically contain 200k-900k records. Most analysis is adhoc and requirements change frequently.
Lately, my data sets have begun to exceed 1m rows and the Excel version I am supplied with is unable to support volume above 1.3m records. The processing time to merge several data sets this large is becoming excruciating. Simple functions like Index/Match take 15 minutes to complete.
I need to find a better way of performing analysis and cannot decide between MS Access and SQL Server Express.
My concern with Access is that it will not have the capacity for what I need and I am worried about database corruption.
My concern with SQL Server is that I am unsure of using it in this manner. I need to determine standard deviations, averages, counts, etc, based on aggregated data. I use SQL as an analyst (data retrieval) and have very little experience with creating/managing a SQL SQL Server database. I am also concerned with the creation time for adhoc reports. I am unsure if this is a valid concern.
Which one should I use in place of excel for my needs?

If I were in your position I would use SQL Server Express Edition to store the data and perform the more complex data manipulation, and I would use Access with ODBC linked tables into the SQL Server database for "exploring", e.g.,
creating ad-hoc queries to get a better understanding of the data, with the option of exporting those queries to Excel for further crunching, and
building reports quickly without getting into the whole SQL Server Reporting Services (SSRS) thing.

I believe the maximum database size for Access is 1 GB. SQL Server Express is 10 GB. You'd want to use SQL Server for many other reasons as well. Access is an atavism - an evolutionary throwback.

Related

Replication from MySQL to SQL Server

I have a system in which data is written constantly. It works on MySQL, I also have a second system that runs on SQL Server and uses some parameters from the first base.
Question: how is it possible (is this even possible) to constantly transfer values from one base (MySQL) to another (SQL Server)? The option to switch to one base is not an option. As I understand it, it will be necessary to write a program for example in Delphi which will transfer values from the other database to another.
You have a number of options.
SQL Server can access another database using ODBC, so you could setup SQL server to obtain the information it needs directly from tables that are held in MySQL.
MySQL supports replication using log files, so you could configure MySQL replication (which does not have to be on all tables) to write relevant transactions to a log file. You would then need to process that log file (which you could do in (almost) real time as the standard MySQL replication does) to identify what needs to be written to the MS SQL Server. Typically this would produce a set of statements to be run against the MS SQL server. You have any number of languages you could use to process the log file and issue the updates.
You could have a scheduled task that reads the required parameters from MySQL and posts it to MS SQL, but this would leave a period of time where the two may not be in sync. Given that you may have an issue with parsing log files and posting the updates you may still want to implement this as a fall back if you are processing log files.
If the SQL Server and the MySQL server are on the same network the external tables method is likely to be simplest and lowest maintenance, but depending on the amount of data involved you may find the overhead of the external connection and queries could affect the overall performace of the queries made against the MS SQL Server.

BULK INSERT or Import and Export Data Wizard?

I have a large weekly CSV file (ranging from 500MB to 1GB with over 2.5 million rows) to load into an SQL Server 2008 R2 database.
I was able to use either BULK INSERT command or the Import and Export Data Wizard to load the data in. There was no observed difference in loading time span between them as far as my dataset is concerned.
What is your recommended approach as far as performance. efficiency and future maintenance is concerned?
Thanks in advance!
Cheers ,Alex
I ended up with using the SQL Server Import and Export Data Wizard and saving it to an SSIS package. Then I used Business Intelligence Development Studio to edit the saved package and re-imported it back to SQL Server. It works well and only takes 2 mins to load all 9 CSV files ranging from 10MB to 600 MB to the SQL Server database.
MSDN Forum:
When a SSIS developer opted for using the "Fast Load" option along
with the "Table lock" on the OLEDB target, or used the SQL Server
Destination, then he/she has effectively used the very BULK INSERT, so
this is a moot point to debate what is faster.
Bulk insert on its own has tricks, in SQL Server contest more can be
done to make it faster a row process, namely making it minimally or
not logging at all. Now disabling constraints is another thing the bcp
takes care of, not SSIS (unless instructed), and this what MSFT can
decide to change in SSIS, but where the SSIS shines is in using an
algorithm figuring out what are the best parameters for a given
machine/system to use (e.g. the buffer size, etc).
So in most applications the SSIS is faster right away and even more
faster with proper tweaking.
In real life many factors bring different impacts to the benchmarking,
but at this stage I am inclined to state there is no real measurable
difference.
Microsoft has published very informative guide about Comparing the different load strategies for achieving high performance and Choosing Between Bulk Load Methods - The Data Loading Performance Guide
Also have a look of following article as well.
SSIS: Destination Adapter Comparison
SSIS vs T-SQL – which one is fastest for ETL tasks?
Speeding Up SSIS Bulk Inserts into SQL Server
SSIS – FASTEST DATA FLOW TASK ITEM FOR TRANSFERRING DATA OVER THE NETWORK
I would save the SSIS package from the Import and Export Data Wizard and tweak the OLE DB Destination settings using Visual Studio (aka BIDS aka SSDT BI) - setting a Exclusive Table Lock and a large Batch Size and Commit Size e.g. 100000 rows. Typically this will boost performance by around 20%.
SSIS is the best option for future tuning e.g. filtering or transforming data, disabling and rebuilding indexes before & after your load.

From Oracle to MS-Access to Mysql

I have a client with close to 120,000,000 records in an Oracle database. Their engineer claims they can only give us a ms access dump of their database. The data will actually be going into an MySQL relational database instance.
What potential issues and problems can we expect moving from Oracle > Access > MySQL?
We have located tools that can convert oracle db to MySQL, but due the large nature of the database 100gb + I am not sure of the stability of these software based solutions to handle the conversion process. This is a time sensitive project and I am worried that if we make any mistakes in the onset that we may not be able to complete in a timely manner.
Exporting the Oracle data to a comma-separated, tab separated, or pipe separated set of files would not be very challenging. It's done all the time.
I have no idea why someone would claim to only be able to produce an MS Access dump from an Oracle database -- if that's not being done directly via selecting from Access through ODBC then it's done via an intermediate flat file anyway. I'm inclined to call "BS" or "incompetence" on this claim.
The maximum size of an Access database is 2GB so I don't see how the proposed migration could be achieved without partitioning the data.

SQL Server 2008 on a separate machine is very slow

We have an application which gets the data from a MS SQL database residing on a server in the headquarter over an internet connection.
Now the problem is that the data is pretty large which makes the query to take large time to transfer the data from the head quarter to the client.
Does SQL provide anything method to speed up queries if your application is geographically separated from the database (e.g. zipping up the data before it starts to transfer) ?
Following quotes from a similar question might provide you with some pointers.
There is never a reason to return a
large result to start with. For those
particular scenarios that need large
amounts of data shipped to the client,
there are better solutions than T-SQL
queries (log shipping, service broker,
replication).
and
You have a couple options that I am
aware of:
Use a third party tool like SQLNitro.
Move to Windows Server 2008 and SQL Server 2008 where they have made
various TCP stack improvements, as
outlined here: Appendix A: A Closer
Look - Examining the Enhancements in
Windows Server 2008 and SQL Server
2008 That Enable Faster Performance
Performance is not only determined by the bandwidth, the latency is also a very strong factor here. Try to minimize the number of requests. You could combine some queries, to give two (or more) resultsets.

ETL mechanisms for MySQL to SQL Server over WAN

I’m looking for some feedback on mechanisms to batch data from MySQL Community Server 5.1.32 with an external host down to an internal SQL Server 05 Enterprise machine over VPN. The external box accumulates data throughout business hours (about 100Mb per day), which then needs to be transferred internationally across a WAN connection (quality not yet determined but it's not going to be super fast) to an internal corporate environment before some BI work is performed. This should just be change-sets making their way down each night.
I’m interested in thoughts on the ETL mechanisms people have successfully used in similar scenarios before. SSIS seems like a potential candidate; can anyone comment on the suitability for this scenario? Alternatively, other thoughts on how to do this in a cost-conscious way would be most appreciated. Thanks!
It depends on the use you have of the data received from the external machine.
If you must have the data for the calculations of the morning after or do not have confidence in your network, you would prefer to loose-couple the two systems and enable some message-queuing between them so that if something fails during the night like the DBs, the networks links, anything that would be a pain for you to recover, you can start every morning with some data.
If the data retrieval is not subject to a high degree of criticality, any solution is good :)
Regarding SSIS, it's just a great ETL framework (yes, there's a subtlety :)). But I don't see it as a part of the data transfer, rather in the ETL part when your data has been received or is still waiting in the message-queing system.
First, if you are going to do this, have a good way to easily see what has changed since the last time. Every field should have a last updatedate or a timestamp that changes when the record is updated (not sure if mysql has this). This is far better than comparing every single field.
If you had SQL Server in both locations I would recommend replication, is it possible to use SQL server instead of mySQL? If not then SSIS is your best bet.
In terms of actually getting your data from MySQL into SQL Server, you can use SSIS to import the data using a number of methods. One would be to connect directly to your MySQL source (via an OLEDB Connection or similar) or you could do a daily export from MySQL to a flat file and pick this up using a FTP Task. Once you have the data, SSIS can perform the required transforms before loading the processed data to SQL Server.