How to deal with Databases which exceed 2gb

How to deal with Databases which exceed 2gb - ms-access

I have to work with a database which exceeds the 2gb limit. I have tried splitting the database as well but I'm not able to do it. Please suggest what I can do to solve this problem

When faced with the inherent 2GB limit on the size of your MS Access database, there are several steps that you can undertake, depending on how aggressively you need to reduce the size of the database:
Step 0: Compact the Database
The obvious first step, but I'll mention it here in case it has been overlooked.
Step 1: Splitting the Database
The operation of splitting the database will separate the 'front end' data (queries, reports, forms, macros etc.) from the 'back end' data (the tables).
To do this, go to Database Tools > Move Data > Access Database
This operation will export all tables in your current database to a separate .accdb database file, and will then link the tables from the new .accdb database file back into the original database.
As a result of this operation, The size of the back end database will be marginally reduced since it no longer contains the definitions of the various front end objects, along with resources such as images which may have been used on reports/forms and which may have contributed more towards the overall size of the original database.
But since the vast majority of the data within the file will be stored in the database tables, you will only see marginal gains in database size following this operation.
If this initial step does not significantly reduce the size of the back end database below the 2GB limit, the next step might be:
Step 2: Dividing the Backend Database
The in-built operation offered by MS Access to split the database into a separate frontend and backend database will export all tables from the original database into a single backend database file, and will then relink such tables into the front end database file.
However, if the resulting backend database is still approaching the 2GB limit, you can divide the backend database further into separate smaller chunks - each with its own 2GB limit.
To do this, export the larger tables from your backend database into a separate .accdb database file, and then link this new separate database file to your frontend database in place of the original table.
Taking this process to the limit would result in each table residing within its own separate .accdb database file.
Step 3: Dividing Table Data
This is the very last resort and the feasibility of this step will depend on the type of data you are working with.
If you are operating will dated data, you might consider exporting all data dated prior to a specific cutoff date into a separate table within a separate .accdb database file, and then link the two separate tables into your frontend database (such that you have a 'live' table and an 'archive' table).
Note however that you will not be able to union the data from the two tables within a single query, as the 2GB MS Access limit applies to the amount of data that MS Access is able to manipulate within the single .accdb file, not just the data which may be stored in the tables.
Step 4: Migrate to another RDBMS
If you're frequently hitting the 2GB limit imposed by an MS Access database and find yourself sabotaging the functionality of your database as a result of having to dice up the data into ever smaller chunks, consider opting for a more heavyweight database management system, such as SQL Server, for example.

You can split your database into several files. The feature exists in the menu Database Tools > Move data.
You can read the Microsoft documentation about it
But prepare yourself to migrate to a new RDBMS in a near future because you are reaching the system limits ...

Related

Update large amount of data in SQL database via Airflow

I have large table in CloudSQL that needs to be updated every hour, and I'm considering Airflow as a potential solution. What is the best way to update a large amount of data in a CloudSQL database from Airflow?
The constrain are:
The table need still be readable while the job is running
The table need to be writable in case one of the job runs overtime and 2 jobs end up running at the same time
Some of the ideas I have:
Load data needs to update into a pandas framework and run pd.to_sql
Load data into a csv in Cloud Storage and execute LOAD DATA LOCAL INFILE
Load data in memory, break it into chunks, and run a multi-thread process that each update the table chunk by chunk using a shared connection pool to prevent exhausting connection limits

My recent airflow related ETL project could be a reference for you.
Input DB: LargeDB (billion row level Oracle)
Interim DB: Mediam DB( tens of million level HD5 file)
Output
DB: Mediam DB (tens of millsion level mysql )
As far as I encountered, write to db is main block for such ETL process. so as you can see,
For interim stage, I use HD5 as interim DB or file for data transforming. the pandas to_hdf function provide a seconds level performance to large data. in my case, 20 millison rows write to hdf5 using less than 3 minutes.
Below is the performance benchmarking for pandas IO. HDF5 format is top3 fastest and most popular format. https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-perf
For the output stage, I use to_sql with chunk_size parameter. in order to speed up to_sql , you has to manually mapping the column type to database colume type and length,especialy the string or varchar format. With manualy mapping it, to_sql will mapp to blob format or varchar(1000). the default mode is 10 times slow than manually mapping mode.
total 20millions rows write to db via to_sql(chunksize mode) spend about 20 minutes.
if you like the answer, pls vote it up

One clue for your reference based on postgresql partition table but need some DML operation define the partitioned table.
Currently, you main constrains are:
the table need still be readable while the job is running
It means no lock allowed.
the table need to be writable in case one of the job runs overtime and 2 jobs end up running at the same time
it should capable with multiple writing in sample time.
I add one things for you may considered as well:
reasonable read performance while writing.
** performance and user experience is key
Partition table could reach all requirements. It is transparence to client applicationi.
At present, you are doing ETL, soon will facing performance issue as the table size gain quickly. The partitioned table is only solution.
The main steps are:
Create partition table with partition list.
normal reading and writing to the table running as usual.
ETL process(could be in parallel):
-. ETL data and uploaded to new table. (very slow, minutes to hours. but no impact to main table)
-. Add the new table to the main table partition list. (super fast, micro seconds level to enable main table)
normal main table reading and write as usual with new data.
If you like the answer, pls vote it up.
Best Regards,
WY

A crucial step to consider while setting up your workflow is to always use good connection management practices to minimize your application's footprint and reduce the likelihood of exceeding Cloud SQL connection limits. Databases connections consume resources on the server and the connection application.
Cloud Composer has no limitations when it comes to your ability to interface with CloudSQL. Therefore, either of the first 2 options is good.
A Python dependency is installable if it has no external dependencies and does not conflict with Composer’s dependencies. In addition, 14262433 explicitly explains the process of setting up a "Large data" workflow using Pandas.
LOAD DATA LOCAL INFILE requires you to use --local-infile for the mysql client. To import data into Cloud SQL, make sure to follow the best practices.

MySQL update whole database without downtime

I have a large database that needs to be rebuilt every 24h. The database is built using a custom script on a server that puls data from different files. The problem is that the whole process takes 1 min to complete and that is 1 min downtime because we need to drop the whole database in order to rebuild it (there is no other way than to drop it).
At first, we planned to build a temporary database, and drop the original and then rename the temporary to the original name but MySQL doesn't support database renaming.
The second approach was to dump .sql file from temp database and import it to main(original) database but that also causes downtime.
What is the best way to do this?

Here is something that I do. It doesn't result in zero downtime but could finish in less than a second.
Create a database that only has interface elements to your real database. In my case, it only contains view definitions, and all user queries go through this database.
Create a new database each night. When it is done, then update the view definitions to refer to the new database. I would recommend either turning off user access to the database containing the views while you are updating them or deleting all of the views and recreating them -- this prevents partial access to the old database. Because creating views is fast, this should be a very fast operation.
We do all of this through a job. In fact, before changing the production views, we test the view creation on another database to be sure they are all working.
Obviously, if you use alter view instead of requiring consistency across all the views, then there is no downtime, just a brief period of inconsistency.

While loading data, Access gets close to the 2GB limit

Every month, I do some analysis on a customer database. My predecessor would create a segment in Eloqua (Our CRM) for each country, and then spend about 10 (tedious, slow) hours refreshing them all. When I took over, I knew that I wouldn't be able to do it in Excel (we had over 10 million customers) so I used Access.
This process has worked pretty well. We're now up to 12 million records, and it's still going strong. However, when importing the master list of customers prior to doing any work on it, the database is inflating. This month it hit 1.3 GB.
Now, I'm not importing ALL of my columns - only 3. And Access freezes if I try to do my manipulations on a linked table. What can I do to reduce the size of my database during import? My source files are linked CSVs with only the bare minimum of columns; after I import the data, my next steps have to be:
Manipulate the data to get counts instead of individual lines
Store the manipulated data (only a few hundred KB)
Empty my imported table
Compress and Repair
This wouldn't be a problem, but i have to do all of this 8 times (8 segments, each showing a different portion of the database), and the 2GB limit is looming over the next horizon.
An alternate question might be: How can I simulate / re-create the "Linked Table" functionality in MySQL/MariaDB/something else free?

For such big number of records MS Access with 2 GB limit is not good solution as data storage. I would use MySQL as backend:
Create table in MySQL and link it to MS Access
Import CSV data directly to MySQL table using native import features of MySQL. Of course Access can be used for data import, but it will work slower.
Use Access for data analyse using this linked MySQL table as regular table.

You could import the CSV to a (new/empty) separate Access database file.
Then, in your current application, link the table from that file. Access will not freeze during your operations as it will when linking text files directly.

Can I use multiple servers to increase mysql's data upload performance?

I am in the process of setting up a mysql server to store some data but realized(after reading a bit this weekend) I might have a problem uploading the data in time.
I basically have multiple servers generating daily data and then sending it to a shared queue to process/analyze. The data is about 5 billion rows(although its very small data, an ID number in a column and a dictionary of ints in another). Most of the performance reports I have seen have shown insert speeds of 60 to 100k/second which would take over 10 hours. We need the data in very quickly so we can work on it that day and then we may discard it(or achieve the table to S3 or something).
What can I do? I have 8 servers at my disposal(in addition to the database server), can I somehow use them to make the uploads faster? At first I was thinking of using them to push data to the server at the same time but I'm also thinking maybe I can load the data onto each of them and then somehow try to merge all the separated data into one server?
I was going to use mysql with innodb(I can use any other settings it helps) but its not finalized so if mysql doesn't work is there something else that will(I have used hbase before but was looking for a mysql solution first in case I have problems seems more widely used and easier to get help)?

Wow. That is a lot of data you're loading. It's probably worth quite a bit of design thought to get this right.
Multiple mySQL server instances won't help with loading speed. What will make a difference is fast processor chips and very fast disk IO subsystems on your mySQL server. If you can use a 64-bit processor and provision it with a LOT of RAM, you may be able to use a MEMORY access method for your big table, which will be very fast indeed. (But if that will work for you, a gigantic Java HashMap may work even better.)
Ask yourself: Why do you need to stash this info in a SQL-queryable table? How will you use your data once you've loaded it? Will you run lots of queries that retrieve single rows or just a few rows of your billions? Or will you run aggregate queries (e.g. SUM(something) ... GROUP BY something_else) that grind through large fractions of the table?
Will you have to access the data while it is incompletely loaded? Or can you load up a whole batch of data before the first access?
If all your queries need to grind the whole table, then don't use any indexes. Otherwise do. But don't throw in any indexes you don't need. They are going to cost you load performance, big time.
Consider using myISAM rather than InnoDB for this table; myISAM's lack of transaction semantics makes it faster to load. myISAM will do fine at handling either aggregate queries or few-row queries.
You probably want to have a separate table for each day's data, so you can "get rid" of yesterday's data by either renaming the table or simply accessing a new table.
You should consider using the LOAD DATA INFILE command.
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
This command causes the mySQL server to read a file from the mySQL server's file system and bulk-load it directly into a table. It's way faster than doing INSERT commands from a client program on another machine. But it's also tricker to set up in production: your shared queue needs access to the mySQL server's file system to write the data files for loading.
You should consider disabling indexing, then loading the whole table, then re-enabling indexing, but only if you don't need to query partially loaded tables.

Access MDB: do access MDB files have an upper size limit?

Do access .MDB files have an upper size limit? Will, for example, applications that connect to an .MDB file over 1GB have problems?
What problems/risks are there with MDB files over 1GB and what can be done?

Yes. The maximum size for an MDB file is 2 GB and yes any file over 1 GB is really pushing Access.
Original (now broken) link Access Database Specifications
See Access Database Specifications for more. (Wayback machine)

You may find data retrieval painfully slow with a large Access database. Indexing can reduce the pain considerably. For example if you have a query which includes "WHERE somefield = 27", data retrieval can be much faster if you create an index on somefield. If you have little experience with indexing, try the Performance Analyzer tool to get you started. In Access 2003 the Performance Analyzer is available from Tools -> Analyze -> Performance. I'm not sure about other Access versions.
One caveat about indexes is that they add overhead for Insert, Update, and Delete operations because the database engine must revise indexes in addition to the table where the changes occur. So if you were to index everything, you would likely make performance worse.
Try to limit the amount of data your client application retrieves from the big database. For example with forms, don't use a table as the form's data source. Instead create a query which returns only one or a few rows, and use the query as the form's data source. Give the user a method to select which record she wants to work on and retrieve only that record.
Your didn't mention whether you have performed Compact and Repair. If not, try it; it could shrink the size of your database considerably. In addition to reclaiming unused space, compact also updates the index statistics which helps the database engine determine how to access data more efficiently.
Tony Toews has more information about Access performance you may find useful, though it's not specific to large databases. See Microsoft Access Performance FAQ
If you anticipate bumping up against the 2 GB limit for MDB files, consider moving your data into SQL Server. The free Express version also limits the amount of data you can store, but is more generous than Access. SQL Server Express R2 will allow you to store 10 GB. Actually I would probably move to SQL Server well before Access' 2 GB limit.

2 GB total for all objects in the database
Total number of objects in a database 32,768

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008