Shapefile to MDB with custom field structure [duplicate] - ms-access

This question already has an answer here:
How to create table in mdb from dbf query
(1 answer)
Closed 6 years ago.
I have a Shapefile with 80.000 polygons that they are grouped by a specific field called "OTA".
I wanted to convert each Shapefile (it's attribute table) to mdb database (not Personal Geodatabase) with one table in it with the same name as the Shapefile and with a given field structure.
In the code I used I had to load on Python 2 new modules:
pypyodbc and adodbapi
The first module was used to create the mdb file for each shapefile and the second to create the table in the mdb and fill the table with the data from the attribute table of the shapefile.
The code I came up with is the following:
import pypyodbc
import adodbapi
Folder = ur'C:\TestPO' # Folder to save the mdbs
FD = Folder+ur'\27ALLPO.shp' # Shapefile
Map = u'PO' # Map type
N = u'27' # Prefecture
OTAList = sorted(set([row[0] for row in arcpy.da.SearchCursor(FD,('OTA'))]))
cnt = 0
for OTAvalue in OTAList:
cnt += 1
dbname = N+OTAvalue+Map
pypyodbc.win_create_mdb(Folder+'\\'+dbname+'.mdb')
conn_str = (r"Provider=Microsoft.Jet.OLEDB.4.0;Data Source="+Folder+"\\"+dbname+ur".mdb;")
conn = adodbapi.connect(conn_str)
crsr = conn.cursor()
SQL = "CREATE TABLE ["+dbname+"] ([FID] INT,[AREA] FLOAT,[PERIMETER] FLOAT,[KA_PO] VARCHAR(10),[NOMOS] VARCHAR(2),[OTA] VARCHAR(3),[KATHGORPO] VARCHAR(2),[KATHGORAL1] VARCHAR(2),[KATHGORAL2] VARCHAR(2),[LABEL_PO] VARCHAR(8),[PHOTO_45] VARCHAR(14),[PHOTO_60] VARCHAR(10),[PHOTO_PO] VARCHAR(8),[POLY_X_CO] DECIMAL(10,3),[POLY_Y_CO] DECIMAL(10,3),[PINAKOKXE] VARCHAR(11),[LANDTYPE] DECIMAL(2,0));"
crsr.execute(SQL)
conn.commit()
with arcpy.da.SearchCursor(FD,['FID','AREA','PERIMETER','KA_PO','NOMOS','OTA','KATHGORPO','KATHGORAL1','KATHGORAL2','LABEL_PO','PHOTO_45','PHOTO_60','PHOTO_PO','POLY_X_CO','POLY_Y_CO','PINAKOKXE','LANDTYPE'],'"OTA" = \'{}\''.format(OTAvalue)) as cur:
for row in cur:
crsr.execute("INSERT INTO "+dbname+" VALUES ("+str(row[0])+","+str(row[1])+","+str(row[2])+",'"+row[3]+"','"+row[4]+"','"+row[5]+"','"+row[6]+"','"+row[7]+"','"+row[8]+"','"+row[9]+"','"+row[10]+"','"+row[11]+"','"+row[12]+"',"+str(row[13])+","+str(row[14])+",'"+row[15]+"',"+str(row[16])+");")
conn.commit()
crsr.close()
conn.close()
print (u'«'+OTAvalue+u'» ('+str(cnt)+u'/'+str(len(OTAList))+u')')
Executing this code took about 5 minutes to complete the task for about 140 mdbs.
As you can see, I execute an "INSERT INTO" statement for each record of the shapefile.
Is this the correct way (and probably the fastest) or should I collect all the statements for each "OTA" and execute them all together?

I don't think anyone's going to write your code for you, but if you try some VBA yourself, and tell us what happened and what worked and what you're stuck on, you'll get a great response.
Saying that - to start with I don't see any reason to use VB6 when you can use VBA right inside your mdb file.
Use DIR command and possibly FileSystemObject to loop through all DBFs in a given folder, or use FileDialog object to select multiple files at one go
Then process each file with
DoCmd.TransferDatabase command
TransferType:=acImport, _
DatabaseType:="dBASE III", _
DatabaseName:="your-dbf-filepath", _
ObjectType:=acTable, _
Source:="Source", _
Destination:="your-newtbldbf"
Finally process each dbf import with a make table query
Look at results and see what might have to be changed based on field types before and after.
Then .... edit your post and let us know how it went

In theory you could do something like this by searching the directory the DBF files reside in, writing those filenames to a table, then loop through the table and, for each filename, scan the DBF for tables and their fieldnames/datatypes and create those tables in your MDB. You could also bring in all the data from the tables, all within a series of loops.
In theory, you could.
In practice, you can't. And you can't, because DBF and MDB support different data types that aren't compatible.
I suppose you could create a "crosswalk" table such that for each datatype in DBF there is a corresponding, hand-picked datatype in MDB and use that when you're creating the table, but it's probably going to either fail to import some of the data or import corrupted data. And that's assuming you can open a DBF for reading the same way you can open an MDB for reading. Can you run OpenDatabase on a DBF from inside Access? I don't even have the answer to that.

I wouldn't recommend that you do this. The reason that you're doing it is because you want to keep the structure as similar as possible when migrating from dBase/FoxBase to Access. However, the file structure is different between them.
As you are aware, each .DBF ("Database file") file is a table, and the folder or directory in which the .DBF files reside constitutes the "database". With Access, all the tables in one database are in one .MDB ("Microsoft Database") file.
If you try to put each .DBF file in a separate .MDB file, you will have no end of trouble getting the .MDB files to interact. Access treats different .MDB files as different databases, not different tables in the same database, and you will have to do strange things like link all the separate databases just to have basic relational functionality. (I tried this about 25 years ago with Paradox files, which are also a one-file-per-table structure. It didn't take me long to decide it was easier to get used to the one-file-per-database concept.) Do yourself a favor, and migrate all of the .DBF files in one folder into a single .MDB file.
As for what you ought to do with your code, I'd first suggest that you use ADO rather than DAO. But if you want to stick with DAO because you've been using it, then you need to have one connection to the dBase file and another to the Access database. As far as I can tell, you don't have the dBase connection. I've never tried what you're doing before, but I doubt you can use a SQL statement to select directly from a .dbf file in the way you're doing. (I could be wrong, though; Microsoft has come up with stranger things over the years.)
It would

Related

How to parsimoniously refer to a data frame in RMySQL

I have a MySQL table that I am reading with the RMySQL package of R. I would like to be able to directly refer to the data frame stored in the table so I can seamlessly interact with it rather than having to execute RMySQL statement every time I want to do something. Is there a way to accomplish this? I tried:
data <- dbReadTable(conn = con, name = 'tablename')
For example, if I now want to check how many rows I have in this table I would run:
nrow(data)
Does this go through the database connection, or am I now storing the object "data" locally, defeating the whole purpose of using an external database?
data <- dbReadTable(conn = con, name = 'tablename')
This command downloads all the data into a local R dataframe (assuming you have enough RAM). Any operations with data from that point forward do not require the SQL connection.

SSIS - Export multiple SQL Server tables to multiple text files

I have to move data between two SQL Server DBs. My task is to export the data as text (.dat) files, move the files and import into the destination. I have to migrate over 200 tables.
This is what I tried
1) I used a Execute SQL task to fetch my tables.
2) Used a For each loop to loop through the table names from the collection.
3) Used a script task inside the for each loop to build the text file destination path.
4) Called a DFT with the table name in a variable for the source ole db and the path name in a variable for the destination flat file.
First table extracts fine but the second table bombs with a synchronization error. I see this is numerous posts but could not find one that matches my scenario. Hence posting here.
Even if I get the package to work with multiple DFTs, the second table from the second DFT does not export columns because the flat file connection manager still remembers the first table columns. Is there a way to get it to forget the columns?
Any thoughts on how I can export multiple tables to multiple text files using one DFT using dynamic source and destination variable?
Thanks and appreciate your help.
Unfortunately Bulk Import Task only enable us to use format files effectively to map the columns between source and destinations. Bulk Import Task uses BULK INSERT TSQL command to import the data, to execute user should have the BULKADMIN server privilege.
Most of the companies would not allow BULKADMIN server privilege to enable due to security reasons.
Hence using the script task to construct BCP statements is a good and simple option to Export.
You does not require to construct .bat file as script itself can execute dos commands which runs under .NET security account.
I figured out a way to do this. I thought I will share if anybody is stuck in the same situation.
So, in summary, I needed to export and import data via files. I also wanted to use a format file if at all possible for various reasons.
What I did was
1) Construct a DFT which gets me a list of table names from the DB that I need to export. I used 'oledb' as a source and 'recordset destination' as target and stored the table names inside a object variable.
A DFT is not really necessary. You can do it any other way. Also, in our application, we store the table names in a table.
2) Add a 'For each loop container' with a 'For Each ADO Enumerator' which takes my object variable from the previous step into the collection.
3) Parse the variable one by one and construct BCP statements like below inside a Script task. Create variables as necessary. The BCP statement will be stored in a variable.
I loop through the tables and construct multiple BCP statements like this.
BCP "DBNAME.DBO.TABLENAME1" out "PATH\FILENAME2.dat" -S SERVERNAME -T -t"|" -r$\n -f "PATH\filename.fmt"
BCP "DBNAME.DBO.TABLENAME1" out "PATH\FILENAME2.dat" -S SERVERNAME -T -t"|" -r$\n -f "PATH\filename.fmt"
The statements are put inside a .bat file. This is also done inside the script task.
4) A execute process task will next execute the .BAT file. I had to do this because, I do not have the option to use the 'master..xp_cmdShell' command or the 'BULK INSERT' command in my company. If I had the option to execute cmdshell, I could have directly run the command from the package.
5) Again add a 'For each loop container' with a 'For Each ADO Enumerator' which takes my object variable from the previous step into the collection.
6) Parse the variable one by one and construct BCP statements like this inside a Script task. Create variables as necessary. The BCP statement will be stored in a variable.
I loop through the tables and construct multiple BCP statements like this.
BCP "DBNAME.DBO.TABLENAME1" in "PATH\FILENAME2.dat" -S SERVERNAME -T -t"|" -r$\n -b10000 -f "PATH\filename.fmt"
BCP "DBNAME.DBO.TABLENAME1" in "PATH\FILENAME2.dat" -S SERVERNAME -T -t"|" -r$\n -b10000 -f "PATH\filename.fmt"
The statements are put inside a .bat file. This is also done inside the script task.
The -b10000 was put so I can import in batches. Without this many of my large tables could not be copied due to less space in the tempdb.
7) Run the .bat file to import the file again.
I am not sure if this is the best solution. I still thought I will share what satisfied my requirement. If my answer is not clear, I would be happy to explain if you have any questions. We can also optimize this solution. The same can be done purely via VB Scripts but you have to write some code to do that.
I also created a package configuration file where I can change the DB name, server name, the data and format file locations dynamically.
Thanks.

Most effective way to push data from a SQL Server database into a Greenplum database?

Greenplum Database version:
PostgreSQL 8.2.15 (Greenplum Database 4.2.3.0 build 1)
SQL Server Database version:
Microsoft SQL Server 2008 R2 (SP1)
Our current approach:
1) Export each table to a flat file from SQL Server
2) Load the data into Greenplum with pgAdmin III using PSQL Console's psql.exe utility
Benifits...
Speed: OK, but is there anything faster? We load millions of rows of data in minutes
Automation: OK, we call this utility from an SSIS package using a Shell script in VB
Pitfalls...
Reliability: ETL is dependent on the file server to hold the flat files
Security: Lots of potentially sensitive data on the file server
Error handling: It's a problem. psql.exe never raises an error that we can catch even if it does error out and loads no data or a partial file
What else we have tried...
.Net Providers\Odbc Data Provider: We have configured a System DSN using DataDirect 6.0 Greenplum Wire Protocol. Good performance for a DELETE. Dog awful slow for an INSERT.
For reference, this is the aforementioned VB script in SSIS...
Public Sub Main()
Dim v_shell
Dim v_psql As String
v_psql = "C:\Program Files\pgAdmin III\1.10\psql.exe -d "MyGPDatabase" -h "MyGPHost" -p "5432" -U "MyServiceAccount" -f \\MyFileLocation\SSIS_load\sql_files\load_MyTable.sql"
v_shell = Shell(v_psql, AppWinStyle.NormalFocus, True)
End Sub
This is the contents of the "load_MyTable.sql" file...
\copy MyTable from '\\MyFileLocation\SSIS_load\txt_files\MyTable.txt' with delimiter as ';' csv header quote as '"'
If you're getting your data load done in minutes, then the current method is probably good enough. However, if you find yourself having to load larger volumes of data (terabyte scale for instance), the usual preferred method for bulk-loading into Greenplum is via gpfdist and corresponding EXTERNAL TABLE definitions. gpload is a decent wrapper that provides abstraction over much of this process and is driven by YAML control files. The general idea is that gpfdist instance(s) are spun up at the location(s) where your data is staged, preferrably as CSV text files, and then the EXTERNAL TABLE definition within Greenplum is made aware of the URIs for the gpfdist instances. From the admin guide, a sample definition of such an external table could look like this:
CREATE READABLE EXTERNAL TABLE students (
name varchar(20), address varchar(30), age int)
LOCATION ('gpfdist://<host>:<portNum>/file/path/')
FORMAT 'CUSTOM' (formatter=fixedwidth_in,
name=20, address=30, age=4,
preserve_blanks='on',null='NULL');
The above example expects to read text files whose fields from left to right are a 20-character (at most) string, a 30-character string, and an integer. To actually load this data into a staging table inside GP:
CREATE TABLE staging_table AS SELECT * FROM students;
For large volumes of data, this should be the most efficient method since all segment hosts are engaged in the parallel load. Do keep in mind that the simplistic approach above will probably result in a randomly distributed table, which may not be desirable. You'd have to customize your table definitions to specify a distribution key.

import database dump to mysql using visual foxpro

I used leaves stru2mysql.prg and vfp2mysql_upload.prg to create a .sql dump file from DBF's. I connect to mysql database from vfp using ODBC.I KNOW how upload the sql dump file but i need to automate the whole process i.e after creating the dump file,my visual foxpro program can upload the dump file without a third party(automatically). I thought of using the source command but that needs to be run in mysql prompt.The assumption here is that my end users dont know how to import(which most of them dont).Please advice on how i can automate importation of sql file to mysql database.thank you
I think what you are looking for are the various SQL* functions in Foxpro. See the VFP help or MSDN on SQLCONNECT (or SQLSTRINGCONNECT), SQLEXEC, and SQLDISCONNECT functions to get you started. Microsoft provided good examples on each in the documentation.
You may also want to use FILETOSTR to get the output from Leafe's programs into a string for the SQLEXEC function.
Here's the steps I use to take data from a Visual FoxPro Database and upload to a MySql Database. These are all put into a custom method on a form, which is fired by a command button. For example the method would be 'uploadnewdata' and I pass parameters for whichever data tables I need
1) Connect to the Server - I use MySql ODBC
2) Validate the user (this uses a SQLEXEC to pull the correct matching record for a users tables
IF M.WorkingDatabase<>-1
nRetVal=SQLEXEC(m.WorkingDatabase,"SELECT * FROM users", "csrUsersOnServer")
SELECT csrUsersOnServer
SELECT userid,FROM csrUsersOnServer;
WHERE ALLTRIM(UPPER(userid))=ALLTRIM(UPPER(lcRanchUser));
AND ALLTRIM(UPPER(lcPassWord))=ALLTRIM(UPPER(lchPassWord));
INTO CURSOR ValidUsers
IF _TALLY>=1
ELSE
=MESSAGEBOX("Your Premise ID Does Not Match Any Records On The Server","System Message")
RETURN 0
ENDIF
ELSE
=MESSAGEBOX("Unable To Connect To Your Database", "System Message")
RETURN 0
ENDIF
3) Once that is successful I create my base cursor (this is the one I'm sending from)
4) I then loop through that cursor creating variable for the values in the fields
5) then using the SQLEXEC, and INSERT INTO, I update each record
6) once the program is finished processing the cursor, it generates a messagebox with the 'finished' message and control returns to the form.
All the user has to do, is select the starting table and enter their login information

How do I customise the CREATE DATABASE statement in VSTS DB Edition Deploy?

I'm using VSTS Database Edition GDR Version 9.1.31024.02
I've got a project where we will be creating multiple databases with identical schema, on the fly, as customers are added to the system. It's one DB per customer. I thought I should be able to use the deploy script to do this. Unfortunately I always get the full filenames specified on the CREATE DATABASE statement. For example:
CREATE DATABASE [$(DatabaseName)]
ON
PRIMARY(NAME = [targetDBName], FILENAME = N'$(DefaultDataPath)targetDBName.mdf')
LOG ON (NAME = [targetDBName_log], FILENAME = N'$(DefaultDataPath)targetDBName_log.ldf')
GO
I'd expected something more like this
CREATE DATABASE [$(DatabaseName)]
ON
PRIMARY(NAME = [targetDBName], FILENAME = N'$(DefaultDataPath)$(DatabaseName).mdf')
LOG ON (NAME = [targetDBName_log], FILENAME = N'$(DefaultDataPath)$(DatabaseName)_log.ldf')
GO
Or even
CREATE DATABASE [$(DatabaseName)]
I'm not going to be running this on an on-going basis so I'd like to make it as simple as possible, for the next guy. There are a bunch of options for deployment in the project properties, but I can't get this to work the way I'd like.
Any one know how to set this up?
Better late than never, I know how to get the $(DefaultDataPath)$(DatabaseName) file names from your second example.
The SQL you're showing in your first code snippet suggests that you don't have scripts for creating the database files in your VSTS:DB project, perhaps by deliberately excluded them from any schema comparisons you've done. I found it a little counter-intuitive, but the solution is to let VSTS:DB script the MDF and LDF in you development environment, then edit those scripts to use the SQLCMD variables.
In your database project, go to the folder Schema Objects > Database Level Objects > Storage > Files. In there, add these two files:
Database.sqlfile.sql
ALTER DATABASE [$(DatabaseName)]
ADD FILE (NAME = [$(DatabaseName)],
FILENAME = '$(DefaultDataPath)$(DatabaseName).mdf',
SIZE = 2304 KB, MAXSIZE = UNLIMITED, FILEGROWTH = 1024 KB)
TO FILEGROUP [PRIMARY];
Database_log.sqlfile.sql
ALTER DATABASE [$(DatabaseName)]
ADD LOG FILE (NAME = [$(DatabaseName)_log],
FILENAME = '$(DefaultDataPath)$(DatabaseName)_log.ldf',
SIZE = 1024 KB, MAXSIZE = 2097152 MB, FILEGROWTH = 10 %);
The full database creation script that VSTS:DB, or for that matter VSDBCMD.exe, generates will now use the SQLCMD variables for naming the MDF and LDF files, allowing you to specify them on the command line, or in MSBuild.
We do this using a template database, that we back up, copy, and restore as new customers are brought online. We don't do any of the schema creation with scripts but with a live, empty DB.
Hmm, well it seems that the best answer so far (given the over whelming response) is to edit the file after the fact... Still looking