BulkInsert into table with Identity column (T-SQL) - sql-server-2008

1) Can I do a BulkInsert from a CSV file, into a table, such that the table has an identity column that is not in the CSV, and gets automatically assigned?
2) Is there any rule that says the table that I'm BulkInsert'ing into, has to have the same columns in the same order as the flat file being read?
This is what I'm trying to do. Too many fields to include everything...
BULK INSERT ServicerStageACS
FROM 'C:\POC\DataFiles\ACSDemo1.csv'
WITH (FORMATFILE = 'C:\POC\DataFiles\ACSDemo1.Fmt');
GO
SELECT * FROM ServicerStageACS;
Error:
Msg 4864, Level 16, State 1, Line 3 Bulk load data conversion error
(type mismatch or invalid character for the specified codepage) for
row 1, column 1 (rowID).
I'm pretty sure the error is because I have an identity.
FMT starts like this:
9.0
4
1 SQLCHAR 0 7 "," 1 Month SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 100 "," 2 Client SQL_Latin1_General_CP1_CI_AS

A co-worker recommended that it was easier to do the bulk insert into a view. The view does not contain the identity field, or any other field not to be loaded.
truncate table ServicerStageACS
go
BULK INSERT VW_BulkInsert_ServicerStageACS
FROM 'C:\POC\DataFiles\ACSDemo1.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
GO
SELECT * FROM ServicerStageACS;

Related

sqlldr generating ORA-00984: column not allowed here - while trying to add constant text

I need to load several similar csv files into work tables with the same format for onward processing but for some of the data I get 'ORA-00984: column not allowed here' errors.
I can't change the layout of the csv but the ordering of the columns in the work table and the format of the sqlldr control file are in my control.
What do I need to change to get sqlldr to load this data?
EDIT: Solution: The following change to the .ctl file:
col6_fixedchar constant "abc", fixes the issue, interestingly sqlldr is quite happy with interpreting "3600" as a number.
Below is a sample:
table:
create table test_sqlldr
(
col1_date date,
col2_char varchar2(15),
col3_int number(5),
col4_int number(5),
col5_int number(5),
-- fixed and dummy fields
col6_fixedchar varchar2(15),
col7_nullchar varchar2(20),
col8_fixedint number(5)
);
csv:
cat /tmp/test_sqlldr.csv
2019-08-27 09:00:00,abcdefghi,3600,0,0
2019-08-27 09:00:00,jklmnopqr,3600,0,0
2019-08-27 09:00:00,stuvwxyza,3600,3598,3598
2019-08-27 09:00:00,bcdefghij,3600,0,0
ctl:
cat /tmp/test_sqlldr.ctl
load data infile '/tmp/test_sqlldr.csv'
insert into table test_sqlldr
fields terminated by ',' optionally enclosed by '"' TRAILING NULLCOLS
(
col1_date timestamp 'yyyy-mm-dd hh24:mi:ss',
col2_char,
col3_int,
col4_int,
col5_int,
col6_fixedchar "abc",
col8_fixedint "3600"
)
This generates the following output:
/opt/oracle/product/112020_cl_64/cl/bin/sqlldr <db credentials> control='/tmp/test_sqlldr.ctl' ; cat test_sqlldr.log
SQL*Loader: Release 12.2.0.1.0 - Production on Wed Aug 28 10:26:00 2019
Copyright (c) 1982, 2017, Oracle and/or its affiliates. All rights reserved.
Path used: Conventional
Commit point reached - logical record count 4
Table TEST_SQLLDR:
0 Rows successfully loaded.
Check the log file:
test_sqlldr.log
for more information about the load.
SQL*Loader: Release 12.2.0.1.0 - Production on Wed Aug 28 10:26:00 2019
Copyright (c) 1982, 2017, Oracle and/or its affiliates. All rights reserved.
Control File: /tmp/test_sqlldr.ctl
Data File: /tmp/test_sqlldr.csv
Bad File: /tmp/test_sqlldr.bad
Discard File: none specified
(Allow all discards)
Number to load: ALL
Number to skip: 0
Errors allowed: 50
Bind array: 64 rows, maximum of 256000 bytes
Continuation: none specified
Path used: Conventional
Table TEST_SQLLDR, loaded from every logical record.
Insert option in effect for this table: INSERT
TRAILING NULLCOLS option in effect
Column Name Position Len Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
COL1_DATE FIRST * , O(") DATETIME yyyy-mm-dd hh24:mi:ss
COL2_CHAR NEXT * , O(") CHARACTER
COL3_INT NEXT * , O(") CHARACTER
COL4_INT NEXT * , O(") CHARACTER
COL5_INT NEXT * , O(") CHARACTER
COL6_FIXEDCHAR NEXT * , O(") CHARACTER
SQL string for column : "abc"
COL8_FIXEDINT NEXT * , O(") CHARACTER
SQL string for column : "3600"
Record 1: Rejected - Error on table TEST_SQLLDR, column COL4_INT.
ORA-00984: column not allowed here
Record 2: Rejected - Error on table TEST_SQLLDR, column COL4_INT.
ORA-00984: column not allowed here
Record 3: Rejected - Error on table TEST_SQLLDR, column COL4_INT.
ORA-00984: column not allowed here
Record 4: Rejected - Error on table TEST_SQLLDR, column COL4_INT.
ORA-00984: column not allowed here
Table TEST_SQLLDR:
0 Rows successfully loaded.
4 Rows not loaded due to data errors.
0 Rows not loaded because all WHEN clauses were failed.
0 Rows not loaded because all fields were null.
Space allocated for bind array: 115584 bytes(64 rows)
Read buffer bytes: 1048576
Total logical records skipped: 0
Total logical records read: 4
Total logical records rejected: 4
Total logical records discarded: 0
Run began on Wed Aug 28 10:26:00 2019
Run ended on Wed Aug 28 10:26:00 2019
Elapsed time was: 00:00:00.14
CPU time was: 00:00:00.03
Try: col6_fixedchar CONSTANT "abc"

UPDATE SET values from list

I have a problem updating a table with new values from a list.
My data looks like this:
id value_1 value_2
1 11 21
2 32 41
3 43 84
...
I already wrote the column id and value_1 with an INSERT command in the table. At that step, I cannot write the value_2 column as I still need to calculate it, so I want to update the table later with an array of values of the value_2 column.
I would like to have a code something like this:
UPDATE table_name SET value_2 = (21,41,84) WHERE id IN (1,2,3)
Unfortunately, it's not possible like this to SET value_2 from a list, it works only with single values.
I got a workaround with writing a for loop over the whole UPDATE query, but this was too slow for my program.
Anyone has a suggestion how I could get this working?
The whole query is performed with Python.
It can be done with one single UPDATE, using a case expression to set the wanted value. Something like this:
UPDATE table_name
SET value_2 = case id when 1 then 21
when 2 then 41
when 3 then 84
end
WHERE id IN (1,2,3)
However, I don't know if it will make any performance difference.
After trying different things I could find a solution to update values of table fast using value lists as input.
I orientated on the idea presented in this question (https://stackoverflow.com/a/3466/7997169) and modified it to deal with lists.
As input I have lists like this:
id = [1, 2, 3, ... ]
value_1 = [11, 32, 41, ... ]
value_2 = [21, 41, 84, ... ]
Using the Python MySQL connector I could write the whole update loop into one query command. Therefore I wrote the data from the lists into a string looking like this:
VALUES (1,11,21),(2,32,41),(3,41,84),....
The total code looks like this:
for i in range(0,len(id),1):
a = '(' + str(id[i]) + ',' + str(value_1[i]) + ',' + str(value_2[i]) + ')'
b = ','
if i < (len(id)-1):
c = c + a + b
else:
c += a
update_cell_info = ("INSERT INTO table (id, value_1, value_2)"
"VALUES %s" % c +
"ON DUPLICATE KEY UPDATE "
"value_1=VALUES(value_1),"
"value_2=VALUES(value_2)"
";")
cursor.execute(update_cell_info)
In the end this procedure is over 10x faster than the previous one where I used a for loop to iterate the update process with new variables.

BULK INSERT large CSV file and attach an additional column

I was able to use BULK INSERT on an SQL Server 2008 R2 database to import a CSV file (Tab delimited) with more than 2 million rows. This command is planned to run every week.
I added an additional column named "lastupdateddate" to the generated table to store the datestamp a row is updated via a INSERT trigger. But when I ran the BULK INSERT again, it failed due to mismatch in columns as there is no such a field in a raw CSV file.
Is there any possibility to configure BULK INSERT to ignore the "lastupdateddate" column when it runs?
Thanks.
-- EDIT:
I tried using a format file but still unable to solve the problem.
The table looks as below.
USE AdventureWorks2008R2;
GO
CREATE TABLE AAA_Test_Table
(
Col1 smallint,
Col2 nvarchar(50) ,
Col3 nvarchar(50) ,
LastUpdatedDate datetime
);
GO
The csv "data.txt" file is:
1,DataField2,DataField3
2,DataField2,DataField3
3,DataField2,DataField3
The format file is like:
10.0
3
1 SQLCHAR 0 7 "," 1 Col1 ""
2 SQLCHAR 0 100 "," 2 Col2 SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 100 "," 3 Col3 SQL_Latin1_General_CP1_CI_AS
The SQL command I ran is:
DELETE AAA_Test_Table
BULK INSERT AAA_Test_Table
FROM 'C:\Windows\Temp\TestFormatFile\data.txt'
WITH (formatfile='C:\Windows\Temp\TestFormatFile\formatfile.fmt');
GO
The error received is:
Msg 4864, Level 16, State 1, Line 2
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 2, column 1 (Col1).
Msg 4832, Level 16, State 1, Line 2
Bulk load: An unexpected end of file was encountered in the data file.
Msg 7399, Level 16, State 1, Line 2
The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error.
Msg 7330, Level 16, State 2, Line 2
Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".
Yes you can using a format file as documented Here and use that format with bcp command with -f option like -f format_file_name.fmt.
Well another option would be; import all the data (I mean all fields) and then drop the non wanted column lastupdateddate using SQL like
ALTER TABLE your_bulk_insert_table DROP COLUMN lastupdateddate

MySQL Pivot multiple rows into new columns

I am trying to write a pivot function in MySQL workbench and many of the places I've looked have not been super relevant.
I currently have:
order_ID Part Description Order number
1 103 A 1
2 104 B 1
3 103 A 2
4 105 C 3
5 103 A 4
6 105 C 4
7 107 D 4
I would like to create:
Order Part1 Description Part2 Description Part3 Description
1 103 A 104 B
2 103 A
3 105 C
4 103 A 105 C 107 D
I can keep the primary key in the output, but it is not necessary. The problem I am running into is that many pivot functions involve using distinct parts names to move them; however, I have over 500 parts. I also would like to move the description and the part together so they are next to each other--most pivot functions are not powerful enough to address that.
I did write a macro to do this in Excel, but it must be done in a database because of further analysis in R and I am pulling data from a database and I must automate any changes made to the data. As a result, I DO NOT have a choice in how the data is organized and laid out. Please do not mention normalizing data or other database techniques because I am trying to fix the data and how messy it is, but I DO NOT have a choice in how the data is inputted.
Some resources I used to gain experience with pivoting in MySQL, but I have not been able to get any code to work.
MySQL pivot table
mysql pivoting - how can I fetch data from the same table into different columns?
http://en.wikibooks.org/wiki/MySQL/Pivot_table
http://buysql.com/mysql/14-how-to-automate-pivot-tables.html
Select group_concat(Table.column1) as anything,
group_concat(Table.column2 separator ';')
AS Anything2, Table.`column3`
FROM Table
group by Table.column3;
Alter TABLE Table ADD
`newcolumn1` varchar(100) DEFAULT '' after `column3`;
Alter TABLE MB ADD
`newcolumn2` varchar(500) DEFAULT '' after `newcolumn1`;
UPDATE Table SET
`newcolumn1` = IF (
LOCATE(',', column1) >0,
SUBSTRING(column1, 1,LOCATE(',', column1)-1),
column1
),
`newcolumn2` = IF(
LOCATE(',', column1) > 0,
SUBSTRING(column1, LOCATE(',', column1)+1),
'');
UPDATE Table SET
newcolumn2 = SUBSTRING_INDEX(newcolumn2, ',', 1);
UPDATE Table SET
newcolumn3 = SUBSTRING_INDEX(newcolumn3, ',', 1);
This code achieved exactly the format I wanted above.

How do I export a large table into 50 smaller csv files of 100,000 records each

I am trying to export one field from a very large table - containing 5,000,000 records, for example - into a csv list - but not all together, rather, 100,000 records into each .csv file created - without duplication. How can I do this, please?
I tried
SELECT field_name
FROM table_name
WHERE certain_conditions_are_met
INTO OUTFILE /tmp/name_of_export_file_for_first_100000_records.csv
LINES TERMINATED BY '\n'
LIMIT 0 , 100000
that gives the first 100000 records, but nothing I do has the other 4,900,000 records exported into 49 other files - and how do I specify the other 49 filenames?
for example, I tried the following, but the SQL syntax is wrong:
SELECT field_name
FROM table_name
WHERE certain_conditions_are_met
INTO OUTFILE /home/user/Eddie/name_of_export_file_for_first_100000_records.csv
LINES TERMINATED BY '\n'
LIMIT 0 , 100000
INTO OUTFILE /home/user/Eddie/name_of_export_file_for_second_100000_records.csv
LINES TERMINATED BY '\n'
LIMIT 100001 , 200000
and that did not create the second file...
what am I doing wrong, please, and is there a better way to do this? Should the LIMIT 0 , 100000 be put Before the first INTO OUTFILE statement, and then repeat the entire command from SELECT for the second 100,000 records, etc?
Thanks for any help.
Eddie
If you're running on a UNIX-like OS, why not just select the whole lot and pipe the output through:
split --lines=100000
As proof of concept:
echo '1
2
3
4
5
6
7
8
9' | split --lines=3
creates three files xaa, xab and xac containing the lines 1,2,3, 4,5,6 and 7,8,9 respectively.
Or, even on other operating systems, you can get the GNU tools, like GnuWin32, where split is in coreutils.
You can use loop and sub-query to generate the files. following procedure can give you clue how to do that(it may have syntax error):
CREATE PROCEDURE exportSplitter(partsCount)
BEGIN
SET rowCount = select count(*) from table;
SET pageRowCount = rowCount / partsCount;
SET p1 = 0;
label1: LOOP
SET p1 = p1 + 1;
SELECT field_name
FROM (SELECT * from table_name WHERE certain_conditions_are_met order by id LIMIT p1*pageRowCount) order by id desc LIMIT pageRowCount
INTO OUTFILE /home/user/Eddie/p1
LINES TERMINATED BY '\n'
IF p1 < partCount THEN ITERATE label1; END IF;
LEAVE label1;
END LOOP label1;
END