I have multiple folders (six or so) with multiple .CSV files in them. The CSV files are all of the same format:
Heading1,Heading2,Heading3
1,Monday,2.45
2,Monday,3.765...
Each .CSV has the same heading names [same data source for different months]. What is the best way to import these CSVs into SQL Server 2008? The server does not have xpShell configured [for security reasons which I cannot modify], so any method which uses that (which I originally tried), will not work.
EDIT
The CSV files are a maximum of 2mb in size and do not contain any commas (other than those required for delimiters).
Any ideas?
F.e. you got CSV file names sample.csv on D:\ drive, with this inside:
Heading1,Heading2,Heading3
1,Monday,2.45
2,Monday,3.765
Then you can use this query:
DECLARE #str nvarchar(max),
#x xml,
#head xml,
#sql nvarchar(max),
#params nvarchar(max) = '#x xml'
SELECT #str = BulkColumn
FROM OPENROWSET (BULK N'D:\sample.csv', SINGLE_CLOB) AS a
SELECT #head = CAST('<row><s>'+REPLACE(SUBSTRING(#str,1,CHARINDEX(CHAR(13)+CHAR(10),#str)-1),',','</s><s>')+'</s></row>' as xml)
SELECT #x = CAST('<row><s>'+REPLACE(REPLACE(SUBSTRING(#str,CHARINDEX(CHAR(10),#str)+1,LEN(#str)),CHAR(13)+CHAR(10),'</s></row><row><s>'),',','</s><s>')+'</s></row>' as xml)
SELECT #sql = N'
SELECT t.c.value(''s[1]'',''int'') '+QUOTENAME(t.c.value('s[1]','nvarchar(max)'))+',
t.c.value(''s[2]'',''nvarchar(max)'') '+QUOTENAME(t.c.value('s[2]','nvarchar(max)'))+',
t.c.value(''s[3]'',''decimal(15,7)'') '+QUOTENAME(t.c.value('s[3]','nvarchar(max)'))+'
FROM #x.nodes(''/row'') as t(c)'
FROM #head.nodes('/row') as t(c)
To get output like:
Heading1 Heading2 Heading3
1 Monday 2.4500000
2 Monday 3.7650000
At first we take data as SINGLE_CLOB with the help of OPEROWSET.
Then we put all in #str variable. The part from beginning to first \r\n we put in #head, the other part in #x with conversion to XML. Structure:
<row>
<s>Heading1</s>
<s>Heading2</s>
<s>Heading3</s>
</row>
<row>
<s>1</s>
<s>Monday</s>
<s>2.45</s>
</row>
<row>
<s>2</s>
<s>Monday</s>
<s>3.765</s>
</row>
After that we build dynamic query like:
SELECT t.c.value('s[1]','int') [Heading1],
t.c.value('s[2]','nvarchar(max)') [Heading2],
t.c.value('s[3]','decimal(15,7)') [Heading3]
FROM #x.nodes('/row') as t(c)
And execute it. Variable #x is passing as parameter.
Hope this helps you.
I ended up solving my problem using a non-SQL answer. Thank you everyone who helped contribute. I apologise for going with a completely off-field answer using PHP. Here is what I created to solve this problem:
<?php
//////////////////////////////////////////////////////////////////////////////////////////////////
// //
// Date: 21/10/2016. //
// Description: Insert CSV rows into pre-created SQL table with same column structure. //
// Notes: - PHP script needs server to execute. //
// - Can run line by line ('INSERT') or bulk ('BULK INSERT'). //
// - 'Bulk Insert' needs bulk insert user permissions. //
// //
// Currently only works under the following file structure: //
// | ROOT FOLDER //
// | FOLDER 1 //
// | CSV 1 //
// | CSV 2... //
// | FOLDER 2 //
// | CSV 1 //
// | CSV 2... //
// | FOLDER 3... //
// | CSV 1 //
// | CSV 2... //
// //
//////////////////////////////////////////////////////////////////////////////////////////////////
//Error log - must have folder pre-created to work
ini_set("error_log", "phplog/bulkinsertCSV.php.log");
//Set the name of the root directory here (Where the folder's of CSVs are)
$rootPath = '\\\networkserver\folder\rootfolderwithCSVs';
//Get an array with the folder names located at the root directory location
// The '0' is alphabetical ascending, '1' is descending.
$rootArray = scandir($rootPath, 0);
//Set Database Connection Details
$myServer = "SERVER";
$myUser = "USER";
$myPass = "PASSWORD";
$myDB = "DATABASE";
//Create connection to the database
$connection = odbc_connect("Driver={SQL Server};Server=$myServer;Database=$myDB;", $myUser, $myPass) or die("Couldn't connect to SQL Server on $myServer");
//Extend Database Connection timeout
set_time_limit(10000);
//Set to true for bulk insert, set to false for line by line insert
// [If set to TRUE] - MUST HAVE BULK INSERT PERMISSIONS TO WORK
$bulkinsert = true;
//For loop that goes through the folders and finds CSV files
loopThroughAllCSVs($rootArray, $rootPath);
//Once procedure finishes, close the connection
odbc_close($connection);
function loopThroughAllCSVs($folderArray, $root){
$fileFormat = '.csv';
for($x = 2; $x < sizeof($folderArray); $x++){
$eachFileinFolder = scandir($root."\\".$folderArray[$x]);
for($y = 0; $y < sizeof($eachFileinFolder); $y++){
$fullCSV_path = $root."\\".$folderArray[$x]."\\".$eachFileinFolder[$y];
if(substr_compare($fullCSV_path, $fileFormat, strlen($fullCSV_path)-strlen($fileFormat), strlen($fileFormat)) === 0){
parseCSV($fullCSV_path);
}
}
}
}
function parseCSV($path){
print_r($path);
print("<br>");
if($GLOBALS['bulkinsert'] === false){
$csv = array_map('str_getcsv', file($path));
array_shift($csv); //Remove Headers
foreach ($csv as $line){
writeLinetoDB($line);
}
}
else{
bulkInserttoDB($path);
}
}
function writeLinetoDB($line){
$tablename = "[DATABASE].[dbo].[TABLE]";
$insert = "INSERT INTO ".$tablename." (Column1,Column2,Column3,Column4,Column5,Column6,Column7)
VALUES ('".$line[0]."','".$line[1]."','".$line[2]."','".$line[3]."','".$line[4]."','".$line[5]."','".$line[6]."')";
$result = odbc_prepare($GLOBALS['connection'], $insert);
odbc_execute($result)or die(odbc_error($connection));
}
function bulkInserttoDB($csvPath){
$tablename = "[DATABASE].[dbo].[TABLE]";
$insert = "BULK
INSERT ".$tablename."
FROM '".$csvPath."'
WITH (FIELDTERMINATOR = ',', ROWTERMINATOR = '\\n')";
print_r($insert);
print_r("<br>");
$result = odbc_prepare($GLOBALS['connection'], $insert);
odbc_execute($result)or die(odbc_error($connection));
}
?>
I ended up using the script above to write to the database line by line... This was going to take hours. I modified to the script to use BULK INSERT which unfortunately we didn't have 'permissions' to use. Once I 'obtained' permissions, the BULK INSERT method worked a charm.
Related
I have an app that writes a set of GPS strings to a text file like this:
[{"date":"02/13/2017 19:26:00","time":1486974360428,"longitude":151.209900,"latitude":-33.865143}{"date":"02/13/2017 19:26:13","time":1486974373496,"longitude":151.209900,"latitude":-33.865143}{"date":"02/13/2017 19:26:23","time":1486974383539,"longitude":151.209900,"latitude":-33.865143}{"date":"02/13/2017 19:26:33","time":1486974393449,"longitude":151.209900,"latitude":-33.865143}{"date":"02/13/2017 19:26:43","time":1486974403423,"longitude":151.209900,"latitude":-33.865143}{"date":"02/13/2017 19:26:53","time":1486974413483,"longitude":151.209900,"latitude":-33.865143}]
the file always starts and ends with [].
This file gets uploaded to an Ubuntu server at
'filepath'/uploads/gps/'device ID'/'year-month-day'/'UTC download time'.txt
for example
/uploads/gps/12/2017-02-12/1486940878.txt
The text files get created when the file gets uploaded to the server, so there are multiple files written per day.
I would like a method to write the values to a MySQL database with the headings DEVICE (obtained from the filepath), DATE, TIME, LONGITUDE, LATITUDE.
Initially, just a command I can run on the server would be preferable, which I can eventually run from a PHP command on an admin panel.
Where do I start?
Instead of uploading, you could easily submit the text to a PHP program on the server. It would use JSON decode to convert it to an array, and then save each record to a table. The device ID would be one of the parameters to the script.
Using this type of approach would eliminate a lot of issues such as not importing a file twice, renaming/moving the files after import, finding the file(s), etc.
It would also mean your data is up to date every time the data is sent.
A script like that would be pretty trivial to write, but it should have some type of security built in to prevent data from being sent by an unauthorized entity.
Here's some sample code that will process the files and store them to the DB. I've removed certain info (user ID/password database name) that you will need to edit. It's a little longer that I guessed, but still pretty short. If you need more info, PM me.
<?php
/* ===============================================================
Locate and parse GPS files, then store to MySQL DB.
Presumes a folder stucture of gps/device_id/date:YYYY-MM-DD.
After a file is processed and stored in the DB table, the
file is renamed with a leading "_" so it will be ignored later.
===============================================================
*/
$DS = '/'; // Directory separator character. Use '/' for Linux, '\' for windows.
// Path to folder containing device folders.
$base_folder = "./gps";
// Today's date foratted like the folders under the devices. If parameter "date" has a value, use it instead of today's date. Parameter MUST be formatted correctly.
$today = isset($_REQUEST['date']) && $_REQUEST['date'] != '' ? $_REQUEST['date'] : date('Y-m-d');
// Get a list of device folders
$device_folders = get_folders($base_folder);
// Loop through all of the device folders
$num_file_processed = 0;
foreach($device_folders as $dev_folder) {
// Check to see if there is a folder in the device folder for today.
$folder_path = $base_folder.$DS.$dev_folder.$DS.$today;
// Check if the device/date folder exists.
if(file_exists($folder_path) && is_dir($folder_path)) {
// Folder exists, get a list of files that haven't been processed.
$file_list = get_files($folder_path);
// Process the files (if any)
foreach($file_list as $filename) {
$f_path = $folder_path.$DS.$filename;
$json = file_get_contents($f_path);
// Fix the JSON -- missing "," between records.
$json = str_replace("}{","},{",$json);
$data = json_decode($json);
// Process each row of data and save to DB.
$num_saved = 0;
$rec_num = 0;
foreach($data as $recno => $rec_data) {
if(save_GPS($dev_folder,$rec_data->date,$rec_data->time,$rec_data->longitude,$rec_data->latitude)) {
$num_saved++;
}
$rec_num++;
}
// Rename file so we can ignore it if processing is done again.
if($num_saved > 0) {
$newName = $folder_path.$DS."_".$filename;
rename($f_path,$newName);
$num_file_processed++;
}
}
} else {
echo "<p>" . $folder_path . " not found.</p>\n";
}
}
echo "Processing Complete. ".$num_file_processed." files processed. ".$num_saved." records saved to db.\n";
function save_GPS($dev_id,$rec_date,$rec_time,$long,$lat) {
$server = "localhost";
$uid = "your_db_user_id";
$pid = "your_db_password";
$db_name = "your_database_name";
$qstr = "";
$qstr .= "INSERT INTO `gps_log`\n";
$qstr .= "(`device`,`date`,`time`,`longitude`,`latitude`)\n";
$qstr .= "VALUES\n";
$qstr .= "('".$dev_id."','".$rec_date."','".$rec_time."','".$long."','".$lat."');\n";
$db = mysqli_connect($server,$uid,$pid,$db_name);
if(mysqli_connect_errno()) {
echo "Failed to connect to MySQL server: " . mysqli_connect_errno() . " " . mysqli_connect_error() . "\n";
return false;
}
// Connected to DB, so save the record
mysqli_query($db,$qstr);
mysqli_close($db);
return true;
}
function get_folders($base_folder) {
$rslts = array();
$folders = array_map("htmlspecialchars", scandir($base_folder));
foreach($folders as $folder) {
// Ignore files and folders that start with "." (ie. current folder and parent folder references)
if(is_dir($base_folder."/".$folder) && substr($folder,0,1) != '.') {
$rslts[] = $folder;
}
}
return $rslts;
}
function get_files($base_folder) {
$rslts = array();
$files = array_map("htmlspecialchars", scandir($base_folder));
foreach($files as $file) {
// Ignore files and folders that start with "." (ie. current folder and parent folder references), or "_" (files already processed).
if(!is_dir($file) && substr($file,0,1) != '.' && substr($file,0,1) != '_') {
$rslts[] = $file;
}
}
return $rslts;
}
I have a folder that that has a file added to it each day as below
Z:\old\stock110813.csv
Z:\old\stock120813.csv
Z:\old\stock130813.csv
Z:\old\stock140813.csv
I would like to import the latest file into SAS dynamically ie. searches the folder for the latest date
or would like sas to make a copy of the latest and change the name & location of the file
I have been searching the web for days testing little bits of code but am struggling therefore any help would be appreciated.
cheers John
If the date is predictable (ie, today's date) then you can do:
%let date=%sysfunc(today,dateformat.); *whatever dateformat you need;
proc import file="z:\old\stock&date..csv"... ;
run;
If not, then your best bet is to use a pipe with the directory listing to see the most recent filename. Something like this (directory logic depends on your server/OS/etc.)
filename mydir pipe 'dir /b /od /a-d z:\old\';
data myfiles;
infile mydir;
input #1 filename $32.;
call symput('filename',filename);
run;
proc import file="z:\old\&filename." ... ;
Do you want to use the system date or the date in the filename to determine what's the newest file? If you want to use the create of modify date, check out the foptname function to determine them. This code looks at the date in the filename. This code works without the need for X command authorization.
data newestFile (keep=newestFile);
format newestFile $20.;
retain newestFile newestDate;
rc = filename("dir","z:\old\");
did = dopen("dir");
/* loop through file and subdirectories in a directory */
do i = 1 to dnum(did);
csvFile = dread(did,i);
rc=filename("fid",cats("z:\old\",csvFile));
sdid=dopen("fid");
/*check if date in name is newest if it is a file */
if sdid le 0 then do;
csvFileDate = input(substr(csvFile,6,6),ddmmyy6.);
if csvFileDate gt newestDate then do;
newestDate = csvFileDate;
newestFile = csvFile;
end;
end;
else rc = dclose(sdid);
end;
rc = dclose(did);
/* move and rename file with latest date to newestFile.csv */
rc = rename(cats("z:\old\",newestFile), "z:\new\newestFile.csv",'file');
run;
I have written the following code to plot a graph with the data present in the 'datafile'. After the graph has been plotted, I want to delete the file.
function plot_torque(datafile)
//This will call a datafile and plot the graph of net_torque vs. time
verbose = 1;
// Columns to plot
x_col = 1;
y_col = 2;
// open the datafile
file1 = file('open', datafile,'old');
data1 = read(file1, -1, 4);
time = data1(:,x_col);
torque = data1(:,y_col);
plot(time, torque, '.-b');
xtitle("Torque Generated vs. Time" ,"Time(s)" , "Torque Generated(Nm/m)");
file('close',file());
//%________________%
endfunction
In the place that I have marked as //%________% I have tried
deletefile(datafile);
and
mdelete(datafile);
None of them have worked.
And I have set the working directory to where the above '.sci' file is present and the 'datafile' is present. I am using scilab-5.4.1.
You probably leave (left) the file open. Try this:
fil="d:\Attila\PROJECTS\Scilab\Stackoverflow\file_to_delete.txt"; //change it!
fprintfMat(fil,rand(3,3),"%.2g"); //fill with some data
fd=mopen(fil,"r"); //open
//do something with the file
mclose(fd); //close
//if you neglect (comment out) this previous line, the file remains open,
//and scilab can not delete it!
//If you made this "mistake", first you should close it by executing:
// mclose("all");
//otherwise the file remains open until you close (and restart) Scilab!
mdelete(fil); //this works for me
Does anyone have any examples of using EzAPI with a flat file as the data source? All the examples in the documentation start with OleDB connections.
Specifically I can't work out how to define input and output columns.
Say, for instance, that I have a CSV file with columns for firstname, surname and age. I want to read this into SSIS, sort by age and write out to another text file.
According to this post How to use EzAPI FlatFile Source in SSIS? I need to define columns manually, but I can't get the suggested code to work.
If I do:
if (!pkg.Source.OutputColumnExists("col0"))
{
pkg.Source.InsertOutputColumn("col0");
}
bool newColumnExists = pkg.Source.OutputColumnExists("col0");
newColumnExists is still false.
i think this link will help you : http://blogs.msdn.com/b/mattm/archive/2008/12/30/ezapi-alternative-package-creation-api.aspx
you will get to know how to create one.
if you want to add columns in flat file use this code:
var flatFileCm = new EzFlatFileCM(this);
flatFileCm.ConnectionString = file;
foreach (var column in columns)
{
// Add a new Column to the Flat File Connection Manager
var flatFileColumn = flatFileCm.Columns.Add();
flatFileColumn.DataType = DataType.DT_WSTR;
flatFileColumn.ColumnWidth = 255;
flatFileColumn.ColumnDelimiter = columns.GetUpperBound(0) == Array.IndexOf(columns, column) ? "\r\n" : "\t";
flatFileColumn.ColumnType = "Delimited";
// Use the Import File Field name to name the Column
var columnName = flatFileColumn as IDTSName100;
if (columnName != null) columnName.Name = column;
}
flatFileCm.ColumnNamesInFirstDataRow = true;
I have a bunch of html files in a site that were created in the year 2000 and have been maintained to this day. We've recently began an effort to replace illegal characters with their html entities. Going page to page looking for copyright symbols and trademark tags seems like quite a chore. Do any of you know of an app that will take a bunch of html files and tell me where I need to replace illegal characters with html entities?
You could write a PHP script (if you can; if not, I'd be happy to help), but I assume you already converted some of the "special characters", so that does make the task a little harder (although I still think it's possible)...
Any good text editor will do a file contents search for you and return a list of matches.
I do this with EditPlus. There are several editors like Notepad++, TextPad, etc that will easily help you do this.
You do not have to open the files. You just specify a path where the files are stored and the Mask (*.html) and the contents to search for "©" and the editor will come back with a list of matches and when you double click, it opens the file and brings up the matching line.
I also have a website that needs to regularly convert large numbers of file names back and forth between character sets. While a text editor can do this, a portable solution using 2 steps in php was preferrable. First, add the filenames to an array, then do the search and replace. An extra piece of code in the function excludes certain file types from the array.
Function listdir($start_dir='.') {
$nonFilesArray=array('index.php','index.html','help.html'); //unallowed files & subfolders
$filesArray = array() ; // $filesArray holds new records and $full[$j] holds names
if (is_dir($start_dir)) {
$fh = opendir($start_dir);
while (($tmpFile = readdir($fh)) !== false) { // get each filename without its path
if (strcmp($tmpFile, '.')==0 || strcmp($tmpFile, '..')==0) continue; // skip . & ..
$filepath = $start_dir . '/' . $tmpFile; // name the relative path/to/file
if (is_dir($filepath)) // if path/to/file is a folder, recurse into it
$filesArray = array_merge($filesArray, listdir($filepath));
else // add $filepath to the end of the array
$test=1 ; foreach ($nonFilesArray as $nonfile) {
if ($tmpFile == $nonfile) { $test=0 ; break ; } }
if ( is_dir($filepath) ) { $test=0 ; }
if ($test==1 && pathinfo($tmpFile, PATHINFO_EXTENSION)=='html') {
$filepath = substr_replace($filepath, '', 0, 17) ; // strip initial part of $filepath
$filesArray[] = $filepath ; }
}
closedir($fh);
} else { $filesArray = false; } # no such folder
return $filesArray ;
}
$filesArray = listdir($targetdir); // call the function for this directory
$numNewFiles = count($filesArray) ; // get number of records
for ($i=0; $i<$numNewFiles; $i++) { // read the filenames and replace unwanted characters
$tmplnk = $linkpath .$filesArray[$i] ;
$outname = basename($filesArray[$i],".html") ; $outname = str_replace('-', ' ', $outname);
}