Replacing Legacy VB Script with SSIS

Overview

I received a request to replace a legacy VB Script process which was failing on a regular basis. The VB script was part of a process that included a batch file and WinSCP script, all initiated by the task scheduler. The process would download files from an SFTP server to our content management server, archive the files on a file server, and prepare the downloaded files for additional processing by our Enterprise Content Management (ECM) system.   The script would run at half hour increments throughout the day and our ECM system would "listen" for the index file and use its contents to import specific PDF files into the system. Another issue, other than failing, was the lack of alerts to notify the IT team that the document import process failed. I reviewed the existing code and identified all the steps that needed to take place.

The overall process would look something like this.

1.Connect to the file server.

2.Connect to ECM server.

3.Clear the ECM staging area of any existing files.

4.Connect to the SFTP server and download files from an SFTP server to a staging area. Files would need to be filtered by date string embedded in the filename, ensuring only the current days files were downloaded.

5.Create a log file detailing the SFTP processing.

6.Archive the downloaded files on the file server.

7.Create a separate log file identifying empty and non-empty files.

8.Delete any the empty zip files.  

9.Extract the contents while maintaining the folder structure into the staging area.

10. Identify the index file in each of the unpacked folders and update specific values in the index file. The index file is used by our content management system to identify the location of each unpacked PDF files and import them for further processing.  This involves

a.Update the first column to a specific value.

b.Update the column containing the unpacked file name so that it includes the path to the PDF file.

c.Rename the index file and place in a separate Index directory, where it will remain until it is process by our ECM.

11. In the even of failure provide detailed error alert to IT team.

Setup

The example I am presenting is slightly different from the example I implemented in production.  In this example, all files are processed locally on my C:\ drive, but the concepts are still the same.  Additionally, I will not be detailing how I connected to the file and ECM servers.  I have addressed how to complete this task in another post. Remember to disconnect from the resource once the process is complete.

Project Level Variables

Name Data Type Value

1.vArchive String Path to archive file i.e c:\Archive\

2.vDate String Dynamic date string in YYYYMMDD format using the following expression for                                                                                         populating the  variable value:

(DT_STR, 4, 1252) DATEPART("yyyy" , GETDATE())
                                                                                       + RIGHT("0" + (DT_STR, 2, 1252) DATEPART("mm" , GETDATE()), 2)
                                                                                       + RIGHT("0" + (DT_STR, 2, 1252) DATEPART("dd" , GETDATE()), 2)

3.vFileProcessLog String  Filename will be created during execution using the following expression. The                                                                                          output of the expression is: c:\Logs\Document_Extraction_                                                                                          20220715163529_log.txt. The filename includes the year, month, day, hour,                                                                                          minute,  and second of execution.

"c:\\Logs\\Document_Extraction_"+@[User::vDate] +Right("0" +                                                                                          (DT_STR,2,1252)DATEPART("hh",GETDATE()),2)+ Right("0" +                                                                                          (DT_STR,2,1252)DATEPART("mi",GETDATE()),2) + Right("0" +                                                                                          (DT_STR,4,1252)DATEPART("ss",GETDATE()),2)+"_log.txt"

4.vIndex         String This variable defines the directory where the index file will be stored and used for                                                                                          further processing by the application system. In our example, C:\Index\.

5.vSftpInboundFileName String This variable provides the filename format that will be used by WinSCP to                                                                                         download the appropriate files. The expression used to populate the variable is the                                                                                         following. The output for the variable is /Test_20220715_.zip. The vDate variable                                                                                         will contain the date of execution. In essence, we are using the variable to provide                                                                                         a filter for the SFTP process. The process will search for any files prefaced by                                                                                         "Test_", with today's date, ending with a .zip.

6.vSftpInboundFilePath String The inbound path for files we acquire from the SFTP server. Enter your destination                                                                                         path. For example, C:\Staging\.

7.vSftpLogFilename String The full path for the logfile that will be generated to capture events during WinSCP                                                                                         transmission."C:\\Logs\\Document_Import_"+ @[User::vDate] +Right("0" +                                                                                         (DT_STR,2,1252) DATEPART("hh",GETDATE()),2) + Right("0" + (DT_STR,2,1252)                                                                                         DATEPART("mi",GETDATE()),2) + Right("0" + (DT_STR,4,1252)                                                                                         DATEPART("ss",GETDATE()),2)+"_log.txt"

8.vSftpServerHostKey         String The host key for the SFTP server. If you do not know the host key, there is a                                                                                         method using WinSCP for identifying the host key.

9.vSftpServerName String The name or IP address of the server.

10.vSftpServerUserPassword String The password for the SFTP server.

11.vSftpUserName String The username for the SFTP server.

Your project level variables should resemble the following. I am using Rebex Tiny SFTP server for testing my development. The host key and server name will differ for you.

Tasks

Description   Task Type

1.Delete Directory Content File System Task

2.Download Files From SFTP Server Script Task

3.Archive Downloaded Files and Remove Empty Folders Script Task

4.Unpack Zip files and Maintain Folder Structure         Script Task

5.Copying and "Updating" a Tab Delimited Text File with SSIS         Script Task

6.Transmit Completion Notification Send Mail Task

The completed solution will resemble the image below.

I will publishing the details of each SSIS package as tasks within the coming weeks. See you there.