DW - CovidCheq (Integration Package)

Last Updated 5/24/2021

Overview
Location
Schedule
Details
 Packages
    master
    create_files
    send_archive
    cleanup
Output File Defintions
Change Log
Troubleshooting Notes

Overview:

The packages in the DW-CovidCheq project are designed to send data to the Qualtrics directory.  It is an integration with the Qualtrics API that allows us to automatically create and updated embedded data for directory contacts in Qualtrics

The ETL has two main components.  An SSIS package that creates files for student, employees and faculty, and a package that sends those files to the Qualtrics file service for processing in  Qualtrics.

Location:

The project – DW-CovidCheq â€“ contains all the packages for this integration and lives in the ETL-BI-Integrations solution in SSIS and in the BI-Integrations folder in the SSISDB catalog.

Schedule:

The packages in this project run as part of the jobs below (full dataMarq schedule).  

ETL – BI-Integration – CovidCheq Full/Inc – Every 1 hr

Project and Package Details:

The packages in ETL – Qualtrics work with the Qualtircs API and don’t operate on our normal one package per table paradigm of other DB related loads. Instead, there are three main packages, one for loading survey data, one for loading/updating users, and one (decommissioned) for loading contact data .

Project Parameters

apiToken: The unique api Token that allows access to the Qualtrics instance and the api methods

automationId: Automations are the processes in the Qualtrics tool that load data files to the contact directory.  There is a unique automation set up to process the files produced by this project

cleanupDays: The number of days back to look back and move any files older than this to the archive folder on the BI shared drive

covidFilePath: The parent directory to store the created files on the BI shared drive - \\marqnet.mu.edu\depts\ITS\BI\covidCheq\

curlPath: The path on the appropriate server/local machine to the curl executable. Curl is a tool used to interact with web services.

dwConnStr: The connection to the approprate dataMarq data warehouse instance

fullRefresh: 1 = Full Refresh, 0 = Incremental

gpgPath: gpg is a tool that encrypts the data in the files and is compatible with Qualtrics' encryption. Once files are created by SSIS they are encrypted using a gpg key and then sent to Qualtrics. This encryption is to secure HIPAA data contained in the files.  Qualtrics has the gpg key and the tool decodes the file so only encrypted files are sent

gpgRecipient: A name for the key holder to encrypt it with the appropriate encryption

refreshDays: How many days to look back to pull data for incremental refreshes

Packages

MASTER

Overview

This is the master orchestration package.

Package Parameters

None

Package Variables

None

Package Flow

  1. Run create_files package
  2. Run send_archive package
  3. Run cleanup package

CREATE_FILES

Overview

This package creates the files are are sent to Qualtrics.  

Package Parameters

None

Package Variables

fileDate: A date set a run time to tag on to the file name to indicate when the file was created

fileName: This package creates three separate files to speed up processing, an Undergrad file (ugrd_), a Grad file (grad_), a visitors file (vis_), and an employee file (emp_).  This variable sets the prefix for the file name.

fileTimestampA timestamp to tag on to the file name to indicate when the file was created

list: Each file is created using a list variable (ugrd, grad, vis, emp). These lists are stored in an XML file

listId: The unique ID in the XML file for each list/file to be created

selectStatement: The select statement for that generates the data for the file.  It is essentially a select statement from the table ops.d_covidcheq_list.

sqlStatement: The full sql statement to be passed to the db source task

whereRefresh: The where clause to be added to the where statement when an incremental job is run.  
@[$Project::fullRefresh] == 1 ? "" : " AND DATEDIFF(HOUR,sys_updated,getdate()) < " +  @[$Project::refreshDays]

whereStatement: The where clause to be added to the sql statement. 
@[User::whereRefresh] + " " +  @[User::whereUsers]

whereUsers: The where clause to be added to the where statement to indicate which users to populate for a given file
@[User::whereRefresh] + " " +  @[User::whereUsers]

 

XML Lists Example

<Lists>
   <List>
     <Id>1</Id>
      <Name>ugrd</Name>
   </List>
   <List>
      <Id>2</Id>
      <Name>grad</Name>
   </List>
</Lists>

Package Flow

  1. Execute the Covid Cheq lists proc - ops.sp_d_covidcheq_list - to get the latest data in the ops.d_covidcheq_list table
  2. Set the fileDate and the fileTimeStamp variables
  3. Loop through the lists XML to get the name of each list/file to create
    1. Execute the Query to populate date for the file using the dynamic sqlStatement variable
    2. Output the file to the file path
  4. Rename the file using the the name variables that were created so the creation time is clear and unique

Package Snapshots
create

SEND_ARCHIVE

Overview

This package enrypts and sends the files that are created in the create_files packages to Qualtrics file service for processing by the automated job to populated the contact directory

Package Parameters

None

Package Variables

fileName: the file path to be send to Qualtrics, determined by the file loop of the covidFilePath project parameter 

Package Flow

  1. Loop through the covidFilePath directory
  2. Encrypt the file
    "--yes --recipient \"" +  @[$Project::gpgRecipient]  + "\" --output \"" + @[$Project::covidFilePath] + @[User::fileName] + ".gpg\"  --encrypt \"" + @[$Project::covidFilePath] + @[User::fileName] + ".txt\""
  3. Send the file to Qualtrics
    Executes a curl statement to send the files to the Qualtrics web service. 
    "/C move " + @[$Project::covidFilePath]  + @[User::fileName]  + ".gpg " +   @[$Project::covidFilePath]  + "archive\\" + @[User::fileName]  + ".gpg"
  4. Archive GPG File
    Moves the sent file to the archive folder
  5. Delete TXT file
    Deletes the original txt file so that the unecrypted data is not saved on the server

Package Snapshots

 send

CLEANUP

Overview

This package has one execute process task that runs a powershell command to delete files that are older than the cleanupDays parameter.

The command that is run is an expression:

"Get-ChildItem –Path \"" + @[$Project::covidFilePath] + "archive\"  -Recurse | Where-Object {($_.LastWriteTime -lt (Get-Date).AddDays("+ @[$Project::cleanupDays] + "))} |  Remove-Item"

 

File Definitions

Field

Type

Length

userid wstr 100
status_code wstr 25
status wstr 75
faculty_ind wstr 1
staff_ind wstr 1
student_ind wstr 1
person_type wstr 25
employment_category wstr 30
college_org_code wstr 10
college wstr 50
vp_code wstr 5
vp_area wstr 80
job_title wstr 150
primary_position_title wstr 100
primary_supervisor_muid wstr 150
primary_supervisor_name wstr 240
dept_chair_muid wstr 150
dept_chair_name wstr 240
chair_job_title wstr 150
chair_position_title wstr 100
next_up_supervisor_muid wstr 150
next_up_supervisor_name wstr 240
location_code wstr 60
office_location wstr 45
office_number wstr 45
office_building wstr 60
student_level_code wstr 4
student_level wstr 15
acad_load_code wstr 1
degree_seeking_ind str 1
acad_career_code wstr 5
acad_prog_code wstr 5
primary_acad_plan wstr 10
acad_plans wstr 150
res_hall_code wstr 1
res_hall wstr 30
acad_career_rep wstr 50
acad_career_sh wstr 50
muid wstr 11
first_name wstr 150
email wstr 240
cell_phone wstr 30
pref_phone wstr 30
trace_ind wstr 1
local_street_address wstr 450
local_city wstr 100
local_state_code wstr 10
local_postal_code wstr 25
last_name wstr 150
student_employee_ind wstr 1
tenured_ind wstr 1
first_gen_ind wstr 1
visitor_ind wstr 1
gender wstr 50
race_ethnicity wstr 50
ed_dsc_exclude wstr 1
delinquent_ind str 1
primary_hr_dept_code wstr 10
primary_hr_dept wstr 50
primary_acad_dept_code wstr 25
primary_acad_dept wstr 100
externalreference wstr 11
ED_DSC_OPT_OUT wstr 1
full_time_ind wstr 1
primary_instructor_ind wstr 1
currently_attending_ind wstr 1
athletics_ind wstr 1
sport_code wstr 5
sport wstr 25
degree_type wstr 100
campus_housing_ind str 1
attribute1 wstr 150
attribute2 wstr 150
attribute3 wstr 150
attribute4 wstr 150
attribute5 wstr 150
WEEKEND_OPT_IN wstr 1
online_only_ind wstr 1
quarantine_status wstr 50
last_90_days_ind str 1
attribute6 wstr 150
attribute7 wstr 150
attribute8 wstr 150
attribute9 wstr 150
attribute10 wstr 150
hall_director_email wstr 240
current_surveillance_ind wstr 1
surveillance_appt_ind wstr 1
surveillance_test_ind wstr 1
surveillance_delinquent_ind wstr 1
attribute11 wstr 150
attribute12 wstr 150
attribute13 wstr 150
attribute14 wstr 150
attribute15 wstr 150

Change Log:

Date Developer Change Record
5/24/2021 N/A Initial Documentation


Troubleshooting Notes:

  1. Occasional Failures Due to Qualtrics API

    Calls to the Qualtrics web service will occasionally fail. This is not due to a problem with the code, but a limiting of calls as we are running these quite frequently. The only issue would be if the job fails repeatedly.  This has not happened with the stage load, only with the downstream procs
  2. Changes to old data in a survey (old would be more than the 2 days we refresh)

    This mostly happens with the contact tracing survey where the Medical Clinic enters in a close contact – or someone enters a self disclosure – and then finds out it shouldn’t have been. When this happens within two days of the response it is not a problem as the incremental refresh picks up the last two days and will alter the response. If it is older than two days the remedy is to DELETE the staging row with the bad response. This is preferrable to running the survey load job with different start/end dates which can be