Watson Explorer Legacy System Automation

Those of us who work in DevOps know that certain application patterns are easy to automate, while others can be very difficult. Some applications seem as though they were designed explicitly to make automation hard. When working with legacy software that has, over time, become a core component of the business infrastructure, it’s not always possible to “just rewrite” the software or “replace it with something modern.” While the results aren’t always pretty (and are rarely ideal), in most cases automation around these legacy systems can result in improvements. Case in point: Watson Explorer.

Watson Explorer Automation

Watson Explorer is a scraping/indexing search software (different from Watson). The proprietary configuration of this software involves a combination of xml and property files that are processed by a custom jar into a configuration package. A total of 71 servers ran this particular software, supported by two people who were constantly swamped.

Based on our research and discussions with the vendor, we realized that there was no precedent for this type of automation. As is the case with most enterprises, these types of applications require a lot of care and feeding. We knew we would need to solve a number of key challenges in order to implement an automation-first approach.

Below, we’ll describe the automation approach we took.

Jenkins Pipeline Implementation

Here are the manual steps that the two-person team performed for each update of the application’s configuration:

Check out code from SVN, make changes, and commit back to SVN. (This process often involved tedious manual changes.)
Log in to jump server that can access application servers.
Check out code from SVN to jump server.
Collect files into the correct directories.
Run preprocess script.
Run main process jar.
Perform cleanup.

After moving the application code over to Git, we implemented the following Jenkins pipeline. This pipeline condenses all the manual steps following code commit into one flow, making the process repeatable and reducing the likelihood of errors.

pipeline {
  agent any

  parameters {

The team can deploy to numerous environments. This parameter lets the team choose the environment with which they want to work. Because there isn’t necessarily progression from one environment to another, we can’t set up a dev -> qa -> prod type of pipeline.

choice(name: 'env', description: 'Choose deployment target', choices:
            'dev1\n' +
            'dev2\n' +
            'etc'
    )

Occasionally an older version will need to be deployed when testing older versions.

 choice(name: 'version', description: "Choose target version to deploy", choices:
            'latest\n' +
            '0.0.1\n' +
            '0.0.2\n' +
            'etc'
    )

No proper linting tool is available to validate that the generated configuration matches what the application expects, so we’re defaulting the pipeline to a “dry run” that allows the team to manually review the created configuration.

choice(name: 'upload', description: "Deploy to watson or just generate payload?", choices: 'generate\nupload')
  }

  environment {
    OUTPUT_DIR = "output/${params.env}"
  }

  stages {

This stage will grab a specified tag if “latest” isn’t specified.

   stage('Checkout tag') {
      when {
        expression {
          params.version != "latest"
        }
      }
      environment {
        scmUrl = ""
      }
      steps {
        echo "Using ${params.version} instead of latest"
        script {
          scmUrl = sh(returnStdout: true, script: 'git config remote.origin.url').trim()
        }
        checkout scm: [$class: 'GitSCM', userRemoteConfigs: [[url: scmUrl, credentialsId: 'github_creds']], branches: [[name: "refs/tags/${params.version}"]]], poll: false
      }
    }

This stage copies all relevant files into one directory for processing.

stage('Collect files') {
      steps {
        sh """
          mkdir -p ${OUTPUT_DIR}/
          cp -R base/* ${OUTPUT_DIR}
          cp -R envs/${params.env}/* ${OUTPUT_DIR}
          cp -R data ${OUTPUT_DIR}/data
        """
      }
    }

This stage is meant to reduce manual edits. The application is expecting a property file with sequentially numbered properties, so adding a property into the middle of a few hundred other properties means all the following properties need to be renumbered.

The solution was to replace the numbers in the property files with placeholders and use this stage to convert the placeholders to sequential numbers.

   stage('Process xpath placeholder replacement') {
      steps {
        dir("${OUTPUT_DIR}") {
          script {
            def props = readProperties file: 'global.properties'
            int total = props."deployments.total".toInteger()
            if (!total) {
              error("global.properties doesn't contain 'deployments.total'")
            } else {
              echo "Found ${total} property file(s) to process."
            }
            for (i = 1; i <= total; i++) {
              def file = props."deployment${i}.configfile"
              echo "Processing xpath placeholder replacement in ${file}"
              sh """
                set +x
                set +e
                filename=${file}
                id="_id_"
                
                i=1
                rc=0
                while [ \$rc -eq 0 ]
                do
                   sed "1,/xpath\${id}=/s/^xpath\${id}=/xpath\${i}=/" \$filename | sed "1,/val\${id}=/s/val\${id}=/val\${i}=/" > \${filename}.tmp
                   mv \${filename}.tmp \${filename}
                   grep -c xpath\${id} \${filename} > /dev/null
                   rc=\$?
                   i=`expr \$i + 1`
                done
              """
            }
          }
        }
      }
    }

“Dry run” vs. “actually apply” is determined from an entry in a property file. This stage flips that property flag if needed.

   stage('Enable upload') {
      when {
        expression {
          params.upload == "upload"
        }
      }
      steps {
        dir("${OUTPUT_DIR}") {
          echo "Enabling sshupload in global.properties"
          sh "set +x; set +e; sed -i 's/sshupload.enabled=false/sshupload.enabled=true/' global.properties"
        }
      }
    }

Some environments need some other tweaks to the configuration before main processing. Those tweaks are stored in one script file that is executed by this stage.

   stage('Run preprocess script') {
      steps {
        dir("${OUTPUT_DIR}") {
          sh """
            if [ -f preprocess.sh ] ; then
                echo Preprocessing repository.xml
                ./preprocess.sh
            fi
          """
        }
      }
    }

This stage runs the main processing jar. This jar will also deploy the configuration to the specified environment if the appropriate flag is set. An ssh key is needed to connect to the application servers; this key is provided via the Jenkins Credentials Plugin.

   stage('Run velocity-deploy jar') {
      steps {
        withCredentials(bindings: [sshUserPrivateKey(credentialsId: 'watsonexplorer', keyFileVariable: 'WATSON_KEY_FILE')]) {
          dir("${OUTPUT_DIR}") {
            sh '''
              cp ${WATSON_KEY_FILE} ./ssh_key
              JAVAPATH=
              CLASSPATH='.:./velocity-deploy-1.1.jar:./jsch-0.1.42.jar'
              java -Xms32m -Xmx256m -classpath ${CLASSPATH} com.vivisimo.processor.BuildProcessor
              rm ./ssh_key
            '''
          }
        }
      }
    }
  }

  post {
    always {

Attempt to remove the key again in case an error occurred that prevented it from being deleted.

     dir("${OUTPUT_DIR}") {
        sh "rm -f ./ssh_key"
      }

This is a pause to allow the team to check the files that were created and validate that the configuration looks good.

     script {
        echo "\n--------------------------------\n" +
                "Check the prior logs to see if there were any errors. You may need to check the workspace.\n" +
                "--------------------------------\n" +
                "Access the workspace to view files here: ${env.BUILD_URL}execution/node/3/ws/${OUTPUT_DIR}\n" +
                "This folder will be deleted in 10 minutes, or whenever you manually continue.\n" +
                "--------------------------------\n"
        try {
          timeout(time: 10, unit: 'MINUTES') {
            input "\n--------------------------------\n" +
                    "Continue?\n" +
                    "Note: Either button will proceed to delete the workspace\n" +
                    "--------------------------------\n"
          }
        }
        catch (e) {
          echo "Continuing"
        }
      }

      cleanWs()
    }
  }
}

Watson Explorer Automation

It’s important to periodically step back and think about our work from a high level to see where we can optimize and eliminate manual steps.

Oftentimes small optimizations, such as the Watson Explorer optimizations above, can have a big impact on the time spent on individual tasks. To be clear, the pipeline shown above isn’t overly complex. Instead, it essentially automates manual steps, making the entire process much faster and more reliable.