IBM-Data-Merge-Utility

A Transformation and Enrichment Engine

View the Project on GitHub

IBM Data Merge Utility v4.0.0 - Users Guide DRAFT

Contents


Overview

A surprisingly large segment of Information Processing can be thought of as consisting of Transformation Services (take this data and transform it into another format) or Enrichment Services (take this data and go get me some more data). The IDMU utilizes a template based approach to these services, in which Merging Templates produces Transformed/Enriched output. The merge process is similar to the familiar Mail Merge feature in most word processors. Potential uses for this tool include:


Merge Processing

This diagram shows an overview of Merge Processing Processing Overview When a template is merged, each directive is executed in order. Simple Transformation templates usually have a Parse directive to parse the idmuPayload value and then Replace and/or Insert directives to put that data into a template. Sub-templates are merge processed when they are inserted at a bookmark. Templates that need additional data from an outside data source such as a Database, File or Rest Service can use the Enrich directive to get that data. Use cases that need to generate a collection of files instead of a single output message can use the Save directive to create an archive with multiple files in it.


The Data Manager

IDMU has an internal Object Data store called the Data Manager. You can think of this as a smiple JSON object structure that only supports a String Primitive. Technically the data manager supports three Element types:

The REST API will automatically place the HTTP Request Parameters in the data manager at the address idmuParameters, and the HTTP Request Payload at idmuPayload. The payload will be a simple primitive, and the parameters are an object of primitive lists.

Data Addresses

This object store will be used to store data that is fetched from an external data source (by an Enrich directive). This data is transient, and is released from memory after the merge is completed. Most directives will either read data from the data store and take some action, or write data to the data store for use by other directives. Data in the Data Manager is accessed via a “path” style address. For example, given this data structure:

{ 
	mydata: { 
		name: "fred", 
		address: "123 Anywhere",
		friends: [
			{ 	name: "allen",
		  		since: "way back" 
		  	},
			{ 	name: "betty", 
		  		since: "last week" 
		  	}
		]
	},
 	someOtherData: { ... }
 }
	.

the Data Manager address “mydata-friends-[0]-name” would refer to “allen”. Since you can’t always predict what special characters might be used you your data names, whenever you provide a data address you specify the delimiter used as a path separator. The above address uses a dash “-“ as the path separator. In addition to the two special addresses mentioned above (idmuParameters and idmuPayload) the address idmuContext has special meaning during insert operations.

Using Replace Tags in an Address

Data Manager addresses can contain replace tags. Replace tags will be processed during execution using the current Template Replace stack. See the Replace Directive for details on Tags and the Replace Stack


Template Developers Guide

Developing Templates

Developing Templates is the main skill needed to use IDMU. The best starting place is to have a sample input / output that your templates should create. Start with a template that is just the expected output and work out from there. When developing templates, start without any bookmarks or sub-templates and then add complexity in steps. Use of the Replace with Json options can be helpful if the data is not what you are expecting.


Templates

The Template is the primary configuration item used by IDMU and describes both the structure of the data to be returned by a merge and the directives that drive the merge process.

Template Attributes:

The following fields are only used by the Rest API and only apply to a “Base” template. These fields do not affect merge processing in any way.

Template Content

From an IDMU perspective, template content is just a block of text, it could be JSON, XML, HTML, a Bash Script, a Java Class, a NodeJs program, a CSV file…. anything you want to generate. The text in a template can have Replacement Tags and Bookmarks that identify where in the template data is to be placed, or sub-templates are to be inserted. Here is a sample of some JSON Template content:

{
	name: "<NAME>",
	address: "<ADDRESS>",
	friends: [<bookmark="friend" group="test" template="friend">]
}

In this template we are using < and > to wrap the tags and bookmarks. The tags <NAME> and <ADDRESS> will be replaced with data during the merge process (by a Replace directive). Sub-Templates for each “friend” will be inserted at the friend bookmark by an Insert directive.


Directives

Each template will have a list of directives that are executed during the merge process. Most directives interact with the Data Manager, and there are currently five directive types:


Replace

The replace directive is used to replace Tags in the template with data values from the Data Manager.

The Replace Stack

The replace directive does not directly replace the Tags in the template with data values, rather it loads a Replace Stack which is a list of “From” and “To” values and then optionally processes that stack over the template. This is useful if you want to place default values in the replace stack. It is also important to know that different directives can use the values in the replace stack during processing. The JDBC and JNDI data providers process replacement tags in a SQL statement from the templates Replace stack. It is also important to know that sub-templates inherit their parent’s replacement stack.

Replace Tags

Replace Tags are any wrapped strings that are not bookmarks, and conform to this pattern

tag=”name” encode=”encode” format=”formatString” parseFirst

All fields except name are optional - using the wrappers { and } {foo} is the same as {tag=foo}. Attributes are:

Replace Directive Attributes

Parse

The Parse directive will take a Primitive String in the data manager, parse it and place the resulting structure in the Data Manager at a specified target.

Parse Directive Attributes
Default Parsing Engines

Parsing engines can be added to the IDMU tool. To find out what parsing formats are supported in your implementation check the HTTP GET http://host/idmu/Config output. If you need to add a new parser, see the IDMU Developers Guide below.


Enrich

The enrich directive will retrieve data from an external data source, and place that data into the Data Manager.

Enrich Directive Attributes

The use of the following fields will be defined by the provider being used. See Default Providers below for details.

Default Providers

The default providers are shown below:

Providers can be added to the IDMU tool. To find out what providers are supported and the environment variable requirements of themcheck the HTTP GET http://host/idmu/Config output. If you need to add a new parser, see the IDMU Developers Guide below.


Insert

The Insert directive will insert sub-templates at bookmarks within the content. During the execution of this directive sub-templates will be merged and their output will be inserted into the parent template at a bookmark. All bookmarks are removed from the template before it is inserted. Un-replaced tags are inserted along with content.

Bookmarks

A Bookmark Segment marks a location within the Content where sub-templates can be inserted and identifies the sub-template to be inserted. Bookmarks are wrapped strings that start with “bookmark” and conform to this pattern

bookmark = “name” group = “group” template = “template” varyby = “varyBy” insertAlways

All fields are required except the varyby field and insertAlways indicator.

Insert Context

Sub templates are inserted in the context of some data in the Data Manager, typically a member of a list, or an attribute of an object. The Data Manager address “idmuContext” will always point to the insert context object within the Data Manager. For example, if we are inserting sub-templates for each member of a list such as:

{
	myList:[
		{object1}
		{object2}
		{object3}
	]
}

Three sub-templates would be inserted, and when the first sub-template is merged for insertion the address idmuContext will be the same as myList-object1.

Insert Directive Attributes

Save

The save directive will write the contents of the tempalte out to an entry in the Merge Archive. The default archive type is tar, you can change this by specifying a value for the parameter idmuArchiveType with one of the following:

Save Directive Attributes
Working with Archives

If you are using the IDMU-REST interface, the GET http://host/idmu/Archive/archiveName will retrieve the archive from the server and remove it from the temporary folder. If you are running the CLI you can find the archives in the output folder which defaults to /opt/ibm/idmu/v4/archives. You can overide this value in the idmu-config environment variable. —

Setting up a Template Development environment

Using IDMU-REST

The fastest way to get started working with templates is to use the Docker IDMU image.

You’ll need Docker

Then you can run this command to start and IDMU instance

docker run -d -p 9080:9080 -p 9443:9443 --name idmu flatballflyer/idmu:4.0.0.Beta1

See The IDMU-REST Wiki for a curl cheat sheet

JSON Editors

Until the IDMU-REST project has a Web-UI you’ll need a good JSON editor. I’m still looking for a good Windows JSON editor. For users on Mac I can recommend Power Json Editor


IDMU Developers Guide

Extending Parsing Capabilities

IDMU is designed to support additional custom parsers. Simply implement the com.ibm.util.merge.data.parser.ParseProxyInterface and then register your parser see Configuring IDMU

Extending Provider Capabilities

IDMU is designed to support additional custom providers for the Enrich directive. Simply implement the com.ibm.util.merge.data.template.directive.enrich.provider.ProviderInterface and then register your provider see Configuring IDMU


IDMU Rest Administrators Guide

Building a Production Ready WAR

Until IDMU-REST Issue #2 is addressed you will have to manually change the web.xml file contained in the WAR package. After extracting the package, edit the web.xml file to remove all servlets except Initialize and Merge. This will prevent templates from being changed through the API while the IDMU instance is running. It will only load templates from the loadFolder specified in idmu-config.

Deploying IDMU on Docker

Create a docker file and build a docker image with your custom templates. It should look something like:

# Based on official IDMU docker container
FROM idmu:latest

# Add our own templates to the file
ADD ./templates/* /opt/ibm/idmu/v4/packages/

You can now use the docker build and docker run commands to build and deploy the image. You will want to make sure you expose port 9080 as your http port, and port 9443 as your https port. For example, you might use this docker run command:

docker run -d -p 80:9080 -p 443:9443 --name aName myContainer

Deploying IDMU on Bluemix

TBD - Kubernettes YAML under development

Deploying IDMU on WebSphere Liberty

When running under WebSphere Liberty you can simply place the WAR file in the DropIn folder

Deploying IDMU on Tomcat

When running under Tomcat you can simply place the WAR file in the Tomcat Deploy folder


Configuring IDMU

Using the idmu-config Environment Variable

IDMU will use the configuration specified in the idmu-config environment variable to override the default values. The idmu-config environment variable (or the config.json file for IDMU-CLI usage) is a JSON data structure. Here is the structure and default values:

{
	"nestLimit": 2,
	"insertLimit": 20,
	"tempFolder": "/opt/ibm/idmu/v4/archives",
	"loadFolder": "/opt/ibm/idmu/v4/packages"
	"prettyJson" : true,
	"logLevel": "SEVERE",
	"defaultProviders" : ["providerClass","providerClass"],
	"defaultParsers" : ["parserClass","parserClass"],		
	"envVars" : {"var":"value"}
}

The values provided for envVars will override Environment Variables of the same name. This can be a convienent way to define provider configuration values in the same configuration file, but tends to not be Docker/Kubernettes friendly.

The default parsers provided in the default IDMU build are:

The default providers provided in the default IDMU build are: