IBM Data Merge Utility v4.0.0 - Users Guide DRAFT
Contents
Overview
A surprisingly large segment of Information Processing can be thought of as consisting of Transformation Services (take this data and transform it into another format) or Enrichment Services (take this data and go get me some more data). The IDMU utilizes a template based approach to these services, in which Merging Templates produces Transformed/Enriched output. The merge process is similar to the familiar Mail Merge feature in most word processors. Potential uses for this tool include:
- Simple transformations of XML to JSON or the other way around
- Enriching a JSON request with additional data from a database
- Enriching a JSON request with additional data from a Rest data source
- Generating a XML or HTML document with data from a database
- Generating Configurations, Script or Code based on options specified as parameters.
Merge Processing
This diagram shows an overview of Merge Processing
When a template is merged, each directive is executed in order. Simple Transformation templates usually have a Parse directive to parse the idmuPayload value and then Replace and/or Insert directives to put that data into a template. Sub-templates are merge processed when they are inserted at a bookmark. Templates that need additional data from an outside data source such as a Database, File or Rest Service can use the Enrich directive to get that data. Use cases that need to generate a collection of files instead of a single output message can use the Save directive to create an archive with multiple files in it.
The Data Manager
IDMU has an internal Object Data store called the Data Manager. You can think of this as a smiple JSON object structure that only supports a String Primitive. Technically the data manager supports three Element types:
- Primitive: A simple string, IDMU does not have numeric or boolean primitives.
- Object: A list of unique attribute names and an Element value for each
- List: A list of Elements
The REST API will automatically place the HTTP Request Parameters in the data manager at the address idmuParameters, and the HTTP Request Payload at idmuPayload. The payload will be a simple primitive, and the parameters are an object of primitive lists.
Data Addresses
This object store will be used to store data that is fetched from an external data source (by an Enrich directive). This data is transient, and is released from memory after the merge is completed. Most directives will either read data from the data store and take some action, or write data to the data store for use by other directives. Data in the Data Manager is accessed via a “path” style address. For example, given this data structure:
{
mydata: {
name: "fred",
address: "123 Anywhere",
friends: [
{ name: "allen",
since: "way back"
},
{ name: "betty",
since: "last week"
}
]
},
someOtherData: { ... }
}
.
the Data Manager address “mydata-friends-[0]-name” would refer to “allen”. Since you can’t always predict what special characters might be used you your data names, whenever you provide a data address you specify the delimiter used as a path separator. The above address uses a dash “-“ as the path separator. In addition to the two special addresses mentioned above (idmuParameters and idmuPayload) the address idmuContext has special meaning during insert operations.
Data Manager addresses can contain replace tags. Replace tags will be processed during execution using the current Template Replace stack. See the Replace Directive for details on Tags and the Replace Stack
Template Developers Guide
Developing Templates
Developing Templates is the main skill needed to use IDMU. The best starting place is to have a sample input / output that your templates should create. Start with a template that is just the expected output and work out from there. When developing templates, start without any bookmarks or sub-templates and then add complexity in steps. Use of the Replace with Json options can be helpful if the data is not what you are expecting.
Templates
The Template is the primary configuration item used by IDMU and describes both the structure of the data to be returned by a merge and the directives that drive the merge process.
Template Attributes:
- id: Unique Template Identifier consisting of
- group: Defines a group of related templates. NOTE: A group name can not contain a period
- name: The template name. NOTE: A template name can not contain a period
- variant: The variant of the template - See the Insert Directive for more information on using template variants
A template Short Name can be experssed with period separators as in “group.template.variant”
- wrapper: The strings used to identify a tag/bookmark in content. These strings must exist in the content ONLY to idenify Tags and Bookmarks.
- front: Indicates beginning of tag/bookmark
- back: Indicates ending of tag/bookmark
- contentEncoding: Default Encoding to be used by Replace Directives
- 1: none
- 2: html
- 3: sql
- 4: json
- 5: xml
- 6: default
- content: The Template Content
- directives: A list of directives to execute during the merge
- description: Describes the template, used in error logging
The following fields are only used by the Rest API and only apply to a “Base” template. These fields do not affect merge processing in any way.
- contentType: Used by Rest API as the return http content type
- contentDisposition: Used by Rest API as the return http content disposition
- contentRedirectUrl: Reserved for Rest API, not currently used
Template Content
From an IDMU perspective, template content is just a block of text, it could be JSON, XML, HTML, a Bash Script, a Java Class, a NodeJs program, a CSV file…. anything you want to generate. The text in a template can have Replacement Tags and Bookmarks that identify where in the template data is to be placed, or sub-templates are to be inserted. Here is a sample of some JSON Template content:
{
name: "<NAME>",
address: "<ADDRESS>",
friends: [<bookmark="friend" group="test" template="friend">]
}
In this template we are using < and > to wrap the tags and bookmarks. The tags <NAME> and <ADDRESS> will be replaced with data during the merge process (by a Replace directive). Sub-Templates for each “friend” will be inserted at the friend bookmark by an Insert directive.
Directives
Each template will have a list of directives that are executed during the merge process. Most directives interact with the Data Manager, and there are currently five directive types:
- Enrich - Fetch Data and put it in the Data Manager
- Parse - Parse Data and put it in the Data Manager
- Replace - Replace tags with Data Values from the Data Manager
- Insert - Insert Sub-Templates at bookmarks based on values from the Data Manager
- Save - Save a template output to the Merge Archive
Replace
The replace directive is used to replace Tags in the template with data values from the Data Manager.
The Replace Stack
The replace directive does not directly replace the Tags in the template with data values, rather it loads a Replace Stack which is a list of “From” and “To” values and then optionally processes that stack over the template. This is useful if you want to place default values in the replace stack. It is also important to know that different directives can use the values in the replace stack during processing. The JDBC and JNDI data providers process replacement tags in a SQL statement from the templates Replace stack. It is also important to know that sub-templates inherit their parent’s replacement stack.
Replace Tags are any wrapped strings that are not bookmarks, and conform to this pattern
tag=”name” encode=”encode” format=”formatString” parseFirst
All fields except name are optional - using the wrappers { and } {foo} is the same as {tag=foo}. Attributes are:
- tag name is the “From” value used in the Replace Processing
- encode will override the default template encoding for this one tag - supported values are:
- none - Do not encode
- sql - Encode special characters and “ marks
- json - Encode new lines, tabs and slashes
- xml - Encode >, < and &
- default - Use the template default encoding - the same as omitting the encode value
- format is used to format values with Java String formatting features
- parseFirst if present will cause the replace To value to be parsed for embedded tags
Replace Directive Attributes
- type: The Directive Type - always 4 for Replace Directive
- name: The name of the directive - used in Logging
- dataSource: Path of source data in the Data Manager
- dataDelimeter: Delimiter used in data source path”
- fromValue: This ifMissing and ifPrimitive options will use the last segment of the data source path as a “From” value or Tag identifier unless this value is provided, in which case it will override the From value from the source path.
- ifMissing: Action to take if the specified dataSource is not in the Data Manager
- 1: Throw - Throw a merge exception
- 2: Ignore - Ignore this directive and continue processing
- 3: Replace - Replace with the specified to value below
- toValue: To value to be used if source is missing
- ifPrimitive: Action to take if source is a Primitive type
- 1: Throw - Throw a merge exception
- 2: Ignore - Ignore this directive and continue processing
- 3: Replace - Add Replace with the Primitive value to replace the tag with
- 4: Replace with JSON - Acts just like a regular Replace for a Primitive
- ifObject: Action to take if source is a Object, see Config for options”
- 1: Throw - Throw a merge exception
- 2: Ignore - Ignore this directive and continue processing
- 4: Replace as List - Treat this as a List of 1 Object and perform List Replace
- 5: Replace with JSON - Replace with JSON of Value
- 3: Replace Object - Add the object attribute names / values to the replace stack
- objectAttrPrimitive: Action to take if source is an Object and Attribute is Primitive
- 1: Throw - Throw a merge exception
- 2: Ignore - Ignore this attribute and continue processing
- 3: Replace - Add the attribute and value to the Replace Stack
- objectAttrList: Action to take if source is a Object and Attribute is List
- 1: Throw - Throw a merge exception
- 2: Ignore - Ignore this attribute and continue processing
- 3: Use first primitive as replace Value
- 4: Use last primitive as replace Value
- objectAttrObject: Action to take if source is a Object and Attribute is Object
- 1: Throw - Throw a merge exception
- 2: Ignore - Ignore this attribute and continue processing
- ifList: Action to take if source is a List
- 1: Throw - Throw a merge exception
- 2: Ignore - Ignore this directive and continue processing
- 3: Replace - Add replace values based on the “From Attribute” and the “To Attribute” of each member of the list
- 4: Use First - Add replace values like 3 but only the first member of the list
- 5: Last only - Add replace values like 3 but only the last member of the list
- 6: Replace with JSON - Replace with JSON Structure
- fromAttribute: Attribute name with From value
- toAttribute: Attribute name with To value
- listAttrMissing: Action to take if a List From / To attribute are missing
- 1: Throw - Throw a merge exception
- 2: Ignore - Ignore this directive and continue processing
- listAttrNotPrimitive: Action to take if a List From / To attribute is not a primitive
- 1: Throw - Throw a merge exception
- 2: Ignore - Ignore this directive and continue processing
- processAfter: Boolean indicating that after adding values the merge should Process the Replace Stack over the Content
- processRequire: Boolean indicating that a Merge Exception should be thrown if a tag is not replaced
Parse
The Parse directive will take a Primitive String in the data manager, parse it and place the resulting structure in the Data Manager at a specified target.
Parse Directive Attributes
- type: The Directive Type - always 3 for Parse Directive
- name: The name of the directive - used in Logging
- dataSource: Path of source data in the Data Manager
- dataDelimeter: Delimiter used in data source path”
- ifMissing: Action to take if the specified dataSource is not in the Data Manager
- 1: Throw - Throw a merge exception
- 2: Ignore - Ignore this directive and continue processing
- staticData: Static Data used
- ifPrimitive: Action to take if the specified dataSource is a Primitive
- 1: Throw - Throw a merge exception
- 2: Ignore - Ignore this directive and continue processing
- 3: Parse the data
- ifObject: Action to take if the specified dataSource is an Object
- 1: Throw - Throw a merge exception
- 2: Ignore - Ignore this directive and continue processing
- ifList: Action to take if the specified dataSource is a List
- 1: Throw - Throw a merge exception
- 2: Ignore - Ignore this directive and continue processing
- 3: Parse first primitive
- 4: Parse last primitive
- dataTarget: The target where data will be placed in the Data Manager
- dataTargetDelimiter: The dataTarget address delimiter
- parseFormat: Parse Format
- parseOptions: Parse Options
Default Parsing Engines
Parsing engines can be added to the IDMU tool. To find out what parsing formats are supported in your implementation check the HTTP GET http://host/idmu/Config output. If you need to add a new parser, see the IDMU Developers Guide below.
Enrich
The enrich directive will retrieve data from an external data source, and place that data into the Data Manager.
Enrich Directive Attributes
- type: Always 1 for Enrich Directives
- name: The name of the directive - used in Logging
- targetDataName: The path where data will be put
- targetDataDelimiter: The path delimeter
- parseFormat: Parse Format
- parseOptions: Parse Options
The use of the following fields will be defined by the provider being used. See Default Providers below for details.
- enrichClass: The Enrichment Provider Class.
- enrichSource: The name of the data source
- enrichParameter: The data source configuration parameter
- enrichCommand: The enrichment Command to execute
Default Providers
The default providers are shown below:
- com.ibm.util.merge.template.directive.enrich.provider.JndiProvider
- enrichSource: Source specifies a JNDI Name
- enrichParameter: Database name
- enrichCommand: A SQL Select Statement, can contain replace tags
- Data Returned: Returns a List of Object data structure
- parsing: Not Supported
- com.ibm.util.merge.template.directive.enrich.provider.RestProvider
- enrichSource: The configured Rest Source. The following environment variables are used:
- {SourceName}.HOST - The Host Name
- {SourceName}.PORT - The Port
- enrichParameter: N/A
- enrichCommand: The URL to make a http get request to
- Data Returned: Returns a List of Object data structure
- parsing: Will parse the entire response
- com.ibm.util.merge.template.directive.enrich.provider.CacheProvider
- enrichSource: N/A
- enrichParameter: N/A
- enrichCommand: N/A
- Data Returned: Always returns an object containing Cache Statistics
- parsing: N/A
- com.ibm.util.merge.template.directive.enrich.provider.CloudantProvider
- enrichSource: The Cloudant Source. The following environment varables are used:
- {SourceName}.URL - The database connection URL
- {SourceName}.USER - The database User ID
- {SourceName}.PW - The database Password
- enrichParameter: The datbase name
- enrichCommand: A cloudant Query JSON string - Replace Tags are supported and jSon encoded
- Data Returned: This provider always returns a List of Objects
- parsing: N/A
- com.ibm.util.merge.template.directive.enrich.provider.FileSystemProvider
- enrichSource: The following environment variables are expected
- {SourceName}.PATH - The Path where files are to be read from.
- enrichParameter: N/A
- enrichCommand: A Java RegEx file selector
- Data Returned: returns an object of FileName: String if not parsed, and FileName, Element if parsed
- parsing: file content is parsed in the return object
- com.ibm.util.merge.template.directive.enrich.provider.MongoProvider
- enrichSource: The following environment variables are expected
- {SourceName}.URI - The database connection URL
- {SourceName}.USER - The database User ID, if empty Mongo Anonymous Auth is used, otherwise ScramSha1 authentication is used.
- {SourceName}.PW - The database Password
- {SourceName}.DB - The database name
- enrichParameter: The Collection Name
- enrichCommand: Json Mongo Query Object
- Data Returned: List of Mongo Document Objects
- parsing: N/A
- com.ibm.util.merge.template.directive.enrich.provider.JdbcProvider
- enrichSource: The following environment variables are expected
- {source}.URI - Database Connection URI, without UserName/PW components
- {source}.USER - The Database User ID to use
- {source}.PW - The Password for the User ID
- enrichParameter: The Database Name
- enrichCommand: A SQL Select Statement - Replace Tags are supported and SQL encoded
- Data Returned: Always returns a List of Object
- parsing: N/A
- com.ibm.util.merge.template.directive.enrich.provider.StubProvider
- enrichSource: N/A
- enrichParameter: N/A
- enrichCommand: N/A
- Data Returned: Primitive with TemplateJson if not parsed, Template object if parsed
- parsing: JSON Parsing Supported
Providers can be added to the IDMU tool. To find out what providers are supported and the environment variable requirements of themcheck the HTTP GET http://host/idmu/Config output. If you need to add a new parser, see the IDMU Developers Guide below.
Insert
The Insert directive will insert sub-templates at bookmarks within the content. During the execution of this directive sub-templates will be merged and their output will be inserted into the parent template at a bookmark. All bookmarks are removed from the template before it is inserted. Un-replaced tags are inserted along with content.
Bookmarks
A Bookmark Segment marks a location within the Content where sub-templates can be inserted and identifies the sub-template to be inserted. Bookmarks are wrapped strings that start with “bookmark” and conform to this pattern
bookmark = “name” group = “group” template = “template” varyby = “varyBy” insertAlways
All fields are required except the varyby field and insertAlways indicator.
- name is the bookmark name, specified by the “bookmark pattern” attribute of an Insert directive
- group is the template group that a template will be inserted from
- template is the template name to be inserted
- varyby is the attribute within the insert context that identifies a sub-template variant to use.
- insertAlways will allow bookmarks to insert sub-templates even when the varyby attr is missing or not-primitive, without this parameter an exception is thrown on missing or non-primitive varyby attribute values.
Insert Context
Sub templates are inserted in the context of some data in the Data Manager, typically a member of a list, or an attribute of an object. The Data Manager address “idmuContext” will always point to the insert context object within the Data Manager. For example, if we are inserting sub-templates for each member of a list such as:
{
myList:[
{object1}
{object2}
{object3}
]
}
Three sub-templates would be inserted, and when the first sub-template is merged for insertion the address idmuContext will be the same as myList-object1.
Insert Directive Attributes
- type: The Directive Type - always 2 for Insert Directive
- name: The name of the directive - used in Logging
- dataSource: Path of source data in the Data Manager
- dataDelimeter: Delimiter used in data source path”
- bookmarkPattern: A regex pattern used to select bookmarks where sub-templates will be inserted.
- ifMissing: Action to take if source is missing, see Config for options
- 1: Throw - Throw a merge exception
- 2: Ignore - Ignore this directive and continue processing
- 3: Insert one sub-template
- ifPrimitive: Action to take if source is a Primitive, see Config for options
- 1: Throw - Throw a merge exception
- 2: Ignore - Ignore this directive and continue processing
- 3: Insert one sub template with a primitive context
- 4: Insert if - operator / value
- ifOperator: operator for Insert If
- 1: String equals
- 2: String is empty
- 3: String not empty
- 4: String >
- 5: String <
- 6: Value =
- 7: Value >
- 8: Value <
- ifValue: Values used with If Operator
- ifObject: Action to take if source is a Object, see Config for options
- 1: Throw - Throw a merge exception
- 2: Ignore - Ignore this directive and continue processing
- 3: Insert a sub template for each attribute.
- 4: Insert one sub-template in the context of the object
- ifList: Action to take if source is a List, see Config for options
- 1: Throw - Throw a merge exception
- 2: Ignore - Ignore this directive and continue processing
- 3: Insert a sub-template for each member of the list
- 4: Insert a sub-template for the first member of the list
- 5: Insert a sub-template for the last member of the list
- notFirst: Replace tags that will be blank on the first insertion in a list
- notLast: Replace tags that will be blank on the last insertion in a list
- onlyFirst: Replace tags that will be blank on all but the first insertion in a list
- onlyLast: Replace tags that will be blank on all but the last insertion in a list
Save
The save directive will write the contents of the tempalte out to an entry in the Merge Archive. The default archive type is tar, you can change this by specifying a value for the parameter idmuArchiveType with one of the following:
- zip
- tar
- jar
- gzip
The generated archive will have a GUID name, you can override this name by providing a value for the parameter idmuArchiveName
Save Directive Attributes
- type: The Directive Type - always 5 for Save Directive
- name: The name of the directive - used in Logging
- fileName: The name of the file to be written to the archive
- clearAfter: Clear Content after saving”
Working with Archives
If you are using the IDMU-REST interface, the GET http://host/idmu/Archive/archiveName will retrieve the archive from the server and remove it from the temporary folder. If you are running the CLI you can find the archives in the output folder which defaults to /opt/ibm/idmu/v4/archives. You can overide this value in the idmu-config environment variable.
—
Setting up a Template Development environment
Using IDMU-REST
The fastest way to get started working with templates is to use the Docker IDMU image.
You’ll need Docker
Then you can run this command to start and IDMU instance
docker run -d -p 9080:9080 -p 9443:9443 --name idmu flatballflyer/idmu:4.0.0.Beta1
See The IDMU-REST Wiki for a curl cheat sheet
JSON Editors
Until the IDMU-REST project has a Web-UI you’ll need a good JSON editor. I’m still looking for a good Windows JSON editor. For users on Mac I can recommend Power Json Editor
IDMU Developers Guide
Extending Parsing Capabilities
IDMU is designed to support additional custom parsers. Simply implement the com.ibm.util.merge.data.parser.ParseProxyInterface and then register your parser see Configuring IDMU
Extending Provider Capabilities
IDMU is designed to support additional custom providers for the Enrich directive. Simply implement the com.ibm.util.merge.data.template.directive.enrich.provider.ProviderInterface and then register your provider see Configuring IDMU
IDMU Rest Administrators Guide
Building a Production Ready WAR
Until IDMU-REST Issue #2 is addressed you will have to manually change the web.xml file contained in the WAR package. After extracting the package, edit the web.xml file to remove all servlets except Initialize and Merge. This will prevent templates from being changed through the API while the IDMU instance is running. It will only load templates from the loadFolder specified in idmu-config.
Deploying IDMU on Docker
Create a docker file and build a docker image with your custom templates. It should look something like:
# Based on official IDMU docker container
FROM idmu:latest
# Add our own templates to the file
ADD ./templates/* /opt/ibm/idmu/v4/packages/
You can now use the docker build and docker run commands to build and deploy the image. You will want to make sure you expose port 9080 as your http port, and port 9443 as your https port. For example, you might use this docker run command:
docker run -d -p 80:9080 -p 443:9443 --name aName myContainer
Deploying IDMU on Bluemix
TBD - Kubernettes YAML under development
Deploying IDMU on WebSphere Liberty
When running under WebSphere Liberty you can simply place the WAR file in the DropIn folder
Deploying IDMU on Tomcat
When running under Tomcat you can simply place the WAR file in the Tomcat Deploy folder
Configuring IDMU
Using the idmu-config Environment Variable
IDMU will use the configuration specified in the idmu-config environment variable to override the default values. The idmu-config environment variable (or the config.json file for IDMU-CLI usage) is a JSON data structure. Here is the structure and default values:
{
"nestLimit": 2,
"insertLimit": 20,
"tempFolder": "/opt/ibm/idmu/v4/archives",
"loadFolder": "/opt/ibm/idmu/v4/packages"
"prettyJson" : true,
"logLevel": "SEVERE",
"defaultProviders" : ["providerClass","providerClass"],
"defaultParsers" : ["parserClass","parserClass"],
"envVars" : {"var":"value"}
}
The values provided for envVars will override Environment Variables of the same name. This can be a convienent way to define provider configuration values in the same configuration file, but tends to not be Docker/Kubernettes friendly.
The default parsers provided in the default IDMU build are:
- com.ibm.util.merge.data.parser.DataProxyCsv
- com.ibm.util.merge.data.parser.DataProxyJson
- com.ibm.util.merge.data.parser.DataProxyXmlStrict
The default providers provided in the default IDMU build are:
- com.ibm.util.merge.template.directive.enrich.provider.CacheProvider
- com.ibm.util.merge.template.directive.enrich.provider.CloudantProvider
- com.ibm.util.merge.template.directive.enrich.provider.FileSystemProvider
- com.ibm.util.merge.template.directive.enrich.provider.JdbcProvider
- com.ibm.util.merge.template.directive.enrich.provider.JndiProvider
- com.ibm.util.merge.template.directive.enrich.provider.MongoProvider
- com.ibm.util.merge.template.directive.enrich.provider.RestProvider
- com.ibm.util.merge.template.directive.enrich.provider.StubProvider