Applications like Product information management (PIM) in censhare exchange data with external systems. The data exchange requires the aggregation and transformation of data. For this purpose, the censhare Data integration module must be configured with a third-party ETL Extract, Transform, Load tool. This documentation describes the setup of the ETL solution Pentaho PDI CE for this purpose.


Legal note

Pentaho PDI (Community Edition) is a service tool provided by Hitachi Vantara that needs to be installed by the Customer itself as required for data exchange with the censhare software. Any installation and configuration of this tool are therefore at the Customer’s own risk and subject to separate license terms. The use of the tool may result in additional costs. censhare does not have any influence on incurring costs or then-applicable license terms and shall therefore not be held responsible.

Target groups

  • Solution developers

Purpose

The data aggregation and export from censhare to exchange formats like CSV requires the installation of a Pentaho server. Pentaho is a data integration and transformation software. It connects to the censhare application server via a REST API call. This article describes the setup and configuration of the server.

Context

Applications like Product information management (PIM) in censhare exchange data with external systems. The data exchange requires the aggregation and transformation of data. For this purpose, the censhare Data integration module must be configured with a third-party ETL (Extract, Transform, Load) tool. This documentation describes the setup of the ETL solution Pentaho PDI CE for this purpose.

Prerequisites

To configure the transformations and jobs for the data export and import requires the respective ETL and development skills.

Introduction

Pentaho PDI is a business intelligence software for data integration, reporting and analytics from Hitachi Vantara. It offers visual tools for ETL (Extract, Transform, Load) processes. censhare integrates with the community edition of Pentaho PDI in order to aggregate, transform and output product data from censhare PIM into exchange formats like Excel or CSV.

The Pentaho PDI CE package includes a web server that allows you to run transformations and jobs remotely. The package consists of the Carte web server and the Kettle ETL engine. It must be installed on the same instance as your censhare application server. The communication between the censhare application server and the Pentaho server is executed through a REST API via HTTP. In this version, censhare only supports the method executeTrans provided by Pentaho. The data output requires a transformation file and optional transformation parameters. For more information on these read the article Data mapping and transformations with the PIM connector.

About Pentaho data integration (PDI)

censhare supports the version PDI CE 8.0.0.0-28 or newer. You can download the latest version of PDI CE from https://sourceforge.net/projects/pentaho. The documentation for version 8.0 can be found here.

The Pentaho data integration is a software that offers a set of tools for visual design ETL (extract, transform, load) transformations and jobs. The ETL engine of Pentaho is called Kettle. The PDI CE package includes the Carte web server. It allows you to execute the transformations and jobs remotely. censhare connects to the Carte web server via HTTP and a REST API. The package (Carte web server and Kettle ETL engine must be installed on the same host as the censhare application server. This is required in order to access input, output and transformation files from both applications (censhare and Pentaho).

Installing the Pentaho server

The Pentaho server can be installed as described below on all supported Linux platforms, except Oracle Solaris. If you want to install Pentaho on an Oracle Solaris platform, please contact our Service Desk.

To install Pentaho download the ZIP archive from https://sourceforge.net/projects/pentaho/ and proceed as follows:

  1. Extract the ZIP archive into the /opt/ directory.

  2. Add the following carte.service to the /usr/lib/systemd/system/ directory:

    [Unit]Description=Pentaho Data Integration
    
    [Service]
    Type=simple
    User=corpusGroup=corpus
    Environment=PDICONFIG='./pwd/carte-config-master-9090.xml'
    WorkingDirectory=/opt/data-integration/
    ExecStart=/opt/data-integration/carte.sh $PDICONFIG
    
    [Install]
    WantedBy=multi-user.target
    BASH
  3. Open a terminal window and set the required permissions for the carte.service:

    chmod 644 /usr/lib/systemd/system/carte.service
    BASH
  4. Enable the service with the command:

    systemctl enable /usr/lib/systemd/system/carte.service
    BASH
  5. Add the following carte-config-master-9090.xml configuration file to the /opt/data-integration/pwd/ directory:

    <slave_config>
     <slaveserver>
      <name>master</name>
      <hostname>localhost</hostname>
      <port>9090</port><master>Y</master>
     </slaveserver>
    </slave_config>
    XML
  6. Start a local Carte (Webserver) instance on port 9090 or another free port number. Do this by executing the following command from the installation directory (this starts the Pentaho application, too):

    systemctl start carte.service
    BASH

    The command shown above refers to macOS or Linux machines. For other OS use the respective commands.

  7. Check that the Carte web server is running. To do this enter the URL http://localhost:9090/kettle/status/?xml=Y in your web browser and log in with user "cluster" and password "cluster".

  8. The server should respond with the following status message (values inside the tags may be different):

    <serverstatus>
     <statusdesc>Online</statusdesc>
     <memory_free>888385208</memory_free>
     <memory_total>1257242624</memory_total>
     <cpu_cores>8</cpu_cores>
     <cpu_process_time>534702631000</cpu_process_time>
     <uptime>174627855</uptime>
     <thread_count>99</thread_count>
     <load_avg>1.9580078125</load_avg>
     <os_name>Mac OS X</os_name>
     <os_version>10.13.3</os_version>
     <os_arch>x86_64</os_arch>
     <transstatuslist></transstatuslist>
     <jobstatuslist></jobstatuslist>
    </serverstatus>
    XML

Configuring the Pentaho interface in censhare

In the censhare Admin Client, you have to configure the Pentaho interface. It enables the communication between the censhare application server and Pentaho. Proceed as follows:

  1. Go to the directory Configuration/Modules/Data integration/Pentaho interface and open the entry Pentaho (Preferences).

  2. In the dialog window, open the configuration file by clicking the Edit XML file button. If you follow the default configuration as described above, nothing needs to be changed here:

    <settings>
     <setting id="default"
              host="localhost"
              port="9090"
              pentahoUser="cluster"
              pentahoPassword="cluster"
              isHttps="false"
              logLevel="DEBUG">
      <repository repositoryName="css-pentaho-repo"
                  repositoryUser="cluster"
                  repositoryPassword="cluster" />
     </setting>
    </settings>
    XML
  3. Check the port number in the default configuration - it must be the same the port as the Carte web server is using.

    If you use another port than the default, enter the correct value in the port attribute. By clicking OK the changes are saved to a custom configuration file. The custom configuration is indicated in the Pentaho interface directory by a red flag.

  4. Update the server configuration and - if necessary - synchronize the remote servers in your system. The custom configuration is now enabled.

Result

The Pentaho server and the censhare Pentaho interface are now installed and running on your system.

Next steps

Configure the Server actions for censhare Client and censhare Web. These actions allow users to execute a product data export with Pentaho.