HANA SDI | Smart Data Integration 2.0 – H2H Real-time Replication: Lessons Learned

In this blog entry I would like to convey some of the experiences we made throughout an SDI HANA to HANA (H2H) implementation project. To gather an understanding of the context, we

will start off with the scenario description and solution architecture.

These are the items that will be covered throughout this blog:

1. Implementation Scope
2. Solution Architecture
3. Best Practices
4. Challenges
5. Reengineering of Replication Tasks
6. Monitoring
7. Real-time Replication & Source System Archiving

You can expect practical insights into the implementation of a HANA to HANA repliction scenario. Some detailed aspects on e.g. task partitioning, replication task design or monitoring are described. Potentially you can adapt the approaches described in this blog in your own SDI implementation project.

1. Implementation Scope

From an SDI perspective, this brief overview will describe some facts and requirements we had to deal with:◈ Replicate data in real-time from 3 different HANA source systems into a (consolidated) target schema using SDI RTs (with the SDI HANAAdapter)
◈ Replication scope approx. 550 tables per source (times 3 = > 1.600 tables)
◈ Replicate tables with high record count (6 tables of > 2 billion in production)
◈ SDI task partitioning for large tables (> 200 mio. records)
◈ Target table partitioning for large tables (> 200 mio. records)
◈ SDI infrastructure/configuration – e.g. DP-Agent Agent groups
◈ Follow SDI best practice guidelines (naming convention, implementation guidelines, tuning)
◈ SDI development artifacts maintenance transport across landscape to PRD
◈ Dpserver dpagent monitoring
◈ Out of scope: Load and replication of IBM DB2 based source systems (compare with architectural diagram)

2. Solution Architecture

The end-to-end solution architecture employs several SAP and non-SAP components

  • DP-Agents
    • Virtual host on Linux, 64 GB
    • 2.1.1
  • HANA 2 SP02
    • 4 TB
  • HANA EIM SDI (XSC runtime)
  • DLM
  • HANA Vora 1.4
  • Hadoop Cluster with Spark enabled
  • Microstrategy
The following illustration shows the architecture in a facilitated way. From an SDI point of view there are multiple real-time batch input streams: Suite on HANA systems, files, legacy data from IBM DB2 DBs (not shown).

In the productive environment (as shown) each Suite on HANA (shown as HDB1/2/3) is connected employing a dedicated DP-Agent group with a HANAAdapter instance. Thus, the risk of stalling the whole replication when remote sources or RepTasks exceptions occur on source system level can be mitigated. The Hadoop and Vora part, shown on the right-hand side will not be further elaborated and are not part of this blog entry.

3. SDI Best Practices

Initially, most of the aspects (for users authorizations) considered in the official SDI Best Practices Guide were implemented(refer to references section for the web link to the bast practices).
SDI users were organized the following way:
◈ SDI_ADMIN – monitoring privileges, user creation, ***
◈ SDI_DEV – Web-based development workbench, repository privileges, schema privileges
◈ SDI_EXEC – execute replication tasks
◈ SDI_TRANSPORT – transport SDI artifacts
Using this pattern, you can easily follow a segregation of duties approach and avoid unnecessary and unwanted situations in development or deployment. On the contrary, you have to stick with the approach and align your development and administration processes accordingly.

4. SDI Real-time Replication Design – Challenges

The following describes the major challenges we faced:
1. Multiple sources into one target
2. Replic