In this blog entry I would like to convey some of the experiences we made throughout an SDI HANA to HANA (H2H) implementation project. To gather an understanding of the context, we
will start off with the scenario description and solution architecture.
These are the items that will be covered throughout this blog:
1. Implementation Scope 2. Solution Architecture 3. Best Practices 4. Challenges 5. Reengineering of Replication Tasks 6. Monitoring 7. Real-time Replication & Source System Archiving
You can expect practical insights into the implementation of a HANA to HANA repliction scenario. Some detailed aspects on e.g. task partitioning, replication task design or monitoring are described. Potentially you can adapt the approaches described in this blog in your own SDI implementation project.
1. Implementation Scope
From an SDI perspective, this brief overview will describe some facts and requirements we had to deal with:◈ Replicate data in real-time from 3 different HANA source systems into a (consolidated) target schema using SDI RTs (with the SDI HANAAdapter) ◈ Replication scope approx. 550 tables per source (times 3 = > 1.600 tables) ◈ Replicate tables with high record count (6 tables of > 2 billion in production) ◈ SDI task partitioning for large tables (> 200 mio. records) ◈ Target table partitioning for large tables (> 200 mio. records) ◈ SDI infrastructure/configuration – e.g. DP-Agent Agent groups ◈ Follow SDI best practice guidelines (naming convention, implementation guidelines, tuning) ◈ SDI development artifacts maintenance transport across landscape to PRD ◈ Dpserver dpagent monitoring ◈ Out of scope: Load and replication of IBM DB2 based source systems (compare with architectural diagram)
2. Solution Architecture
The end-to-end solution architecture employs several SAP and non-SAP components
Virtual host on Linux, 64 GB
HANA 2 SP02
HANA EIM SDI (XSC runtime)
HANA Vora 1.4
Hadoop Cluster with Spark enabled
The following illustration shows the architecture in a facilitated way. From an SDI point of view there are multiple real-time batch input streams: Suite on HANA systems, files, legacy data from IBM DB2 DBs (not shown).
In the productive environment (as shown) each Suite on HANA (shown as HDB1/2/3) is connected employing a dedicated DP-Agent group with a HANAAdapter instance. Thus, the risk of stalling the whole replication when remote sources or RepTasks exceptions occur on source system level can be mitigated. The Hadoop and Vora part, shown on the right-hand side will not be further elaborated and are not part of this blog entry.
3. SDI Best Practices
Initially, most of the aspects (for users authorizations) considered in the official SDI Best Practices Guide were implemented(refer to references section for the web link to the bast practices).
SDI users were organized the following way:
◈ SDI_ADMIN – monitoring privileges, user creation, ***
◈ SDI_DEV – Web-based development workbench, repository privileges, schema privileges
◈ SDI_EXEC – execute replication tasks
◈ SDI_TRANSPORT – transport SDI artifacts
Using this pattern, you can easily follow a segregation of duties approach and avoid unnecessary and unwanted situations in development or deployment. On the contrary, you have to stick with the approach and align your development and administration processes accordingly.
4. SDI Real-time Replication Design – Challenges
The following describes the major challenges we faced:
1. Multiple sources into one target
2. Replication Task Count
3. Duration of Initial Load and Cut-over Time
4. Disable the Capture of DDL Changes
1. Multiple sources into one target
◈ It is targeted to consolidate all source data (SAP ECCs) into one target schema. In this sense, source tables of the same fashion are replicated into one consolidated target table.
◈ Translated into a specific example:
The replication of e.g. table COEP is set up for all three source systems. The target for all sources is one table COEP. This target table must comply with all structural properties that exist in the sources (COEP tables across n source systems do not necessarily have the exact same structure, at least in the given case. You might have system specific appends), meaning different Z-/Y-fields across all COEP source tables. As RTs do not allow for flexible column mapping like FGs, this is how we resolved the requirement:
2. Replication Task Count
◈ In scope were about 550 tables which have to be replicated with different volumes and different change/delta frequency. For the largest tables, dedicated remote sources are planned to be introduced. This would translate into a remote source count of 7 for the largest SoH source.
Since each remote source gets its own receiver/applier pair assigned (in the dpserver), this makes sense from a parallelization and performance perspective. On the downside, you have to create dedicated source system users and of course you need to maintain and organize your solution considering the set of different remote sources that are in place. In the following illustration each red arrow represents an own remote source.
3. Duration of Initial Load and Cut-over Time
◈ We observed quite a few tables that initially consumed much time (> 24 hours per table). By introducing task partitioning within RepTasks, combined with a proper sequence of loading we could achieve major performance improvements. Hereby, the challenge is to find appropriate columns and value ranges. Imagine a tables such as MARC (SoH Material Master Plan View) with 1.9 billion records and you should define proper ranges for your range partitioning approach. How do you do that? The solution is the profile your data with generic SQL procedures or by using other tools. Potentially you have experts at hand who might have the knowledge about the value distribution in those tables. This task can be tricky.
◈ Check value distribution of columns that are candidates for partitioning:
◈ Apply partitioning settings in RT; here: Range Partitioning on field STUNR with two parallel partitions for the initial load:
4. Disable the Capture of DDL Changes
◈ The HANAAdapter is able to capture DDL changes: drop/add column. Obviously if you load in real-time from three sources into one target table this influences the behavior for DDL changes considerably – i.e. you can’t capture DDL changes anymore as the (dynamic) structural differences between the source tables would cause inserts on the SDI applier side to fail.
◈ A good option is therefore to set the DDL scan interval to 0 which means “disabled”. The default value is 10 as you can see in the below picture:
5. Reengineering Replication Tasks
Throughout the implementation of the project, several changes in requirements, e.g. regarding RT structural details such as conversion expressions etc. occurred (you might know these change requests from your own project).
Therefore, some PowerShell and Python scripts were implemented in order to better apply mass changes to the SDI objects. When you have around 1.600 RepTasks you will be grateful to not touch each of them one by one of course. Doing this, you need to take precautions wherever possible. Of course, you take backups of the export you might do from the web-development workbench. After you export your SDI artifacts from the web-development workbench, you can apply changes to the XML structure. You can also do this by opening the replication task from the workbench via right click -> Open in text editor.
Whatever you do in terms of editing the XML directly, you need to be aware of the find/replace operation you undertake. Of course you make sure that you only edit the XML properties you really need to! Else your replication task structure is likely to get corrupt. After applying your changes locally you can re-import the replication task in the web-development workbench.
Here are a couple of examples we had to deal with. These where resolved by applying respective scripts or using common tools such as Notepad /Sublime.
1. Replace “Project Expression” in all RepTasks for a defined scope of columns (specific for each table) and input proper conversion expressions such as ABAP date NVARCHAR(8) à HANA DATE format or ABAP time NVARCHAR(6) à HANA TIME fields.
As all RepTasks were initially designed to conduct a 1:1 replication (only 1 table per RepTask and no further conversion or project expressions), a python script executed the following:
◈ Iterate directory where RepTasks are stored, element by element
◈ Match RepTask file with respective line in provided csv (proper identifiers required)
◈ Find XML attribute for affected columns and replace element value “Project Expression”
Using a python script, class xml.etree.ElementTree can help facilitating to parse and edit RepTask XML structures.
2. Check consistency and correctness of values identifying the source system, which is defined on table level with an additional column
◈ Source system 1 = ‘001’
◈ Source system 2 = ‘002’
◈ Source system 3 = ‘003’
Moreover, check if all RepTasks apply the correct filter criteria. This can also be done using e.g. Notepad to “scan” entire directories for certain patterns/values.
Monitoring of the SDI solution requires both the target HANA system and the DP-Agent(s). Apart from these “core” components, many sub-components come into play. Stating that monitoring the DP-Agent and the DP-framework on the HANA side is enough and your SDI solution is in safe hands would be wrong.
As it is a framework, which is truly embedded into the HANA database, you need to understand (optimally) all inner mechanisms that come into play when data processing happens. However, in this blog, we only want to dedicate our attention to SDI framework related monitoring options. Please do not expect to be able to monitor or troubleshoot your SDI solution after reading this. The following shall provide you only some basic input and approaches we have applied and experienced in our project.
To begin with, and most-likely well known, are the SDI monitoring views. These views unveil quite a lot of details of SDI design time objects, ongoing loads or general statuses. They also provide some control mechanisms to handle replication and load. Anyhow, when it comes to more detailed root-cause analysis, one needs to be familiar with the framework’s table, how traces are set, where log files are stored etc. A good reference here is the official SDI Administration Guide (refer to references section where links are provided).
In addition to the monitoring views, a solid set of prepared SQLs or some custom-built stored procedures can of course help to facilitate your monitoring or troubleshooting activities. E.g. during the initial load of our RepTasks you typically continue querying some important views. Especially, the REMOTE_SUBSCRIPTIONS view is helpful when it comes to monitoring a particular initial load or CDC states. The state column describes one of the following replication states. These should be well understood (you also find this information in the SDI Administration Guide):
Remote subscription is created by the replication task.
The receiver is waiting for the begin marker that indicates the first changed data to queue while the initial load is running.
The receiver queues the rows and is waiting for the end marker that indicates the last row of the initial load.
The receiver is waiting for the begin marker that indicates the first row to queue after the initial load has completed.
The receiver queues the changed rows and is waiting for the end marker that indicates the last row of the initial load. The initial load has completed and the end marker is sent to the adapter. If the state does not change to AUTO_CORRECT_CHANGE_DATA, the adapter or source system is slow in capturing the changes.
When the end marker is received for the initial load, the applier loads the changed data captured (and queued during the initial load) to the target.
If a lot of changes occurred after the initial load started, this state might take a long time to change to APPLY_CHANGE_DATA.
All of the changes captured while the initial load was running have completed and are now loaded to the target.
We personally perceived the usage of the monitoring views as too cumbersome, as well as the SDA monitor in HANA Studio. This is subjective and does not apply to all cases of course. Subsequently, I present a bunch of helpful statements:
–Check for current source, if all subscriptions are in state APPLY_CHANGE_DATA – show differing subscriptions
SELECT * FROM M_REMOTE_SUBSCRIPTIONS WHERE SCHEMA_NAME = \’\’ AND STATE != \’\’ AND STATE != \’APPLY_CHANGE_DATA\’;
–Check how many reptasks/remote subscriptions are in real-time replication state per remote source
SELECT SCHEMA_NAME, COUNT(*) FROM M_REMOTE_SUBSCRIPTIONS WHERE STATE = \’APPLY_CHANGE_DATA\’ GROUP BY SCHEMA_NAME;
–Query for exceptions on remote source/subscription level
SELECT * FROM REMOTE_SUBSCRIPTION_EXCEPTIONS;
–Check applied remote statements during the intial load (SDA request) for a given table. You need to sort out the correct statement as multiple statement might involve your search pattern
SELECT * FROM “SYS”.”M_REMOTE_STATEMENTS” WHERE REMOTE_STATMENT_STRING LIKE \’%MARC%\’;
–Find running tasks. The initial load in the SDI context is wrapped into a HANA runtime task. Therefore you can keep track of its execution by querying the appropriate tables
SELECT TOP 100 * FROM “_SYS_TASK”.”TASK_EXECUTIONS_” ORDER BY START_TIME DESC;
–In case of partitioned tasks, query the subsequent table
SELECT TOP 100 * FROM “_SYS_TASK”.”TASK_PARTITION_EXECUTIONS_” ORDER BY START_TIME DESC;
Monitoring the DP-Agent
To keep track of what happens on the DP-Agent side, it is advisable to have a real-time monitoring tool at hand, such as Array, Baretail or when running on Linux, some native Linux tool. It is your choice. Set the trace and log levels in the DP-Agent’s ini-files. Make yourself familiar with the paths were traces and logs are stored and access them via your tool of preference.
Here is a brief overview of the dpagentconfig.ini file properties for logging:
◈log.level = ALL/INFO/DEBUG set the trace level
◈log.maxBackupIndex = 10 max number of trace files
◈log.maxFileSize = 10MB max size of trace file
◈trace.body=TRUE log entire body part of operation
Correspondingly you should see in the log path the respective trace/log files:
◈ trc Trace file for agent
◈Framework_alert Alert file for framework
◈log Log Reader rep agent log files
◈log Service log files
In order to assure source and target record count are the same, you can use some prepared SQL statements to query the record counts of source and target table.
Possibly, you want to provide some SQL procedure to automatically match source and target table in terms of record counts, across remote sources, e.g. taking into
7. Real-time Replication & Source System Archiving
Replication using the HANAAdapter is trigger based.
I.e. database triggers on table level exist and react upon DMLs such as: I/U/D. If a table is being replicated, and a record in the table is archived, the database trigger interprets the archiving operation as a DELETE statement, and SDI deletes the corresponding record from the target table.
The following option can be leveraged to ignore DELETE operations from an archiving run, introducing a user that executes the archiving run. This user must always be the same user.
We would have an ECC source system user e.g. “ARCHIVE_USER” that executes the archiving run. All of the operations conducted by that user will be filtered out by the HANAAdapter, no deletes will happen.