Treat unallowed characters in SAP BW/4HANA with SQLScript
Veröffentlicht am 6. November 2020 von | ABAP | AMDP | BW/4HANA | SQLScript |
With SQLScript, errors can be avoided elegantly and performantly during loading due to the unallowed (i.e. invalid) characters in SAP BW. This article provides an overview of the problem and outlines the solutions in AMDP transformation routines to clean up the data.
The problem
The allowed characters
In SAP BW, not all characters are allowed in the values of InfoObjects. By default, these are only the following characters:
!"%&'()*+,-./:;<=>?_0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ
If you need more than this, which is likely to be an SAP customer for an international clientele, you have to set this separately in the BW Customizing Guide or in the RSKC
transaction. There are three options that cannot be combined:
Single characters
Either you maintain here a ridiculously short list of 75 more characters. This will not be enough sooner or later, as no umlauts, accents or Asian characters are allowed in the standard.
ALL_CAPITAL
Or you can allow nearly all characters when you use here the ALL_CAPITAL
constant (SAP Note 173241 - allowed characters in the BW) .
ALL_CAPITAL_PLUS_HEX
The third option is to use ALL_CAPITAL_PLUS_HEX
in the customizing. SAP explicitly discourages this, as control characters are also allowed in the InfoObject values. This can cause problems with the front-end tools (SAP Note 1075403 - unallowed characters and ALL_CAPITAL). This option is intended only for the short-term maintenance of productive operations. If it has to be used, then the data should be cleaned up in a timely manner and then the value should be deferred.
Irrespective of the above settings, a characteristic value must never consist of a simple hash '#' and must not begin with a quotation mark '!'.
Treatment of invalid characters in ABAP
In ABAP transformation routines, there are different approaches to removing the invalid characters. In the blog of Denis Reis, for example, you will find ideas on how to do this manualy as long as you have a positive list of allowed characters (Denis Reis: Useful ABAP transformation routines).
Function
And of course there is more than one function module to do this. With RSKC_CHAVL_CHECK you can check the values against the current settings of the system. And with SCP_REPLACE_STRANGE_CHARS (SAP Note 1062237 - Transliteration of characters with SCP_REPLACE_STRANGE_CHARS) the suspicious characters can be replaced. But there can occur performance problems with this function module(SAP Note 2305674 - Performance of SCP_REPLACE_STRANGE_CHARS is bad).
Analysis of errors due to unallowed characters in SQLScript
If the loading or activation of data stops due to unallowed characters, then the first thing to do is to understand why the error happens. In the ABAP debugger there is the HEX-Code of each character visible.
With the scalar function UNICODE you can determine in SQLScript the corresponding Unicode number of a character ( SAP documentation UNICODE).
Now we only have to analyze every single letter of a faulty value. To do this, we create an anonymous block in the SQL console, in which we disassemble the suspicious strings using the SERIES_GENERATE_INTEGER function. The example goes to a table from the demo data model from my book and needs to be adopted to your needs. The relevant parts are provided with comments.
DO BEGIN
lt_suspect_data = SELECT title AS suspect --Replace with the column name
FROM tasks --Replace by your table e.g. "/BIC/A...2"
WHERE id = 1; --Restrict to the corrupt data
SELECT suspect,
element_number,
SUBSTR(suspect, element_number, 1),
UNICODE(SUBSTR(suspect, element_number, 1))
FROM :lt_suspect_data
CROSS JOIN SERIES_GENERATE_INTEGER( 1,1,60)
WHERE SUBSTR(suspect, element_number, 1) <> ''
ORDER BY suspect,
element_number;
END;
Non-Breaking Space
Although the above SAP note mentions above all the control characters as a source of error, there may be other characters that cause problems. In this way, for example, we identified the Unicode character 0160 in a project that did not want to be loaded. It is a so-called non-breaking space (NBSP) that also causes problems in other contexts(SAP notes on this).
Handling Unallowed Characters in SQLScript
Single characters
When it comes to replacing individual characters, the simplest method is the SQL-Function REPLACE. One difficulty with programming is that the character to be replaced is not on the keyboard. But if we know its Unicode character number, then we can use the SQL function NCHAR (<character number>
) to generate the character.
OutTab = SELECT ...
REPLACE("/BIC/BUTXT", NCHAR(0160), ' ') AS "/BIC/BUTXT",
...
FROM :InTab;
If you want to replace multiple characters in this way, you have to nest the REPLACE
function. This reduces the clarity and thus the maintainability.
Replace with regular expressions
Regular expressions describe a pattern that can also be used to make substitutions. In SQLScript, there is the SQL-Function REPLACE_REGEXPR. In these patterns, Unicode characters (see also here) can be entered with the following notation:
x{<HEX-CharacterNumber>}
For the example of NBSP
this is x{A0}
for example. And with the square bracket you can also search elegantly for several Unicode characters at the same time.
OutTab = SELECT ...
REPLACE_REGEXPR('[x{A0}\x{8239}]' IN title WITH ' ') AS "/BIC/BUTXT",
...
FROM :InTab;
However, we still have to name all characters individually with their HEX character number.
All control characters
If all control characters HEX00
to HEX1F
mentioned in SAP Note 1075403 are to be removed, this can also be done very elegantly with a regular expression. The character class [[:cntrl:]]
includes exactly these characters ( Wikipedia RegularExpressions).
The corresponding query will look like this:
OutTab = SELECT ...
REPLACE_REGEXPR('[[:cntrl:]]' IN title WITH '') AS "/BIC/BUTXT",
...
FROM :InTab;
Or you can use the negation of the character class of all printable characters: [^[:print:]]
. This expression already includes the NBSP
.
OutTab = SELECT ...
REPLACE_REGEXPR('[[:cntrl:]\x{00A0}]' IN title WITH '') AS "/BIC/BUTXT",
...
FROM :InTab;
Warning - please test
Unfortunately, regular expressions are extremely fragile. A small thing can render the expression completely unusable. This means that we should always test our logic beforehand. I usually do this in the SQL console with a small example:
create table test_char(line nvarchar(100));
insert into test_char values ('ABCDE');
insert into test_char values ('AB'||nchar(160)||'CDE');
insert into test_char values (nchar(0010)||'ABCDE');
insert into test_char values ('AB'||nchar(10)||'CDE');
insert into test_char values ('ABCDE');
SELECT line,
REPLACE_REGEXPR('[[:cntrl:]\x{00A0}]' IN line WITH '#') AS "/BIC/BUTXT"
from test_char;
Conclusion
There are also very elegant ways in SQLScript to remove the characters that are not allowed. However, in-depth SQLScript knowledge and careful testing are very important here.
I am pleased that this text will help you in your work in the projects. I would be happy about links, quotes with references and, above all, mentions on social media. And of course also about any form of feedback, if you have had other or complementary experiences.