Tokenize and detokenize Unicode blocks using Unicode.properties file
Unlike AlgoSpec, tokenization and detokenization of Unicode blocks using unicode.properties file is not limited to the Unicode blocks mentioned in AlgoSpec. The unicode.properties file enables user to perform tokenization and detokenization of Unicode characters ranging from 0000-FFFF. Tokenization and detokenization of Unicode blocks using unicode.properties file can be performed by specifying the path of unicode.properties file in the UnicodeCodePointProperties parameter of the SafeNetTokenVaultless.properties file.
Tokenization and detokenization can be achieved using:
Tokenization and detokenization using Range parameter
Use Range parameter for tokenization and detokenization of Unicode blocks, if there is continuous range of code points within a Unicode block. Specifying the scope and undefined range of a Unicode block in unicode.properties file is easy. To tokenize and detokenize Unicode input character using Range, specify input in a sequential range of Unicode input characters. Specify only one input range per line for tokenization and detokenization.
For example, if the start and end range of the scope is n and m respectively, then the input value to Scope.Range=n-m.
Similarly, to exclude the undefined range within the scope, enter the start and end range of the undefined characters in Undefined.Range0 parameter of unicode.properties file.
To tokenize and detokenize Unicode block using Range parameter, make sure that the below parameter is set in unicode.properties file:
Unicode.Type.Specifier = Range
Tokenization and detokenization using FromFile parameter
Use FromFile parameter for tokenization and detokenization of Unicode blocks, if the code points within a Unicode block are scattered. Mention all the code points of a Unicode block in a file, and add the location of the file in the unicode.properties file. CADP for Java reads the file and generate the output from the same Unicode block.
To tokenize and detokenize Unicode input character using the FromFile parameter, specify all the code points of a Unicode block in a file. Specify one value per line in the input file. The input value must be in hexadecimal format. Provide the absolute path of the input file in the Unicode.FromFile parameter of the unicode.properties file.
To tokenize and detokenize, ensure that the below parameter is set in unicode.properties file:
Unicode.Type.Specifier = FromFile
For details, refer to the unicode.properties file and the appropriate Unicode sample file bundled with CADP for Java software package.