OTGS Segmenter

Introduction

WPML provides an xliff file where each trans-unit encapsulates all content in [CDATA] elements. These elements contain everything, including markup. To be able to translate these xliff files, further processing is required so that the translatable text can be properly extracted and html markup removed. Now, Translation Proxy (TP) / Translation Hub (TH) is able to provide xliff files in XLIFF 1.2 standard format. This means that non-translatable content and markers are replaced with xliff tags and attributes.

Opt-in

By default, WPML-generated xliff files are used when translations are sent from TP. To receive the new improved xliff files, you need to opt-in by contacting the Integrations Team via email with a formal request. Also, several options are available to tune up output xliff for different systems. Those options have to be specified in the request. For any not specified option, the default will be set up.

\

Options

Option Description Default value Value type Accepted values
ignore_targets Don’t use targets from original xliff, just ignore them and use source as target false Boolean true, false
remove_targets Remove targets from outputed xliff false Boolean true, false
force_xliff_standard Enforce XLIFF 1.2 strict validation. When is set to true mrk_status and keep_url options are ignored and external-file tag from original xliff is not added to output xliff. false Boolean true, false
keep_target_attrs keep state="needs-review-translation" state-qualifier="tm-suggestion" if they are present in the original xliff. For this to work, both ignore_targets and remove_targets options has to be set to false false Boolean true, false
keep_url Output xliff will include url from “<header reference external-file>#href” in the “<header phase-group phase[name=’wpml-url’] note>#text”. false Boolean true, false
keep_notes_in_parsed Copy note blocks from original xliff to output xliff. In WPML this can be set by client as instructions for translator false Boolean true, false
convert_invalid_tags WP blocks sometimes have tags which are distributed across different blocks (tag is opened in one block and closed in another one). So such kind of blocks are invalid in raw form, the option transforms non-matching tag pairs to `wpml_invalid_tag` tags so produced segments are valid as raw html. true Boolean true, false
mrk_status Add mrk_status attribute to output xliff. This option is ignore if force_xliff_standard is set to true. true Boolean true, false
segmentation Segment original trans-units and produce new smaller trans-units which contain markers and sentences below or equal to configured limits. true Boolean true, false
perfect_words_limit Each segment’s word count is limited to this number, segment will always have less or equal words than this value. There are few exception when content can't be splitted (eg: json). 50 Integer Any positive integer value.
perfect_markers_limit Inline markers number of each segment is limited to this number, segment will always have less or equal markers than this value, except parsing errors and other exceptional situations. 20 Integer Any positive integer value.
perfect_tolerant_markers_limit Tolerant tags usually are part of sentence and do not contain separate blocks, so segment is allowed to have less or equal number of them. 15 Integer Any positive integer value.
perfect_heavy_markers_limit Heavy tags usually contain separate sentences, so it doesn’t make sense to keep more of them in the same segment. When limit is set to 0 it means that it is impossible to have segment with heavy tags inside, then 1 - it means that segment can have 1 heavy segment max. 0 Integer Any positive integer value.
perfect_other_markers_limit A setting for normal tags, they can contain separate sentences, or be containers for words inside of a bigger segment, segment can have less or equal number of them. 5 Integer Any positive integer value.
tolerant_tags List of tags to apply the perfect_tolerant_markers_limit on. a b span bold strong i em p abbr small sub sup text br wpml_linebreak wpml_nbsp String Space sepparated string, naming desired tags.
heavy_tags List of tags to apply the perfect_heavy_markers_limit on. p h1 h2 h3 h4 h5 h6 ol ul li div form table fieldset wpml_separator String Space sepparated string, naming desired tags.
untranslatable_recognizers List of algorithms which detect untranslatable content, and automatically skip those trans-units in output xliff. Empty list to disable this option. [:whitespace, :numbers, :plugin_settings, :css_keywords, :urls, :json, :uuid, :too_many_nodes, :big_text, :css_snippet] List List of comma sepparated name of algorithms. Supported algorithms: :whitespace, :numbers, :plugin_settings, :css_keywords, :urls, :json, :uuid, :too_many_nodes, :big_text, :css_snippet.
Options for untranslatable_recognizers
whitespace Skip trans-units and segments which contain only spaces, tabs in different encodings.
numbers sSip trans-units and segments which contain only digits and numbers in different forms (100%, 100.22, +2, -0.6) in different encodings.
plugin_settings Skip trans-units which contains specific markers in trans-unit id (css, fonts, fields...) related to plugin settings.
css_keywords Skip known css keywords, hex colors, font values, font-awesome icons, etc.
urls Skip correctly written urls starting with http or https protocol.
json Skip valid json.
uuid Skip UUID codes
too_many_nodes Don’t segment trans-unit if it contains more than 2000 and ratio of words and tags number if lower than 0.1%.
big_text Don’t segment trans-unit if it contains more than 10000 chars of raw text.
css_snippet Skip trans-unit if it contains valid CSS only