PHP extension wikidiff2

Clone this repo:
  1. 04682ea Release 1.14.1 by Tim Starling · 1 year, 6 months ago master 1.14.1
  2. 9e6574d Exclude equality when comparing similarity metrics by Tim Starling · 1 year, 6 months ago
  3. b6d1bbb Release 1.14.0 by Tim Starling · 1 year, 6 months ago 1.14.0
  4. 0e632bb Merge "Inline diff support for paragraph split detection" by jenkins-bot · 1 year, 6 months ago
  5. babda8c Inline diff support for paragraph split detection by Tim Starling · 1 year, 7 months ago

wikidiff2

Wikidiff2 is a PHP extension which formats changes between two input texts, producing HTML or JSON.

It performs word-level diffs, including support for Thai word segmentation. It can detect moved and split lines.

Dependencies

To build wikidiff2 as a PHP extension, you also need the php-dev and pkg-config packages.

Compilation and installation

$ phpize
$ ./configure
$ make
$ sudo make install

License

wikidiff2 is licensed under the GPL v2 or any later version. The GPL is incompatible with the PHP license, meaning that any binaries of wikidiff2 are not redistributable under either license.

The licensing issue is tracked at https://phabricator.wikimedia.org/T196132

Configuration

The following php.ini settings are supported:

wikidiff2.moved_line_threshold

Wikidiff2 estimates similarity of added and deleted lines based on changed character count. When the similarity of an added and deleted line is greater than this threshold, the lines are displayed as moved.

Range 0.0 .. 1.0. Default 0.4.

wikidiff2.change_threshold

Changed lines with a similarity value below this threshold will be split into a deleted line and added line. This helps matching up moved lines in some cases.

Range 0.0 .. 1.0. Default 0.2.

wikidiff2.moved_paragraph_detection_cutoff

When the number of added and deleted lines in a table diff is greater than this limit, no attempt to detect moved lines will be made.

Default 100.

wikidiff2.max_word_level_diff_complexity

When comparing two lines for changes within the line, a word-level diff will be done unless the product of the LHS word count and the RHS word count exceeds this limit.

Default 40000000.

Usage

The input is assumed to be UTF-8 encoded. Invalid UTF-8 may cause undesirable operation, such as truncation of the output, so the input should be validated by the application. The input text should have UNIX-style line endings.

wikidiff2_do_diff

function wikidiff2_do_diff(string $text1, string $text2, int $numContextLines): string

Compare two strings $text1 and $text2, and produce output formatted as a fragment of an HTML table, that is, a series of <tr> elements.

$numContextLines is the number of copied context lines shown before and after each change. Before each block of context lines and changes, a line number will appear as an HTML comment inside a tr/td, e.g.

<!--LINE 1-->

This allows the application to localize line numbers.

wikidiff2_inline_diff

function wikidiff2_inline_diff(string $text1, string $text2, int $numContextLines): string

Compare two strings $text1 and $text2, and produce output formatted as inline HTML.

wikidiff2_inline_json_diff

function wikidiff2_inline_json_diff(string $text1, string $text2, int $numContextLines): string

Compare two strings $text1 and $text2 and produce output formatted as JSON. See the JSON diff format documentation.

wikidiff2_multi_format_diff

function wikidiff2_multi_format_diff(string $text1, string $text2, array $options = []): array

Compare two strings $text1 and $text2 with an associative array of options:

  • numContextLines: The number of context lines shown before and after each block

  • changeThreshold: The minimum similarity a pair of lines must have to be detected as a change and shown as a word-level diff. If present, this overrides php.ini wikidiff2.change_threshold.

  • movedLineThreshold: The minimum similarity a pair of lines must have to be detected as a moved line. If present, this overrides php.ini wikidiff2.moved_line_threshold.

  • maxMovedLines: The maximum number of added or deleted lines, above which no move detection will be performed. If present, this overrides php.ini moved_paragraph_detection_cutoff.

  • maxWordLevelDiffComplexity: The maximum complexity of a word-level diff. If the product of the word count in the LHS and RHS exceeds this value, a word-level diff will not be done. If present, this overrides php.ini wikidiff2.max_word_level_diff_complexity.

  • maxSplitSize: The maximum number of lines in $text2 which may be considered for a word-level diff against a single line of $text1. Default: 1.

  • initialSplitThreshold: The minimum similarity which must be maintained during a split detection search. The search terminates when the similarity falls below this level. Default: 0.1.

  • finalSplitThreshold: The minimum similarity which must be achieved in order to display the comparison between one line and several lines as a split. Default 0.6.

  • formats: An array of desired formats. Each format is one of the following strings: table, inline or inlineJSON. The default is ['table'].

The return value is an associative array of formatted outputs. The key of each element is the format name table, inline or inlineJSON, and the value is a string.

wikidiff2_version

function wikidiff2_version(): string {}

Produces the same thing as phpversion('wikidiff2'). Probably should be deprecated.

  NODES