Sware: 1 revision imported

2026-04-21T11:22:49Z

1 revision imported

← Older revision	Revision as of 11:22, 21 April 2026
(No difference)

Sware: Created page with "local byte = string.byte local match = string.match local sub = string.sub --[==[ A comparison function for strings, which returns {true} if {a} sorts before {b}, or otherwise {false}; it can be used as the sort function with {table.sort}. This function always sorts using byte-order, which makes it roughly equivalent to the {<} operator, but with fixes for two serious bugs raised in phab:T193096#4161287 and phab:T49137#9167559: * {<} is supposed to compare UTF-..."

2026-04-14T20:29:18Z

Created page with "local byte = string.byte local match = string.match local sub = string.sub --[==[ A comparison function for strings, which returns {true} if {a} sorts before {b}, or otherwise {false}; it can be used as the sort function with {table.sort}. This function always sorts using byte-order, which makes it roughly equivalent to the {<} operator, but with fixes for two serious bugs raised in phab:T193096#4161287 and phab:T49137#9167559: * {<} is supposed to compare UTF-..."

← Older revision	Revision as of 20:29, 14 April 2026
(No difference)

wikt>Theknightwho: Protected "Module:string/compare": Highly visible template/module ([Edit=Allow only template editors and administrators] (indefinite) [Move=Allow only template editors and administrators] (indefinite))

2025-05-07T00:54:09Z

Protected "Module:string/compare": Highly visible template/module ([Edit=Allow only template editors and administrators] (indefinite) [Move=Allow only template editors and administrators] (indefinite))

New page

local byte = string.byte
local match = string.match
local sub = string.sub

--[==[
A comparison function for strings, which returns {true} if {a} sorts before {b}, or otherwise {false}; it can be used as the sort function with {table.sort}.

This function always sorts using byte-order, which makes it roughly equivalent to the {<} operator, but with fixes for two serious bugs raised in [[phab:T193096#4161287]] and [[phab:T49137#9167559]]:
* {<} is supposed to compare UTF-8 codepoints in the two strings, but when a codepoint that is U+10000 or above is encountered in the left-hand string, {<} always returns {false}, irrespective of the content of the other string.
* {<} treats unassigned codepoints and non-UTF-8 byte sequences as being higher than {"\0"} but lower than {"\1"}, instead of sorting according to byte order.]==]
return function(a, b)
-- Equality check.
if a == b then
return false
end
-- Byte comparison is slow, so only do it when it's really needed:
-- iterate over both strings, grabbing a set of ASCII bytes followed by
-- a set of non-ASCII bytes from each (either of which could be empty),
-- and compare them with ==. If the ASCII substrings are unequal, just
-- use <, since the bug won't affect it. Otherwise, compare bytes in the
-- non-ASCII substrings.
local loc, ascii_a, nonascii_a, ascii_b, nonascii_b = 1
repeat
ascii_a, nonascii_a = match(a, "^([^\128-\255]*)([\128-\255]*)", loc)
ascii_b, nonascii_b, loc = match(b, "^([^\128-\255]*)([\128-\255]*)()", loc) -- update `loc` on the second call
-- When comparing ASCII sets, use <. The lower substring will be
-- from the lower string *except* when it comprises the start of the
-- other substring and is followed by a non-ASCII character. For
-- instance, if `ascii_a` is "pqrs":
-- If `ascii_b` is "abc", `b` is lower, since "abc" < "pqrs".
-- If `ascii_b` is "pqr" and followed by non-ASCII "ž", `a` is
-- lower, since "pqrs" < "pqrž".
-- If `ascii_b` is "pqr" and at the end of `b`, `b` is lower, since
-- "pqr" < "pqrs".
if ascii_a ~= ascii_b then
if ascii_a < ascii_b then
return nonascii_a == "" or ascii_a ~= sub(ascii_b, 1, #ascii_a)
end
return not (nonascii_b == "" or ascii_b ~= sub(ascii_a, 1, #ascii_b))
end
-- If the non-ASCII parts are not equal, terminate the loop.
until nonascii_a ~= nonascii_b
-- If either one is the empty string, then the end of that string has
-- been reached, making it the lower string.
if nonascii_a == "" then
return true
elseif nonascii_b == "" then
return false
end
loc = 1
while true do
-- 4 bytes at a time is a balance between minimizing the number of
-- byte() calls without grabbing unnecessary extra bytes after the
-- difference.
local b_a1, b_a2, b_a3, b_a4 = byte(nonascii_a, loc, loc + 3)
if b_a1 == nil then
return true
end
local b_b1, b_b2, b_b3, b_b4 = byte(nonascii_b, loc, loc + 3)
if b_a1 ~= b_b1 then
return b_b1 and b_a1 < b_b1
elseif b_a2 ~= b_b2 then
return b_a2 == nil or b_b2 and b_a2 < b_b2
elseif b_a3 ~= b_b3 then
return b_a3 == nil or b_b3 and b_a3 < b_b3
elseif b_a4 ~= b_b4 then
return b_a4 == nil or b_b4 and b_a4 < b_b4
end
loc = loc + 4
end
end

Module:string/compare - Revision history

Sware: 1 revision imported

wikt>Theknightwho: Protected "Module:string/compare": Highly visible template/module ([Edit=Allow only template editors and administrators] (indefinite) [Move=Allow only template editors and administrators] (indefinite))