Module:form of/cats

From Linguifex
< Module:form of
Revision as of 16:30, 27 August 2021 by Sware (talk | contribs)
Jump to navigation Jump to search


local cats = {}

--[=[

This contains categorization specs for specific languages and for all languages.
The particular categories listed are listed without the preceding canonical
language name, which will automatically be prepended, and the text "<<p>>"
in a category will be replaced with the user-specified part of speech.

The value of an entry in the cats[] table is a list of specifications to apply
to inflections in a specific language (except that the entry for "und" applies
to all languages). Each specification indicates the conditions under which a
given category is applied. Each specification is processed independently; if
multiple specifications apply, all the resulting categories will be added to
the page. (This is equivalent to wrapping the specifications in a
{"multi", ...} clause; see below.)

A specification is one of:

(1) A string:

	Always apply that category.

(2) A list {"has", TAG, SPEC} or {"has", TAG, SPEC, ELSESPEC}:

	TAG is an inflection tag, and can either be the full form or any
	abbreviation; if that tag is present among the user-supplied tags, SPEC is
	applied, otherwise ELSESPEC is applied if present. SPEC and ELSESPEC are
	specifications just as at the top level; i.e. they can be strings, nested
	conditions, etc.

(2) A list {"hasall", TAGS, SPEC} or {"hasall", TAGS, SPEC, ELSESPEC}:

	Similar to {"has", ...} but only activates if all of the tags in TAGS
	(a list) are present among the user-supplied tags (in any order, and
	other tags may be present, including between the tags in TAGS).

(3) A list {"hasany", TAGS, SPEC} or {"hasany", TAGS, SPEC, ELSESPEC}:

	Similar to {"has", ...} but activates if any of the tags in TAGS
	(a list) are present among the user-supplied tags.

(4) A list {"tags=", TAGS, SPEC} or {"tags=", TAGS, SPEC, ELSESPEC}:

	Similar to {"hasall", ...} but activates only if the
	user-supplied tags exactly match the tags in TAGS, including
	the order. (But, as above, any tag abbreviation can be given
	in TAGS, and will match any equivalent abbreviation or full
	form.)

(5) A list {"p=", VALUE, SPEC} or {"p=", VALUE, SPEC, ELSESPEC}:

	Similar to {"has", ...} but activates if the value supplied for the p=
	or POS= parameters is the specified value (which can be either the full
	form or any abbreviation).

(6) A list {"pany", VALUES, SPEC} or {"pany", VALUES, SPEC, ELSESPEC}:

	Similar to {"p=", ...} but activates if the value supplied for the p=
	or POS= parameters is any of the specified values (which can be either
	the full forms or any abbreviation).

(7) A list {"pexists", SPEC} or {"pexists", SPEC, ELSESPEC}:

	Activates if any value was specified for the p= or POS= parameters.

(8) A list {"cond", SPEC1, SPEC2, ...}:

	If SPEC1 applies, it will be applied; otherwise, if SPEC2 applies, it
	will be applied; etc. This stops processing specifications as soon as it
	finds one that applies.

(9) A list {"multi", SPEC1, SPEC2, ...}:

	If SPEC1 applies, it will be applied; in addition, if SPEC2 applies, it
	will also be applied; etc. Unlike {"cond", ...}, this continues
	processing specifications even if a previous one has applied.

(10) A list {"not", CONDITION, SPEC} or {"not", CONDITION, SPEC, ELSESPEC}:

	 If CONDITION does *NOT* apply, SPEC will be applied, otherwise ELSESPEC
	 will be applied if present. CONDITION is one of:

	 -- {"has", TAG}
	 -- {"hasall", TAGS}
	 -- {"hasany", TAGS}
	 -- {"tags=", TAGS},
	 -- {"p=", VALUE}
	 -- {"pany", VALUES}
	 -- {"pexists"}
	 -- {"not", CONDITION}
	 -- {"and", CONDITION1, CONDITION2}
	 -- {"or", CONDITION1, CONDITION2}
	 -- {"call", FUNCTION} where FUNCTION is a string naming a function listed
	    in cat_functions in [[Module:form of/functions]], which is passed a
	    single argument (see (10) below) and should return true or false.

	 That is, conditions are similar to if-else SPECS but without any
	 specifications given.

(11) A list {"and", CONDITION1, CONDITION2, SPEC} or {"and", CONDITION1, CONDITION2, SPEC, ELSESPEC}:

	 If CONDITION1 and CONDITION2 both apply, SPEC will be applied, otherwise
	 ELSESPEC will be applied if present. CONDITION is as above for "not".

(12) A list {"or", CONDITION1, CONDITION2, SPEC} or {"or", CONDITION1, CONDITION2, SPEC, ELSESPEC}:

	 If either CONDITION1 or CONDITION2 apply, SPEC will be applied, otherwise
	 ELSESPEC will be applied if present. CONDITION is as above for "not".

(13) A list {"call", FUNCTION}:

	 FUNCTION is the name of a function listed in cat_functions in
	 [[Module:form of/functions]], which is passed a single argument, a table
	 containing the parameters given to the template call, and which should
	 return a specification (a string naming a category, a list of any of the
	 formats described above). In the table, the following keys are present:

	 "lang": the structure describing the language (usually the first
	         parameter);
	 "tags": the list of tags (canonicalized to their full forms);
	 "term": the term to link to (will be missing if no term is given);
	 "alt": the display form of the term (will be missing if no display form
	        is given);
	 "t": the gloss of the term (will be missing if no gloss is given);

	 In addition, any other parameters specified will be located under a key
	 corresponding to the parameter name.


As a simple example, consider this:

cats["et"] = {
	{"has", "part", "participles"},
}

This says that, for language code "et" (Estonian), if the "part" tag is
present (or if "participle" is present, which is the equivalent full form),
the page will be categorized into [[:Category:Estonian participles]].

Another example:

cats["lt"] = {
	{"p=", "part",
		{"has", "pron",
			"pronominal dalyvis participle forms",
			"dalyvis participle forms",
		}
	}
}

This says that, for language code "lt" (Lithuanian), if the "p=" parameter
was given with the value "part" (or "participle", the equivalent full form),
then if the "pron" tag is present (or the equivalent full form "pronominal"),
categorize into [[:Category:Lithuanian pronominal dalyvis participle forms]],
else categorize into [["Category:Lithuanian dalyvis participle forms]]. Note
that, if p= isn't specified, or has a value other than "part" or
"participle", no categories will be added to the page, because there is no
"else" specification associated with the "p=" specification.

--]=]

-- First, the language-independent categories; be careful here not to
-- overcategorize. In practice we achieve this using tags=; we should
-- probably be smarter. But we don't e.g. want to categorize a page
-- into "present participles" if it has the tags f|s|pres|part, which
-- is a participle form rather than a participle itself.
--
-- We include the categorization here rather than in e.g. {{augmentative of}}
-- because we want the categorization to also apply when e.g. an augmentative
-- is specified using {{inflection of|LANG|...|aug}} rather than
-- {{augmentative of|LANG}}.
cats["und"] = {
	-- Disable all of these for now as they are somewhat controversial.
	--{"tags=", {"aug"}, "augmentative <<p=n>>s"},
	--{"tags=", {"dim"}, "diminutive <<p=n>>s"},
	--{"or", {"tags=", {"end"}},
	--	{"or", {"tags=", {"end", "form"}}, {"tags=", {"end", "dim"}}},
	--	"endearing <<p=n>>s"},
	--{"tags=", {"pej"}, "derogatory terms"},
	--{"tags=", {"comd"}, "comparative <<p=a>>s"},
	--{"tags=", {"supd"}, "superlative <<p=a>>s"},
	--{"tags=", {"equd"}, "<<p=a>> equative forms"},
	--{"tags=", {"caus"}, "causative <<p=v>>s"},
	--{"tags=", {"freq"}, "frequentative <<p=v>>s"},
	--{"tags=", {"iter"}, "iterative <<p=v>>s"},
	--{"tags=", {"refl"}, "reflexive <<p=v>>s"},
	--{"or", {"tags=", {"impfv"}}, {"tags=", {"impfv", "form"}}, "imperfective <<p=v>>s"},
	--{"or", {"tags=", {"pfv"}}, {"tags=", {"pfv", "form"}}, "perfective <<p=v>>s"},
	--{"tags=", {"nomzn"}, "nominalized adjectives"},
	--{"tags=", {"ger"}, "gerunds"},
	--{"tags=", {"vnoun"}, "verbal nouns"},
	--{"tags=", {"pass"}, "<<p=v>> passive forms"},
	-- [[User:Rua]] objects to these categories
	-- {"tags=", {"past", "act", "part"}, "past active participles"},
	-- {"tags=", {"past", "pass", "part"}, "past passive participles"},
	-- {"tags=", {"past", "part"}, "past participles"},
	-- {"tags=", {"pres", "act", "part"}, "present active participles"},
	-- {"tags=", {"pres", "pass", "part"}, "present passive participles"},
	-- {"tags=", {"pres", "part"}, "present participles"},
	-- {"tags=", {"perf", "part"}, "perfect participles"},
}

cats["bg"] = {
	{"p=", "a",
		{"multi",
			{"has", "m", "adjective masculine forms"},
			{"has", "f", "adjective feminine forms"},
			{"has", "n", "adjective neuter forms"},
			{"has", "p", "adjective plural forms"},
			{"has", "extended", "adjective vocative forms"},
			{"has", "def", "adjective definite forms"},
			{"has", "indef", "adjective indefinite forms"},
		}
	},
	{"p=", "n",
		{"multi",
			{"has", "indef", "noun indefinite forms"},
			{"has", "def", "noun definite forms"},
			{"has", "voc", "noun vocative forms"},
			{"has", "count",
				{"multi", "noun count forms", "noun plural forms"},
			},
			{"has", "p", "noun plural forms"},
		}
	},
	{"cond",
		{"hasall", {"adv", "part"}, "adverbial participles"},
		{"has", "part",
			-- If this is a lemma participle form, categorize appropriately
			-- for the type of participle, otherwise put into
			-- "participle forms". We determine a lemma if all of the
			-- following apply:
			-- (1) either is masculine, or no gender listed; and
			-- (2) either is indefinite, or no definiteness listed; and
			-- (3) not listed as either subjective or objective form.
			{"and",
				{"or", {"has", "m"}, {"not", {"hasany", {"f", "n", "p"}}}},
				{"and",
					{"or", {"has", "indef"}, {"not", {"has", "def"}}},
					{"not", {"hasany", {"sbjv", "objv"}}},
				},
				{"cond",
					{"hasall", {"pres", "act"}, "present active participles"},
					{"hasall", {"past", "pass"}, "past passive participles"},
					{"hasall", {"past", "act", "aor"}, "past active aorist participles"},
					{"hasall", {"past", "act", "impf"}, "past active imperfect participles"},
				},
				-- FIXME: "participle forms" probably not necessary,
				-- should be handled by headword
				"participle forms"
			}
		}
	},
}

cats["br"] = {
	{"p=", "n",
		{"has", "p", "noun plural forms"}
	},
}

-- Applies to ca, es, it, pt
local romance_adjective_categorization =
	{"p=", "a",
		{"multi",
			{"has", "f", "adjective feminine forms"},
			{"has", "p", "adjective plural forms"},
		}
	}

cats["ca"] = {
	romance_adjective_categorization,
	{"has", "part",
		{"cond",
			-- FIXME, not clear if we need all of these conditions;
			-- may partly be handled by headword
			{"hasany", {"f", "p"}, "participle forms"},
			{"has", "pres", "present participles"},
			{"has", "past", "past participles"},
		}
	},
}

cats["cel-gal"] = {
	{"has", "hard mutation", "hard-mutation forms"},
	{"has", "soft mutation", "soft-mutation forms"},
}

cats["de"] = {
	{"hasall", {"pres", "part"}, "present participles"},
	{"hasall", {"past", "part"}, "past participles"},
}

cats["el"] = {
	{"has", "dat", "dative forms"},
	{"p=", "v",
		{"cond",
			{"hasall", {"1", "s", "past"}, "verb past tense forms"},
			{"has", "nonfinite", "verb nonfinite forms"},
		},
	},
}

cats["enm"] = {
	{"not", {"hasany", {"sub", "imp"}}, 
		{"multi",
			{"hasall", {"1", "s", "pres"}, "first-person singular forms"},
			{"hasall", {"2", "s", "pres"}, "second-person singular forms"},
			{"hasall", {"3", "s", "pres"}, "third-person singular forms"},
			{"hasall", {"1//3", "s", "past"}, "first/third-person singular past forms"},
			{"hasall", {"2", "s", "past"}, "second-person singular past forms"},
			{"hasall", {"p", "pres"}, "plural forms"},
		},
	},
	{"hasall", {"p", "pres", "sub"}, "plural subjunctive forms"},
	{"hasall", {"p", "past"}, "plural past forms"},
	{"hasall", {"s", "pres", "sub"}, "singular subjunctive forms"},
	{"hasall", {"s", "past", "sub"}, "singular past subjunctive forms"},
	{"hasall", {"s", "imp"}, "singular imperative forms"},
	{"hasall", {"p", "imp"}, "plural imperative forms"},
	{"hasall", {"pres", "part"}, "present participles"},
	{"hasall", {"past", "part"}, "past participles"},
}

cats["es"] = {
	romance_adjective_categorization
}

cats["et"] = {
	{"has", "part", "participles"},
}

cats["got"] = {
	{"hasall", {"pres", "part"}, "present participles"},
	{"hasall", {"past", "part"}, "past participles"},
}

cats["hu"] = {
	{"hasall", {"past", "part"}, "past participles"},
	{"hasall", {"pres", "part"}, "present participles"},
	{"hasall", {"fut", "part"}, "future participles"},
	{"hasall", {"adv", "part"}, "adverbial participles"},
	{"hasall", {"verbal", "part"}, "verbal participles"},
}

cats["it"] = {
	romance_adjective_categorization
}

cats["ja"] = {
	{"p=", "v",
		{"multi",
			{"has", "past", "past tense verb forms"},
			{"has", "conj", "conjunctive verb forms"},
		}
	},
}

cats["kmr"] = {
	{"hasall", {"pres", "part"}, "present participles"},
	{"hasall", {"past", "part"}, "past participles"},
}

cats["liv"] = {
	{"hasany", {"1", "2", "3"},
		{"cond",
			{"hasall", {"pres", "ind"}, "verb forms (present indicative)"},
			{"hasall", {"past", "ind"}, "verb forms (past indicative)"},
			{"hasall", {"imp", "neg"}, "verb forms (imperative negative)"},
			{"has", "imp", "verb forms (imperative)"},
			{"has", "neg", "verb forms (negative)"},
			{"has", "cond", "verb forms (conditional)"},
			{"has", "juss", "verb forms (jussive)"},
			{"has", "quot", "verb forms (quotative)"},
		}
	},
	{"has", "part",
		{"cond",
			{"hasall", {"pres", "act"}, "present active participles"},
			{"hasall", {"pres", "pass"}, "present passive participles"},
			{"hasall", {"past", "act"}, "past active participles"},
			{"hasall", {"past", "pass"}, "past passive participles"},
		},
	},
	{"cond",
		{"has", "ger", "gerunds"},
		{"hasall", {"sup", "abe"}, "supine abessives"},
		{"has", "sup", "supines"},
		{"has", "deb", "debitives"},
	},
}

cats["lt"] = {
	{"p=", "part",
		{"cond",
			-- Three types of adverbial participles.
			{"has", "budinys", "būdinys participles"},
			{"has", "padalyvis", "padalyvis participles"},
			{"has", "pusdalyvis", "pusdalyvis participles"},
			-- If it's a non-adverbial participle, it's a dalyvis = regular
			-- adjectival participle. It's a participle per se if it has
			-- no case, number or gender listed.
			{"not", {"hasany", {
				"nom", "gen", "dat", "acc", "ins", "loc", "voc",
				"m", "f", "s", "p"
			}}, "dalyvis participles"},
			-- Otherwise, it's a participle form, pronominal if "pron"
			-- is present, else non-pronominal.
			{"has", "pron", "pronominal dalyvis participle forms"},
			"dalyvis participle forms"
		}
	},
	{"p=", "a",
		{"has", "pron",
			{"cond",
				{"has", "comd", "comparative pronominal adjective forms"},
				{"has", "supd", "superlative pronominal adjective forms"},
			},
			{"cond",
				{"has", "comd", "comparative adjective forms"},
				{"has", "supd", "superlative adjective forms"},
			},
		}
	},
}

cats["lv"] = {
	{"has", "neg", "negative verb forms"},
	{"has", "comd",
		{"p=", "part",
			{"has", "def",
				"definite comparative participles",
				"comparative participles"
			},
			{"has", "def",
				"definite comparative adjectives",
				"comparative adjectives",
			},
		}
	},
	{"has", "supd",
		{"p=", "part",
			"superlative participles",
			"superlative adjectives",
		}
	},
}

cats["pt"] = {
	romance_adjective_categorization,
	{"p=", "n",
		{"multi",
			{"has", "f", "noun feminine forms"},
			{"has", "p", "noun plural forms"},
			{"has", "aug", "noun augmentative forms"},
			{"has", "dim", "noun diminutive forms"},
		}
	},
}

cats["ru"] = {
	{"has", "part",
		{"multi",
			"participles",
			"verb forms",
			{"cond",
				{"hasall", {"pres", "act"}, "present active participles"},
				{"hasall", {"pres", "pass"}, "present passive participles"},
				{"hasall", {"pres", "adv"}, "present adverbial participles"},
				{"hasall", {"past", "act"}, "past active participles"},
				{"hasall", {"past", "pass"}, "past passive participles"},
				{"hasall", {"past", "adv"}, "past adverbial participles"},
			},
		}
	},
}

cats["sa"] = {
	{"has", "desid",
		{"multi",
			"desiderative verbs",
			"verbs derived from primitive verbs"
		},
	},
	{"has", "freq",
		{"multi",
			"frequentative verbs",
			"verbs derived from primitive verbs"
		},
	},
	{"has", "root", "root forms"},
}

cats["sco"] = {
	{"hasall", {"simple", "past"}, "verb simple past forms"},
	{"hasall", {"3", "s", "pres", "ind"}, "third-person singular forms"},
}

cats["sv"] = {
	{"hasall", {"past", "part"}, "past participles"},
}

cats["uk"] = {
   	{"has", "part",
   		{"multi",
   			"participles",
   			"verb forms",
   			{"cond",
   				{"hasall", {"pres", "act"}, "present active participles"},
   				{"hasall", {"pres", "pass"}, "present passive participles"},
   				{"hasall", {"pres", "adv"}, "present adverbial participles"},
   				{"hasall", {"past", "act"}, "past active participles"},
   				{"hasall", {"past", "pass"}, "past passive participles"},
   				{"hasall", {"past", "adv"}, "past adverbial participles"},
   			},
   		}
   	},
}

return cats

-- For Vim, so we get 4-space tabs
-- vim: set ts=4 sw=4 noet: