API Reference

WordSiv is a Python library for generating text for an incomplete typeface.

Classes:

Name	Description
`Vocab`	A vocabulary of words and occurrence counts with metadata for filtering and punctuating.
`WordSiv`	The main WordSiv object which uses Vocabs to generate text.

Attributes:

Name	Type	Description
`CaseType`		Options for setting case via the `case` argument.

CaseType `module-attribute`

CaseType = Literal[
    "any",
    "any_og",
    "lc",
    "lc_force",
    "cap",
    "cap_og",
    "cap_force",
    "uc",
    "uc_og",
    "uc_force",
]

Options for setting case via the case argument. See Letter Case in the Guide for detailed descriptions and examples of each option

Vocab

A vocabulary of words and occurrence counts with metadata for filtering and punctuating.

Attributes:

Name	Type	Description
`lang`	`str`	The language of the vocabulary.
`bicameral`	`bool`	Specifies whether the vocabulary has uppercase and lowercase letters.
`punctuation`	`dict`	A dictionary or None for handling punctuation in generated text.
`data`	`str`	A TSV-formatted string with word-count pairs or a newline-delimited list of words.
`data_file`	`str \| Traversable`	A path to a file to supply the data instead of the data attribute.
`meta`	`dict`	Additional metadata for the vocabulary.

Methods:

Name	Description
`__init__`	Initializes the Vocab instance.

Attributes:

Name	Type	Description
`data`		Returns raw data from parameter _data or data_file.
`wordcount`	`tuple[tuple[str, int], ...]`	Returns a tuple of tuples with words and counts.
`wordcount_str`	`str`	Returns a TSV-formatted string with words and counts.

Source code in wordsiv/_vocab.py

class Vocab:
    """A vocabulary of words and occurrence counts with metadata for filtering and punctuating.

    Attributes:
        lang (str): The language of the vocabulary.
        bicameral (bool): Specifies whether the vocabulary has uppercase and lowercase letters.
        punctuation (dict, optional): A dictionary or None for handling punctuation in generated text.
        data (str, optional): A TSV-formatted string with word-count pairs or a newline-delimited list of words.
        data_file (str | Traversable, optional): A path to a file to supply the data instead of the data attribute.
        meta (dict, optional): Additional metadata for the vocabulary.
    """

    def __init__(
        self,
        lang: str,
        bicameral: bool,
        punctuation: dict | None = None,
        data: str | None = None,
        data_file: str | Traversable | None = None,
        meta: dict | None = None,
    ):
        """Initializes the Vocab instance.

        Args:
            lang (str): The language of the vocabulary.
            bicameral (bool): Specifies whether the vocabulary has uppercase and lowercase letters.
            punctuation (dict, optional): A dictionary or None for handling punctuation in generated text.
            data (str, optional): A TSV-formatted string with word-count pairs or a newline-delimited list of words.
            data_file (str | Traversable, optional): A path to a file to supply the data instead of the data attribute.
            meta (dict, optional): Additional metadata for the vocabulary.
        """

        self.lang = lang
        self.bicameral = bicameral
        self.punctuation = punctuation
        self._data = data
        self.data_file = data_file
        self.meta = meta

        if data and data_file:
            raise ValueError("Cannot specify both 'data' and 'data_file'")
        elif data is None and not data_file:
            raise ValueError("Must specify either 'data' or 'data_file'")

    @property
    def data(self):
        """Returns raw data from parameter _data or data_file."""

        if self._data is not None:
            data = self._data
        elif getattr(self, "data_file", None):
            data = _read_file(self.data_file)
        if not data:
            raise VocabEmptyError(f"No data found in {self.data_file}")

        return data

    @property
    def wordcount_str(self) -> str:
        """Returns a TSV-formatted string with words and counts."""

        firstline = self.data.partition("\n")[0]

        if regex.match(r"[[:alpha:]]+\t\d+$", firstline):
            # if we have counts, return the original string
            return self.data
        elif regex.match(r"[[:alpha:]]+$", firstline):
            # if we just have newline-delimited words, add counts of 1
            return _add_counts_to_wordcount_str(self.data)
        else:
            raise VocabFormatError(
                "The vocab file is formatted incorrectly. "
                "Should be a TSV file with words and counts as columns, or a newline-delimited list of words."
            )

    @property
    def wordcount(self) -> tuple[tuple[str, int], ...]:
        """Returns a tuple of tuples with words and counts."""

        return _wordcount_str_to_wordcount_tuple(self.wordcount_str)

    def filter(self, **kwargs):
        return _filter_wordcount(self.wordcount_str, self.bicameral, **kwargs)

data `property`

data

Returns raw data from parameter _data or data_file.

wordcount `property`

wordcount

Returns a tuple of tuples with words and counts.

wordcount_str `property`

wordcount_str

Returns a TSV-formatted string with words and counts.

init

__init__(
    lang,
    bicameral,
    punctuation=None,
    data=None,
    data_file=None,
    meta=None,
)

Parameters:

Name	Type	Description	Default
`lang`	`str`	The language of the vocabulary.	required
`bicameral`	`bool`	Specifies whether the vocabulary has uppercase and lowercase letters.	required
`punctuation`	`dict`	A dictionary or None for handling punctuation in generated text.	`None`
`data`	`str`	A TSV-formatted string with word-count pairs or a newline-delimited list of words.	`None`
`data_file`	`str \| Traversable`	A path to a file to supply the data instead of the data attribute.	`None`
`meta`	`dict`	Additional metadata for the vocabulary.	`None`

Source code in wordsiv/_vocab.py

def __init__(
    self,
    lang: str,
    bicameral: bool,
    punctuation: dict | None = None,
    data: str | None = None,
    data_file: str | Traversable | None = None,
    meta: dict | None = None,
):
    """Initializes the Vocab instance.

    Args:
        lang (str): The language of the vocabulary.
        bicameral (bool): Specifies whether the vocabulary has uppercase and lowercase letters.
        punctuation (dict, optional): A dictionary or None for handling punctuation in generated text.
        data (str, optional): A TSV-formatted string with word-count pairs or a newline-delimited list of words.
        data_file (str | Traversable, optional): A path to a file to supply the data instead of the data attribute.
        meta (dict, optional): Additional metadata for the vocabulary.
    """

    self.lang = lang
    self.bicameral = bicameral
    self.punctuation = punctuation
    self._data = data
    self.data_file = data_file
    self.meta = meta

    if data and data_file:
        raise ValueError("Cannot specify both 'data' and 'data_file'")
    elif data is None and not data_file:
        raise ValueError("Must specify either 'data' or 'data_file'")

WordSiv

The main WordSiv object which uses Vocabs to generate text.

This object serves as the main interface for generating text. It can hold multiple vocabulary objects, store default settings (like default glyphs and vocab), and expose high-level methods that produce words, sentences, paragraphs, and more.

Parameters:

Name	Type	Description	Default
`vocab`	`str \| None`	The name of the default Vocab.	`None`
`glyphs`	`str \| None`	The default set of glyphs that constrains the words generated.	`None`
`add_default_vocabs`	`bool`	Whether to add the default Vocabs defined in `DEFAULT_VOCABS`.	`True`
`raise_errors`	`bool`	Whether to raise errors or fail gently.	`False`
`seed`	`int \| None`	Seed for the random number generator.	`None`

Attributes:

Name	Type	Description
`vocab`	`str \| None`	The name of the default Vocab.
`glyphs`	`str \| None`	The default set of glyphs that constrains the words generated.
`raise_errors`	`bool`	Whether to raise errors or fail gently.
`_vocab_lookup`	`dict[str, Vocab]`	A dictionary of vocab names to `Vocab` objects.
`_rand`	`Random`	A `random.Random` instance.

Methods:

Name	Description
`add_vocab`	Add a `Vocab` object to this `WordSiv` instance under a given name.
`get_vocab`	Retrieve a `Vocab` by name, or return the default Vocab if `vocab_name` is None.
`list_vocabs`	Return a list of all available Vocab names.
`number`	Generate a random numeric string (made of digits) constrained by glyphs and
`para`	Generate a paragraph by creating multiple sentences with `sents(...)` and
`paras`	Generate multiple paragraphs with `para(...)`, returned as a list.
`seed`	Seed the random number generator for reproducible results.
`sent`	Generate a single sentence, optionally punctuated, using words (and/or numbers).
`sents`	Generate multiple sentences with `sent(...)`, returned as a list.
`text`	Generate multiple paragraphs of text, calling `paras(...)` and joining them with
`top_word`	Retrieve the most common (or nth most common) word from the Vocab, subject to
`top_words`	Retrieve the top `n_words` from the Vocab, starting at index `idx`, subject to
`word`	Generate a random word that meets a variety of constraints, such as glyphs,
`words`	Generate a list of words (and optionally numbers) according to the specified

Source code in wordsiv/__init__.py

class WordSiv:
    """The main WordSiv object which uses Vocabs to generate text.

    This object serves as the main interface for generating text. It can hold multiple
    vocabulary objects, store default settings (like default glyphs and vocab), and
    expose high-level methods that produce words, sentences, paragraphs, and more.

    Args:
        vocab (str | None): The name of the default Vocab.
        glyphs (str | None): The default set of glyphs that constrains the words
            generated.
        add_default_vocabs (bool): Whether to add the default Vocabs defined in
            `DEFAULT_VOCABS`.
        raise_errors (bool): Whether to raise errors or fail gently.
        seed (int | None): Seed for the random number generator.

    Attributes:
        vocab (str | None): The name of the default Vocab.
        glyphs (str | None): The default set of glyphs that constrains the words
            generated.
        raise_errors (bool): Whether to raise errors or fail gently.
        _vocab_lookup (dict[str, Vocab]): A dictionary of vocab names to `Vocab`
            objects.
        _rand (random.Random): A `random.Random` instance.
    """

    def __init__(
        self,
        vocab: str | None = None,
        glyphs: str | None = None,
        add_default_vocabs: bool = True,
        raise_errors: bool = False,
        seed=None,
    ):
        self.vocab = vocab
        self.glyphs = glyphs
        self.raise_errors = raise_errors
        self._vocab_lookup: dict[str, Vocab] = {}

        if add_default_vocabs:
            self._add_default_vocabs()

        self._rand = random.Random()

        if seed is not None:
            self.seed(seed)

    def seed(self, seed: float | str | None = None) -> None:
        """
        Seed the random number generator for reproducible results.

        Args:
            seed (float | str | None): The seed value used to initialize the random
                number generator.

        Returns:
            None
        """
        self._rand.seed(seed)

    def add_vocab(self, vocab_name: str, vocab: Vocab) -> None:
        """
        Add a `Vocab` object to this `WordSiv` instance under a given name.

        Args:
            vocab_name (str): The unique identifier for this Vocab.
            vocab (Vocab): The `Vocab` object to be associated with `vocab_name`.

        Returns:
            None
        """
        self._vocab_lookup[vocab_name] = vocab

    def _add_default_vocabs(self) -> None:
        """
        Initialize and add the default Vocabs to this `WordSiv` instance.

        The default vocabularies are specified in the `DEFAULT_VOCABS` dictionary, which
        maps a short code (e.g., 'en', 'es') to the meta and data filenames. This method
        initializes each Vocab (however, the data is loaded lazily).
        """
        for vocab_name, (meta_file, data_file) in _DEFAULT_VOCABS.items():
            meta_path = resources.files(_vocab_data) / meta_file
            with meta_path.open("r", encoding="utf8") as f:
                meta = json.load(f)

            data_path = resources.files(_vocab_data) / data_file
            vocab = Vocab(
                meta["lang"], bool(meta["bicameral"]), meta=meta, data_file=data_path
            )
            self.add_vocab(vocab_name, vocab)

    def get_vocab(self, vocab_name: str | None = None) -> Vocab:
        """
        Retrieve a `Vocab` by name, or return the default Vocab if `vocab_name` is None.

        Args:
            vocab_name (str | None): The name of the Vocab to retrieve. If None,
                use the default `self.vocab`.

        Returns:
            Vocab: The `Vocab` object corresponding to `vocab_name`.

        Raises:
            ValueError: If no `vocab_name` is provided and no default is set.
        """
        if vocab_name:
            return self._vocab_lookup[vocab_name]
        else:
            if self.vocab:
                return self._vocab_lookup[self.vocab]
            else:
                raise ValueError("Error: no vocab specified")

    def list_vocabs(self) -> list[str]:
        """
        Return a list of all available Vocab names.

        Returns:
            list[str]: A list of all registered Vocab names in this `WordSiv`.
        """
        return list(self._vocab_lookup.keys())

    def number(
        self,
        seed: float | str | None = None,
        glyphs: str | None = None,
        wl: int | None = None,
        min_wl: int = 1,
        max_wl: int = _DEFAULT_MAX_NUM_LENGTH,
        raise_errors: bool = False,
    ) -> str:
        """
        Generate a random numeric string (made of digits) constrained by glyphs and
        other parameters.

        Args:
            seed (float | str | None): Seed the random number generator if seed is not
                None.
            glyphs (str | None): A string of allowed glyphs. If None, uses the default
                glyphs of this `WordSiv` instance.
            wl (int | None): Exact length of the generated numeric string. If None, a
                random length between `min_wl` and `max_wl` is chosen.
            min_wl (int): Minimum length of the numeric string. Defaults to 1.
            max_wl (int): Maximum length of the numeric string. Defaults to 4.
            raise_errors (bool): Whether to raise an error if no numerals are available.

        Returns:
            str: A randomly generated string consisting of numerals.

        Raises:
            ValueError: If `min_wl` is greater than `max_wl`.
            FilterError: If no numerals are available in `glyphs` and `raise_errors` is
                True.
        """
        glyphs = self.glyphs if glyphs is None else glyphs
        raise_errors = self.raise_errors if raise_errors is None else raise_errors

        if seed is not None:
            self._rand.seed(seed)

        if wl:
            length = wl
        else:
            if min_wl > max_wl:
                raise ValueError("'min_wl' must be less than or equal to 'max_wl'")
            length = self._rand.randint(min_wl, max_wl)

        available_numerals = "".join(str(n) for n in range(0, 10))
        if glyphs:
            available_numerals = "".join(n for n in available_numerals if n in glyphs)

            if not available_numerals:
                if raise_errors:
                    raise FilterError("No numerals available in glyphs")
                else:
                    log.warning("No numerals available in glyphs")
                    return ""

        return "".join(self._rand.choice(available_numerals) for _ in range(length))

    def word(
        self,
        vocab: str | None = None,
        glyphs: str | None = None,
        seed: float | str | None = None,
        rnd: float = 0,
        case: CaseType = "any",
        top_k: int = 0,
        min_wl: int = 1,
        max_wl: int | None = None,
        wl: int | None = None,
        contains: str | Sequence[str] | None = None,
        inner: str | Sequence[str] | None = None,
        startswith: str | None = None,
        endswith: str | None = None,
        regexp: str | None = None,
        raise_errors: bool = False,
    ) -> str:
        """
        Generate a random word that meets a variety of constraints, such as glyphs,
        length, regex filters, etc.

        Args:
            vocab (str | None): Name of the Vocab to use. If None, uses default Vocab.
            glyphs (str | None): A string of allowed glyphs. If None, uses default
                glyphs.
            seed (float | str | None): Seed the random number generator if seed is not
                None.
            rnd (float): Randomness factor in [0, 1] for selecting among the top words.
            case (CaseType): Desired case of the output word (e.g., 'upper', 'lower',
                'any').
            top_k (int): If > 0, only consider the top K words by frequency.
            min_wl (int): Minimum word length.
            max_wl (int | None): Maximum word length. If None, no maximum is applied.
            wl (int | None): Exact word length. If None, no exact length is enforced.
            contains (str | Sequence[str] | None): Substring(s) that must appear in the
                word.
            inner (str | Sequence[str] | None): Substring(s) that must appear, but not
                at the start or end of the word.
            startswith (str | None): Required starting substring.
            endswith (str | None): Required ending substring.
            regexp (str | None): A regular expression that the word must match.
            raise_errors (bool): Whether to raise filtering errors or fail gently.

        Returns:
            str: A randomly generated word meeting the specified constraints (or an
                empty string on failure if `raise_errors` is False).

        Raises:
            ValueError: If `rnd` is not in [0, 1].
            FilterError: If filtering yields no results and `raise_errors` is True.
            VocabFormatError: If the underlying Vocab data is malformed.
            VocabEmptyError: If the underlying Vocab is empty.
        """
        glyphs = self.glyphs if glyphs is None else glyphs
        raise_errors = self.raise_errors if raise_errors is None else raise_errors
        vocab_obj = self.get_vocab(vocab)

        if not (0 <= rnd <= 1):
            raise ValueError("'rnd' must be between 0 and 1")

        if seed is not None:
            self._rand.seed(seed)

        try:
            wc_list = vocab_obj.filter(
                glyphs=glyphs,
                case=case,
                min_wl=min_wl,
                max_wl=max_wl,
                wl=wl,
                contains=contains,
                inner=inner,
                startswith=startswith,
                endswith=endswith,
                regexp=regexp,
            )
        except FilterError as e:
            if raise_errors:
                raise e
            else:
                log.warning("%s", e.args[0])
                return ""

        if top_k:
            wc_list = wc_list[:top_k]

        return _sample_word(wc_list, self._rand, rnd)

    def top_word(
        self,
        vocab: str | None = None,
        glyphs: str | None = None,
        seed: float | str | None = None,
        idx: int = 0,
        case: CaseType = "any",
        min_wl: int = 2,
        max_wl: int | None = None,
        wl: int | None = None,
        contains: str | Sequence[str] | None = None,
        inner: str | Sequence[str] | None = None,
        startswith: str | None = None,
        endswith: str | None = None,
        regexp: str | None = None,
        raise_errors: bool = False,
    ) -> str:
        """
        Retrieve the most common (or nth most common) word from the Vocab, subject to
        filtering constraints.

        Args:
            vocab (str | None): Name of the Vocab to use. If None, use the default
                Vocab.
            glyphs (str | None): Whitelisted glyphs to filter words. If None, uses
                default.
            seed (float | str | None): Seed the random number generator if seed is
                not None.
            idx (int): Index of the desired word in the frequency-sorted list
                (0-based).
            case (CaseType): Desired case form for the word (e.g., 'lower', 'upper',
                'any').
            min_wl (int): Minimum word length.
            max_wl (int | None): Maximum word length. If None, no maximum.
            wl (int | None): Exact word length. If None, no exact length filter.
            contains (str | Sequence[str] | None): Substring(s) that must appear in
                the word.
            inner (str | Sequence[str] | None): Substring(s) that must appear in the
                interior.
            startswith (str | None): Substring that the word must start with.
            endswith (str | None): Substring that the word must end with.
            regexp (str | None): Regex pattern that the word must match.
            raise_errors (bool): Whether to raise errors on filter or index failures.

        Returns:
            str: The nth most common word that meets the constraints (or an empty string
                on failure if `raise_errors` is False).

        Raises:
            FilterError: If filtering fails (no words match) and `raise_errors` is True.
            ValueError: If no default vocab is set when `vocab` is None.
            IndexError: If `idx` is out of range after filtering and `raise_errors` is
                True.
        """
        glyphs = self.glyphs if glyphs is None else glyphs
        raise_errors = self.raise_errors if raise_errors is None else raise_errors
        vocab_obj = self.get_vocab(self.vocab) if not vocab else self.get_vocab(vocab)

        try:
            wc_list = vocab_obj.filter(
                glyphs=glyphs,
                case=case,
                min_wl=min_wl,
                max_wl=max_wl,
                wl=wl,
                contains=contains,
                inner=inner,
                startswith=startswith,
                endswith=endswith,
                regexp=regexp,
            )
        except FilterError as e:
            if raise_errors:
                raise e
            else:
                log.warning("%s", e.args[0])
                return ""

        try:
            return wc_list[idx][0]
        except IndexError:
            if raise_errors:
                raise FilterError(f"No word at index idx='{idx}'")
            else:
                log.warning("No word at index idx='%s'", idx)
                return ""

    def words(
        self,
        vocab: str | None = None,
        glyphs: str | None = None,
        seed=None,
        n_words: int | None = None,
        min_n_words: int = 10,
        max_n_words: int = 20,
        numbers: float = 0,
        cap_first: bool | None = None,
        case: CaseType = "any",
        rnd: float = 0,
        min_wl: int = 1,
        max_wl: int | None = None,
        wl: int | None = None,
        raise_errors: bool = False,
        **word_kwargs,
    ) -> list[str]:
        """
        Generate a list of words (and optionally numbers) according to the specified
        parameters.

        This method will produce `n_words` tokens, each of which may be a word or a
        number (digit string), depending on the `numbers` ratio. It can also
        automatically handle capitalization of the first token if `cap_first` is True
        (or inferred).

        Args:
            vocab (str | None): Name of the Vocab to use. If None, uses the default
                Vocab.
            glyphs (str | None): Allowed glyph set. If None, uses the default glyphs.
            seed (any): Seed for the random number generator. If None, current state is
                used.
            n_words (int | None): Exact number of tokens to generate. If None, randomly
                choose between `min_n_words` and `max_n_words`.
            min_n_words (int): Minimum number of tokens if `n_words` is not specified.
            max_n_words (int): Maximum number of tokens if `n_words` is not specified.
            numbers (float): A value in [0, 1] that determines the probability of
                generating a numeric token instead of a word.
            cap_first (bool | None): If True, capitalize the first word (if `case` is
                "any"). If None, automatically decide based on glyphs availability.
            case (CaseType): Desired case form for the words ("any", "lower", "upper",
                etc.).
            rnd (float): Randomness factor for word selection, in [0, 1].
            min_wl (int): Minimum length for words/numbers.
            max_wl (int): Maximum length for words/numbers.
            wl (int | None): Exact length for words/numbers. If None, uses min/max_wl.
            raise_errors (bool): Whether to raise errors or fail gently.
            **word_kwargs: Additional keyword arguments passed along to `word()`.

        Returns:
            list[str]: A list of randomly generated tokens (words or numbers).

        Raises:
            ValueError: If `numbers` is not in [0, 1].
        """
        glyphs = self.glyphs if glyphs is None else glyphs

        if seed is not None:
            self._rand.seed(seed)

        if not n_words:
            n_words = self._rand.randint(min_n_words, max_n_words)

        if cap_first is None:
            if glyphs:
                # If constrained glyphs, only capitalize if uppercase letters exist
                cap_first = any(c for c in glyphs if c.isupper())
            else:
                # Otherwise, default to capitalize the first word
                cap_first = True

        if not (0 <= numbers <= 1):
            raise ValueError("'numbers' must be between 0 and 1")

        word_list = []
        last_w = None
        for i in range(n_words):
            if cap_first and case == "any" and i == 0:
                word_case: CaseType = "cap"
            else:
                word_case = case

            token_type = self._rand.choices(
                ["word", "number"],
                weights=[1 - numbers, numbers],
            )[0]

            if token_type == "word":
                w = self.word(
                    vocab=vocab,
                    glyphs=glyphs,
                    case=word_case,
                    rnd=rnd,
                    min_wl=min_wl,
                    max_wl=max_wl,
                    wl=wl,
                    raise_errors=raise_errors,
                    **word_kwargs,
                )

                # Try once more to avoid consecutive repeats
                # TODO: this is a hack, we should find a better way to avoid consecutive
                # repeats
                if w == last_w:
                    w = self.word(
                        vocab=vocab,
                        glyphs=glyphs,
                        case=word_case,
                        rnd=rnd,
                        min_wl=min_wl,
                        max_wl=max_wl,
                        wl=wl,
                        raise_errors=raise_errors,
                        **word_kwargs,
                    )

                if w:
                    word_list.append(w)
                    last_w = w
            else:
                # token_type == "number"
                w = self.number(
                    glyphs=glyphs,
                    wl=wl,
                    min_wl=min_wl,
                    max_wl=max_wl or _DEFAULT_MAX_NUM_LENGTH,
                    raise_errors=raise_errors,
                )

                if w:
                    word_list.append(w)
                    last_w = w

        return word_list

    def top_words(
        self,
        glyphs: str | None = None,
        vocab: str | None = None,
        n_words: int = 10,
        idx: int = 0,
        case: CaseType = "any",
        min_wl: int = 1,
        max_wl: int | None = None,
        wl: int | None = None,
        contains: str | Sequence[str] | None = None,
        inner: str | Sequence[str] | None = None,
        startswith: str | None = None,
        endswith: str | None = None,
        regexp: str | None = None,
        raise_errors: bool = False,
    ) -> list[str]:
        """
        Retrieve the top `n_words` from the Vocab, starting at index `idx`, subject to
        filtering constraints.

        Args:
            glyphs (str | None): Allowed glyph set. If None, uses default glyphs.
            vocab (str | None): Name of the Vocab to use. If None, use the default
                Vocab.
            n_words (int): Number of words to return.
            idx (int): The index at which to start returning words (0-based).
            case (CaseType): Desired case form ("any", "upper", "lower", etc.).
            min_wl (int): Minimum word length. Defaults to 1.
            max_wl (int | None): Maximum word length. If None, no maximum is applied.
            wl (int | None): Exact word length. If None, no exact length filter.
            contains (str | Sequence[str] | None): Substring(s) that must appear.
            inner (str | Sequence[str] | None): Substring(s) that must appear, not at
                edges.
            startswith (str | None): Required starting substring.
            endswith (str | None): Required ending substring.
            regexp (str | None): Regex pattern(s) to match.
            raise_errors (bool): Whether to raise errors or fail gently.

        Returns:
            list[str]: A list of up to `n_words` words, in descending frequency order.

        Raises:
            FilterError: If filtering fails (no words match) and `raise_errors` is True.
        """
        glyphs = self.glyphs if glyphs is None else glyphs
        vocab_obj = self.get_vocab(self.vocab) if not vocab else self.get_vocab(vocab)

        try:
            wc_list = vocab_obj.filter(
                glyphs=glyphs,
                case=case,
                min_wl=min_wl,
                max_wl=max_wl,
                wl=wl,
                contains=contains,
                inner=inner,
                startswith=startswith,
                endswith=endswith,
                regexp=regexp,
            )[idx : idx + n_words]
        except FilterError as e:
            if raise_errors:
                raise e
            else:
                log.warning("%s", e.args[0])
                return []

        if not wc_list:
            if raise_errors:
                raise FilterError(f"No words found at idx '{idx}'")
            else:
                log.warning("No words found at idx '%s'", idx)
                return []

        return [w for w, _ in wc_list]

    def sent(
        self,
        vocab: str | None = None,
        glyphs: str | None = None,
        seed=None,
        punc: bool = True,
        rnd_punc: float = 0,
        **words_kwargs,
    ) -> str:
        """
        Generate a single sentence, optionally punctuated, using words (and/or numbers).

        A sentence is created by calling `words(...)`, then (optionally) punctuating the
        resulting list.

        Args:
            vocab (str | None): Name of the Vocab to use. If None, use the default Vocab.
            glyphs (str | None): Allowed glyphs. If None, uses default glyphs.
            seed (any): Seed for the random number generator. If None, current state is used.
            punc (bool): Whether to add punctuation to the sentence.
            rnd_punc (float): A randomness factor between 0 and 1 that adjusts the punctuation
                frequency or distribution.
            **words_kwargs: Additional keyword arguments passed to `words(...)`.

        Returns:
            str: A single sentence, optionally with punctuation.

        Raises:
            ValueError: If `rnd_punc` is not in [0, 1].
        """
        glyphs = self.glyphs if glyphs is None else glyphs
        vocab_obj = self.get_vocab(vocab)

        if seed is not None:
            self._rand.seed(seed)

        word_list = self.words(
            glyphs=glyphs,
            vocab=vocab,
            **words_kwargs,
        )

        if punc:
            if not (0 <= rnd_punc <= 1):
                raise ValueError("'rnd_punc' must be between 0 and 1")

            if vocab_obj.punctuation:
                punctuation = vocab_obj.punctuation
            else:
                try:
                    try:
                        punctuation = DEFAULT_PUNCTUATION[vocab_obj.lang]
                    except KeyError:
                        # If no default punctuation is found, return unpunctuated sentence
                        return " ".join(word_list)
                except KeyError:
                    # If no default punctuation is found, return unpunctuated sentence
                    return " ".join(word_list)

            return _punctuate(
                punctuation,
                self._rand,
                word_list,
                glyphs,
                rnd_punc,
            )
        else:
            return " ".join(word_list)

    def sents(
        self,
        seed=None,
        min_n_sents: int = 3,
        max_n_sents: int = 5,
        n_sents: int | None = None,
        **sent_kwargs,
    ) -> list[str]:
        """
        Generate multiple sentences with `sent(...)`, returned as a list.

        Args:
            seed (float | str | None): Seed the random number generator if seed is not
                None.
            min_n_sents (int): Minimum number of sentences to produce if `n_sents` is None.
            max_n_sents (int): Maximum number of sentences to produce if `n_sents` is None.
            n_sents (int | None): If specified, exactly that many sentences are produced.
            **sent_kwargs: Additional keyword arguments passed to `sent(...)`.

        Returns:
            list[str]: A list of generated sentences.
        """
        if seed is not None:
            self._rand.seed(seed)

        if not n_sents:
            n_sents = self._rand.randint(min_n_sents, max_n_sents)

        return [self.sent(**sent_kwargs) for _ in range(n_sents)]

    def para(
        self,
        seed=None,
        sent_sep: str = " ",
        **sents_kwargs,
    ) -> str:
        """
        Generate a paragraph by creating multiple sentences with `sents(...)` and
        joining them with `sent_sep`.

        Args:
            seed (float | str | None): Seed the random number generator if seed is not
                None.
            sent_sep (str): The string used to join sentences.
            **sents_kwargs: Keyword arguments passed to `sents(...)`.

        Returns:
            str: A single paragraph containing multiple sentences.
        """
        if seed is not None:
            self._rand.seed(seed)

        return sent_sep.join(self.sents(**sents_kwargs))

    def paras(
        self,
        seed=None,
        n_paras: int = 3,
        **para_kwargs,
    ) -> list[str]:
        """
        Generate multiple paragraphs with `para(...)`, returned as a list.

        Args:
            seed (float | str | None): Seed the random number generator if seed is not
                None.
            n_paras (int): Number of paragraphs to generate.
            **para_kwargs: Additional keyword arguments passed to `para(...)`.

        Returns:
            list[str]: A list of paragraphs.
        """
        if seed is not None:
            self._rand.seed(seed)

        return [self.para(**para_kwargs) for _ in range(n_paras)]

    def text(
        self,
        seed: float | str | None = None,
        para_sep: str = "\n\n",
        **paras_kwargs,
    ) -> str:
        """
        Generate multiple paragraphs of text, calling `paras(...)` and joining them with
        `para_sep`.

        Args:
            seed (float | str | None): Seed the random number generator if seed is not
                None.
            para_sep (str): The string used to separate paragraphs in the final text.
            **paras_kwargs: Additional keyword arguments passed to `paras(...)`.

        Returns:
            str: A string containing multiple paragraphs of text, separated by
                `para_sep`.
        """
        if seed is not None:
            self._rand.seed(seed)

        return para_sep.join(self.paras(**paras_kwargs))

add_vocab

add_vocab(vocab_name, vocab)

Add a Vocab object to this WordSiv instance under a given name.

Parameters:

Name	Type	Description	Default
`vocab_name`	`str`	The unique identifier for this Vocab.	required
`vocab`	`Vocab`	The `Vocab` object to be associated with `vocab_name`.	required

Returns:

Type	Description
`None`	None

Source code in wordsiv/__init__.py

def add_vocab(self, vocab_name: str, vocab: Vocab) -> None:
    """
    Add a `Vocab` object to this `WordSiv` instance under a given name.

    Args:
        vocab_name (str): The unique identifier for this Vocab.
        vocab (Vocab): The `Vocab` object to be associated with `vocab_name`.

    Returns:
        None
    """
    self._vocab_lookup[vocab_name] = vocab

get_vocab

get_vocab(vocab_name=None)

Retrieve a Vocab by name, or return the default Vocab if vocab_name is None.

Parameters:

Name	Type	Description	Default
`vocab_name`	`str \| None`	The name of the Vocab to retrieve. If None, use the default `self.vocab`.	`None`

Returns:

Name	Type	Description
`Vocab`	`Vocab`	The `Vocab` object corresponding to `vocab_name`.

Raises:

Type	Description
`ValueError`	If no `vocab_name` is provided and no default is set.

Source code in wordsiv/__init__.py

def get_vocab(self, vocab_name: str | None = None) -> Vocab:
    """
    Retrieve a `Vocab` by name, or return the default Vocab if `vocab_name` is None.

    Args:
        vocab_name (str | None): The name of the Vocab to retrieve. If None,
            use the default `self.vocab`.

    Returns:
        Vocab: The `Vocab` object corresponding to `vocab_name`.

    Raises:
        ValueError: If no `vocab_name` is provided and no default is set.
    """
    if vocab_name:
        return self._vocab_lookup[vocab_name]
    else:
        if self.vocab:
            return self._vocab_lookup[self.vocab]
        else:
            raise ValueError("Error: no vocab specified")

list_vocabs

list_vocabs()

Return a list of all available Vocab names.

Returns:

Type	Description
`list[str]`	list[str]: A list of all registered Vocab names in this `WordSiv`.

Source code in wordsiv/__init__.py

def list_vocabs(self) -> list[str]:
    """
    Return a list of all available Vocab names.

    Returns:
        list[str]: A list of all registered Vocab names in this `WordSiv`.
    """
    return list(self._vocab_lookup.keys())

number

number(
    seed=None,
    glyphs=None,
    wl=None,
    min_wl=1,
    max_wl=_DEFAULT_MAX_NUM_LENGTH,
    raise_errors=False,
)

Generate a random numeric string (made of digits) constrained by glyphs and other parameters.

Parameters:

Name	Type	Description	Default
`seed`	`float \| str \| None`	Seed the random number generator if seed is not None.	`None`
`glyphs`	`str \| None`	A string of allowed glyphs. If None, uses the default glyphs of this `WordSiv` instance.	`None`
`wl`	`int \| None`	Exact length of the generated numeric string. If None, a random length between `min_wl` and `max_wl` is chosen.	`None`
`min_wl`	`int`	Minimum length of the numeric string. Defaults to 1.	`1`
`max_wl`	`int`	Maximum length of the numeric string. Defaults to 4.	`_DEFAULT_MAX_NUM_LENGTH`
`raise_errors`	`bool`	Whether to raise an error if no numerals are available.	`False`

Returns:

Name	Type	Description
`str`	`str`	A randomly generated string consisting of numerals.

Raises:

Type	Description
`ValueError`	If `min_wl` is greater than `max_wl`.
`FilterError`	If no numerals are available in `glyphs` and `raise_errors` is True.

Source code in wordsiv/__init__.py

def number(
    self,
    seed: float | str | None = None,
    glyphs: str | None = None,
    wl: int | None = None,
    min_wl: int = 1,
    max_wl: int = _DEFAULT_MAX_NUM_LENGTH,
    raise_errors: bool = False,
) -> str:
    """
    Generate a random numeric string (made of digits) constrained by glyphs and
    other parameters.

    Args:
        seed (float | str | None): Seed the random number generator if seed is not
            None.
        glyphs (str | None): A string of allowed glyphs. If None, uses the default
            glyphs of this `WordSiv` instance.
        wl (int | None): Exact length of the generated numeric string. If None, a
            random length between `min_wl` and `max_wl` is chosen.
        min_wl (int): Minimum length of the numeric string. Defaults to 1.
        max_wl (int): Maximum length of the numeric string. Defaults to 4.
        raise_errors (bool): Whether to raise an error if no numerals are available.

    Returns:
        str: A randomly generated string consisting of numerals.

    Raises:
        ValueError: If `min_wl` is greater than `max_wl`.
        FilterError: If no numerals are available in `glyphs` and `raise_errors` is
            True.
    """
    glyphs = self.glyphs if glyphs is None else glyphs
    raise_errors = self.raise_errors if raise_errors is None else raise_errors

    if seed is not None:
        self._rand.seed(seed)

    if wl:
        length = wl
    else:
        if min_wl > max_wl:
            raise ValueError("'min_wl' must be less than or equal to 'max_wl'")
        length = self._rand.randint(min_wl, max_wl)

    available_numerals = "".join(str(n) for n in range(0, 10))
    if glyphs:
        available_numerals = "".join(n for n in available_numerals if n in glyphs)

        if not available_numerals:
            if raise_errors:
                raise FilterError("No numerals available in glyphs")
            else:
                log.warning("No numerals available in glyphs")
                return ""

    return "".join(self._rand.choice(available_numerals) for _ in range(length))

para

para(seed=None, sent_sep=' ', **sents_kwargs)

Generate a paragraph by creating multiple sentences with sents(...) and joining them with sent_sep.

Parameters:

Name	Type	Description	Default
`seed`	`float \| str \| None`	Seed the random number generator if seed is not None.	`None`
`sent_sep`	`str`	The string used to join sentences.	`' '`
`**sents_kwargs`		Keyword arguments passed to `sents(...)`.	`{}`

Returns:

Name	Type	Description
`str`	`str`	A single paragraph containing multiple sentences.

Source code in wordsiv/__init__.py

def para(
    self,
    seed=None,
    sent_sep: str = " ",
    **sents_kwargs,
) -> str:
    """
    Generate a paragraph by creating multiple sentences with `sents(...)` and
    joining them with `sent_sep`.

    Args:
        seed (float | str | None): Seed the random number generator if seed is not
            None.
        sent_sep (str): The string used to join sentences.
        **sents_kwargs: Keyword arguments passed to `sents(...)`.

    Returns:
        str: A single paragraph containing multiple sentences.
    """
    if seed is not None:
        self._rand.seed(seed)

    return sent_sep.join(self.sents(**sents_kwargs))

paras

paras(seed=None, n_paras=3, **para_kwargs)

Generate multiple paragraphs with para(...), returned as a list.

Parameters:

Name	Type	Description	Default
`seed`	`float \| str \| None`	Seed the random number generator if seed is not None.	`None`
`n_paras`	`int`	Number of paragraphs to generate.	`3`
`**para_kwargs`		Additional keyword arguments passed to `para(...)`.	`{}`

Returns:

Type	Description
`list[str]`	list[str]: A list of paragraphs.

Source code in wordsiv/__init__.py

def paras(
    self,
    seed=None,
    n_paras: int = 3,
    **para_kwargs,
) -> list[str]:
    """
    Generate multiple paragraphs with `para(...)`, returned as a list.

    Args:
        seed (float | str | None): Seed the random number generator if seed is not
            None.
        n_paras (int): Number of paragraphs to generate.
        **para_kwargs: Additional keyword arguments passed to `para(...)`.

    Returns:
        list[str]: A list of paragraphs.
    """
    if seed is not None:
        self._rand.seed(seed)

    return [self.para(**para_kwargs) for _ in range(n_paras)]

seed

seed(seed=None)

Seed the random number generator for reproducible results.

Parameters:

Name	Type	Description	Default
`seed`	`float \| str \| None`	The seed value used to initialize the random number generator.	`None`

Returns:

Type	Description
`None`	None

Source code in wordsiv/__init__.py

def seed(self, seed: float | str | None = None) -> None:
    """
    Seed the random number generator for reproducible results.

    Args:
        seed (float | str | None): The seed value used to initialize the random
            number generator.

    Returns:
        None
    """
    self._rand.seed(seed)

sent

sent(
    vocab=None,
    glyphs=None,
    seed=None,
    punc=True,
    rnd_punc=0,
    **words_kwargs,
)

Generate a single sentence, optionally punctuated, using words (and/or numbers).

A sentence is created by calling words(...), then (optionally) punctuating the resulting list.

Parameters:

Name	Type	Description	Default
`vocab`	`str \| None`	Name of the Vocab to use. If None, use the default Vocab.	`None`
`glyphs`	`str \| None`	Allowed glyphs. If None, uses default glyphs.	`None`
`seed`	`any`	Seed for the random number generator. If None, current state is used.	`None`
`punc`	`bool`	Whether to add punctuation to the sentence.	`True`
`rnd_punc`	`float`	A randomness factor between 0 and 1 that adjusts the punctuation frequency or distribution.	`0`
`**words_kwargs`		Additional keyword arguments passed to `words(...)`.	`{}`

Returns:

Name	Type	Description
`str`	`str`	A single sentence, optionally with punctuation.

Raises:

Type	Description
`ValueError`	If `rnd_punc` is not in [0, 1].

Source code in wordsiv/__init__.py

def sent(
    self,
    vocab: str | None = None,
    glyphs: str | None = None,
    seed=None,
    punc: bool = True,
    rnd_punc: float = 0,
    **words_kwargs,
) -> str:
    """
    Generate a single sentence, optionally punctuated, using words (and/or numbers).

    A sentence is created by calling `words(...)`, then (optionally) punctuating the
    resulting list.

    Args:
        vocab (str | None): Name of the Vocab to use. If None, use the default Vocab.
        glyphs (str | None): Allowed glyphs. If None, uses default glyphs.
        seed (any): Seed for the random number generator. If None, current state is used.
        punc (bool): Whether to add punctuation to the sentence.
        rnd_punc (float): A randomness factor between 0 and 1 that adjusts the punctuation
            frequency or distribution.
        **words_kwargs: Additional keyword arguments passed to `words(...)`.

    Returns:
        str: A single sentence, optionally with punctuation.

    Raises:
        ValueError: If `rnd_punc` is not in [0, 1].
    """
    glyphs = self.glyphs if glyphs is None else glyphs
    vocab_obj = self.get_vocab(vocab)

    if seed is not None:
        self._rand.seed(seed)

    word_list = self.words(
        glyphs=glyphs,
        vocab=vocab,
        **words_kwargs,
    )

    if punc:
        if not (0 <= rnd_punc <= 1):
            raise ValueError("'rnd_punc' must be between 0 and 1")

        if vocab_obj.punctuation:
            punctuation = vocab_obj.punctuation
        else:
            try:
                try:
                    punctuation = DEFAULT_PUNCTUATION[vocab_obj.lang]
                except KeyError:
                    # If no default punctuation is found, return unpunctuated sentence
                    return " ".join(word_list)
            except KeyError:
                # If no default punctuation is found, return unpunctuated sentence
                return " ".join(word_list)

        return _punctuate(
            punctuation,
            self._rand,
            word_list,
            glyphs,
            rnd_punc,
        )
    else:
        return " ".join(word_list)

sents

sents(
    seed=None,
    min_n_sents=3,
    max_n_sents=5,
    n_sents=None,
    **sent_kwargs,
)

Generate multiple sentences with sent(...), returned as a list.

Parameters:

Name	Type	Description	Default
`seed`	`float \| str \| None`	Seed the random number generator if seed is not None.	`None`
`min_n_sents`	`int`	Minimum number of sentences to produce if `n_sents` is None.	`3`
`max_n_sents`	`int`	Maximum number of sentences to produce if `n_sents` is None.	`5`
`n_sents`	`int \| None`	If specified, exactly that many sentences are produced.	`None`
`**sent_kwargs`		Additional keyword arguments passed to `sent(...)`.	`{}`

Returns:

Type	Description
`list[str]`	list[str]: A list of generated sentences.

Source code in wordsiv/__init__.py

def sents(
    self,
    seed=None,
    min_n_sents: int = 3,
    max_n_sents: int = 5,
    n_sents: int | None = None,
    **sent_kwargs,
) -> list[str]:
    """
    Generate multiple sentences with `sent(...)`, returned as a list.

    Args:
        seed (float | str | None): Seed the random number generator if seed is not
            None.
        min_n_sents (int): Minimum number of sentences to produce if `n_sents` is None.
        max_n_sents (int): Maximum number of sentences to produce if `n_sents` is None.
        n_sents (int | None): If specified, exactly that many sentences are produced.
        **sent_kwargs: Additional keyword arguments passed to `sent(...)`.

    Returns:
        list[str]: A list of generated sentences.
    """
    if seed is not None:
        self._rand.seed(seed)

    if not n_sents:
        n_sents = self._rand.randint(min_n_sents, max_n_sents)

    return [self.sent(**sent_kwargs) for _ in range(n_sents)]

text

text(seed=None, para_sep='\n\n', **paras_kwargs)

Generate multiple paragraphs of text, calling paras(...) and joining them with para_sep.

Parameters:

Name	Type	Description	Default
`seed`	`float \| str \| None`	Seed the random number generator if seed is not None.	`None`
`para_sep`	`str`	The string used to separate paragraphs in the final text.	`'\n\n'`
`**paras_kwargs`		Additional keyword arguments passed to `paras(...)`.	`{}`

Returns:

Name	Type	Description
`str`	`str`	A string containing multiple paragraphs of text, separated by `para_sep`.

Source code in wordsiv/__init__.py

def text(
    self,
    seed: float | str | None = None,
    para_sep: str = "\n\n",
    **paras_kwargs,
) -> str:
    """
    Generate multiple paragraphs of text, calling `paras(...)` and joining them with
    `para_sep`.

    Args:
        seed (float | str | None): Seed the random number generator if seed is not
            None.
        para_sep (str): The string used to separate paragraphs in the final text.
        **paras_kwargs: Additional keyword arguments passed to `paras(...)`.

    Returns:
        str: A string containing multiple paragraphs of text, separated by
            `para_sep`.
    """
    if seed is not None:
        self._rand.seed(seed)

    return para_sep.join(self.paras(**paras_kwargs))

top_word

top_word(
    vocab=None,
    glyphs=None,
    seed=None,
    idx=0,
    case="any",
    min_wl=2,
    max_wl=None,
    wl=None,
    contains=None,
    inner=None,
    startswith=None,
    endswith=None,
    regexp=None,
    raise_errors=False,
)

Retrieve the most common (or nth most common) word from the Vocab, subject to filtering constraints.

Parameters:

Name	Type	Description	Default
`vocab`	`str \| None`	Name of the Vocab to use. If None, use the default Vocab.	`None`
`glyphs`	`str \| None`	Whitelisted glyphs to filter words. If None, uses default.	`None`
`seed`	`float \| str \| None`	Seed the random number generator if seed is not None.	`None`
`idx`	`int`	Index of the desired word in the frequency-sorted list (0-based).	`0`
`case`	`CaseType`	Desired case form for the word (e.g., 'lower', 'upper', 'any').	`'any'`
`min_wl`	`int`	Minimum word length.	`2`
`max_wl`	`int \| None`	Maximum word length. If None, no maximum.	`None`
`wl`	`int \| None`	Exact word length. If None, no exact length filter.	`None`
`contains`	`str \| Sequence[str] \| None`	Substring(s) that must appear in the word.	`None`
`inner`	`str \| Sequence[str] \| None`	Substring(s) that must appear in the interior.	`None`
`startswith`	`str \| None`	Substring that the word must start with.	`None`
`endswith`	`str \| None`	Substring that the word must end with.	`None`
`regexp`	`str \| None`	Regex pattern that the word must match.	`None`
`raise_errors`	`bool`	Whether to raise errors on filter or index failures.	`False`

Returns:

Name	Type	Description
`str`	`str`	The nth most common word that meets the constraints (or an empty string on failure if `raise_errors` is False).

Raises:

Type	Description
`FilterError`	If filtering fails (no words match) and `raise_errors` is True.
`ValueError`	If no default vocab is set when `vocab` is None.
`IndexError`	If `idx` is out of range after filtering and `raise_errors` is True.

Source code in wordsiv/__init__.py

def top_word(
    self,
    vocab: str | None = None,
    glyphs: str | None = None,
    seed: float | str | None = None,
    idx: int = 0,
    case: CaseType = "any",
    min_wl: int = 2,
    max_wl: int | None = None,
    wl: int | None = None,
    contains: str | Sequence[str] | None = None,
    inner: str | Sequence[str] | None = None,
    startswith: str | None = None,
    endswith: str | None = None,
    regexp: str | None = None,
    raise_errors: bool = False,
) -> str:
    """
    Retrieve the most common (or nth most common) word from the Vocab, subject to
    filtering constraints.

    Args:
        vocab (str | None): Name of the Vocab to use. If None, use the default
            Vocab.
        glyphs (str | None): Whitelisted glyphs to filter words. If None, uses
            default.
        seed (float | str | None): Seed the random number generator if seed is
            not None.
        idx (int): Index of the desired word in the frequency-sorted list
            (0-based).
        case (CaseType): Desired case form for the word (e.g., 'lower', 'upper',
            'any').
        min_wl (int): Minimum word length.
        max_wl (int | None): Maximum word length. If None, no maximum.
        wl (int | None): Exact word length. If None, no exact length filter.
        contains (str | Sequence[str] | None): Substring(s) that must appear in
            the word.
        inner (str | Sequence[str] | None): Substring(s) that must appear in the
            interior.
        startswith (str | None): Substring that the word must start with.
        endswith (str | None): Substring that the word must end with.
        regexp (str | None): Regex pattern that the word must match.
        raise_errors (bool): Whether to raise errors on filter or index failures.

    Returns:
        str: The nth most common word that meets the constraints (or an empty string
            on failure if `raise_errors` is False).

    Raises:
        FilterError: If filtering fails (no words match) and `raise_errors` is True.
        ValueError: If no default vocab is set when `vocab` is None.
        IndexError: If `idx` is out of range after filtering and `raise_errors` is
            True.
    """
    glyphs = self.glyphs if glyphs is None else glyphs
    raise_errors = self.raise_errors if raise_errors is None else raise_errors
    vocab_obj = self.get_vocab(self.vocab) if not vocab else self.get_vocab(vocab)

    try:
        wc_list = vocab_obj.filter(
            glyphs=glyphs,
            case=case,
            min_wl=min_wl,
            max_wl=max_wl,
            wl=wl,
            contains=contains,
            inner=inner,
            startswith=startswith,
            endswith=endswith,
            regexp=regexp,
        )
    except FilterError as e:
        if raise_errors:
            raise e
        else:
            log.warning("%s", e.args[0])
            return ""

    try:
        return wc_list[idx][0]
    except IndexError:
        if raise_errors:
            raise FilterError(f"No word at index idx='{idx}'")
        else:
            log.warning("No word at index idx='%s'", idx)
            return ""

top_words

top_words(
    glyphs=None,
    vocab=None,
    n_words=10,
    idx=0,
    case="any",
    min_wl=1,
    max_wl=None,
    wl=None,
    contains=None,
    inner=None,
    startswith=None,
    endswith=None,
    regexp=None,
    raise_errors=False,
)

Retrieve the top n_words from the Vocab, starting at index idx, subject to filtering constraints.

Parameters:

Name	Type	Description	Default
`glyphs`	`str \| None`	Allowed glyph set. If None, uses default glyphs.	`None`
`vocab`	`str \| None`	Name of the Vocab to use. If None, use the default Vocab.	`None`
`n_words`	`int`	Number of words to return.	`10`
`idx`	`int`	The index at which to start returning words (0-based).	`0`
`case`	`CaseType`	Desired case form ("any", "upper", "lower", etc.).	`'any'`
`min_wl`	`int`	Minimum word length. Defaults to 1.	`1`
`max_wl`	`int \| None`	Maximum word length. If None, no maximum is applied.	`None`
`wl`	`int \| None`	Exact word length. If None, no exact length filter.	`None`
`contains`	`str \| Sequence[str] \| None`	Substring(s) that must appear.	`None`
`inner`	`str \| Sequence[str] \| None`	Substring(s) that must appear, not at edges.	`None`
`startswith`	`str \| None`	Required starting substring.	`None`
`endswith`	`str \| None`	Required ending substring.	`None`
`regexp`	`str \| None`	Regex pattern(s) to match.	`None`
`raise_errors`	`bool`	Whether to raise errors or fail gently.	`False`

Returns:

Type	Description
`list[str]`	list[str]: A list of up to `n_words` words, in descending frequency order.

Raises:

Type	Description
`FilterError`	If filtering fails (no words match) and `raise_errors` is True.

Source code in wordsiv/__init__.py

def top_words(
    self,
    glyphs: str | None = None,
    vocab: str | None = None,
    n_words: int = 10,
    idx: int = 0,
    case: CaseType = "any",
    min_wl: int = 1,
    max_wl: int | None = None,
    wl: int | None = None,
    contains: str | Sequence[str] | None = None,
    inner: str | Sequence[str] | None = None,
    startswith: str | None = None,
    endswith: str | None = None,
    regexp: str | None = None,
    raise_errors: bool = False,
) -> list[str]:
    """
    Retrieve the top `n_words` from the Vocab, starting at index `idx`, subject to
    filtering constraints.

    Args:
        glyphs (str | None): Allowed glyph set. If None, uses default glyphs.
        vocab (str | None): Name of the Vocab to use. If None, use the default
            Vocab.
        n_words (int): Number of words to return.
        idx (int): The index at which to start returning words (0-based).
        case (CaseType): Desired case form ("any", "upper", "lower", etc.).
        min_wl (int): Minimum word length. Defaults to 1.
        max_wl (int | None): Maximum word length. If None, no maximum is applied.
        wl (int | None): Exact word length. If None, no exact length filter.
        contains (str | Sequence[str] | None): Substring(s) that must appear.
        inner (str | Sequence[str] | None): Substring(s) that must appear, not at
            edges.
        startswith (str | None): Required starting substring.
        endswith (str | None): Required ending substring.
        regexp (str | None): Regex pattern(s) to match.
        raise_errors (bool): Whether to raise errors or fail gently.

    Returns:
        list[str]: A list of up to `n_words` words, in descending frequency order.

    Raises:
        FilterError: If filtering fails (no words match) and `raise_errors` is True.
    """
    glyphs = self.glyphs if glyphs is None else glyphs
    vocab_obj = self.get_vocab(self.vocab) if not vocab else self.get_vocab(vocab)

    try:
        wc_list = vocab_obj.filter(
            glyphs=glyphs,
            case=case,
            min_wl=min_wl,
            max_wl=max_wl,
            wl=wl,
            contains=contains,
            inner=inner,
            startswith=startswith,
            endswith=endswith,
            regexp=regexp,
        )[idx : idx + n_words]
    except FilterError as e:
        if raise_errors:
            raise e
        else:
            log.warning("%s", e.args[0])
            return []

    if not wc_list:
        if raise_errors:
            raise FilterError(f"No words found at idx '{idx}'")
        else:
            log.warning("No words found at idx '%s'", idx)
            return []

    return [w for w, _ in wc_list]

word

word(
    vocab=None,
    glyphs=None,
    seed=None,
    rnd=0,
    case="any",
    top_k=0,
    min_wl=1,
    max_wl=None,
    wl=None,
    contains=None,
    inner=None,
    startswith=None,
    endswith=None,
    regexp=None,
    raise_errors=False,
)

Generate a random word that meets a variety of constraints, such as glyphs, length, regex filters, etc.

Parameters:

Name	Type	Description	Default
`vocab`	`str \| None`	Name of the Vocab to use. If None, uses default Vocab.	`None`
`glyphs`	`str \| None`	A string of allowed glyphs. If None, uses default glyphs.	`None`
`seed`	`float \| str \| None`	Seed the random number generator if seed is not None.	`None`
`rnd`	`float`	Randomness factor in [0, 1] for selecting among the top words.	`0`
`case`	`CaseType`	Desired case of the output word (e.g., 'upper', 'lower', 'any').	`'any'`
`top_k`	`int`	If > 0, only consider the top K words by frequency.	`0`
`min_wl`	`int`	Minimum word length.	`1`
`max_wl`	`int \| None`	Maximum word length. If None, no maximum is applied.	`None`
`wl`	`int \| None`	Exact word length. If None, no exact length is enforced.	`None`
`contains`	`str \| Sequence[str] \| None`	Substring(s) that must appear in the word.	`None`
`inner`	`str \| Sequence[str] \| None`	Substring(s) that must appear, but not at the start or end of the word.	`None`
`startswith`	`str \| None`	Required starting substring.	`None`
`endswith`	`str \| None`	Required ending substring.	`None`
`regexp`	`str \| None`	A regular expression that the word must match.	`None`
`raise_errors`	`bool`	Whether to raise filtering errors or fail gently.	`False`

Returns:

Name	Type	Description
`str`	`str`	A randomly generated word meeting the specified constraints (or an empty string on failure if `raise_errors` is False).

Raises:

Type	Description
`ValueError`	If `rnd` is not in [0, 1].
`FilterError`	If filtering yields no results and `raise_errors` is True.
`VocabFormatError`	If the underlying Vocab data is malformed.
`VocabEmptyError`	If the underlying Vocab is empty.

Source code in wordsiv/__init__.py

def word(
    self,
    vocab: str | None = None,
    glyphs: str | None = None,
    seed: float | str | None = None,
    rnd: float = 0,
    case: CaseType = "any",
    top_k: int = 0,
    min_wl: int = 1,
    max_wl: int | None = None,
    wl: int | None = None,
    contains: str | Sequence[str] | None = None,
    inner: str | Sequence[str] | None = None,
    startswith: str | None = None,
    endswith: str | None = None,
    regexp: str | None = None,
    raise_errors: bool = False,
) -> str:
    """
    Generate a random word that meets a variety of constraints, such as glyphs,
    length, regex filters, etc.

    Args:
        vocab (str | None): Name of the Vocab to use. If None, uses default Vocab.
        glyphs (str | None): A string of allowed glyphs. If None, uses default
            glyphs.
        seed (float | str | None): Seed the random number generator if seed is not
            None.
        rnd (float): Randomness factor in [0, 1] for selecting among the top words.
        case (CaseType): Desired case of the output word (e.g., 'upper', 'lower',
            'any').
        top_k (int): If > 0, only consider the top K words by frequency.
        min_wl (int): Minimum word length.
        max_wl (int | None): Maximum word length. If None, no maximum is applied.
        wl (int | None): Exact word length. If None, no exact length is enforced.
        contains (str | Sequence[str] | None): Substring(s) that must appear in the
            word.
        inner (str | Sequence[str] | None): Substring(s) that must appear, but not
            at the start or end of the word.
        startswith (str | None): Required starting substring.
        endswith (str | None): Required ending substring.
        regexp (str | None): A regular expression that the word must match.
        raise_errors (bool): Whether to raise filtering errors or fail gently.

    Returns:
        str: A randomly generated word meeting the specified constraints (or an
            empty string on failure if `raise_errors` is False).

    Raises:
        ValueError: If `rnd` is not in [0, 1].
        FilterError: If filtering yields no results and `raise_errors` is True.
        VocabFormatError: If the underlying Vocab data is malformed.
        VocabEmptyError: If the underlying Vocab is empty.
    """
    glyphs = self.glyphs if glyphs is None else glyphs
    raise_errors = self.raise_errors if raise_errors is None else raise_errors
    vocab_obj = self.get_vocab(vocab)

    if not (0 <= rnd <= 1):
        raise ValueError("'rnd' must be between 0 and 1")

    if seed is not None:
        self._rand.seed(seed)

    try:
        wc_list = vocab_obj.filter(
            glyphs=glyphs,
            case=case,
            min_wl=min_wl,
            max_wl=max_wl,
            wl=wl,
            contains=contains,
            inner=inner,
            startswith=startswith,
            endswith=endswith,
            regexp=regexp,
        )
    except FilterError as e:
        if raise_errors:
            raise e
        else:
            log.warning("%s", e.args[0])
            return ""

    if top_k:
        wc_list = wc_list[:top_k]

    return _sample_word(wc_list, self._rand, rnd)

words

words(
    vocab=None,
    glyphs=None,
    seed=None,
    n_words=None,
    min_n_words=10,
    max_n_words=20,
    numbers=0,
    cap_first=None,
    case="any",
    rnd=0,
    min_wl=1,
    max_wl=None,
    wl=None,
    raise_errors=False,
    **word_kwargs,
)

Generate a list of words (and optionally numbers) according to the specified parameters.

This method will produce n_words tokens, each of which may be a word or a number (digit string), depending on the numbers ratio. It can also automatically handle capitalization of the first token if cap_first is True (or inferred).

Parameters:

Name	Type	Description	Default
`vocab`	`str \| None`	Name of the Vocab to use. If None, uses the default Vocab.	`None`
`glyphs`	`str \| None`	Allowed glyph set. If None, uses the default glyphs.	`None`
`seed`	`any`	Seed for the random number generator. If None, current state is used.	`None`
`n_words`	`int \| None`	Exact number of tokens to generate. If None, randomly choose between `min_n_words` and `max_n_words`.	`None`
`min_n_words`	`int`	Minimum number of tokens if `n_words` is not specified.	`10`
`max_n_words`	`int`	Maximum number of tokens if `n_words` is not specified.	`20`
`numbers`	`float`	A value in [0, 1] that determines the probability of generating a numeric token instead of a word.	`0`
`cap_first`	`bool \| None`	If True, capitalize the first word (if `case` is "any"). If None, automatically decide based on glyphs availability.	`None`
`case`	`CaseType`	Desired case form for the words ("any", "lower", "upper", etc.).	`'any'`
`rnd`	`float`	Randomness factor for word selection, in [0, 1].	`0`
`min_wl`	`int`	Minimum length for words/numbers.	`1`
`max_wl`	`int`	Maximum length for words/numbers.	`None`
`wl`	`int \| None`	Exact length for words/numbers. If None, uses min/max_wl.	`None`
`raise_errors`	`bool`	Whether to raise errors or fail gently.	`False`
`**word_kwargs`		Additional keyword arguments passed along to `word()`.	`{}`

Returns:

Type	Description
`list[str]`	list[str]: A list of randomly generated tokens (words or numbers).

Raises:

Type	Description
`ValueError`	If `numbers` is not in [0, 1].

Source code in wordsiv/__init__.py

def words(
    self,
    vocab: str | None = None,
    glyphs: str | None = None,
    seed=None,
    n_words: int | None = None,
    min_n_words: int = 10,
    max_n_words: int = 20,
    numbers: float = 0,
    cap_first: bool | None = None,
    case: CaseType = "any",
    rnd: float = 0,
    min_wl: int = 1,
    max_wl: int | None = None,
    wl: int | None = None,
    raise_errors: bool = False,
    **word_kwargs,
) -> list[str]:
    """
    Generate a list of words (and optionally numbers) according to the specified
    parameters.

    This method will produce `n_words` tokens, each of which may be a word or a
    number (digit string), depending on the `numbers` ratio. It can also
    automatically handle capitalization of the first token if `cap_first` is True
    (or inferred).

    Args:
        vocab (str | None): Name of the Vocab to use. If None, uses the default
            Vocab.
        glyphs (str | None): Allowed glyph set. If None, uses the default glyphs.
        seed (any): Seed for the random number generator. If None, current state is
            used.
        n_words (int | None): Exact number of tokens to generate. If None, randomly
            choose between `min_n_words` and `max_n_words`.
        min_n_words (int): Minimum number of tokens if `n_words` is not specified.
        max_n_words (int): Maximum number of tokens if `n_words` is not specified.
        numbers (float): A value in [0, 1] that determines the probability of
            generating a numeric token instead of a word.
        cap_first (bool | None): If True, capitalize the first word (if `case` is
            "any"). If None, automatically decide based on glyphs availability.
        case (CaseType): Desired case form for the words ("any", "lower", "upper",
            etc.).
        rnd (float): Randomness factor for word selection, in [0, 1].
        min_wl (int): Minimum length for words/numbers.
        max_wl (int): Maximum length for words/numbers.
        wl (int | None): Exact length for words/numbers. If None, uses min/max_wl.
        raise_errors (bool): Whether to raise errors or fail gently.
        **word_kwargs: Additional keyword arguments passed along to `word()`.

    Returns:
        list[str]: A list of randomly generated tokens (words or numbers).

    Raises:
        ValueError: If `numbers` is not in [0, 1].
    """
    glyphs = self.glyphs if glyphs is None else glyphs

    if seed is not None:
        self._rand.seed(seed)

    if not n_words:
        n_words = self._rand.randint(min_n_words, max_n_words)

    if cap_first is None:
        if glyphs:
            # If constrained glyphs, only capitalize if uppercase letters exist
            cap_first = any(c for c in glyphs if c.isupper())
        else:
            # Otherwise, default to capitalize the first word
            cap_first = True

    if not (0 <= numbers <= 1):
        raise ValueError("'numbers' must be between 0 and 1")

    word_list = []
    last_w = None
    for i in range(n_words):
        if cap_first and case == "any" and i == 0:
            word_case: CaseType = "cap"
        else:
            word_case = case

        token_type = self._rand.choices(
            ["word", "number"],
            weights=[1 - numbers, numbers],
        )[0]

        if token_type == "word":
            w = self.word(
                vocab=vocab,
                glyphs=glyphs,
                case=word_case,
                rnd=rnd,
                min_wl=min_wl,
                max_wl=max_wl,
                wl=wl,
                raise_errors=raise_errors,
                **word_kwargs,
            )

            # Try once more to avoid consecutive repeats
            # TODO: this is a hack, we should find a better way to avoid consecutive
            # repeats
            if w == last_w:
                w = self.word(
                    vocab=vocab,
                    glyphs=glyphs,
                    case=word_case,
                    rnd=rnd,
                    min_wl=min_wl,
                    max_wl=max_wl,
                    wl=wl,
                    raise_errors=raise_errors,
                    **word_kwargs,
                )

            if w:
                word_list.append(w)
                last_w = w
        else:
            # token_type == "number"
            w = self.number(
                glyphs=glyphs,
                wl=wl,
                min_wl=min_wl,
                max_wl=max_wl or _DEFAULT_MAX_NUM_LENGTH,
                raise_errors=raise_errors,
            )

            if w:
                word_list.append(w)
                last_w = w

    return word_list

API Reference

CaseType module-attribute

Vocab

data property

wordcount property

wordcount_str property

__init__

WordSiv

add_vocab

get_vocab

list_vocabs

number

para

paras

seed

sent

sents

text

top_word

top_words

word

words

CaseType `module-attribute`

data `property`

wordcount `property`

wordcount_str `property`

init