Skip to content

API Reference

WordSiv is a Python library for generating text for an incomplete typeface.

Classes:

Name Description
Vocab

A vocabulary of words and occurrence counts with metadata for filtering and punctuating.

WordSiv

The main WordSiv object which uses Vocabs to generate text.

Attributes:

Name Type Description
CaseType

Options for setting case via the case argument.

CaseType module-attribute

CaseType = Literal[
    "any",
    "any_og",
    "lc",
    "lc_force",
    "cap",
    "cap_og",
    "cap_force",
    "uc",
    "uc_og",
    "uc_force",
]

Options for setting case via the case argument. See Letter Case in the Guide for detailed descriptions and examples of each option

Vocab

A vocabulary of words and occurrence counts with metadata for filtering and punctuating.

Attributes:

Name Type Description
lang str

The language of the vocabulary.

bicameral bool

Specifies whether the vocabulary has uppercase and lowercase letters.

punctuation dict

A dictionary or None for handling punctuation in generated text.

data str

A TSV-formatted string with word-count pairs or a newline-delimited list of words.

data_file str | Traversable

A path to a file to supply the data instead of the data attribute.

meta dict

Additional metadata for the vocabulary.

Methods:

Name Description
__init__

Initializes the Vocab instance.

Attributes:

Name Type Description
data

Returns raw data from parameter _data or data_file.

wordcount tuple[tuple[str, int], ...]

Returns a tuple of tuples with words and counts.

wordcount_str str

Returns a TSV-formatted string with words and counts.

Source code in wordsiv/_vocab.py
class Vocab:
    """A vocabulary of words and occurrence counts with metadata for filtering and punctuating.

    Attributes:
        lang (str): The language of the vocabulary.
        bicameral (bool): Specifies whether the vocabulary has uppercase and lowercase letters.
        punctuation (dict, optional): A dictionary or None for handling punctuation in generated text.
        data (str, optional): A TSV-formatted string with word-count pairs or a newline-delimited list of words.
        data_file (str | Traversable, optional): A path to a file to supply the data instead of the data attribute.
        meta (dict, optional): Additional metadata for the vocabulary.
    """

    def __init__(
        self,
        lang: str,
        bicameral: bool,
        punctuation: dict | None = None,
        data: str | None = None,
        data_file: str | Traversable | None = None,
        meta: dict | None = None,
    ):
        """Initializes the Vocab instance.

        Args:
            lang (str): The language of the vocabulary.
            bicameral (bool): Specifies whether the vocabulary has uppercase and lowercase letters.
            punctuation (dict, optional): A dictionary or None for handling punctuation in generated text.
            data (str, optional): A TSV-formatted string with word-count pairs or a newline-delimited list of words.
            data_file (str | Traversable, optional): A path to a file to supply the data instead of the data attribute.
            meta (dict, optional): Additional metadata for the vocabulary.
        """

        self.lang = lang
        self.bicameral = bicameral
        self.punctuation = punctuation
        self._data = data
        self.data_file = data_file
        self.meta = meta

        if data and data_file:
            raise ValueError("Cannot specify both 'data' and 'data_file'")
        elif data is None and not data_file:
            raise ValueError("Must specify either 'data' or 'data_file'")

    @property
    def data(self):
        """Returns raw data from parameter _data or data_file."""

        if self._data is not None:
            data = self._data
        elif getattr(self, "data_file", None):
            data = _read_file(self.data_file)
        if not data:
            raise VocabEmptyError(f"No data found in {self.data_file}")

        return data

    @property
    def wordcount_str(self) -> str:
        """Returns a TSV-formatted string with words and counts."""

        firstline = self.data.partition("\n")[0]

        if regex.match(r"[[:alpha:]]+\t\d+$", firstline):
            # if we have counts, return the original string
            return self.data
        elif regex.match(r"[[:alpha:]]+$", firstline):
            # if we just have newline-delimited words, add counts of 1
            return _add_counts_to_wordcount_str(self.data)
        else:
            raise VocabFormatError(
                "The vocab file is formatted incorrectly. "
                "Should be a TSV file with words and counts as columns, or a newline-delimited list of words."
            )

    @property
    def wordcount(self) -> tuple[tuple[str, int], ...]:
        """Returns a tuple of tuples with words and counts."""

        return _wordcount_str_to_wordcount_tuple(self.wordcount_str)

    def filter(self, **kwargs):
        return _filter_wordcount(self.wordcount_str, self.bicameral, **kwargs)

data property

data

Returns raw data from parameter _data or data_file.

wordcount property

wordcount

Returns a tuple of tuples with words and counts.

wordcount_str property

wordcount_str

Returns a TSV-formatted string with words and counts.

__init__

__init__(
    lang,
    bicameral,
    punctuation=None,
    data=None,
    data_file=None,
    meta=None,
)

Parameters:

Name Type Description Default
lang str

The language of the vocabulary.

required
bicameral bool

Specifies whether the vocabulary has uppercase and lowercase letters.

required
punctuation dict

A dictionary or None for handling punctuation in generated text.

None
data str

A TSV-formatted string with word-count pairs or a newline-delimited list of words.

None
data_file str | Traversable

A path to a file to supply the data instead of the data attribute.

None
meta dict

Additional metadata for the vocabulary.

None
Source code in wordsiv/_vocab.py
def __init__(
    self,
    lang: str,
    bicameral: bool,
    punctuation: dict | None = None,
    data: str | None = None,
    data_file: str | Traversable | None = None,
    meta: dict | None = None,
):
    """Initializes the Vocab instance.

    Args:
        lang (str): The language of the vocabulary.
        bicameral (bool): Specifies whether the vocabulary has uppercase and lowercase letters.
        punctuation (dict, optional): A dictionary or None for handling punctuation in generated text.
        data (str, optional): A TSV-formatted string with word-count pairs or a newline-delimited list of words.
        data_file (str | Traversable, optional): A path to a file to supply the data instead of the data attribute.
        meta (dict, optional): Additional metadata for the vocabulary.
    """

    self.lang = lang
    self.bicameral = bicameral
    self.punctuation = punctuation
    self._data = data
    self.data_file = data_file
    self.meta = meta

    if data and data_file:
        raise ValueError("Cannot specify both 'data' and 'data_file'")
    elif data is None and not data_file:
        raise ValueError("Must specify either 'data' or 'data_file'")

WordSiv

The main WordSiv object which uses Vocabs to generate text.

This object serves as the main interface for generating text. It can hold multiple vocabulary objects, store default settings (like default glyphs and vocab), and expose high-level methods that produce words, sentences, paragraphs, and more.

Parameters:

Name Type Description Default
vocab str | None

The name of the default Vocab.

None
glyphs str | None

The default set of glyphs that constrains the words generated.

None
add_default_vocabs bool

Whether to add the default Vocabs defined in DEFAULT_VOCABS.

True
raise_errors bool

Whether to raise errors or fail gently.

False
seed int | None

Seed for the random number generator.

None

Attributes:

Name Type Description
vocab str | None

The name of the default Vocab.

glyphs str | None

The default set of glyphs that constrains the words generated.

raise_errors bool

Whether to raise errors or fail gently.

_vocab_lookup dict[str, Vocab]

A dictionary of vocab names to Vocab objects.

_rand Random

A random.Random instance.

Methods:

Name Description
add_vocab

Add a Vocab object to this WordSiv instance under a given name.

get_vocab

Retrieve a Vocab by name, or return the default Vocab if vocab_name is None.

list_vocabs

Return a list of all available Vocab names.

number

Generate a random numeric string (made of digits) constrained by glyphs and

para

Generate a paragraph by creating multiple sentences with sents(...) and

paras

Generate multiple paragraphs with para(...), returned as a list.

seed

Seed the random number generator for reproducible results.

sent

Generate a single sentence, optionally punctuated, using words (and/or numbers).

sents

Generate multiple sentences with sent(...), returned as a list.

text

Generate multiple paragraphs of text, calling paras(...) and joining them with

top_word

Retrieve the most common (or nth most common) word from the Vocab, subject to

top_words

Retrieve the top n_words from the Vocab, starting at index idx, subject to

word

Generate a random word that meets a variety of constraints, such as glyphs,

words

Generate a list of words (and optionally numbers) according to the specified

Source code in wordsiv/__init__.py
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
class WordSiv:
    """The main WordSiv object which uses Vocabs to generate text.

    This object serves as the main interface for generating text. It can hold multiple
    vocabulary objects, store default settings (like default glyphs and vocab), and
    expose high-level methods that produce words, sentences, paragraphs, and more.

    Args:
        vocab (str | None): The name of the default Vocab.
        glyphs (str | None): The default set of glyphs that constrains the words
            generated.
        add_default_vocabs (bool): Whether to add the default Vocabs defined in
            `DEFAULT_VOCABS`.
        raise_errors (bool): Whether to raise errors or fail gently.
        seed (int | None): Seed for the random number generator.

    Attributes:
        vocab (str | None): The name of the default Vocab.
        glyphs (str | None): The default set of glyphs that constrains the words
            generated.
        raise_errors (bool): Whether to raise errors or fail gently.
        _vocab_lookup (dict[str, Vocab]): A dictionary of vocab names to `Vocab`
            objects.
        _rand (random.Random): A `random.Random` instance.
    """

    def __init__(
        self,
        vocab: str | None = None,
        glyphs: str | None = None,
        add_default_vocabs: bool = True,
        raise_errors: bool = False,
        seed=None,
    ):
        self.vocab = vocab
        self.glyphs = glyphs
        self.raise_errors = raise_errors
        self._vocab_lookup: dict[str, Vocab] = {}

        if add_default_vocabs:
            self._add_default_vocabs()

        self._rand = random.Random()

        if seed is not None:
            self.seed(seed)

    def seed(self, seed: float | str | None = None) -> None:
        """
        Seed the random number generator for reproducible results.

        Args:
            seed (float | str | None): The seed value used to initialize the random
                number generator.

        Returns:
            None
        """
        self._rand.seed(seed)

    def add_vocab(self, vocab_name: str, vocab: Vocab) -> None:
        """
        Add a `Vocab` object to this `WordSiv` instance under a given name.

        Args:
            vocab_name (str): The unique identifier for this Vocab.
            vocab (Vocab): The `Vocab` object to be associated with `vocab_name`.

        Returns:
            None
        """
        self._vocab_lookup[vocab_name] = vocab

    def _add_default_vocabs(self) -> None:
        """
        Initialize and add the default Vocabs to this `WordSiv` instance.

        The default vocabularies are specified in the `DEFAULT_VOCABS` dictionary, which
        maps a short code (e.g., 'en', 'es') to the meta and data filenames. This method
        initializes each Vocab (however, the data is loaded lazily).
        """
        for vocab_name, (meta_file, data_file) in _DEFAULT_VOCABS.items():
            meta_path = resources.files(_vocab_data) / meta_file
            with meta_path.open("r", encoding="utf8") as f:
                meta = json.load(f)

            data_path = resources.files(_vocab_data) / data_file
            vocab = Vocab(
                meta["lang"], bool(meta["bicameral"]), meta=meta, data_file=data_path
            )
            self.add_vocab(vocab_name, vocab)

    def get_vocab(self, vocab_name: str | None = None) -> Vocab:
        """
        Retrieve a `Vocab` by name, or return the default Vocab if `vocab_name` is None.

        Args:
            vocab_name (str | None): The name of the Vocab to retrieve. If None,
                use the default `self.vocab`.

        Returns:
            Vocab: The `Vocab` object corresponding to `vocab_name`.

        Raises:
            ValueError: If no `vocab_name` is provided and no default is set.
        """
        if vocab_name:
            return self._vocab_lookup[vocab_name]
        else:
            if self.vocab:
                return self._vocab_lookup[self.vocab]
            else:
                raise ValueError("Error: no vocab specified")

    def list_vocabs(self) -> list[str]:
        """
        Return a list of all available Vocab names.

        Returns:
            list[str]: A list of all registered Vocab names in this `WordSiv`.
        """
        return list(self._vocab_lookup.keys())

    def number(
        self,
        seed: float | str | None = None,
        glyphs: str | None = None,
        wl: int | None = None,
        min_wl: int = 1,
        max_wl: int = _DEFAULT_MAX_NUM_LENGTH,
        raise_errors: bool = False,
    ) -> str:
        """
        Generate a random numeric string (made of digits) constrained by glyphs and
        other parameters.

        Args:
            seed (float | str | None): Seed the random number generator if seed is not
                None.
            glyphs (str | None): A string of allowed glyphs. If None, uses the default
                glyphs of this `WordSiv` instance.
            wl (int | None): Exact length of the generated numeric string. If None, a
                random length between `min_wl` and `max_wl` is chosen.
            min_wl (int): Minimum length of the numeric string. Defaults to 1.
            max_wl (int): Maximum length of the numeric string. Defaults to 4.
            raise_errors (bool): Whether to raise an error if no numerals are available.

        Returns:
            str: A randomly generated string consisting of numerals.

        Raises:
            ValueError: If `min_wl` is greater than `max_wl`.
            FilterError: If no numerals are available in `glyphs` and `raise_errors` is
                True.
        """
        glyphs = self.glyphs if glyphs is None else glyphs
        raise_errors = self.raise_errors if raise_errors is None else raise_errors

        if seed is not None:
            self._rand.seed(seed)

        if wl:
            length = wl
        else:
            if min_wl > max_wl:
                raise ValueError("'min_wl' must be less than or equal to 'max_wl'")
            length = self._rand.randint(min_wl, max_wl)

        available_numerals = "".join(str(n) for n in range(0, 10))
        if glyphs:
            available_numerals = "".join(n for n in available_numerals if n in glyphs)

            if not available_numerals:
                if raise_errors:
                    raise FilterError("No numerals available in glyphs")
                else:
                    log.warning("No numerals available in glyphs")
                    return ""

        return "".join(self._rand.choice(available_numerals) for _ in range(length))

    def word(
        self,
        vocab: str | None = None,
        glyphs: str | None = None,
        seed: float | str | None = None,
        rnd: float = 0,
        case: CaseType = "any",
        top_k: int = 0,
        min_wl: int = 1,
        max_wl: int | None = None,
        wl: int | None = None,
        contains: str | Sequence[str] | None = None,
        inner: str | Sequence[str] | None = None,
        startswith: str | None = None,
        endswith: str | None = None,
        regexp: str | None = None,
        raise_errors: bool = False,
    ) -> str:
        """
        Generate a random word that meets a variety of constraints, such as glyphs,
        length, regex filters, etc.

        Args:
            vocab (str | None): Name of the Vocab to use. If None, uses default Vocab.
            glyphs (str | None): A string of allowed glyphs. If None, uses default
                glyphs.
            seed (float | str | None): Seed the random number generator if seed is not
                None.
            rnd (float): Randomness factor in [0, 1] for selecting among the top words.
            case (CaseType): Desired case of the output word (e.g., 'upper', 'lower',
                'any').
            top_k (int): If > 0, only consider the top K words by frequency.
            min_wl (int): Minimum word length.
            max_wl (int | None): Maximum word length. If None, no maximum is applied.
            wl (int | None): Exact word length. If None, no exact length is enforced.
            contains (str | Sequence[str] | None): Substring(s) that must appear in the
                word.
            inner (str | Sequence[str] | None): Substring(s) that must appear, but not
                at the start or end of the word.
            startswith (str | None): Required starting substring.
            endswith (str | None): Required ending substring.
            regexp (str | None): A regular expression that the word must match.
            raise_errors (bool): Whether to raise filtering errors or fail gently.

        Returns:
            str: A randomly generated word meeting the specified constraints (or an
                empty string on failure if `raise_errors` is False).

        Raises:
            ValueError: If `rnd` is not in [0, 1].
            FilterError: If filtering yields no results and `raise_errors` is True.
            VocabFormatError: If the underlying Vocab data is malformed.
            VocabEmptyError: If the underlying Vocab is empty.
        """
        glyphs = self.glyphs if glyphs is None else glyphs
        raise_errors = self.raise_errors if raise_errors is None else raise_errors
        vocab_obj = self.get_vocab(vocab)

        if not (0 <= rnd <= 1):
            raise ValueError("'rnd' must be between 0 and 1")

        if seed is not None:
            self._rand.seed(seed)

        try:
            wc_list = vocab_obj.filter(
                glyphs=glyphs,
                case=case,
                min_wl=min_wl,
                max_wl=max_wl,
                wl=wl,
                contains=contains,
                inner=inner,
                startswith=startswith,
                endswith=endswith,
                regexp=regexp,
            )
        except FilterError as e:
            if raise_errors:
                raise e
            else:
                log.warning("%s", e.args[0])
                return ""

        if top_k:
            wc_list = wc_list[:top_k]

        return _sample_word(wc_list, self._rand, rnd)

    def top_word(
        self,
        vocab: str | None = None,
        glyphs: str | None = None,
        seed: float | str | None = None,
        idx: int = 0,
        case: CaseType = "any",
        min_wl: int = 2,
        max_wl: int | None = None,
        wl: int | None = None,
        contains: str | Sequence[str] | None = None,
        inner: str | Sequence[str] | None = None,
        startswith: str | None = None,
        endswith: str | None = None,
        regexp: str | None = None,
        raise_errors: bool = False,
    ) -> str:
        """
        Retrieve the most common (or nth most common) word from the Vocab, subject to
        filtering constraints.

        Args:
            vocab (str | None): Name of the Vocab to use. If None, use the default
                Vocab.
            glyphs (str | None): Whitelisted glyphs to filter words. If None, uses
                default.
            seed (float | str | None): Seed the random number generator if seed is
                not None.
            idx (int): Index of the desired word in the frequency-sorted list
                (0-based).
            case (CaseType): Desired case form for the word (e.g., 'lower', 'upper',
                'any').
            min_wl (int): Minimum word length.
            max_wl (int | None): Maximum word length. If None, no maximum.
            wl (int | None): Exact word length. If None, no exact length filter.
            contains (str | Sequence[str] | None): Substring(s) that must appear in
                the word.
            inner (str | Sequence[str] | None): Substring(s) that must appear in the
                interior.
            startswith (str | None): Substring that the word must start with.
            endswith (str | None): Substring that the word must end with.
            regexp (str | None): Regex pattern that the word must match.
            raise_errors (bool): Whether to raise errors on filter or index failures.

        Returns:
            str: The nth most common word that meets the constraints (or an empty string
                on failure if `raise_errors` is False).

        Raises:
            FilterError: If filtering fails (no words match) and `raise_errors` is True.
            ValueError: If no default vocab is set when `vocab` is None.
            IndexError: If `idx` is out of range after filtering and `raise_errors` is
                True.
        """
        glyphs = self.glyphs if glyphs is None else glyphs
        raise_errors = self.raise_errors if raise_errors is None else raise_errors
        vocab_obj = self.get_vocab(self.vocab) if not vocab else self.get_vocab(vocab)

        try:
            wc_list = vocab_obj.filter(
                glyphs=glyphs,
                case=case,
                min_wl=min_wl,
                max_wl=max_wl,
                wl=wl,
                contains=contains,
                inner=inner,
                startswith=startswith,
                endswith=endswith,
                regexp=regexp,
            )
        except FilterError as e:
            if raise_errors:
                raise e
            else:
                log.warning("%s", e.args[0])
                return ""

        try:
            return wc_list[idx][0]
        except IndexError:
            if raise_errors:
                raise FilterError(f"No word at index idx='{idx}'")
            else:
                log.warning("No word at index idx='%s'", idx)
                return ""

    def words(
        self,
        vocab: str | None = None,
        glyphs: str | None = None,
        seed=None,
        n_words: int | None = None,
        min_n_words: int = 10,
        max_n_words: int = 20,
        numbers: float = 0,
        cap_first: bool | None = None,
        case: CaseType = "any",
        rnd: float = 0,
        min_wl: int = 1,
        max_wl: int | None = None,
        wl: int | None = None,
        raise_errors: bool = False,
        **word_kwargs,
    ) -> list[str]:
        """
        Generate a list of words (and optionally numbers) according to the specified
        parameters.

        This method will produce `n_words` tokens, each of which may be a word or a
        number (digit string), depending on the `numbers` ratio. It can also
        automatically handle capitalization of the first token if `cap_first` is True
        (or inferred).

        Args:
            vocab (str | None): Name of the Vocab to use. If None, uses the default
                Vocab.
            glyphs (str | None): Allowed glyph set. If None, uses the default glyphs.
            seed (any): Seed for the random number generator. If None, current state is
                used.
            n_words (int | None): Exact number of tokens to generate. If None, randomly
                choose between `min_n_words` and `max_n_words`.
            min_n_words (int): Minimum number of tokens if `n_words` is not specified.
            max_n_words (int): Maximum number of tokens if `n_words` is not specified.
            numbers (float): A value in [0, 1] that determines the probability of
                generating a numeric token instead of a word.
            cap_first (bool | None): If True, capitalize the first word (if `case` is
                "any"). If None, automatically decide based on glyphs availability.
            case (CaseType): Desired case form for the words ("any", "lower", "upper",
                etc.).
            rnd (float): Randomness factor for word selection, in [0, 1].
            min_wl (int): Minimum length for words/numbers.
            max_wl (int): Maximum length for words/numbers.
            wl (int | None): Exact length for words/numbers. If None, uses min/max_wl.
            raise_errors (bool): Whether to raise errors or fail gently.
            **word_kwargs: Additional keyword arguments passed along to `word()`.

        Returns:
            list[str]: A list of randomly generated tokens (words or numbers).

        Raises:
            ValueError: If `numbers` is not in [0, 1].
        """
        glyphs = self.glyphs if glyphs is None else glyphs

        if seed is not None:
            self._rand.seed(seed)

        if not n_words:
            n_words = self._rand.randint(min_n_words, max_n_words)

        if cap_first is None:
            if glyphs:
                # If constrained glyphs, only capitalize if uppercase letters exist
                cap_first = any(c for c in glyphs if c.isupper())
            else:
                # Otherwise, default to capitalize the first word
                cap_first = True

        if not (0 <= numbers <= 1):
            raise ValueError("'numbers' must be between 0 and 1")

        word_list = []
        last_w = None
        for i in range(n_words):
            if cap_first and case == "any" and i == 0:
                word_case: CaseType = "cap"
            else:
                word_case = case

            token_type = self._rand.choices(
                ["word", "number"],
                weights=[1 - numbers, numbers],
            )[0]

            if token_type == "word":
                w = self.word(
                    vocab=vocab,
                    glyphs=glyphs,
                    case=word_case,
                    rnd=rnd,
                    min_wl=min_wl,
                    max_wl=max_wl,
                    wl=wl,
                    raise_errors=raise_errors,
                    **word_kwargs,
                )

                # Try once more to avoid consecutive repeats
                # TODO: this is a hack, we should find a better way to avoid consecutive
                # repeats
                if w == last_w:
                    w = self.word(
                        vocab=vocab,
                        glyphs=glyphs,
                        case=word_case,
                        rnd=rnd,
                        min_wl=min_wl,
                        max_wl=max_wl,
                        wl=wl,
                        raise_errors=raise_errors,
                        **word_kwargs,
                    )

                if w:
                    word_list.append(w)
                    last_w = w
            else:
                # token_type == "number"
                w = self.number(
                    glyphs=glyphs,
                    wl=wl,
                    min_wl=min_wl,
                    max_wl=max_wl or _DEFAULT_MAX_NUM_LENGTH,
                    raise_errors=raise_errors,
                )

                if w:
                    word_list.append(w)
                    last_w = w

        return word_list

    def top_words(
        self,
        glyphs: str | None = None,
        vocab: str | None = None,
        n_words: int = 10,
        idx: int = 0,
        case: CaseType = "any",
        min_wl: int = 1,
        max_wl: int | None = None,
        wl: int | None = None,
        contains: str | Sequence[str] | None = None,
        inner: str | Sequence[str] | None = None,
        startswith: str | None = None,
        endswith: str | None = None,
        regexp: str | None = None,
        raise_errors: bool = False,
    ) -> list[str]:
        """
        Retrieve the top `n_words` from the Vocab, starting at index `idx`, subject to
        filtering constraints.

        Args:
            glyphs (str | None): Allowed glyph set. If None, uses default glyphs.
            vocab (str | None): Name of the Vocab to use. If None, use the default
                Vocab.
            n_words (int): Number of words to return.
            idx (int): The index at which to start returning words (0-based).
            case (CaseType): Desired case form ("any", "upper", "lower", etc.).
            min_wl (int): Minimum word length. Defaults to 1.
            max_wl (int | None): Maximum word length. If None, no maximum is applied.
            wl (int | None): Exact word length. If None, no exact length filter.
            contains (str | Sequence[str] | None): Substring(s) that must appear.
            inner (str | Sequence[str] | None): Substring(s) that must appear, not at
                edges.
            startswith (str | None): Required starting substring.
            endswith (str | None): Required ending substring.
            regexp (str | None): Regex pattern(s) to match.
            raise_errors (bool): Whether to raise errors or fail gently.

        Returns:
            list[str]: A list of up to `n_words` words, in descending frequency order.

        Raises:
            FilterError: If filtering fails (no words match) and `raise_errors` is True.
        """
        glyphs = self.glyphs if glyphs is None else glyphs
        vocab_obj = self.get_vocab(self.vocab) if not vocab else self.get_vocab(vocab)

        try:
            wc_list = vocab_obj.filter(
                glyphs=glyphs,
                case=case,
                min_wl=min_wl,
                max_wl=max_wl,
                wl=wl,
                contains=contains,
                inner=inner,
                startswith=startswith,
                endswith=endswith,
                regexp=regexp,
            )[idx : idx + n_words]
        except FilterError as e:
            if raise_errors:
                raise e
            else:
                log.warning("%s", e.args[0])
                return []

        if not wc_list:
            if raise_errors:
                raise FilterError(f"No words found at idx '{idx}'")
            else:
                log.warning("No words found at idx '%s'", idx)
                return []

        return [w for w, _ in wc_list]

    def sent(
        self,
        vocab: str | None = None,
        glyphs: str | None = None,
        seed=None,
        punc: bool = True,
        rnd_punc: float = 0,
        **words_kwargs,
    ) -> str:
        """
        Generate a single sentence, optionally punctuated, using words (and/or numbers).

        A sentence is created by calling `words(...)`, then (optionally) punctuating the
        resulting list.

        Args:
            vocab (str | None): Name of the Vocab to use. If None, use the default Vocab.
            glyphs (str | None): Allowed glyphs. If None, uses default glyphs.
            seed (any): Seed for the random number generator. If None, current state is used.
            punc (bool): Whether to add punctuation to the sentence.
            rnd_punc (float): A randomness factor between 0 and 1 that adjusts the punctuation
                frequency or distribution.
            **words_kwargs: Additional keyword arguments passed to `words(...)`.

        Returns:
            str: A single sentence, optionally with punctuation.

        Raises:
            ValueError: If `rnd_punc` is not in [0, 1].
        """
        glyphs = self.glyphs if glyphs is None else glyphs
        vocab_obj = self.get_vocab(vocab)

        if seed is not None:
            self._rand.seed(seed)

        word_list = self.words(
            glyphs=glyphs,
            vocab=vocab,
            **words_kwargs,
        )

        if punc:
            if not (0 <= rnd_punc <= 1):
                raise ValueError("'rnd_punc' must be between 0 and 1")

            if vocab_obj.punctuation:
                punctuation = vocab_obj.punctuation
            else:
                try:
                    try:
                        punctuation = DEFAULT_PUNCTUATION[vocab_obj.lang]
                    except KeyError:
                        # If no default punctuation is found, return unpunctuated sentence
                        return " ".join(word_list)
                except KeyError:
                    # If no default punctuation is found, return unpunctuated sentence
                    return " ".join(word_list)

            return _punctuate(
                punctuation,
                self._rand,
                word_list,
                glyphs,
                rnd_punc,
            )
        else:
            return " ".join(word_list)

    def sents(
        self,
        seed=None,
        min_n_sents: int = 3,
        max_n_sents: int = 5,
        n_sents: int | None = None,
        **sent_kwargs,
    ) -> list[str]:
        """
        Generate multiple sentences with `sent(...)`, returned as a list.

        Args:
            seed (float | str | None): Seed the random number generator if seed is not
                None.
            min_n_sents (int): Minimum number of sentences to produce if `n_sents` is None.
            max_n_sents (int): Maximum number of sentences to produce if `n_sents` is None.
            n_sents (int | None): If specified, exactly that many sentences are produced.
            **sent_kwargs: Additional keyword arguments passed to `sent(...)`.

        Returns:
            list[str]: A list of generated sentences.
        """
        if seed is not None:
            self._rand.seed(seed)

        if not n_sents:
            n_sents = self._rand.randint(min_n_sents, max_n_sents)

        return [self.sent(**sent_kwargs) for _ in range(n_sents)]

    def para(
        self,
        seed=None,
        sent_sep: str = " ",
        **sents_kwargs,
    ) -> str:
        """
        Generate a paragraph by creating multiple sentences with `sents(...)` and
        joining them with `sent_sep`.

        Args:
            seed (float | str | None): Seed the random number generator if seed is not
                None.
            sent_sep (str): The string used to join sentences.
             **sents_kwargs: Keyword arguments passed to `sents(...)`.

        Returns:
            str: A single paragraph containing multiple sentences.
        """
        if seed is not None:
            self._rand.seed(seed)

        return sent_sep.join(self.sents(**sents_kwargs))

    def paras(
        self,
        seed=None,
        n_paras: int = 3,
        **para_kwargs,
    ) -> list[str]:
        """
        Generate multiple paragraphs with `para(...)`, returned as a list.

        Args:
            seed (float | str | None): Seed the random number generator if seed is not
                None.
            n_paras (int): Number of paragraphs to generate.
            **para_kwargs: Additional keyword arguments passed to `para(...)`.

        Returns:
            list[str]: A list of paragraphs.
        """
        if seed is not None:
            self._rand.seed(seed)

        return [self.para(**para_kwargs) for _ in range(n_paras)]

    def text(
        self,
        seed: float | str | None = None,
        para_sep: str = "\n\n",
        **paras_kwargs,
    ) -> str:
        """
        Generate multiple paragraphs of text, calling `paras(...)` and joining them with
        `para_sep`.

        Args:
            seed (float | str | None): Seed the random number generator if seed is not
                None.
            para_sep (str): The string used to separate paragraphs in the final text.
            **paras_kwargs: Additional keyword arguments passed to `paras(...)`.

        Returns:
            str: A string containing multiple paragraphs of text, separated by
                `para_sep`.
        """
        if seed is not None:
            self._rand.seed(seed)

        return para_sep.join(self.paras(**paras_kwargs))

add_vocab

add_vocab(vocab_name, vocab)

Add a Vocab object to this WordSiv instance under a given name.

Parameters:

Name Type Description Default
vocab_name str

The unique identifier for this Vocab.

required
vocab Vocab

The Vocab object to be associated with vocab_name.

required

Returns:

Type Description
None

None

Source code in wordsiv/__init__.py
def add_vocab(self, vocab_name: str, vocab: Vocab) -> None:
    """
    Add a `Vocab` object to this `WordSiv` instance under a given name.

    Args:
        vocab_name (str): The unique identifier for this Vocab.
        vocab (Vocab): The `Vocab` object to be associated with `vocab_name`.

    Returns:
        None
    """
    self._vocab_lookup[vocab_name] = vocab

get_vocab

get_vocab(vocab_name=None)

Retrieve a Vocab by name, or return the default Vocab if vocab_name is None.

Parameters:

Name Type Description Default
vocab_name str | None

The name of the Vocab to retrieve. If None, use the default self.vocab.

None

Returns:

Name Type Description
Vocab Vocab

The Vocab object corresponding to vocab_name.

Raises:

Type Description
ValueError

If no vocab_name is provided and no default is set.

Source code in wordsiv/__init__.py
def get_vocab(self, vocab_name: str | None = None) -> Vocab:
    """
    Retrieve a `Vocab` by name, or return the default Vocab if `vocab_name` is None.

    Args:
        vocab_name (str | None): The name of the Vocab to retrieve. If None,
            use the default `self.vocab`.

    Returns:
        Vocab: The `Vocab` object corresponding to `vocab_name`.

    Raises:
        ValueError: If no `vocab_name` is provided and no default is set.
    """
    if vocab_name:
        return self._vocab_lookup[vocab_name]
    else:
        if self.vocab:
            return self._vocab_lookup[self.vocab]
        else:
            raise ValueError("Error: no vocab specified")

list_vocabs

list_vocabs()

Return a list of all available Vocab names.

Returns:

Type Description
list[str]

list[str]: A list of all registered Vocab names in this WordSiv.

Source code in wordsiv/__init__.py
def list_vocabs(self) -> list[str]:
    """
    Return a list of all available Vocab names.

    Returns:
        list[str]: A list of all registered Vocab names in this `WordSiv`.
    """
    return list(self._vocab_lookup.keys())

number

number(
    seed=None,
    glyphs=None,
    wl=None,
    min_wl=1,
    max_wl=_DEFAULT_MAX_NUM_LENGTH,
    raise_errors=False,
)

Generate a random numeric string (made of digits) constrained by glyphs and other parameters.

Parameters:

Name Type Description Default
seed float | str | None

Seed the random number generator if seed is not None.

None
glyphs str | None

A string of allowed glyphs. If None, uses the default glyphs of this WordSiv instance.

None
wl int | None

Exact length of the generated numeric string. If None, a random length between min_wl and max_wl is chosen.

None
min_wl int

Minimum length of the numeric string. Defaults to 1.

1
max_wl int

Maximum length of the numeric string. Defaults to 4.

_DEFAULT_MAX_NUM_LENGTH
raise_errors bool

Whether to raise an error if no numerals are available.

False

Returns:

Name Type Description
str str

A randomly generated string consisting of numerals.

Raises:

Type Description
ValueError

If min_wl is greater than max_wl.

FilterError

If no numerals are available in glyphs and raise_errors is True.

Source code in wordsiv/__init__.py
def number(
    self,
    seed: float | str | None = None,
    glyphs: str | None = None,
    wl: int | None = None,
    min_wl: int = 1,
    max_wl: int = _DEFAULT_MAX_NUM_LENGTH,
    raise_errors: bool = False,
) -> str:
    """
    Generate a random numeric string (made of digits) constrained by glyphs and
    other parameters.

    Args:
        seed (float | str | None): Seed the random number generator if seed is not
            None.
        glyphs (str | None): A string of allowed glyphs. If None, uses the default
            glyphs of this `WordSiv` instance.
        wl (int | None): Exact length of the generated numeric string. If None, a
            random length between `min_wl` and `max_wl` is chosen.
        min_wl (int): Minimum length of the numeric string. Defaults to 1.
        max_wl (int): Maximum length of the numeric string. Defaults to 4.
        raise_errors (bool): Whether to raise an error if no numerals are available.

    Returns:
        str: A randomly generated string consisting of numerals.

    Raises:
        ValueError: If `min_wl` is greater than `max_wl`.
        FilterError: If no numerals are available in `glyphs` and `raise_errors` is
            True.
    """
    glyphs = self.glyphs if glyphs is None else glyphs
    raise_errors = self.raise_errors if raise_errors is None else raise_errors

    if seed is not None:
        self._rand.seed(seed)

    if wl:
        length = wl
    else:
        if min_wl > max_wl:
            raise ValueError("'min_wl' must be less than or equal to 'max_wl'")
        length = self._rand.randint(min_wl, max_wl)

    available_numerals = "".join(str(n) for n in range(0, 10))
    if glyphs:
        available_numerals = "".join(n for n in available_numerals if n in glyphs)

        if not available_numerals:
            if raise_errors:
                raise FilterError("No numerals available in glyphs")
            else:
                log.warning("No numerals available in glyphs")
                return ""

    return "".join(self._rand.choice(available_numerals) for _ in range(length))

para

para(seed=None, sent_sep=' ', **sents_kwargs)

Generate a paragraph by creating multiple sentences with sents(...) and joining them with sent_sep.

Parameters:

Name Type Description Default
seed float | str | None

Seed the random number generator if seed is not None.

None
sent_sep str

The string used to join sentences. **sents_kwargs: Keyword arguments passed to sents(...).

' '

Returns:

Name Type Description
str str

A single paragraph containing multiple sentences.

Source code in wordsiv/__init__.py
def para(
    self,
    seed=None,
    sent_sep: str = " ",
    **sents_kwargs,
) -> str:
    """
    Generate a paragraph by creating multiple sentences with `sents(...)` and
    joining them with `sent_sep`.

    Args:
        seed (float | str | None): Seed the random number generator if seed is not
            None.
        sent_sep (str): The string used to join sentences.
         **sents_kwargs: Keyword arguments passed to `sents(...)`.

    Returns:
        str: A single paragraph containing multiple sentences.
    """
    if seed is not None:
        self._rand.seed(seed)

    return sent_sep.join(self.sents(**sents_kwargs))

paras

paras(seed=None, n_paras=3, **para_kwargs)

Generate multiple paragraphs with para(...), returned as a list.

Parameters:

Name Type Description Default
seed float | str | None

Seed the random number generator if seed is not None.

None
n_paras int

Number of paragraphs to generate.

3
**para_kwargs

Additional keyword arguments passed to para(...).

{}

Returns:

Type Description
list[str]

list[str]: A list of paragraphs.

Source code in wordsiv/__init__.py
def paras(
    self,
    seed=None,
    n_paras: int = 3,
    **para_kwargs,
) -> list[str]:
    """
    Generate multiple paragraphs with `para(...)`, returned as a list.

    Args:
        seed (float | str | None): Seed the random number generator if seed is not
            None.
        n_paras (int): Number of paragraphs to generate.
        **para_kwargs: Additional keyword arguments passed to `para(...)`.

    Returns:
        list[str]: A list of paragraphs.
    """
    if seed is not None:
        self._rand.seed(seed)

    return [self.para(**para_kwargs) for _ in range(n_paras)]

seed

seed(seed=None)

Seed the random number generator for reproducible results.

Parameters:

Name Type Description Default
seed float | str | None

The seed value used to initialize the random number generator.

None

Returns:

Type Description
None

None

Source code in wordsiv/__init__.py
def seed(self, seed: float | str | None = None) -> None:
    """
    Seed the random number generator for reproducible results.

    Args:
        seed (float | str | None): The seed value used to initialize the random
            number generator.

    Returns:
        None
    """
    self._rand.seed(seed)

sent

sent(
    vocab=None,
    glyphs=None,
    seed=None,
    punc=True,
    rnd_punc=0,
    **words_kwargs,
)

Generate a single sentence, optionally punctuated, using words (and/or numbers).

A sentence is created by calling words(...), then (optionally) punctuating the resulting list.

Parameters:

Name Type Description Default
vocab str | None

Name of the Vocab to use. If None, use the default Vocab.

None
glyphs str | None

Allowed glyphs. If None, uses default glyphs.

None
seed any

Seed for the random number generator. If None, current state is used.

None
punc bool

Whether to add punctuation to the sentence.

True
rnd_punc float

A randomness factor between 0 and 1 that adjusts the punctuation frequency or distribution.

0
**words_kwargs

Additional keyword arguments passed to words(...).

{}

Returns:

Name Type Description
str str

A single sentence, optionally with punctuation.

Raises:

Type Description
ValueError

If rnd_punc is not in [0, 1].

Source code in wordsiv/__init__.py
def sent(
    self,
    vocab: str | None = None,
    glyphs: str | None = None,
    seed=None,
    punc: bool = True,
    rnd_punc: float = 0,
    **words_kwargs,
) -> str:
    """
    Generate a single sentence, optionally punctuated, using words (and/or numbers).

    A sentence is created by calling `words(...)`, then (optionally) punctuating the
    resulting list.

    Args:
        vocab (str | None): Name of the Vocab to use. If None, use the default Vocab.
        glyphs (str | None): Allowed glyphs. If None, uses default glyphs.
        seed (any): Seed for the random number generator. If None, current state is used.
        punc (bool): Whether to add punctuation to the sentence.
        rnd_punc (float): A randomness factor between 0 and 1 that adjusts the punctuation
            frequency or distribution.
        **words_kwargs: Additional keyword arguments passed to `words(...)`.

    Returns:
        str: A single sentence, optionally with punctuation.

    Raises:
        ValueError: If `rnd_punc` is not in [0, 1].
    """
    glyphs = self.glyphs if glyphs is None else glyphs
    vocab_obj = self.get_vocab(vocab)

    if seed is not None:
        self._rand.seed(seed)

    word_list = self.words(
        glyphs=glyphs,
        vocab=vocab,
        **words_kwargs,
    )

    if punc:
        if not (0 <= rnd_punc <= 1):
            raise ValueError("'rnd_punc' must be between 0 and 1")

        if vocab_obj.punctuation:
            punctuation = vocab_obj.punctuation
        else:
            try:
                try:
                    punctuation = DEFAULT_PUNCTUATION[vocab_obj.lang]
                except KeyError:
                    # If no default punctuation is found, return unpunctuated sentence
                    return " ".join(word_list)
            except KeyError:
                # If no default punctuation is found, return unpunctuated sentence
                return " ".join(word_list)

        return _punctuate(
            punctuation,
            self._rand,
            word_list,
            glyphs,
            rnd_punc,
        )
    else:
        return " ".join(word_list)

sents

sents(
    seed=None,
    min_n_sents=3,
    max_n_sents=5,
    n_sents=None,
    **sent_kwargs,
)

Generate multiple sentences with sent(...), returned as a list.

Parameters:

Name Type Description Default
seed float | str | None

Seed the random number generator if seed is not None.

None
min_n_sents int

Minimum number of sentences to produce if n_sents is None.

3
max_n_sents int

Maximum number of sentences to produce if n_sents is None.

5
n_sents int | None

If specified, exactly that many sentences are produced.

None
**sent_kwargs

Additional keyword arguments passed to sent(...).

{}

Returns:

Type Description
list[str]

list[str]: A list of generated sentences.

Source code in wordsiv/__init__.py
def sents(
    self,
    seed=None,
    min_n_sents: int = 3,
    max_n_sents: int = 5,
    n_sents: int | None = None,
    **sent_kwargs,
) -> list[str]:
    """
    Generate multiple sentences with `sent(...)`, returned as a list.

    Args:
        seed (float | str | None): Seed the random number generator if seed is not
            None.
        min_n_sents (int): Minimum number of sentences to produce if `n_sents` is None.
        max_n_sents (int): Maximum number of sentences to produce if `n_sents` is None.
        n_sents (int | None): If specified, exactly that many sentences are produced.
        **sent_kwargs: Additional keyword arguments passed to `sent(...)`.

    Returns:
        list[str]: A list of generated sentences.
    """
    if seed is not None:
        self._rand.seed(seed)

    if not n_sents:
        n_sents = self._rand.randint(min_n_sents, max_n_sents)

    return [self.sent(**sent_kwargs) for _ in range(n_sents)]

text

text(seed=None, para_sep='\n\n', **paras_kwargs)

Generate multiple paragraphs of text, calling paras(...) and joining them with para_sep.

Parameters:

Name Type Description Default
seed float | str | None

Seed the random number generator if seed is not None.

None
para_sep str

The string used to separate paragraphs in the final text.

'\n\n'
**paras_kwargs

Additional keyword arguments passed to paras(...).

{}

Returns:

Name Type Description
str str

A string containing multiple paragraphs of text, separated by para_sep.

Source code in wordsiv/__init__.py
def text(
    self,
    seed: float | str | None = None,
    para_sep: str = "\n\n",
    **paras_kwargs,
) -> str:
    """
    Generate multiple paragraphs of text, calling `paras(...)` and joining them with
    `para_sep`.

    Args:
        seed (float | str | None): Seed the random number generator if seed is not
            None.
        para_sep (str): The string used to separate paragraphs in the final text.
        **paras_kwargs: Additional keyword arguments passed to `paras(...)`.

    Returns:
        str: A string containing multiple paragraphs of text, separated by
            `para_sep`.
    """
    if seed is not None:
        self._rand.seed(seed)

    return para_sep.join(self.paras(**paras_kwargs))

top_word

top_word(
    vocab=None,
    glyphs=None,
    seed=None,
    idx=0,
    case="any",
    min_wl=2,
    max_wl=None,
    wl=None,
    contains=None,
    inner=None,
    startswith=None,
    endswith=None,
    regexp=None,
    raise_errors=False,
)

Retrieve the most common (or nth most common) word from the Vocab, subject to filtering constraints.

Parameters:

Name Type Description Default
vocab str | None

Name of the Vocab to use. If None, use the default Vocab.

None
glyphs str | None

Whitelisted glyphs to filter words. If None, uses default.

None
seed float | str | None

Seed the random number generator if seed is not None.

None
idx int

Index of the desired word in the frequency-sorted list (0-based).

0
case CaseType

Desired case form for the word (e.g., 'lower', 'upper', 'any').

'any'
min_wl int

Minimum word length.

2
max_wl int | None

Maximum word length. If None, no maximum.

None
wl int | None

Exact word length. If None, no exact length filter.

None
contains str | Sequence[str] | None

Substring(s) that must appear in the word.

None
inner str | Sequence[str] | None

Substring(s) that must appear in the interior.

None
startswith str | None

Substring that the word must start with.

None
endswith str | None

Substring that the word must end with.

None
regexp str | None

Regex pattern that the word must match.

None
raise_errors bool

Whether to raise errors on filter or index failures.

False

Returns:

Name Type Description
str str

The nth most common word that meets the constraints (or an empty string on failure if raise_errors is False).

Raises:

Type Description
FilterError

If filtering fails (no words match) and raise_errors is True.

ValueError

If no default vocab is set when vocab is None.

IndexError

If idx is out of range after filtering and raise_errors is True.

Source code in wordsiv/__init__.py
def top_word(
    self,
    vocab: str | None = None,
    glyphs: str | None = None,
    seed: float | str | None = None,
    idx: int = 0,
    case: CaseType = "any",
    min_wl: int = 2,
    max_wl: int | None = None,
    wl: int | None = None,
    contains: str | Sequence[str] | None = None,
    inner: str | Sequence[str] | None = None,
    startswith: str | None = None,
    endswith: str | None = None,
    regexp: str | None = None,
    raise_errors: bool = False,
) -> str:
    """
    Retrieve the most common (or nth most common) word from the Vocab, subject to
    filtering constraints.

    Args:
        vocab (str | None): Name of the Vocab to use. If None, use the default
            Vocab.
        glyphs (str | None): Whitelisted glyphs to filter words. If None, uses
            default.
        seed (float | str | None): Seed the random number generator if seed is
            not None.
        idx (int): Index of the desired word in the frequency-sorted list
            (0-based).
        case (CaseType): Desired case form for the word (e.g., 'lower', 'upper',
            'any').
        min_wl (int): Minimum word length.
        max_wl (int | None): Maximum word length. If None, no maximum.
        wl (int | None): Exact word length. If None, no exact length filter.
        contains (str | Sequence[str] | None): Substring(s) that must appear in
            the word.
        inner (str | Sequence[str] | None): Substring(s) that must appear in the
            interior.
        startswith (str | None): Substring that the word must start with.
        endswith (str | None): Substring that the word must end with.
        regexp (str | None): Regex pattern that the word must match.
        raise_errors (bool): Whether to raise errors on filter or index failures.

    Returns:
        str: The nth most common word that meets the constraints (or an empty string
            on failure if `raise_errors` is False).

    Raises:
        FilterError: If filtering fails (no words match) and `raise_errors` is True.
        ValueError: If no default vocab is set when `vocab` is None.
        IndexError: If `idx` is out of range after filtering and `raise_errors` is
            True.
    """
    glyphs = self.glyphs if glyphs is None else glyphs
    raise_errors = self.raise_errors if raise_errors is None else raise_errors
    vocab_obj = self.get_vocab(self.vocab) if not vocab else self.get_vocab(vocab)

    try:
        wc_list = vocab_obj.filter(
            glyphs=glyphs,
            case=case,
            min_wl=min_wl,
            max_wl=max_wl,
            wl=wl,
            contains=contains,
            inner=inner,
            startswith=startswith,
            endswith=endswith,
            regexp=regexp,
        )
    except FilterError as e:
        if raise_errors:
            raise e
        else:
            log.warning("%s", e.args[0])
            return ""

    try:
        return wc_list[idx][0]
    except IndexError:
        if raise_errors:
            raise FilterError(f"No word at index idx='{idx}'")
        else:
            log.warning("No word at index idx='%s'", idx)
            return ""

top_words

top_words(
    glyphs=None,
    vocab=None,
    n_words=10,
    idx=0,
    case="any",
    min_wl=1,
    max_wl=None,
    wl=None,
    contains=None,
    inner=None,
    startswith=None,
    endswith=None,
    regexp=None,
    raise_errors=False,
)

Retrieve the top n_words from the Vocab, starting at index idx, subject to filtering constraints.

Parameters:

Name Type Description Default
glyphs str | None

Allowed glyph set. If None, uses default glyphs.

None
vocab str | None

Name of the Vocab to use. If None, use the default Vocab.

None
n_words int

Number of words to return.

10
idx int

The index at which to start returning words (0-based).

0
case CaseType

Desired case form ("any", "upper", "lower", etc.).

'any'
min_wl int

Minimum word length. Defaults to 1.

1
max_wl int | None

Maximum word length. If None, no maximum is applied.

None
wl int | None

Exact word length. If None, no exact length filter.

None
contains str | Sequence[str] | None

Substring(s) that must appear.

None
inner str | Sequence[str] | None

Substring(s) that must appear, not at edges.

None
startswith str | None

Required starting substring.

None
endswith str | None

Required ending substring.

None
regexp str | None

Regex pattern(s) to match.

None
raise_errors bool

Whether to raise errors or fail gently.

False

Returns:

Type Description
list[str]

list[str]: A list of up to n_words words, in descending frequency order.

Raises:

Type Description
FilterError

If filtering fails (no words match) and raise_errors is True.

Source code in wordsiv/__init__.py
def top_words(
    self,
    glyphs: str | None = None,
    vocab: str | None = None,
    n_words: int = 10,
    idx: int = 0,
    case: CaseType = "any",
    min_wl: int = 1,
    max_wl: int | None = None,
    wl: int | None = None,
    contains: str | Sequence[str] | None = None,
    inner: str | Sequence[str] | None = None,
    startswith: str | None = None,
    endswith: str | None = None,
    regexp: str | None = None,
    raise_errors: bool = False,
) -> list[str]:
    """
    Retrieve the top `n_words` from the Vocab, starting at index `idx`, subject to
    filtering constraints.

    Args:
        glyphs (str | None): Allowed glyph set. If None, uses default glyphs.
        vocab (str | None): Name of the Vocab to use. If None, use the default
            Vocab.
        n_words (int): Number of words to return.
        idx (int): The index at which to start returning words (0-based).
        case (CaseType): Desired case form ("any", "upper", "lower", etc.).
        min_wl (int): Minimum word length. Defaults to 1.
        max_wl (int | None): Maximum word length. If None, no maximum is applied.
        wl (int | None): Exact word length. If None, no exact length filter.
        contains (str | Sequence[str] | None): Substring(s) that must appear.
        inner (str | Sequence[str] | None): Substring(s) that must appear, not at
            edges.
        startswith (str | None): Required starting substring.
        endswith (str | None): Required ending substring.
        regexp (str | None): Regex pattern(s) to match.
        raise_errors (bool): Whether to raise errors or fail gently.

    Returns:
        list[str]: A list of up to `n_words` words, in descending frequency order.

    Raises:
        FilterError: If filtering fails (no words match) and `raise_errors` is True.
    """
    glyphs = self.glyphs if glyphs is None else glyphs
    vocab_obj = self.get_vocab(self.vocab) if not vocab else self.get_vocab(vocab)

    try:
        wc_list = vocab_obj.filter(
            glyphs=glyphs,
            case=case,
            min_wl=min_wl,
            max_wl=max_wl,
            wl=wl,
            contains=contains,
            inner=inner,
            startswith=startswith,
            endswith=endswith,
            regexp=regexp,
        )[idx : idx + n_words]
    except FilterError as e:
        if raise_errors:
            raise e
        else:
            log.warning("%s", e.args[0])
            return []

    if not wc_list:
        if raise_errors:
            raise FilterError(f"No words found at idx '{idx}'")
        else:
            log.warning("No words found at idx '%s'", idx)
            return []

    return [w for w, _ in wc_list]

word

word(
    vocab=None,
    glyphs=None,
    seed=None,
    rnd=0,
    case="any",
    top_k=0,
    min_wl=1,
    max_wl=None,
    wl=None,
    contains=None,
    inner=None,
    startswith=None,
    endswith=None,
    regexp=None,
    raise_errors=False,
)

Generate a random word that meets a variety of constraints, such as glyphs, length, regex filters, etc.

Parameters:

Name Type Description Default
vocab str | None

Name of the Vocab to use. If None, uses default Vocab.

None
glyphs str | None

A string of allowed glyphs. If None, uses default glyphs.

None
seed float | str | None

Seed the random number generator if seed is not None.

None
rnd float

Randomness factor in [0, 1] for selecting among the top words.

0
case CaseType

Desired case of the output word (e.g., 'upper', 'lower', 'any').

'any'
top_k int

If > 0, only consider the top K words by frequency.

0
min_wl int

Minimum word length.

1
max_wl int | None

Maximum word length. If None, no maximum is applied.

None
wl int | None

Exact word length. If None, no exact length is enforced.

None
contains str | Sequence[str] | None

Substring(s) that must appear in the word.

None
inner str | Sequence[str] | None

Substring(s) that must appear, but not at the start or end of the word.

None
startswith str | None

Required starting substring.

None
endswith str | None

Required ending substring.

None
regexp str | None

A regular expression that the word must match.

None
raise_errors bool

Whether to raise filtering errors or fail gently.

False

Returns:

Name Type Description
str str

A randomly generated word meeting the specified constraints (or an empty string on failure if raise_errors is False).

Raises:

Type Description
ValueError

If rnd is not in [0, 1].

FilterError

If filtering yields no results and raise_errors is True.

VocabFormatError

If the underlying Vocab data is malformed.

VocabEmptyError

If the underlying Vocab is empty.

Source code in wordsiv/__init__.py
def word(
    self,
    vocab: str | None = None,
    glyphs: str | None = None,
    seed: float | str | None = None,
    rnd: float = 0,
    case: CaseType = "any",
    top_k: int = 0,
    min_wl: int = 1,
    max_wl: int | None = None,
    wl: int | None = None,
    contains: str | Sequence[str] | None = None,
    inner: str | Sequence[str] | None = None,
    startswith: str | None = None,
    endswith: str | None = None,
    regexp: str | None = None,
    raise_errors: bool = False,
) -> str:
    """
    Generate a random word that meets a variety of constraints, such as glyphs,
    length, regex filters, etc.

    Args:
        vocab (str | None): Name of the Vocab to use. If None, uses default Vocab.
        glyphs (str | None): A string of allowed glyphs. If None, uses default
            glyphs.
        seed (float | str | None): Seed the random number generator if seed is not
            None.
        rnd (float): Randomness factor in [0, 1] for selecting among the top words.
        case (CaseType): Desired case of the output word (e.g., 'upper', 'lower',
            'any').
        top_k (int): If > 0, only consider the top K words by frequency.
        min_wl (int): Minimum word length.
        max_wl (int | None): Maximum word length. If None, no maximum is applied.
        wl (int | None): Exact word length. If None, no exact length is enforced.
        contains (str | Sequence[str] | None): Substring(s) that must appear in the
            word.
        inner (str | Sequence[str] | None): Substring(s) that must appear, but not
            at the start or end of the word.
        startswith (str | None): Required starting substring.
        endswith (str | None): Required ending substring.
        regexp (str | None): A regular expression that the word must match.
        raise_errors (bool): Whether to raise filtering errors or fail gently.

    Returns:
        str: A randomly generated word meeting the specified constraints (or an
            empty string on failure if `raise_errors` is False).

    Raises:
        ValueError: If `rnd` is not in [0, 1].
        FilterError: If filtering yields no results and `raise_errors` is True.
        VocabFormatError: If the underlying Vocab data is malformed.
        VocabEmptyError: If the underlying Vocab is empty.
    """
    glyphs = self.glyphs if glyphs is None else glyphs
    raise_errors = self.raise_errors if raise_errors is None else raise_errors
    vocab_obj = self.get_vocab(vocab)

    if not (0 <= rnd <= 1):
        raise ValueError("'rnd' must be between 0 and 1")

    if seed is not None:
        self._rand.seed(seed)

    try:
        wc_list = vocab_obj.filter(
            glyphs=glyphs,
            case=case,
            min_wl=min_wl,
            max_wl=max_wl,
            wl=wl,
            contains=contains,
            inner=inner,
            startswith=startswith,
            endswith=endswith,
            regexp=regexp,
        )
    except FilterError as e:
        if raise_errors:
            raise e
        else:
            log.warning("%s", e.args[0])
            return ""

    if top_k:
        wc_list = wc_list[:top_k]

    return _sample_word(wc_list, self._rand, rnd)

words

words(
    vocab=None,
    glyphs=None,
    seed=None,
    n_words=None,
    min_n_words=10,
    max_n_words=20,
    numbers=0,
    cap_first=None,
    case="any",
    rnd=0,
    min_wl=1,
    max_wl=None,
    wl=None,
    raise_errors=False,
    **word_kwargs,
)

Generate a list of words (and optionally numbers) according to the specified parameters.

This method will produce n_words tokens, each of which may be a word or a number (digit string), depending on the numbers ratio. It can also automatically handle capitalization of the first token if cap_first is True (or inferred).

Parameters:

Name Type Description Default
vocab str | None

Name of the Vocab to use. If None, uses the default Vocab.

None
glyphs str | None

Allowed glyph set. If None, uses the default glyphs.

None
seed any

Seed for the random number generator. If None, current state is used.

None
n_words int | None

Exact number of tokens to generate. If None, randomly choose between min_n_words and max_n_words.

None
min_n_words int

Minimum number of tokens if n_words is not specified.

10
max_n_words int

Maximum number of tokens if n_words is not specified.

20
numbers float

A value in [0, 1] that determines the probability of generating a numeric token instead of a word.

0
cap_first bool | None

If True, capitalize the first word (if case is "any"). If None, automatically decide based on glyphs availability.

None
case CaseType

Desired case form for the words ("any", "lower", "upper", etc.).

'any'
rnd float

Randomness factor for word selection, in [0, 1].

0
min_wl int

Minimum length for words/numbers.

1
max_wl int

Maximum length for words/numbers.

None
wl int | None

Exact length for words/numbers. If None, uses min/max_wl.

None
raise_errors bool

Whether to raise errors or fail gently.

False
**word_kwargs

Additional keyword arguments passed along to word().

{}

Returns:

Type Description
list[str]

list[str]: A list of randomly generated tokens (words or numbers).

Raises:

Type Description
ValueError

If numbers is not in [0, 1].

Source code in wordsiv/__init__.py
def words(
    self,
    vocab: str | None = None,
    glyphs: str | None = None,
    seed=None,
    n_words: int | None = None,
    min_n_words: int = 10,
    max_n_words: int = 20,
    numbers: float = 0,
    cap_first: bool | None = None,
    case: CaseType = "any",
    rnd: float = 0,
    min_wl: int = 1,
    max_wl: int | None = None,
    wl: int | None = None,
    raise_errors: bool = False,
    **word_kwargs,
) -> list[str]:
    """
    Generate a list of words (and optionally numbers) according to the specified
    parameters.

    This method will produce `n_words` tokens, each of which may be a word or a
    number (digit string), depending on the `numbers` ratio. It can also
    automatically handle capitalization of the first token if `cap_first` is True
    (or inferred).

    Args:
        vocab (str | None): Name of the Vocab to use. If None, uses the default
            Vocab.
        glyphs (str | None): Allowed glyph set. If None, uses the default glyphs.
        seed (any): Seed for the random number generator. If None, current state is
            used.
        n_words (int | None): Exact number of tokens to generate. If None, randomly
            choose between `min_n_words` and `max_n_words`.
        min_n_words (int): Minimum number of tokens if `n_words` is not specified.
        max_n_words (int): Maximum number of tokens if `n_words` is not specified.
        numbers (float): A value in [0, 1] that determines the probability of
            generating a numeric token instead of a word.
        cap_first (bool | None): If True, capitalize the first word (if `case` is
            "any"). If None, automatically decide based on glyphs availability.
        case (CaseType): Desired case form for the words ("any", "lower", "upper",
            etc.).
        rnd (float): Randomness factor for word selection, in [0, 1].
        min_wl (int): Minimum length for words/numbers.
        max_wl (int): Maximum length for words/numbers.
        wl (int | None): Exact length for words/numbers. If None, uses min/max_wl.
        raise_errors (bool): Whether to raise errors or fail gently.
        **word_kwargs: Additional keyword arguments passed along to `word()`.

    Returns:
        list[str]: A list of randomly generated tokens (words or numbers).

    Raises:
        ValueError: If `numbers` is not in [0, 1].
    """
    glyphs = self.glyphs if glyphs is None else glyphs

    if seed is not None:
        self._rand.seed(seed)

    if not n_words:
        n_words = self._rand.randint(min_n_words, max_n_words)

    if cap_first is None:
        if glyphs:
            # If constrained glyphs, only capitalize if uppercase letters exist
            cap_first = any(c for c in glyphs if c.isupper())
        else:
            # Otherwise, default to capitalize the first word
            cap_first = True

    if not (0 <= numbers <= 1):
        raise ValueError("'numbers' must be between 0 and 1")

    word_list = []
    last_w = None
    for i in range(n_words):
        if cap_first and case == "any" and i == 0:
            word_case: CaseType = "cap"
        else:
            word_case = case

        token_type = self._rand.choices(
            ["word", "number"],
            weights=[1 - numbers, numbers],
        )[0]

        if token_type == "word":
            w = self.word(
                vocab=vocab,
                glyphs=glyphs,
                case=word_case,
                rnd=rnd,
                min_wl=min_wl,
                max_wl=max_wl,
                wl=wl,
                raise_errors=raise_errors,
                **word_kwargs,
            )

            # Try once more to avoid consecutive repeats
            # TODO: this is a hack, we should find a better way to avoid consecutive
            # repeats
            if w == last_w:
                w = self.word(
                    vocab=vocab,
                    glyphs=glyphs,
                    case=word_case,
                    rnd=rnd,
                    min_wl=min_wl,
                    max_wl=max_wl,
                    wl=wl,
                    raise_errors=raise_errors,
                    **word_kwargs,
                )

            if w:
                word_list.append(w)
                last_w = w
        else:
            # token_type == "number"
            w = self.number(
                glyphs=glyphs,
                wl=wl,
                min_wl=min_wl,
                max_wl=max_wl or _DEFAULT_MAX_NUM_LENGTH,
                raise_errors=raise_errors,
            )

            if w:
                word_list.append(w)
                last_w = w

    return word_list