Skip to content

Generating Text

WordSiv provides several methods for generating text:

For methods that generate text with probabilities (not top_word(), top_words()), there are options for adjusting the randomness of the output:

  • seed: Make output repeatable (deterministic)
  • rnd: Blend in fully random word selection with probability-based word selection
  • rnd_punc: Blend in fully random punctuation selection with the default probability-based punctuation selection

There are also additional options:

  • numbers: Mix-in figures with the words
  • punc: Optionally disable punctuation
  • top_k: Restrict the word list used for text generation to top_k (most probable) words

Text Generation Methods

WordSiv is structured so that word generation calls cascade, passing arguments from the larger to smaller text generation methods. So when you call text(), you get this chain of method calls:

text()paras()para()sents()sent()words()word()

This means you can pass arguments to text() that will effect all the smaller text generation methods it calls:

from wordsiv import WordSiv

wsv = WordSiv(vocab="en")

print(
    wsv.text(
        n_paras=3,  # number of paragraphs
        min_n_sents=2,  # min sentences per paragraph
        max_n_sents=4,  # max sentences per paragraph
        min_n_words=3,  # min words per sentence
        max_n_words=7,  # max words per sentence
        numbers=0.1,  # 10% chance of numbers
        rnd=0.1,  # 10% random word selection
        rnd_punc=0.5,  # 50% random punctuation
        para_sep="¶",  # custom paragraph separator
        min_wl=5,  # minimum word length
        max_wl=10,  # maximum word length
        contains="a",  # words contains substring (doesn't affect numbers)
    )
)

Random Word (word())

The word() method returns a single word, randomly selected from the Vocab (weighted by word probability). See also word filter arguments.

from wordsiv import WordSiv

wsv = WordSiv(vocab="en")

# random single word from probabilities
print(wsv.word())

# random single word with glyphs restriction
print(wsv.word(glyphs="HAMBUGERFONTSIVhambugerfontsiv"))

# random single word, no probablities
print(wsv.word(rnd=1))

Most Common Word (top_word())

The top_word() method returns the most common word or the nth common word. See also word filter arguments.

from wordsiv import WordSiv

wsv = WordSiv(vocab="en")

# Get most common word
print(wsv.top_word())

# Get 5th most common word
print(wsv.top_word(idx=4))

# Get 5th most common word after word filters
print(wsv.top_word(idx=4, glyphs="HAMBUGERFONTSIVhambugerfontsiv", wl=7))

List of Random Words (words())

The words() method returns a list of words generated by word() (randomly selected from the Vocab, weighted by word probability). See also word filter arguments.

from wordsiv import WordSiv

wsv = WordSiv(vocab="en")

# Random number of words
print(wsv.words())

# Random number of words, capitalized first word
print(wsv.words(cap_first=True))

# 5 words
print(wsv.words(n_words=5))

# 3-10 words
print(wsv.words(min_n_words=3, max_n_words=10))

# 10 numbers
print(wsv.words(numbers=1, n_words=10))

# 50% words, 50% numbers
print(wsv.words(numbers=0.5, n_words=10))

List of Most Common Words (top_words())

The top_words() method returns a list of the most common words in descending frequency order. See also word filter arguments.

from wordsiv import WordSiv

wsv = WordSiv(vocab="en")

# Get 10 most common words
print(wsv.top_words(n_words=10))

# Get the 10th-19th most common words
print(wsv.top_words(n_words=10, idx=9))

# Get 10 most common words after word filters
print(wsv.top_words(n_words=10, glyphs="HAMBUGERFONTSIVhambugerfontsiv", wl=7))

Sentence (sent())

The sent() method returns a single sentence, joining the output of words() with punctuation (optionally). See also word filter arguments.

from wordsiv import WordSiv

wsv = WordSiv(vocab="en")

# Sentence of random length
print(wsv.sent())

# Sentence w/ no punctuation
print(wsv.sent(punc=False))

# Sentence w/ 5 words and completely random punctuation
print(wsv.sent(rnd_punc=1, n_words=5))

# Sentence w/ 3-10 words
print(wsv.sent(min_n_words=3, max_n_words=10))

# String of 10 numbers
print(wsv.sent(numbers=1, n_words=10))

# 50% words, 50% numbers
print(wsv.sent(numbers=0.5, n_words=10))

List of Sentences (sents())

The sents() method returns a list of sentences generated from sent(). See also word filter arguments.

from wordsiv import WordSiv

wsv = WordSiv(vocab="en")

# A random number of sentences
print(wsv.sents())

# 5 sentences
print(wsv.sents(n_sents=5))

# 2-3 sentences
print(wsv.sents(min_n_sents=2, max_n_sents=3))

Paragraph (para())

The para() method returns a single paragraph, joining the output of sents(). See also word filter arguments.

from wordsiv import WordSiv

wsv = WordSiv(vocab="en")

# Paragraph w/ random number of sentences
print(wsv.para())

# Paragraph w/ 3 sentences
print(wsv.para(n_sents=3))

# Paragraph which joins sentences with a custom separator
print(wsv.para(sent_sep="\n"))

# Paragraph w/ 1-3 sentences
print(wsv.para(min_n_sents=1, max_n_sents=3))

Multiple Paragraphs (paras())

The paras() method returns a list of paragraphs generated from para(). See also word filter arguments.

from wordsiv import WordSiv

wsv = WordSiv(vocab="en")

# Random number of paragraphs
print(wsv.paras())

# 3 paragraphs
print(wsv.paras(n_paras=3))

Text Block (text())

The text() method generates a text block, joining the output of paras(). See also word filter arguments.

from wordsiv import WordSiv

wsv = WordSiv(vocab="en")

# Text block w/ random number of paragraphs
print(wsv.text())

# Text block with special paragraph separator
print(wsv.text(para_sep="¶"))

# Text block w/ 3 paragraphs
print(wsv.text(n_paras=3))

Adjusting Randomness

Repeatable Output (seed)

For reproducible results, you can set a random seed when initializing WordSiv or for individual function calls. This is essential if you want your proof to remain the same until you make changes to the code (or your glyphs).

from wordsiv import WordSiv

# Set seed on initialization
wsv = WordSiv(vocab="en", seed=123)
print(wsv.words(n_words=5))

# Or set seed for specific calls
wsv = WordSiv(vocab="en")
print(wsv.words(n_words=5, seed=123))

This example might give you a better example of how this works:

from wordsiv import WordSiv

wsv = WordSiv(vocab="en", glyphs="HAMBUGERFONThambugerfont")

# same results
print(wsv.sent(seed=3))
# "Heart tent terra Emma root buffet foam mom Hagen to earth at ammo"
print(wsv.sent(seed=3))
# "Heart tent terra Emma root buffet foam mom Hagen to earth at ammo"

# not if we change our glyphs though!
wsv.glyphs = "HAMBUGERFONTSIVhambugerfontsiv"
print(wsv.sent(seed=3))
# "Of not but not to as on to setting the of the things"

# you only need to seed at the beginning of your proof:
wsv.seed(1)
print(wsv.word())
# "of"
print(wsv.word())
# "agreement"

# See? same results as above:
wsv.seed(1)
print(wsv.word())
# "of"
print(wsv.word())
# "agreement"

# so as long as you don't insert a new call which uses the random generator in-between:
wsv.seed(1)
print(wsv.word())
# "of"
print(wsv.word(startswith="f"))
# "fee"
print(wsv.word())
# "area"

Word Randomness (rnd)

The rnd parameter controls how random the word generation is. This is useful for outputting more less-probable words, especially when your glyph set is limited (the probability distribution becomes even more skewed towards short, common words).

  • rnd=0: Use word probabilities to select words (default)
  • rnd=1: Completely random word selection
  • 0<rnd<1: Interpolation of word probability distribution and fully random distribution
from wordsiv import WordSiv

wsv = WordSiv(vocab="en", glyphs="HAMBUGERFONTSIVhambugerfontsiv")

# Default behavior - based on word frequencies
print(wsv.words(n_words=10))

# Completely random selection
print(wsv.words(n_words=10, rnd=1))

# Blending in just a little bit of randomness helps when you have a very constricted
# glyphs set like "HAMBUGERFONTS"
print(wsv.words(n_words=10, rnd=0.03))

Punctuation Randomness (rnd_punc)

The rnd_punc parameter controls how random the punctuation generation is. This is useful for getting less common punctuation that statistically doesn't occur all that often.

  • rnd_punc=0: Use punctuation frequencies to select punctuation (default)
  • rnd_punc=1: Completely random punctuation selection
  • 0<rnd_punc<1: Interpolation of punctuation frequency distribution and fully random distribution
from wordsiv import WordSiv

wsv = WordSiv(vocab="en")

# Default behavior - based on punctuation probabilities
print(wsv.sent(rnd_punc=0))

# Completely random punctuation selection
print(wsv.sent(rnd_punc=1))

# Interpolation between totally random punc selection and probability-based punc
# selection
print(wsv.sent(rnd_punc=0.5))

Additional Options

Limiting Word Pool (top_k)

You can restrict word selection to the most common top_k words. This is useful if you want to generate text with only highly-frequent words:

from wordsiv import WordSiv

wsv = WordSiv(vocab="en")

# Top 100 most frequent words
print(wsv.text(top_k=100))

# This is useful to get a selection of highly-frequent words, without skewing towards
# the top few words.
print(wsv.text(rnd=1, top_k=1000))

Mixing In Numbers (numbers)

You can include basic random figures in your text (constrained by glyphs) with the numbers parameter:

from wordsiv import WordSiv

wsv = WordSiv(vocab="en")

# No numbers (default is 0 anyway)
print(wsv.sent(numbers=0))

# 25% chance each word is a number (will make up roughly 25% of text)
print(wsv.text(numbers=0.25))

# A list of numbers
print(wsv.words(numbers=1))

Disabling Punctuation (punc)

You can disable punctuation with the punc parameter:

from wordsiv import WordSiv

wsv = WordSiv(vocab="en")

# No punctuation
print(wsv.para(punc=False))