Generating Text
WordSiv provides several methods for generating text:
- Word(s):
word()
,words()
,top_word()
,top_words()
- Sentence(s):
sent()
,sents()
- Paragraph(s):
para()
,paras()
- Text Block:
text()
For methods that generate text with probabilities (not top_word()
, top_words()
), there are options for adjusting the randomness of the output:
seed
: Make output repeatable (deterministic)rnd
: Blend in fully random word selection with probability-based word selectionrnd_punc
: Blend in fully random punctuation selection with the default probability-based punctuation selection
There are also additional options:
numbers
: Mix-in figures with the wordspunc
: Optionally disable punctuationtop_k
: Restrict the word list used for text generation totop_k
(most probable) words
Text Generation Methods
WordSiv is structured so that word generation calls cascade, passing arguments
from the larger to smaller text generation methods. So when you call text()
, you get this chain of method calls:
text()
➔ paras()
➔ para()
➔ sents()
➔ sent()
➔ words()
➔ word()
This means you can pass arguments to text()
that will effect all the smaller
text generation methods it calls:
from wordsiv import WordSiv
wsv = WordSiv(vocab="en")
print(
wsv.text(
n_paras=3, # number of paragraphs
min_n_sents=2, # min sentences per paragraph
max_n_sents=4, # max sentences per paragraph
min_n_words=3, # min words per sentence
max_n_words=7, # max words per sentence
numbers=0.1, # 10% chance of numbers
rnd=0.1, # 10% random word selection
rnd_punc=0.5, # 50% random punctuation
para_sep="¶", # custom paragraph separator
min_wl=5, # minimum word length
max_wl=10, # maximum word length
contains="a", # words contains substring (doesn't affect numbers)
)
)
Random Word (word()
)
The word()
method returns a single word, randomly selected from the Vocab (weighted by word probability). See also word filter arguments.
from wordsiv import WordSiv
wsv = WordSiv(vocab="en")
# random single word from probabilities
print(wsv.word())
# random single word with glyphs restriction
print(wsv.word(glyphs="HAMBUGERFONTSIVhambugerfontsiv"))
# random single word, no probablities
print(wsv.word(rnd=1))
Most Common Word (top_word()
)
The top_word()
method returns the most common word or the nth common word.
See also word filter arguments.
from wordsiv import WordSiv
wsv = WordSiv(vocab="en")
# Get most common word
print(wsv.top_word())
# Get 5th most common word
print(wsv.top_word(idx=4))
# Get 5th most common word after word filters
print(wsv.top_word(idx=4, glyphs="HAMBUGERFONTSIVhambugerfontsiv", wl=7))
List of Random Words (words()
)
The words()
method returns a list of words generated by word()
(randomly
selected from the Vocab, weighted by word probability). See also word filter arguments.
from wordsiv import WordSiv
wsv = WordSiv(vocab="en")
# Random number of words
print(wsv.words())
# Random number of words, capitalized first word
print(wsv.words(cap_first=True))
# 5 words
print(wsv.words(n_words=5))
# 3-10 words
print(wsv.words(min_n_words=3, max_n_words=10))
# 10 numbers
print(wsv.words(numbers=1, n_words=10))
# 50% words, 50% numbers
print(wsv.words(numbers=0.5, n_words=10))
List of Most Common Words (top_words()
)
The top_words()
method returns a list of the most common words in descending
frequency order. See also word filter arguments.
from wordsiv import WordSiv
wsv = WordSiv(vocab="en")
# Get 10 most common words
print(wsv.top_words(n_words=10))
# Get the 10th-19th most common words
print(wsv.top_words(n_words=10, idx=9))
# Get 10 most common words after word filters
print(wsv.top_words(n_words=10, glyphs="HAMBUGERFONTSIVhambugerfontsiv", wl=7))
Sentence (sent()
)
The sent()
method returns a single sentence, joining the output of words()
with punctuation (optionally). See also word filter
arguments.
from wordsiv import WordSiv
wsv = WordSiv(vocab="en")
# Sentence of random length
print(wsv.sent())
# Sentence w/ no punctuation
print(wsv.sent(punc=False))
# Sentence w/ 5 words and completely random punctuation
print(wsv.sent(rnd_punc=1, n_words=5))
# Sentence w/ 3-10 words
print(wsv.sent(min_n_words=3, max_n_words=10))
# String of 10 numbers
print(wsv.sent(numbers=1, n_words=10))
# 50% words, 50% numbers
print(wsv.sent(numbers=0.5, n_words=10))
List of Sentences (sents()
)
The sents()
method returns a list of sentences generated from sent()
. See
also word filter arguments.
from wordsiv import WordSiv
wsv = WordSiv(vocab="en")
# A random number of sentences
print(wsv.sents())
# 5 sentences
print(wsv.sents(n_sents=5))
# 2-3 sentences
print(wsv.sents(min_n_sents=2, max_n_sents=3))
Paragraph (para()
)
The para()
method returns a single paragraph, joining the output of sents()
.
See also word filter arguments.
from wordsiv import WordSiv
wsv = WordSiv(vocab="en")
# Paragraph w/ random number of sentences
print(wsv.para())
# Paragraph w/ 3 sentences
print(wsv.para(n_sents=3))
# Paragraph which joins sentences with a custom separator
print(wsv.para(sent_sep="\n"))
# Paragraph w/ 1-3 sentences
print(wsv.para(min_n_sents=1, max_n_sents=3))
Multiple Paragraphs (paras()
)
The paras()
method returns a list of paragraphs generated from para()
.
See also word filter arguments.
from wordsiv import WordSiv
wsv = WordSiv(vocab="en")
# Random number of paragraphs
print(wsv.paras())
# 3 paragraphs
print(wsv.paras(n_paras=3))
Text Block (text()
)
The text()
method generates a text block, joining the output of paras()
. See
also word filter arguments.
from wordsiv import WordSiv
wsv = WordSiv(vocab="en")
# Text block w/ random number of paragraphs
print(wsv.text())
# Text block with special paragraph separator
print(wsv.text(para_sep="¶"))
# Text block w/ 3 paragraphs
print(wsv.text(n_paras=3))
Adjusting Randomness
Repeatable Output (seed
)
For reproducible results, you can set a random seed when initializing WordSiv or for individual function calls. This is essential if you want your proof to remain the same until you make changes to the code (or your glyphs).
from wordsiv import WordSiv
# Set seed on initialization
wsv = WordSiv(vocab="en", seed=123)
print(wsv.words(n_words=5))
# Or set seed for specific calls
wsv = WordSiv(vocab="en")
print(wsv.words(n_words=5, seed=123))
This example might give you a better example of how this works:
from wordsiv import WordSiv
wsv = WordSiv(vocab="en", glyphs="HAMBUGERFONThambugerfont")
# same results
print(wsv.sent(seed=3))
# "Heart tent terra Emma root buffet foam mom Hagen to earth at ammo"
print(wsv.sent(seed=3))
# "Heart tent terra Emma root buffet foam mom Hagen to earth at ammo"
# not if we change our glyphs though!
wsv.glyphs = "HAMBUGERFONTSIVhambugerfontsiv"
print(wsv.sent(seed=3))
# "Of not but not to as on to setting the of the things"
# you only need to seed at the beginning of your proof:
wsv.seed(1)
print(wsv.word())
# "of"
print(wsv.word())
# "agreement"
# See? same results as above:
wsv.seed(1)
print(wsv.word())
# "of"
print(wsv.word())
# "agreement"
# so as long as you don't insert a new call which uses the random generator in-between:
wsv.seed(1)
print(wsv.word())
# "of"
print(wsv.word(startswith="f"))
# "fee"
print(wsv.word())
# "area"
Word Randomness (rnd
)
The rnd
parameter controls how random the word generation is. This is useful
for outputting more less-probable words, especially when your glyph set is
limited (the probability distribution becomes even more skewed towards short,
common words).
rnd=0
: Use word probabilities to select words (default)rnd=1
: Completely random word selection0<rnd<1
: Interpolation of word probability distribution and fully random distribution
from wordsiv import WordSiv
wsv = WordSiv(vocab="en", glyphs="HAMBUGERFONTSIVhambugerfontsiv")
# Default behavior - based on word frequencies
print(wsv.words(n_words=10))
# Completely random selection
print(wsv.words(n_words=10, rnd=1))
# Blending in just a little bit of randomness helps when you have a very constricted
# glyphs set like "HAMBUGERFONTS"
print(wsv.words(n_words=10, rnd=0.03))
Punctuation Randomness (rnd_punc
)
The rnd_punc
parameter controls how random the punctuation generation is. This
is useful for getting less common punctuation that statistically doesn't occur
all that often.
rnd_punc=0
: Use punctuation frequencies to select punctuation (default)rnd_punc=1
: Completely random punctuation selection0<rnd_punc<1
: Interpolation of punctuation frequency distribution and fully random distribution
from wordsiv import WordSiv
wsv = WordSiv(vocab="en")
# Default behavior - based on punctuation probabilities
print(wsv.sent(rnd_punc=0))
# Completely random punctuation selection
print(wsv.sent(rnd_punc=1))
# Interpolation between totally random punc selection and probability-based punc
# selection
print(wsv.sent(rnd_punc=0.5))
Additional Options
Limiting Word Pool (top_k
)
You can restrict word selection to the most common top_k
words. This is useful
if you want to generate text with only highly-frequent words:
from wordsiv import WordSiv
wsv = WordSiv(vocab="en")
# Top 100 most frequent words
print(wsv.text(top_k=100))
# This is useful to get a selection of highly-frequent words, without skewing towards
# the top few words.
print(wsv.text(rnd=1, top_k=1000))
Mixing In Numbers (numbers
)
You can include basic random figures in your text (constrained by glyphs
) with
the numbers
parameter:
from wordsiv import WordSiv
wsv = WordSiv(vocab="en")
# No numbers (default is 0 anyway)
print(wsv.sent(numbers=0))
# 25% chance each word is a number (will make up roughly 25% of text)
print(wsv.text(numbers=0.25))
# A list of numbers
print(wsv.words(numbers=1))
Disabling Punctuation (punc
)
You can disable punctuation with the punc
parameter: