{"id":11719,"date":"2025-08-29T02:00:00","date_gmt":"2025-08-29T06:00:00","guid":{"rendered":"https:\/\/www.both.org\/?p=11719"},"modified":"2025-08-28T21:08:01","modified_gmt":"2025-08-29T01:08:01","slug":"building-a-random-text-generator","status":"publish","type":"post","link":"https:\/\/www.both.org\/?p=11719","title":{"rendered":"Building a random text generator"},"content":{"rendered":"<div class=\"pld-like-dislike-wrap pld-template-1\">\r\n    <div class=\"pld-like-wrap  pld-common-wrap\">\r\n    <a href=\"javascript:void(0)\" class=\"pld-like-trigger pld-like-dislike-trigger  \" title=\"\" data-post-id=\"11719\" data-trigger-type=\"like\" data-restriction=\"cookie\" data-already-liked=\"0\">\r\n                        <i class=\"fas fa-thumbs-up\"><\/i>\r\n                <\/a>\r\n    <span class=\"pld-like-count-wrap pld-count-wrap\">    <\/span>\r\n<\/div><\/div>\n<p>There are many reasons you need to create placeholder text. For example, if you are building a new website, you may not have all of the content ready as you&#8217;re creating the design; placeholder text helps you see what the design will look like after you&#8217;ve added the content.<\/p>\n\n\n\n<p>For years, my &#8220;go-to&#8221; to generate sample content for documents has been the lipsum.com website, to insert Latin-like meaningless text. Most people are able to ignore the placeholder content if they immediately recognize that it&#8217;s just meaningless words, and &#8220;Lorem Ipsum&#8221; can do that very well. If I want placeholder text in English, I sometimes use other placeholder generators to do the same job, by inserting random content from Star Wars, Doctor Who, and Star Trek.<\/p>\n\n\n\n<p>But there&#8217;s another way to create placeholder text without copying from a website: you can make your own text generator. I wrote my own Bash script on Linux to generate a few paragraphs of random text. Here&#8217;s how it works.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">A list of words<\/h2>\n\n\n\n<p>Every Linux system includes a default dictionary of correctly-spelled words, usually saved in <code>\/usr\/share\/dict\/words<\/code>. These words are in sorted order, and contain both uppercase and lowercase words. If you use the <strong>head<\/strong> command to print the first ten lines of the <code>words<\/code> file, you will see &#8220;words&#8221; that start with numbers:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ head \/usr\/share\/dict\/words\n1080\n10-point\n10th\n11-point\n12-point\n16-point\n18-point\n1st\n2\n20-point<\/code><\/pre>\n\n\n\n<p>The <strong>grep<\/strong> command is an old Unix command that finds text in a file. You can just give <strong>grep<\/strong> some plain text to find, or you can make your search more specific by using special markers that indicate the start of a line (<code>^<\/code>) or the end of a line (<code>$<\/code>). For example, you can use two <strong>grep<\/strong> commands to search for all lines that start with a lowercase letter <code>a<\/code>, and end with the letter <code>e<\/code>, and use <strong>head<\/strong> to display only the first ten examples:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ grep '^a' \/usr\/share\/dict\/words | grep 'e$' | head\nabacate\nabacinate\nabaisance\nabaisse\nabalienate\nabalone\nabampere\nabandonable\nabandonee\nabase<\/code><\/pre>\n\n\n\n<p>You can do more with grep than just find plain words. Those special markers are called <em>regular expressions<\/em> and there&#8217;s a lot you can do. For example, you can specify repeating examples of text by using <code>+<\/code> for one or more or <code>*<\/code> to mean zero or more of the previous character. If you want to specify certain classes of characters, you can use special brackets like <code>[[:upper:]]<\/code> to mean the uppercase letters A to Z, or <code>[[:lower:]]<\/code> for the lowercase letters. This flexibility makes it possible to search for all kinds of text in a file. For example, to print all lines that start with an uppercase letter followed by one or more lowercase letters, you would use this regular expression:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ grep '^&#91;&#91;:upper:]]&#91;&#91;:lower:]]\\+$' \/usr\/share\/dict\/words<\/code><\/pre>\n\n\n\n<p>However, <strong>grep<\/strong> can find some very long words, if they are in the <code>words<\/code> file. On my system, the longest words that start with an uppercase letter followed by one or more lowercase letters are Prorhipidoglossomorpha, Pseudolamellibranchia, and Pseudolamellibranchiata. Those are too long if I want to generate some random placeholder text for a website. I think good placeholder text is a reasonable length, maybe 2 to 8 letters long for lowercase words, or 4 to 8 letters for uppercase words.<\/p>\n\n\n\n<p>To limit the length of the words, I can send the output of the <strong>grep<\/strong> command to another classic Unix command called <strong>awk<\/strong>, implemented as <strong>gawk<\/strong> (GNU awk) on most Linux systems. The <strong>awk<\/strong> command takes pairs of patterns and actions; for each matching pattern, it executes the action. In my case, to print just the words that start with an uppercase letter followed by one or more lowercase letters, and are more than 2 letters and less than 8 letters long, I would use this command:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ grep '^&#91;&#91;:upper:]]&#91;&#91;:lower:]]\\+$' \/usr\/share\/dict\/words | gawk 'length($0)&gt;4 &amp;&amp; length($0)&lt;8 {print}'<\/code><\/pre>\n\n\n\n<p>That&#8217;s a long line, but it&#8217;s just a <strong>grep<\/strong> command to find lines of text, and sending that to the <strong>gawk<\/strong> command.<\/p>\n\n\n\n<p>But a <strong>gawk<\/strong> pattern can also be a regular expression, using basically the same syntax as the <strong>grep<\/strong> command. That allows us to rewrite the command to search the <code>\/usr\/share\/dict\/words<\/code> file for all words that start with an uppercase letter followed by one or more lowercase letters, more than 2 letters and less than 8 letters, as a single <strong>gawk<\/strong> command:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ gawk '\/^&#91;&#91;:upper:]]&#91;&#91;:lower:]]+$\/ {if ((length($0)&gt;2) &amp;&amp; (length($0)&lt;8)) {print}' \/usr\/share\/dict\/words &gt; upper.tmp<\/code><\/pre>\n\n\n\n<p>This moves the length test inside the action, using if to determine if the word&#8217;s length is greater than 2 and less than 8. Other than using a redirector (<strong>&gt;<\/strong>) to save the output to a temporary file called <code>upper.tmp<\/code>, the command is essentially the same, but doing it all inside <strong>gawk<\/strong> instead of using <strong>grep<\/strong> then <strong>gawk<\/strong>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Using loops<\/h2>\n\n\n\n<p>The script generates 5 paragraphs of text, each consisting of a random number of sentences, each with a random number of words. I do this with several <strong>for<\/strong> loops, to iterate over a set of values. For example, to print out the text &#8220;Hello&#8221; 4 times, I would write this <strong>for<\/strong> loop:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ for word in 1 2 3 4; do echo \"Hello\"; done<\/code><\/pre>\n\n\n\n<p>If you type this at the Bash command line, or save it to a &#8220;script&#8221; file and run it, you should see &#8220;Hello&#8221; printed back to you 4 times:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ for word in 1 2 3 4; do echo \"Hello\"; done\nHello\nHello\nHello\nHello<\/code><\/pre>\n\n\n\n<p>At every &#8220;pass&#8221; through the loop, the variable <code>word<\/code> is assigned the value 1, 2, 3, or 4. You can print out the value of the <code>word<\/code> variable by writing it with a &#8220;dollar sign&#8221; in front, like this to print the numbers 1, 2, 3, and 4:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ for word in 1 2 3 4; do echo $word; done\n1\n2\n3\n4<\/code><\/pre>\n\n\n\n<p>You can also put one <strong>for<\/strong> loop &#8220;inside&#8221; another; this is called nested loops. It&#8217;s easiest to show nested loops by writing it in a script, where I can split up the lines to make the instructions more clear. For example, this prints the values A1, A2, B1, and B2 to the screen using nested loops:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>for letter in A B ; do\n  for number in 1 2 ; do\n    echo $letter$number\n  done\ndone<\/code><\/pre>\n\n\n\n<p>I&#8217;ve also added some extra spacing so you can see the nested loops in action, and to make clear what is &#8220;inside&#8221; each loop. When I write <strong>for<\/strong> loops like this, I usually write the <strong>;<\/strong> with spaces on either side. This is just a personal style, you don&#8217;t need to use the extra space.<\/p>\n\n\n\n<p>If you save this to a script and run it, you should see the values A1, A2, B1, and B2 printed to the screen. That&#8217;s because the &#8220;outer&#8221; loop iterates through the letters A and B; for each &#8220;letter&#8221; loop, the &#8220;inner&#8221; loop iterates through the numbers 1 and 2. The effect is the loop generates the four values in order:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>A1\nA2\nB1\nB2<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Printing random lines<\/h2>\n\n\n\n<p>To generate random words, either all lowercase words or words that start with an initial uppercase letter, we need to print random lines from a word file. We can use <strong>gawk<\/strong> to find the words we need; the next step is to pick random words from the temporary file.<\/p>\n\n\n\n<p>Linux provides a command called <strong>shuf<\/strong> that can shuffle a text file and generate a file with the lines in a random order. For example, let&#8217;s print the numbers 1, 2, 3, and 4 in a random order with the <strong>shuf<\/strong> command:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ seq 4 | shuf\n3\n4\n1\n2<\/code><\/pre>\n\n\n\n<p>The <strong>seq<\/strong> command always prints 1, 2, 3, and 4 in that order, but adding the <strong>shuf<\/strong> command randomizes the order. Similarly, if you have a longer list, but only want to see the first few lines from the shuffled list, send the output to the <strong>head<\/strong> command. This prints only the first ten lines by default; use a hyphen with a number to print that many lines, such as this to shuffle a list of ten numbers but print only 4 lines of output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ seq 10 | shuf | head -4\n5\n9\n10\n2<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Putting it all together<\/h2>\n\n\n\n<p>With these Bash scripting commands, plus a few extra Bash features that I&#8217;ll show you, you can generate a few paragraphs of random text. Each paragraph contains a random number of sentences, between 5 and 8 sentences. Each sentence has a random number of words, between 6 and 9 words.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/bin\/bash\n\nwords=\/usr\/share\/dict\/words\n\nlower=\/tmp\/lower.tmp\nupper=\/tmp\/upper.tmp\n\ngawk '\/^&#91;&#91;:lower:]]+$\/ {if ((length($0)>2) &amp;&amp; (length($0)&lt;8)) {print}}' $words > $lower\ngawk '\/^&#91;&#91;:upper:]]&#91;&#91;:lower:]]+$\/ {if ((length($0)>2) &amp;&amp; (length($0)&lt;8)) {print}}' $words > $upper\n\nfor para in $(seq 5) ; do\n  s=$((RANDOM % 5 + 3))\n\n  for sent in $(seq $s) ; do\n    w=$((RANDOM % 6 + 3))\n    ( shuf -n 1 $upper ; shuf -n $w $lower ) | tr '\\n' ' ' | sed 's\/ $\/. \/'\n  done\n  echo -e '\\n'\ndone\n\nrm -f $lower $upper<\/code><\/pre>\n\n\n\n<p>On my system, I saved this script to a file called <code>mkwords.bash<\/code>. Let&#8217;s look at this in more detail to understand how it works:<\/p>\n\n\n\n<p>The first few lines save some values to a few variables; a variable is just a way to access a value later on. In this case, I&#8217;ve saved the path to the word list in a <code>words<\/code> variable, the path to a list of lowercase words in the <code>lower<\/code> variable, and a list of uppercase words in the <code>upper<\/code> variable. I can use these at any time in the Bash script with a &#8220;dollar sign&#8221; like <code>$words<\/code> to get the full path to the word list, at <code>\/usr\/share\/dict\/words<\/code>.<\/p>\n\n\n\n<p>After that, the script runs the two <strong>gawk<\/strong> commands to generate the list of all-lowercase words and the list of words that start with an uppercase letter.<\/p>\n\n\n\n<p>Then, the script uses a nested <strong>for<\/strong> loop to print 5 paragraphs. This also sets a variable called s that is a random number between 3 and 7. That&#8217;s because the <code>$(( ))<\/code> brackets create an arithmetic expansion, so Bash can do simple arithmetic. You probably know the basic arithmetic operators like add (<code>+<\/code>), subtract (<code>-<\/code>), multiply (<code>*<\/code>) and divide (<code>\/<\/code>). You can also use <code>%<\/code> to mean modulo, or the remainder after division. For example, <code>9 % 4<\/code> is 1, because 9 divided by 4 is 2 with 1 left over. The arithmetic expansion to assign a value to <code>s<\/code> uses <code>RANDOM<\/code> to mean a random number, and taking the modulo of 5 will give a value in the range 0, 1, 2, 3, or 4. That means <code>s<\/code> can be in the range 3 (0 + 3) to 7 (4 + 3).<\/p>\n\n\n\n<p>The next loop generates that many random sentences, from 1 to <code>s<\/code>, using a similar trick to pick a random number of words (w) between 3 and 8.<\/p>\n\n\n\n<p>The last line inside the &#8220;inner&#8221; loop uses two <strong>shuf<\/strong> commands to print 1 random word from the uppercase words, then the random number of words from the list of lowercase words. The random words are printed one per line, so I&#8217;ve added the <strong>tr<\/strong> command to translate the newline (\\n) to a space. The <strong>sed<\/strong> command makes line-by-line edits to add a period to the end of the line. These commands generate a series of &#8220;sentences&#8221; that begin with an uppercase word followed by a random number of lowercase words, plus a period.<\/p>\n\n\n\n<p>After each sentence, the script uses an <strong>echo<\/strong> command to print an extra newline. Actually, the echo command itself generates a newline, so this command effectively prints 2 newlines.<\/p>\n\n\n\n<p>The last line in the script cleans up my temporary files by deleting them.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">A few samples<\/h2>\n\n\n\n<p>Whenever I need to generate some placeholder text for a project, I can just run this Bash script to print out a few paragraphs. Every time I run the script, it prints 5 paragraphs of a few sentences, each with a reasonable number of words. This is somewhat representative of text that I might include in a document.<\/p>\n\n\n\n<p>The script prints each paragraph on a single line. To make it more readable, I&#8217;ll send the output through the <strong>fmt<\/strong> program to &#8220;wrap&#8221; the lines:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ bash mkwords.bash | fmt\nAttalie adicity sebate arecain. Jedthus azaleas ottos calor omniana. Alber\nshammes talpa resoaks micmac ducs anchors vil frosty. Olalla gesling\nnooses trashy downby gnosis pituri sambuca magmata. Borda durably salada\ndubbin sanable femoral cubane. Gorizia pirojki viper mattins jitters\nrongeur theos laciest cretic.\n\nVinie driven outgaze sleepry. Lepper lansing ogams trams cruiser italite\noutstay. Niort slavers noecho tugriks swaddle. Fassold plagal vlei unioid\nmellows bunty weals. Loy prahus stare rowable inlayed. Vally pigmy joeyes\nzincify balada clethra pks tineine.\n\nInola perit peggy filled. Yarura oceanic taunt scrath rapids crusta\nwyches. Aleus gleety bumphs staw caaba ratio cliffs. Stigler ortman\ndecay faucals.\n\nNyoro atoxic asses melvie. Blau insteep chaw couac. Boff clite sodless\narzan.\n\nGerhan feudary espinal shoad libra brunion debts rosing. Alvito fister\nquested buxom pennant impower tabstop stylize outrick. Dupuis caffle\nemerick neems hagbut equinox.<\/code><\/pre>\n\n\n\n<p>Every time I run the mkwords.bash script, it generates new random words, sentences, and paragraphs.<\/p>\n\n\n\n<p>This script works well for me, but you can still improve it. For example, every time the script runs, it generates the same list of words from the <code>\/usr\/share\/dict\/words<\/code> file. Since the system&#8217;s word list doesn&#8217;t change very often, you can make this script run faster if you save the list of temporary words somewhere in your home directory, and only regenerate the lists if they are not there.<\/p>\n\n\n\n<p>Also, the <code>\/usr\/share\/dict\/words<\/code> file contains some words that are not work-friendly. So instead of using the system&#8217;s word list, you might make your own list of words to use. One way to create a list like this is to use the words from other documents you have already written, and use that word list as the starting point.<\/p>\n\n\n\n<p>But if you just want to generate a few paragraphs of random-length sentences with random words, this script will do the job. And you can do it on your own with a Bash script.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><em>This article is based on <a href=\"https:\/\/technicallywewrite.com\/2025\/08\/26\/placeholder\">Generating your own random text<\/a> by Jim Hall, and is republished with the author&#8217;s permission.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Generate your own random placeholder text by writing a few lines in Bash.<\/p>\n","protected":false},"author":33,"featured_media":3522,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_lmt_disableupdate":"","_lmt_disable":"","footnotes":""},"categories":[5,150],"tags":[91,152],"class_list":["post-11719","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-linux","category-programming","tag-linux","tag-programming"],"modified_by":"David Both","_links":{"self":[{"href":"https:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/posts\/11719","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/users\/33"}],"replies":[{"embeddable":true,"href":"https:\/\/www.both.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=11719"}],"version-history":[{"count":3,"href":"https:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/posts\/11719\/revisions"}],"predecessor-version":[{"id":11750,"href":"https:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/posts\/11719\/revisions\/11750"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/media\/3522"}],"wp:attachment":[{"href":"https:\/\/www.both.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=11719"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.both.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=11719"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.both.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=11719"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}