{"id":6898,"date":"2024-08-09T03:00:00","date_gmt":"2024-08-09T07:00:00","guid":{"rendered":"https:\/\/www.both.org\/?p=6898"},"modified":"2024-08-06T19:43:22","modified_gmt":"2024-08-06T23:43:22","slug":"converting-wordstar-files","status":"publish","type":"post","link":"https:\/\/www.both.org\/?p=6898","title":{"rendered":"Converting WordStar files"},"content":{"rendered":"<div class=\"pld-like-dislike-wrap pld-template-1\">\r\n    <div class=\"pld-like-wrap  pld-common-wrap\">\r\n    <a href=\"javascript:void(0)\" class=\"pld-like-trigger pld-like-dislike-trigger  \" title=\"\" data-post-id=\"6898\" data-trigger-type=\"like\" data-restriction=\"cookie\" data-already-liked=\"0\">\r\n                        <i class=\"fas fa-thumbs-up\"><\/i>\r\n                <\/a>\r\n    <span class=\"pld-like-count-wrap pld-count-wrap\">    <\/span>\r\n<\/div><\/div>\n<p>The great thing about open source software is you can create your own tools. With a little programming knowledge, you can solve problems that are unique to you. And by sharing them under an open source license, others use your solution to tackle similar issues.<\/p>\n\n\n\n<p>For example, I recently wanted to explore files created with WordStar. WordStar was a popular desktop word processor from the 1980s. It was originally published by MicroPro for the CP\/M operating system, and later for MS-DOS where it gained the height of its popularity during the early 1980s.<\/p>\n\n\n\n<p>WordStar is a very old file format, and not even LibreOffice Writer can import my WordStar 4.0 documents. But the WordStar file format is well documented, and I was able to create a tool to convert WordStar files to HTML. Here&#8217;s how I did it.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"a-brief-look-at-wordstar-files\">A brief look at WordStar files<\/h2>\n\n\n\n<p>WordStar used <em>control codes<\/em> for inline formatting such as bold and underlined text, and <em>dot commands<\/em> for other formatting such as margins and offsets.<\/p>\n\n\n\n<p>The dot commands aren\u2019t too far from nroff, the standard Unix document preparation system, although the WordStar commands are unrelated to each other. For example, while WordStar and nroff share a similar dot command to set the page length (<code>.PL<\/code> on WordStar, <code>.pl<\/code> on nroff) WordStar has unique dot commands to set features like the character width (<code>.CW<\/code>) or the footing (<code>.FO<\/code>) or the line height (<code>.LH<\/code>).<\/p>\n\n\n\n<p>Inside the file, WordStar used single-byte 7-bit ASCII characters for printable characters. For example, the letter \u201ccapital A\u201d (<code>A<\/code>) has the ASCII value of 65, or binary value <strong>0100 0001<\/strong>, or hexadecimal 0x41.<\/p>\n\n\n\n<p>WordStar used ASCII values below ASCII 0x20 (space) as control codes. To turn bold type on or off, WordStar inserted ASCII 0x02. Similarly, WordStar used ASCII 0x13 to turn underline on or off, and ASCII 0x19 to set and unset italic text. WordStar also supported other control codes for strikethrough, double strike, superscript, subscript, and other formatting.<\/p>\n\n\n\n<p>White space characters were as you might expect: a space was a space, and a tab was a tab. Each line ended with the <em>carriage return<\/em> and <em>new line<\/em> pair (ASCII values 0x0d and 0x0a) which was the standard character pair for the DOS operating system. ASCII 0x1a marked the end of the file.<\/p>\n\n\n\n<p>While WordStar used 7-bit ASCII characters for printable data, it reserved bit 8 (the \u201chigh bit\u201d) to indicate characters that can be \u201cmicrojustified\u201d by the printer driver, such as to create adjusted paragraphs that go all the way to the right margin. To retrieve the printable value, the print driver would strip off the high bit. This feature was used through WordStar 4, but removed from later versions of WordStar, relying instead on the dot commands to control text justification.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"a-program-to-examine-wordstar-files\">A program to examine WordStar files<\/h2>\n\n\n\n<p>We can use this to examine a sample WordStar file, and convert it to HTML output. To demonstrate this program, I\u2019ll use a sample file that contains just two paragraphs, using only bold and underlined text.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"533\" height=\"400\" src=\"https:\/\/www.both.org\/wp-content\/uploads\/2024\/08\/wordstar.png\" alt=\"\" class=\"wp-image-6900\"\/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"defining-the-data\">Defining the data<\/h3>\n\n\n\n<p>To create the program, we need to start with a few definitions. We\u2019ll read data in a series of full 8-bit bytes. We\u2019ll use a data type definition called <code>BYTE<\/code> that effectively creates an alias to the <code>unsigned char<\/code> data type, which is exactly 8 bits and holds unsigned values from 0 to 255.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>typedef unsigned char BYTE;<\/code><\/pre>\n\n\n\n<p>This program will read from a WordStar file and generate HTML output, using HTML tags like <code>&lt;b&gt;<\/code> to turn on bold and <code>&lt;\/b&gt;<\/code> to turn it off, and so on for other formatting. To control formatting, let\u2019s define a data structure to indicate what text style is currently in effect. We can define our own data type called <code>BOOL<\/code> which can store either <code>false<\/code> (zero) or <code>true<\/code> (non-zero) values, and then create a structure with the different formatting that the program can recognize:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>typedef enum { false, true } BOOL;\n\nstruct {\n    BOOL bold;\n    BOOL dblstrike;\n    BOOL underline;\n    BOOL superscr;\n    BOOL subscr;\n    BOOL strike;\n    BOOL italic;\n} fmt;<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"displaying-codes-and-characters\">Displaying codes and characters<\/h3>\n\n\n\n<p>The main work of the program will be reading characters, recognizing which are control codes to control inline formatting, setting those values, and printing the text. We can define a function called <code>show_codes<\/code> that takes a single byte value, evaluates it, and takes action. While we might assume this program will generate output on the <em>standard output<\/em>, we can make the function a bit more flexible by also providing a file pointer for the output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>int show_codes(BYTE ch, FILE *out)\n{\n...\n}<\/code><\/pre>\n\n\n\n<p>Since the function just needs to work on one character value at a time, we can do everything inside a <code>switch<\/code> statement. This is effectively a \u201cjump table\u201d that performs an action depending on the value of the character. For this program, I\u2019m not interested in micro justification, so I\u2019ll write the function to use the lower 7 bits of the character:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#define FLIP(A) ( (A) ? false : true )\n\nint show_codes(BYTE ch, FILE *out)\n{\n    BYTE low;\n\n    low = ch &amp; 0x7f;\n\n    switch (low) {                     \/* ignore high bit *\/\n        \/* a few formatting codes *\/\n    case 0x02:                        \/* bold on\/off *\/\n        fmt.bold = FLIP(fmt.bold);\n        if (fmt.bold) {\n            fputs(\"&lt;b&gt;\", out);\n        }\n        else {\n            fputs(\"&lt;\/b&gt;\", out);\n        }\n        break;\n    case 0x04:                        \/* double strike on\/off *\/\n        fmt.dblstrike = FLIP(fmt.dblstrike);\n        if (fmt.dblstrike) {\n            fputs(\"&lt;bold&gt;\", out);\n        }\n        else {\n            fputs(\"&lt;\/bold&gt;\", out);\n        }\n        break;\n    case 0x13:                        \/* underline on\/off *\/\n        fmt.underline = FLIP(fmt.underline);\n        if (fmt.underline) {\n            fputs(\"&lt;u&gt;\", out);\n        }\n        else {\n            fputs(\"&lt;\/u&gt;\", out);\n        }\n        break;\n    case 0x14:                        \/* superscript on\/off *\/\n        fmt.superscr = FLIP(fmt.superscr);\n        if (fmt.superscr) {\n            fputs(\"&lt;sup&gt;\", out);\n        }\n        else {\n            fputs(\"&lt;\/sup&gt;\", out);\n        }\n        break;\n    case 0x16:                        \/* subscript on\/off *\/\n        fmt.subscr = FLIP(fmt.subscr);\n        if (fmt.subscr) {\n            fputs(\"&lt;sub&gt;\", out);\n        }\n        else {\n            fputs(\"&lt;\/sub&gt;\", out);\n        }\n        break;\n    case 0x18:                        \/* strikethrough on\/off *\/\n        fmt.strike = FLIP(fmt.strike);\n        if (fmt.strike) {\n            fputs(\"&lt;s&gt;\", out);\n        }\n        else {\n            fputs(\"&lt;\/s&gt;\", out);\n        }\n        break;\n    case 0x19:                        \/* italic on\/off *\/\n        fmt.italic = FLIP(fmt.italic);\n        if (fmt.italic) {\n            fputs(\"&lt;i&gt;\", out);\n        }\n        else {\n            fputs(\"&lt;\/i&gt;\", out);\n        }\n        break;\n\n        \/* printable codes *\/\n\n    case 0x09:                        \/* tab *\/\n        fputs(\"&lt;span&gt;&amp;rarrb;&lt;\/span&gt;\", out);\n        break;\n    case 0x0a:                        \/* new line *\/\n        fputs(\"&lt;span&gt;&amp;ldsh;&lt;\/span&gt;&lt;br&gt;\", out);\n        break;\n    case 0x0c:                        \/* page feed *\/\n        fputs(\"&lt;span&gt;&amp;dArr;&lt;\/span&gt;\", out);\n        break;\n    case 0x0d:                        \/* carr rtn *\/\n        fputs(\"&lt;span&gt;&amp;larrhk;&lt;\/span&gt;\", out);\n        break;\n    case 0x1a:                        \/* eof *\/\n        fputs(\"&lt;span&gt;&amp;squf;&lt;\/span&gt;\", out);\n        return -1;\n\n    default:\n        if (low &lt; ' ') {               \/* not printable *\/\n            fprintf(out, \"&lt;span&gt;0x%X&lt;\/span&gt;\", low);\n        }\n        else {                         \/* printable *\/\n            fputc(low, out);\n        }\n    }\n\n    return 0;\n}<\/code><\/pre>\n\n\n\n<p>That\u2019s a large <code>switch<\/code> statement that responds to different values. Taking a step back and looking at the <code>switch<\/code> statement at a high level, you might see that the block evaluates several kinds of characters: control codes that turn formatting on or off, printable codes like tabs and new lines, and printable characters.<\/p>\n\n\n\n<p>The control codes that turn formatting on and off use a function called <code>FLIP<\/code>, which is actually a <em>macro<\/em> defined at the top. <code>FLIP<\/code> effectively flips the value of a boolean: if the input is true, it returns false; if it is false, it returns true. The formatting settings are stored in the global <code>fmt<\/code> structure.<\/p>\n\n\n\n<p>The printable codes actually generate an HTML entity that visually represents the otherwise invisible whitespace. For example, tabs become a right arrow pointing to a vertical bar (<code>&amp;rarrb;<\/code>) and an end of file character becomes a small filled square (<code>&amp;squf;<\/code>). Generating these whitespace entities inside <code>&lt;span&gt;..&lt;\/span&gt;<\/code> means we can format them later using CSS, either by setting them to a color such as <code>color:pink<\/code> or by removing them with <code>display:none<\/code>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"the-main-program\">The main program<\/h3>\n\n\n\n<p>With this function to do the \u201cheavy lifting\u201d of the program, the rest is fairly straightforward. The main program will open a file, and process it using another function. The second function reads the file, and passes each byte value to the <code>show_codes<\/code> function.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>void show_file(FILE *in, FILE *out)\n{\n    BYTE str&#91;100];\n    size_t len, i;\n\n    while (!feof(in)) {\n        len = fread(str, sizeof(BYTE), 100, in);\n\n        if (len &gt; 0) {\n            for (i = 0; i &lt; len; i++) {\n                if (show_codes(str&#91;i], out) &lt; 0) {\n                    return;\n                }\n            }\n        }\n    }\n}\n\nint main(int argc, char **argv)\n{\n    FILE *pfile;\n\n    \/* check command line *\/\n\n    if (argc != 2) {\n        fputs(\"usage: wshtml {file}\\n\", stderr);\n        return 1;\n    }\n\n    \/* init formatting *\/\n\n    fmt.bold = false;\n    fmt.dblstrike = false;\n    fmt.underline = false;\n    fmt.superscr = false;\n    fmt.subscr = false;\n    fmt.strike = false;\n    fmt.italic = false;\n\n    \/* HTML start *\/\n\n    puts(\"&lt;!DOCTYPE html&gt;\");\n    puts(\"&lt;html&gt;&lt;head&gt;&lt;title&gt;\");\n    puts(argv&#91;1]);\n    puts(\"&lt;\/title&gt;&lt;style&gt;\");\n    puts(\"span{color:pink;}\");\n    puts(\"&lt;\/style&gt;&lt;\/head&gt;&lt;body&gt;\");\n\n    \/* process file *\/\n\n    pfile = fopen(argv&#91;1], \"rb\");\n\n    if (pfile == NULL) {\n        fputs(\"cannot open file: \", stderr);\n        fputs(argv&#91;1], stderr);\n        fputc('\\n', stderr);\n    }\n    else {\n        show_file(pfile, stdout);\n        fclose(pfile);\n    }\n\n    \/* HTML end *\/\n\n    puts(\"&lt;\/body&gt;&lt;\/html&gt;\");\n\n    return 0;\n}<\/code><\/pre>\n\n\n\n<p>Looking at the details, the main program also initializes the values for the <code>fmt<\/code> structure of formatting settings, and generates HTML file data around the body.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"putting-it-all-together\">Putting it all together<\/h3>\n\n\n\n<p>Now all that\u2019s left is to assemble the parts and run the program. This is how the program looks when it\u2019s assembled, although I\u2019ve left out the contents of the functions; you can copy them from the above.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#include &lt;stdio.h&gt;\n\ntypedef unsigned char BYTE;\ntypedef enum { false, true } BOOL;\n\nstruct {\n    BOOL bold;\n    BOOL dblstrike;\n    BOOL underline;\n    BOOL superscr;\n    BOOL subscr;\n    BOOL strike;\n    BOOL italic;\n} fmt;\n\n#define FLIP(A) ( (A) ? false : true )\n\nint show_codes(BYTE ch, FILE *out)\n{\n...\n}\n\nvoid show_file(FILE *in, FILE *out)\n{\n...\n}\n\nint main(int argc, char **argv)\n{\n...\n}<\/code><\/pre>\n\n\n\n<p>Save this file as <code>wshtml.c<\/code> (\u201cWordStar to HTML\u201d) and compile it with this command:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ gcc -o wshtml wshtml.c<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"converting-wordstar-files\">Converting WordStar files<\/h2>\n\n\n\n<p>The program makes it easy to convert legacy WordStar files into more portable HTML documents:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ .\/wshtml sample.ws &gt; sample.html<\/code><\/pre>\n\n\n\n<p>Running the <code>wshtml<\/code> program converts my sample WordStar document into HTML, which I can view in my browser:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"509\" height=\"105\" src=\"https:\/\/www.both.org\/wp-content\/uploads\/2024\/08\/wordstar-html.png\" alt=\"\" class=\"wp-image-6901\"\/><\/figure>\n\n\n\n<p>While this program converts the inline formatting via the control codes, it does not recognize the dot commands as anything other than normal text; any dot commands will be included verbatim in the output, just like body text. If you are interested in converting files that also use dot commands for document-level formatting, you can use this program as a starting point.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Explore how this popular word processor stored data, so you can convert your old files.<\/p>\n","protected":false},"author":33,"featured_media":2818,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_lmt_disableupdate":"","_lmt_disable":"","footnotes":""},"categories":[407,150],"tags":[152],"class_list":["post-6898","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-file-management","category-programming","tag-programming"],"modified_by":"Jim Hall","_links":{"self":[{"href":"https:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/posts\/6898","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/users\/33"}],"replies":[{"embeddable":true,"href":"https:\/\/www.both.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=6898"}],"version-history":[{"count":3,"href":"https:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/posts\/6898\/revisions"}],"predecessor-version":[{"id":6916,"href":"https:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/posts\/6898\/revisions\/6916"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/media\/2818"}],"wp:attachment":[{"href":"https:\/\/www.both.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=6898"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.both.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=6898"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.both.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=6898"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}