{"id":11889,"date":"2025-09-11T03:00:00","date_gmt":"2025-09-11T07:00:00","guid":{"rendered":"https:\/\/www.both.org\/?p=11889"},"modified":"2025-09-08T11:45:41","modified_gmt":"2025-09-08T15:45:41","slug":"read-long-lines-with-getline","status":"publish","type":"post","link":"http:\/\/www.both.org\/?p=11889","title":{"rendered":"Read long lines with getline"},"content":{"rendered":"<div class=\"pld-like-dislike-wrap pld-template-1\">\r\n    <div class=\"pld-like-wrap  pld-common-wrap\">\r\n    <a href=\"javascript:void(0)\" class=\"pld-like-trigger pld-like-dislike-trigger  \" title=\"\" data-post-id=\"11889\" data-trigger-type=\"like\" data-restriction=\"cookie\" data-already-liked=\"0\">\r\n                        <i class=\"fas fa-thumbs-up\"><\/i>\r\n                <\/a>\r\n    <span class=\"pld-like-count-wrap pld-count-wrap\">    <\/span>\r\n<\/div><\/div>\n<p>Reading strings in C used to be a very dangerous thing to do. When reading input from the user, programmers might be tempted to use the <code>gets<\/code> function from the C Standard Library. The usage for <code>gets<\/code> is simple enough:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>char *gets(char *string);<\/code><\/pre>\n\n\n\n<p>That is, <code>gets<\/code> reads data from standard input, and stores the result in a string variable. Using <code>gets<\/code> returns a pointer to the string, or the value NULL if nothing was read.<\/p>\n\n\n\n<p>As a simple example, we might ask the user a question and read the result into a string:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#include &lt;stdio.h&gt;\n#include &lt;string.h&gt;\n\nint\nmain()\n{\n  char city&#91;10];                       \/\/ Such as \"Chicago\"\n\n  \/\/ this is bad .. please don't use gets\n\n  puts(\"Where do you live?\");\n  gets(city);\n\n  printf(\"&lt;%s&gt; is length %ld\\n\", city, strlen(city));\n\n  return 0;\n}<\/code><\/pre>\n\n\n\n<p>Entering a relatively short value with the above program works well enough:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Where do you live?\nChicago\n&lt;Chicago&gt; is length 7<\/code><\/pre>\n\n\n\n<p>However, the <code>gets<\/code> function is very simple, and will naively read data until it thinks the user is finished. But <code>gets<\/code> doesn&#8217;t check that the string is long enough to hold the user&#8217;s input. Entering a very long value will cause <code>gets<\/code> to store more data than the string variable can hold, resulting in overwriting other parts of memory.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Where do you live?\nLlanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch\n&lt;Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch&gt; is length 58\nSegmentation fault (core dumped)<\/code><\/pre>\n\n\n\n<p>At best, overwriting parts of memory simply breaks the program. At worst, this introduces a critical security bug where a bad user can insert arbitrary data into the computer&#8217;s memory via your program.<\/p>\n\n\n\n<p>That&#8217;s why the <code>gets<\/code> function is dangerous to use in a program. Using <code>gets<\/code>, you have no control over how much data your program attempts to read from the user. This often leads to buffer overflow.<\/p>\n\n\n\n<p>And for this reason, <code>gets<\/code> is no longer part of the C standard. Instead, the <code>fgets<\/code> function has historically been the recommended way to read strings safely. This version of <code>gets<\/code> provides a safety check by only reading up to a certain number of characters, passed as a function argument.<\/p>\n\n\n\n<p>The <code>fgets<\/code> function reads from the file pointer, and stores data into a string variable, but only up to the length indicated by <code>size<\/code>. While this is certainly safer than using <code>fgets<\/code> to read user input, it does so at the cost of &#8220;cutting off&#8221; your user&#8217;s input if it is too long.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"the-new-safe-way\">The safer way<\/h1>\n\n\n\n<p>A more flexible solution to reading long data is to allow the string-reading function to allocate more memory to the string, if the user entered more data than the variable might hold. By resizing the string variable as necessary, the program always has enough room to store the user&#8217;s input.<\/p>\n\n\n\n<p>The <code>getline<\/code> function does exactly that. This function reads input from an input stream, such as the keyboard or a file, and stores the data in a string variable. But unlike <code>fgets<\/code> and <code>gets<\/code>, <code>getline<\/code> resizes the string with <code>realloc<\/code> to ensure there is enough memory to store the complete input.<\/p>\n\n\n\n<p><code>ssize_t getline(char **pstring, size_t *size, FILE *stream);<\/code><\/p>\n\n\n\n<p>The<code> getline<\/code> is actually a wrapper to a similar function called <code>getdelim<\/code> that reads data up to a special delimiter character. In this case, <code>getline<\/code> uses a newline (&#8216;\\n&#8217;) as the delimiter, because when reading user input either from the keyboard or from a file, lines of data are separated by a newline character.<\/p>\n\n\n\n<p>The result is a much safer method to read arbitrary data, one line at a time. To use <code>getline<\/code>, define a string pointer and set it to NULL to indicate no memory has been set aside yet. Also define a &#8220;string size&#8221; variable of type <code>size_t<\/code> and give it a zero value. When you call <code>getline<\/code>, you&#8217;ll use pointers to both the string and the string size variables, and indicate where to read data. For a sample program, we can read from the standard input:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#include &lt;stdio.h&gt;\n#include &lt;stdlib.h&gt;\n#include &lt;string.h&gt;\n\nint\nmain()\n{\n  char *string = NULL;\n  size_t size = 0;\n  ssize_t chars_read;\n\n  \/\/ read a long string with getline\n\n  puts(\"Enter a really long string:\");\n\n  chars_read = getline(&amp;string, &amp;size, stdin);\n  printf(\"getline returned %ld\\n\", chars_read);\n\n  \/\/ check for errors\n\n  if (chars_read &lt; 0) {\n    puts(\"couldn't read the input\");\n    free(string);\n    return 1;\n  }\n\n  \/\/ print the string\n\n  printf(\"&lt;%s&gt; is length %ld\\n\", string, strlen(string));\n\n  \/\/ free the memory used by string\n\n  free(string);\n\n  return 0;\n}<\/code><\/pre>\n\n\n\n<p>As the <code>getline<\/code> reads data, it will automatically reallocate more memory for the string variable as needed. When the function has read all the data from one line, it updates the size of the string via the pointer, and returns the number of characters read, including the delimiter.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\u200bEnter a really long string:\nSupercalifragilisticexpialidocious\ngetline returned 35\n&lt;Supercalifragilisticexpialidocious\n> is length 35<\/code><\/pre>\n\n\n\n<p>Note that the string includes the delimiter character. For <code>getline<\/code>, the delimiter is the newline, which is why the output has a line feed in there. If you don&#8217;t want the delimiter in your string value, you can use another function to change the delimiter to a null character in the string.<\/p>\n\n\n\n<p>With<code> getline<\/code>, programmers can safely avoid one of the common pitfalls of C programming. You can never tell what data your user might try to enter, which is why using <code>gets<\/code> is unsafe, and <code>fgets<\/code> is awkward. Instead, <code>getline<\/code> offers a more flexible way to read user data into your program without breaking the system.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><em>This article is adapted from <a href=\"https:\/\/opensource.com\/article\/22\/5\/safely-read-user-input-getline\">How to (safely) read user input with the getline function<\/a> by Jim Hall, and is republished with the author&#8217;s permission.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Getline offers a more flexible way to read user data into your program without breaking the system.<\/p>\n","protected":false},"author":33,"featured_media":2949,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_lmt_disableupdate":"","_lmt_disable":"","footnotes":""},"categories":[5,150],"tags":[91,152],"class_list":["post-11889","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-linux","category-programming","tag-linux","tag-programming"],"modified_by":"Jim Hall","_links":{"self":[{"href":"http:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/posts\/11889","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/users\/33"}],"replies":[{"embeddable":true,"href":"http:\/\/www.both.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=11889"}],"version-history":[{"count":1,"href":"http:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/posts\/11889\/revisions"}],"predecessor-version":[{"id":11890,"href":"http:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/posts\/11889\/revisions\/11890"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/media\/2949"}],"wp:attachment":[{"href":"http:\/\/www.both.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=11889"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.both.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=11889"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.both.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=11889"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}