File I/O in the Shell


Reading and writing to text files...

Probably one of the more neglected areas covered in shell tutorials is file I/O. This may be true because most users have sophisticated wordprocessing programs that essentially hide all the mechanics of file maintenance. And, even if you do not use wordprocessors extensively, most application programs internally manage their own data storage so that the details of the I/O process are rarely ever visible. You just display the data in your file! Or, you edit that data manually, in "vi" for example, and then close it up.

But what if you wanted a program to manage file content? A data logger comes to mind right away. You might want to collect all events that happened at the end of the day, who you heard for example; so you write a program in the shell to "detect" the callsigns, but you need a way to store this. Your script needs to write to a file.

What if you were running a program or a script that uses, or could use, a configuration file. It might contain both data and instructions for some program or script. Your script needs to read a file!

Below, we will cover some "bare bones" basics of file I/O where it is the machine, or program, that is actually doing the reading and the writing... not a person.




The "Ins" and "Outs" of File I/O

Reading a File

Lets begin by reading some data from a file into a variable. By using cat, we can do this in one command. Normally when you cat a file, it is displayed to standard output, also known as the screen, but by assigning the process to a variable, all the data ends up there!

Reading Data into a File All at Once
...

#----- The file to read
        myFile="/root/somefile"

#----- The "big" data variable
        myData=""

#----- Now the read
        myData=`cat $myFile`

#----- Show that the data is really in the variable...
#----- This is in the same format as the orginal file, new lines preserved
        echo "$myData"

#----- Show the data in non-quoted format, the space becomes the separator
        echo $myData

...

TASK : To open a file and read it, storing the data onto a variable, and then echoing the file contents in two formats using the quoted variable, and then the unquoted variable format...



Lets look at the output from these script echoes. First here is what the file content might look like as seen from the console:

Original File Contents
...

ka1fsb:~# cat somefile
one
two
three

...
Here is the first echoed output from the script with the variable quoted:

Script Variable Output Using Quotes
...

one
two
three

...
Here is the second echoed output from the script without quotes around the variable:

Script Variable Output without Quotes
...

one two three

...
As you can see, when we quote, we preserve the original format of the file, in this case, the new lines. If we don't quote, the echo command removes the new lines and substitutes spaces. This is exactly the format that the for/do/done loop structure requires. So if you were going to use this data string in a loop, it is already in the right format! Handy, but not always what you want or expect ... So, be careful about quoting or not quoting your variables when using echo. At this point, we also have all the data "sitting" in one massive variable, perhaps not very useable. Do we have another choice? Yes...

Suppose you didn't want to read in the entire file using cat, in other words, you want to read the file line-by-line. Can you do that? And why might you want to? You can, and you might want to parse or examine each line separately to extract data or trigger some other event. In our above example, we have all our data in one variable, so we would need to break it apart somehow to work with it. If we read line-by-line, it is already in a format that we can use. Let's look at the one-line-at-a-time case:

Reading Data One Line at a Time
...

#----- The file to read
        myFile="/root/somefile"

#----- The line data variable
        myLine=""

#----- Loop to read file data content
        while [ 1 ]
        do
                read myLine || break
                echo "$myLine"

        done < $myFile 

...

TASK : To read a file using a loop, line-by-line. The read gets the data and stores it in $myLine. When there is nothing left to read, i.e., read fails, the break exits the loop.



This little "snip" of code is actually doing quite a bit, especially the loop! Since the test condition always evaulates to 1 or true, this loop could potentially go on forever. Our only escape is brought about when the read fails to get any more data from the file and we break out of the loop at that point. The read line is shorthand for "if we can read then continue on, but if not then break." The "||" double pipe characters mean do this alternative when the first operation fails.

Notice too that the echoed variable is quoted. So we want the line to be displayed exactly as it appears in the file. If we didn't quote, all the spaces or tabs would be reduced to just single spaces between words. (Sometimes you want that and sometimes you don't. Just keep it in mind when scripting.)

You might also notice the done line. It has been "extended" to include a re-direct from our file. This is the "feed" for the loop. It will keep "pulling" lines from this file until it sees the EOF character.

Suppose you only have a small file and want to do a field parse on the entire file, or on a single line. How can you control the separator character so that you may cut into the target field and return only a specific datum? As you are storing it, you make your conversion. You could replace this line:

  • myData=`cat $myFile`

with this line:

  • myData=`cat $myFile | tr '\n' '|'`

in the script that reads the file all at once. This replaces all new lines with the "|" character as the field separator. Now, it doesn't matter whether you quote the variable or not, it will always appear as one massive block of data. You then would do a cut on the field you need:

  • myField=`echo $myData | cut -f<field_number> -d"|"`


where field_number is the offset into the data block beginning with field 1. And now $myField contains the datum as extracted from the field_number location in the data block.

You could apply the same technique to the line-by-line read. After the data is in $myLine, echo and cut the line based on the separator used, sometimes the colon as in the passwd file:

  • myField=`echo $myLine | cut -f<field_number> -d":"`


This would probably be a faster process since any given line is going to be quite short as compared to a large block of data. However, it should be noted that loops in the shell are very slow! (If you really need high speed looping, awk is a much better candidate.)

You now see how to read data from a file and store it in a variable in a script. And using cut, we can further extract items of data from blocks or lines. These values can then be applied to the processes in your script.

Writing to a File

In its simplest format, writing to a file can be done all at once, or appending to the end. The easiest write is to collect data in a variable and in one "swell foop" write it out to a file once and for all. Here is how to do that from a script:

Write to a File All at Once
...

#----- Set up the path and name of file
        myFile="/root/temp.txt"

#----- Load data string, the separator here is the colon (:)
        myData="one:two:three"

#----- Here is the write...
        echo "$myData" | tr ':' '\n' > $myFile

...
Let's take a closer look. The first line assigns to the variable $myFile the path and name we want to write to. The next line fills a variable with data that we want to be stored in the file. We use the colon (:) as the field separator which will be converted later to a new line "\n". The last line does the work. We send the data into a pipe, echo, and translate, tr, all the colons into new lines just before the write. The ">" symbol means overwrite this file with the current data, or, if this file does not exist, create it and then write to it.

Suppose you are collecting data frequently and need to add to this file. Assuming that data ends up in $myData, you change the write line to:

  • myData="four:five:six"
    ...
  • echo "$myData" | tr ':' '\n' >> $myFile


where the ">>" symbol means append or add onto the end of this file. (Note, this file will keep growing until you clear it, something to watch out for.)

Also, you have to manage the data in the $myData variable. It is expecting a colon (:) between data fields since that gets converted to a new line on the write.

NOTE: If you are just writing a single field on a line, when using the ">>" symbol, the shell assumes you are writing a new line and you don't need to convert or append a field separator character to the data string.

  • myData="seven"
    ...
  • echo "$myData" >> $myFile


This is by far the simplest and most common way to build a file. And even though I have used single words as the data elements, you could use full sentences or lengthy strings here instead. For example:

  • myData="This is a long sentence here."
    ...
  • echo "$myData" >> $myFile


And this line will be appended to the file, on its own line, just as it has been quoted. You may also use a sequence of sentences with a possible "|" pipe symbol as a separator, for example:

  • myData="This is a long sentence here."
  • myData=$myData"|Here is another sentence."
    ...
  • echo "$myData" | tr '|' '\n' >> $myFile


We are building up a data string, $myData, by concatenation, and then translating all pipes to newlines just before the write. (I chose a pipe symbol since a colon might have been used in any given sentence.)

As you have no doubt "guessed" by now, the preparation and formatting of the data is far more involved than the actual writing to a file which is really quite straightforward. It's the data prep that can be the real challenge, where you will spend most of your programming effort!

I hope this brief article will encourage you to venture forth into the "mysteries" of the shell, which will become less so as you begin to work with it. The shell is a huge program which means it has "places" which are seldom "visited." Not only will you be suprised by what you find there, but you will be able to put your "discoveries" to work and build on the efforts of many "generations" of shell programmers!


(Courtesy KBNorton Computer Services)