File I/O in the Shell
Reading and writing to text files...
Probably one of the more neglected areas covered
in shell tutorials is file I/O. This may be true because most users
have sophisticated wordprocessing programs that essentially hide all the
mechanics of file maintenance. And, even if you do not use wordprocessors
extensively, most application programs internally manage their own
data storage so that the details of the I/O process are rarely ever
visible. You just display the data in your file! Or, you edit that data
manually, in "vi" for example, and then close it up.
But what if you wanted a program to manage file
content? A data logger comes to mind right away. You might want to collect
all events that happened at the end of the day, who you heard for example;
so you write a program in the shell to "detect" the callsigns, but you
need a way to store this. Your script needs to write to a file.
What if you were running a program or a script that
uses, or could use, a configuration file. It might contain both data and
instructions for some program or script. Your script needs to
read a file!
Below, we will cover some "bare bones" basics of
file I/O where it is the machine, or program, that is actually doing the
reading and the writing... not a person.
|
The "Ins" and "Outs" of File I/O
- Reading a File
-
Lets begin by reading some data from a file into
a variable. By using cat, we can do this in one command. Normally
when you cat a file, it is displayed to standard output, also known as
the screen, but by assigning the process to a variable, all the data
ends up there!
| Reading Data into a File All at Once |
...
#----- The file to read
myFile="/root/somefile"
#----- The "big" data variable
myData=""
#----- Now the read
myData=`cat $myFile`
#----- Show that the data is really in the variable...
#----- This is in the same format as the orginal file, new lines preserved
echo "$myData"
#----- Show the data in non-quoted format, the space becomes the separator
echo $myData
...
|
|
TASK
: To open a file and read it, storing the data onto a variable, and then
echoing the file contents in two formats using the quoted variable, and
then the unquoted variable format...
|
Lets look at the output from these script echoes. First here is what the
file content might look like as seen from the console:
| Original File Contents |
...
ka1fsb:~# cat somefile
one
two
three
...
|
Here is the first echoed output from the script with the variable
quoted:
| Script Variable Output Using Quotes |
...
one
two
three
...
|
Here is the second echoed output from the script without
quotes around the variable:
| Script Variable Output without Quotes |
...
one two three
...
|
As you can see, when we quote, we preserve the original format of the file,
in this case, the new lines. If we don't quote, the echo command removes
the new lines and substitutes spaces. This is exactly the format that
the
for/do/done
loop structure requires. So if you were going to use this
data string in a loop, it is already in the right format! Handy, but
not always what you want or expect ... So, be careful about quoting or
not quoting your variables when using echo. At this point, we also have
all the data "sitting" in one massive variable, perhaps not very useable.
Do we have another choice? Yes...
Suppose you didn't want to read in the entire file using cat, in other words,
you want to read the file line-by-line. Can you do that? And why might you
want to? You can, and you might want to parse or examine each line
separately to extract data or trigger some other event. In our above example,
we have all our data in one variable, so we would need to break it apart
somehow to work with it. If we read line-by-line, it is already in a format
that we can use. Let's look at the one-line-at-a-time case:
| Reading Data One Line at a Time |
...
#----- The file to read
myFile="/root/somefile"
#----- The line data variable
myLine=""
#----- Loop to read file data content
while [ 1 ]
do
read myLine || break
echo "$myLine"
done < $myFile
...
|
|
TASK
: To read a file using a loop, line-by-line. The read gets the data and stores
it in $myLine. When there is nothing left to read, i.e., read fails, the
break exits the loop.
|
This little "snip" of code is actually doing quite a bit, especially the
loop! Since the test condition always evaulates to 1 or true, this loop
could potentially go on forever. Our only escape is brought about when
the read fails to get any more data from the file and we
break out of the loop at that point. The read line is shorthand for
"if we can read then continue on, but if not then break." The "||" double
pipe characters mean do this alternative when the first operation fails.
Notice too that the echoed variable is quoted. So we want the line to be
displayed exactly as it appears in the file. If we didn't quote, all the
spaces or tabs would be reduced to just single spaces between words.
(Sometimes you want that and sometimes you don't. Just keep it in mind
when scripting.)
You might also notice the done line. It has been "extended" to
include a re-direct from our file. This is the "feed" for the loop. It will
keep "pulling" lines from this file until it sees the EOF character.
Suppose you only have a small file and want to do a field parse on the
entire file, or on a single line. How can you control the separator
character so that you may cut into the target field and return only
a specific datum? As you are storing it, you make your conversion. You
could replace this line:
with this line:
- myData=`cat $myFile | tr '\n' '|'`
in the script that reads the file all at once. This replaces all new lines
with the "|" character as the field separator. Now, it doesn't matter whether
you quote the variable or not, it will always appear as one massive block of
data. You then would do a
cut
on the field you need:
- myField=`echo $myData | cut -f<field_number> -d"|"`
where field_number is the offset into the data block beginning with field 1.
And now $myField contains the datum as extracted from the field_number
location in the data block.
You could apply the same technique to the line-by-line read. After the data
is in $myLine, echo and cut the line based on the separator used, sometimes
the colon as in the passwd file:
- myField=`echo $myLine | cut -f<field_number> -d":"`
This would probably be a faster process since any given line is going to be
quite short as compared to a large block of data. However, it should be
noted that loops in the shell are very slow! (If you really need high speed
looping, awk is a much better candidate.)
You now see how to read data from a file and store it in a variable in a
script. And using cut, we can further extract items of data from
blocks or lines. These values can then be applied to the processes in your
script.
- Writing to a File
-
In its simplest format, writing to a file can be
done all at once, or appending to the end. The easiest write is to
collect data in a variable and in one "swell foop" write it out to a
file once and for all. Here is how to do that from a script:
| Write to a File All at Once |
...
#----- Set up the path and name of file
myFile="/root/temp.txt"
#----- Load data string, the separator here is the colon (:)
myData="one:two:three"
#----- Here is the write...
echo "$myData" | tr ':' '\n' > $myFile
...
|
Let's take a closer look. The first line assigns to the variable
$myFile the path and name we want to write to. The next line fills
a variable with data that we want to be stored in the file. We use the
colon (:) as the field separator which will be converted later to a
new line "\n". The last line does the work. We send the data into a
pipe, echo, and translate, tr, all the colons into new
lines just before the write. The ">" symbol means overwrite
this file with the current data, or, if this file does not exist,
create it and then write to it.
Suppose you are collecting data frequently and need to add to this file.
Assuming that data ends up in $myData, you change the write line to:
- myData="four:five:six"
...
- echo "$myData" | tr ':' '\n' >> $myFile
where the ">>" symbol means append or add onto the end of this file.
(Note, this file will keep growing until you clear it, something to
watch out for.)
Also, you have to manage the data in the $myData variable. It is expecting
a colon (:) between data fields since that gets converted to a new line on
the write.
NOTE: If you are just writing a single field on a line, when using the
">>" symbol, the shell assumes you are writing a new line and you don't
need to convert or append a field separator character to the data string.
- myData="seven"
...
- echo "$myData" >> $myFile
This is by far the simplest and most common way to build a file. And even
though I have used single words as the data elements, you could use full
sentences or lengthy strings here instead. For example:
- myData="This is a long sentence here."
...
- echo "$myData" >> $myFile
And this line will be appended to the file, on its own line, just as it
has been quoted. You may also use a sequence of sentences with a
possible "|" pipe symbol as a separator, for example:
- myData="This is a long sentence here."
- myData=$myData"|Here is another sentence."
...
- echo "$myData" | tr '|' '\n' >> $myFile
We are building up a data string, $myData, by concatenation, and then
translating all pipes to newlines just before the write. (I chose a pipe
symbol since a colon might have been used in any given sentence.)
As you have no doubt "guessed" by now, the preparation and formatting of
the data is far more involved than the actual writing to a file which is
really quite straightforward. It's the data prep that can be
the real challenge, where you will spend most of your programming effort!
I hope this brief article will encourage you to venture forth into the
"mysteries" of the shell, which will become less so as you begin to work
with it. The shell is a huge program which means it has "places" which
are seldom "visited." Not only will you be suprised by what you find
there, but you will be able to put your "discoveries" to work and
build on the efforts of many "generations" of shell programmers!
(Courtesy KBNorton Computer Services)
|