In this section, we’ll look at processing character data.
Text input or output, regardless of where it originates or where it goes to, is dealt with as streams of characters. A text stream is a sequence of characters divided into lines; each line consists of zero or more characters followed by a newline character.
Lets look at two of the simplest functions provided by the standard library for reading or writing one character at a time: getchar and putchar. Each time it is called, getchar reads the next input character from a text stream and returns that as its value.
c = getchar()
When this line is executed, the variable c contains the next character of input. The characters normally come from the keyboard.
The function putchar prints a character each time it is called:
putchar(c)
prints the contents of the integer variable c as a character, usually on the screen.
Let’s look at a program that is a simplified version of the UNIX command-line utility “wc”, which displays the number of lines, words and bytes contained in an input file or standard input to the standard output.
#include <stdio.h> #define TRUE 1 #define FALSE 0 main() { int c, lines, words, bytes, flag = TRUE; lines = words = bytes = 0; while ((c = getchar()) != EOF) { bytes++; if (c == '\n') lines++; if (c == ' ' || c == '\t' || c == '\n') flag = TRUE; else if (flag) { words++; flag = FALSE; } } printf("Lines = %d, words = %d, bytes = %d\n", lines, words, bytes); }
Every time getchar returns something other than EOF, we increment the bytes variable. If it’s a newline character, we increment lines variable. The last if statement checks if we encountered either a space, newline or tab character. If that’s the case, then if this is followed by another space, newline or tab character, we won’t want to increase the word count. However, if we get a valid character, we increase the word count and set the flag to FALSE, so that we only increase the word count, when we encounter another word.
The line
lines = words = bytes = 0;
sets all three variables to 0. This is not a special case, but a consequence of the fact that an assignment is an expression with a value and assignments associate from right to left. This is same as:
lines = (words = (bytes = 0));
|| is the or operator, so
if (c == ' ' || c == '\t' || c == '\n')
means if c is equal to either space or tab or newline. For and, we use &&.
Here, we also use else if statement, which specifies an alternative action if the condition part of an if statement is false. The general form of if statement is:
if (expression) statement1 else if (expression) statement2 else statement3
We also use
else if (flag) {
If the expression inside if or else if (in this case variable flag) evaluates to zero, that means the condition is false, otherwise it’s true.
Note: To terminate the program, press ^d (ctrl + d) at the beginning of a line (after pressing enter and without typing any characters). When you press ^d with an empty buffer, getchar() will return with zero bytes, and this gets interpreted as EOF (end of file).