Loading learning content...
The shell is one of the most fundamental interfaces between users and the operating system. Every time you type ls, cd, or pipe commands together, you're interacting with a sophisticated program that parses input, manages processes, handles I/O redirection, and coordinates with the kernel.
Building a shell from scratch is perhaps the single best project for understanding process management, system calls, and the Unix philosophy. It's a rite of passage for systems programmers and a cornerstone project in OS courses worldwide.
By the end of this page, you will understand shell architecture, command parsing, process creation with fork/exec, I/O redirection, pipes, signal handling, job control, and how to build a fully functional Unix-like shell from the ground up.
A shell is fundamentally a Read-Eval-Print Loop (REPL) that continuously reads user input, interprets commands, executes them, and displays results. However, beneath this simple concept lies considerable complexity.
The Shell Execution Cycle:
This cycle seems straightforward, but each step involves intricate interactions with the operating system.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
#include <stdio.h>#include <stdlib.h>#include <string.h>#include <unistd.h>#include <sys/wait.h> #define MAX_LINE 1024#define MAX_ARGS 128 char *read_line(void);char **parse_line(char *line);int execute_command(char **args); int main(void) { char *line; char **args; int status = 1; // Main shell loop while (status) { // 1. Display prompt printf("mysh> "); fflush(stdout); // 2. Read input line = read_line(); if (line == NULL) { break; // EOF (Ctrl+D) } // 3. Parse input into arguments args = parse_line(line); if (args[0] == NULL) { free(line); free(args); continue; // Empty command } // 4. Execute command status = execute_command(args); // 5. Cleanup and loop free(line); free(args); } printf("\nExiting shell.\n"); return EXIT_SUCCESS;}Notice how the main loop separates reading, parsing, and execution into distinct functions. This modular design makes the shell easier to extend and debug. Each component can be tested and modified independently.
Parsing transforms raw input into structured data the shell can execute. A production shell must handle:
ls -la /homeecho "hello world"echo hello\ world|, >, >>, <, &, ;echo $HOMEecho $(date)For a simple shell, we start with basic tokenization:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182
#include <string.h>#include <stdlib.h>#include <ctype.h> #define TOKEN_DELIMITERS " \t\r\n" /** * Tokenize input line into array of arguments. * Returns NULL-terminated array of strings. */char **parse_line(char *line) { int bufsize = 64; int position = 0; char **tokens = malloc(bufsize * sizeof(char *)); char *token; if (!tokens) { perror("allocation error"); exit(EXIT_FAILURE); } token = strtok(line, TOKEN_DELIMITERS); while (token != NULL) { tokens[position++] = token; // Reallocate if needed if (position >= bufsize) { bufsize += 64; tokens = realloc(tokens, bufsize * sizeof(char *)); if (!tokens) { perror("allocation error"); exit(EXIT_FAILURE); } } token = strtok(NULL, TOKEN_DELIMITERS); } tokens[position] = NULL; // NULL-terminate return tokens;} /** * Advanced tokenizer supporting quotes. * Handles: "quoted strings" and 'single quotes' */char **parse_line_advanced(char *line) { int bufsize = 64; int position = 0; char **tokens = malloc(bufsize * sizeof(char *)); char *ptr = line; char *token_start; char quote_char = 0; while (*ptr) { // Skip whitespace while (*ptr && isspace(*ptr)) ptr++; if (!*ptr) break; // Check for quotes if (*ptr == '"' || *ptr == '\'') { quote_char = *ptr++; token_start = ptr; while (*ptr && *ptr != quote_char) ptr++; if (*ptr == quote_char) { *ptr++ = '\0'; // Terminate token } } else { token_start = ptr; while (*ptr && !isspace(*ptr)) ptr++; if (*ptr) *ptr++ = '\0'; } tokens[position++] = strdup(token_start); if (position >= bufsize) { bufsize += 64; tokens = realloc(tokens, bufsize * sizeof(char *)); } } tokens[position] = NULL; return tokens;}| Character | Name | Function |
|---|---|---|
| | Pipe | Connect stdout of left command to stdin of right |
> | Redirect Out | Redirect stdout to file (overwrite) |
>> | Append | Redirect stdout to file (append) |
< | Redirect In | Redirect file to stdin |
& | Background | Run command in background |
; | Separator | Sequential command execution |
&& | And | Execute next only if previous succeeds |
|| | Or | Execute next only if previous fails |
The fork/exec model is the cornerstone of Unix process creation. Understanding it deeply is essential for shell implementation:
fork() — Creates an exact copy of the current process
exec() — Replaces current process with a new program
Why this two-step model?
Between fork() and exec(), the child can modify its environment—set up redirections, close files, change directories—before loading the new program. This separation enables the shell's powerful I/O manipulation.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475
#include <stdio.h>#include <stdlib.h>#include <unistd.h>#include <sys/wait.h>#include <string.h>#include <errno.h> /** * Execute an external command using fork/exec. * Returns 1 to continue shell, 0 to exit. */int execute_external(char **args) { pid_t pid, wpid; int status; pid = fork(); if (pid == 0) { // ========== CHILD PROCESS ========== // This is where we set up redirections, pipes, etc. // before replacing ourselves with the target program. // Execute the command if (execvp(args[0], args) == -1) { // execvp only returns on error fprintf(stderr, "mysh: %s: %s\n", args[0], strerror(errno)); exit(EXIT_FAILURE); } // Unreachable if exec succeeds exit(EXIT_FAILURE); } else if (pid < 0) { // ========== FORK ERROR ========== perror("mysh: fork"); return 1; } else { // ========== PARENT PROCESS ========== // Wait for the child to complete do { wpid = waitpid(pid, &status, WUNTRACED); } while (!WIFEXITED(status) && !WIFSIGNALED(status)); } return 1;} /** * Execute command - handles builtins and external commands. */int execute_command(char **args) { if (args[0] == NULL) { return 1; // Empty command } // Check for built-in commands first if (strcmp(args[0], "exit") == 0) { return 0; // Signal to exit shell } if (strcmp(args[0], "cd") == 0) { if (args[1] == NULL) { // cd with no args goes to HOME chdir(getenv("HOME")); } else { if (chdir(args[1]) != 0) { perror("mysh: cd"); } } return 1; } // External command return execute_external(args);}Commands like cd, exit, and export MUST run in the shell process itself, not in a child. If cd ran in a forked child, only the child's working directory would change—the shell would remain unchanged. This is why shells distinguish 'built-in' from external commands.
I/O redirection is one of Unix's most powerful features. The key insight: file descriptors are inherited across fork() and preserved across exec().
To redirect:
dup2() to copy file descriptors to stdin/stdout/stderr123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
#include <stdio.h>#include <stdlib.h>#include <unistd.h>#include <fcntl.h>#include <sys/wait.h> /** * Execute command with I/O redirection. * * Supports: * command > file (redirect stdout, overwrite) * command >> file (redirect stdout, append) * command < file (redirect stdin) * command 2> file (redirect stderr) */int execute_with_redirection(char **args, char *input_file, char *output_file, int append_mode) { pid_t pid = fork(); if (pid == 0) { // ========== CHILD: Set up redirections ========== // Input redirection: command < file if (input_file != NULL) { int fd_in = open(input_file, O_RDONLY); if (fd_in < 0) { perror("mysh: input redirection"); exit(EXIT_FAILURE); } // Replace stdin (fd 0) with our file dup2(fd_in, STDIN_FILENO); close(fd_in); // Close original fd } // Output redirection: command > file or >> file if (output_file != NULL) { int flags = O_WRONLY | O_CREAT; flags |= append_mode ? O_APPEND : O_TRUNC; int fd_out = open(output_file, flags, 0644); if (fd_out < 0) { perror("mysh: output redirection"); exit(EXIT_FAILURE); } // Replace stdout (fd 1) with our file dup2(fd_out, STDOUT_FILENO); close(fd_out); } // Execute the command execvp(args[0], args); perror("mysh: exec"); exit(EXIT_FAILURE); } else if (pid > 0) { // ========== PARENT: Wait for child ========== int status; waitpid(pid, &status, 0); return WEXITSTATUS(status); } else { perror("mysh: fork"); return -1; }}Pipes connect the output of one command to the input of another. The command ls | grep foo | wc -l creates a pipeline of three processes, each feeding into the next.
The pipe() system call creates a unidirectional communication channel:
pipefd[0] for reading, pipefd[1] for writingpipefd[1] can be read from pipefd[0]123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
#include <stdio.h>#include <stdlib.h>#include <unistd.h>#include <sys/wait.h> /** * Execute a pipeline: cmd1 | cmd2 * * Creates a pipe, forks twice, connects stdout of cmd1 * to stdin of cmd2. */int execute_pipeline(char **cmd1, char **cmd2) { int pipefd[2]; pid_t pid1, pid2; // Create the pipe if (pipe(pipefd) == -1) { perror("mysh: pipe"); return -1; } // Fork first child (left side of pipe) pid1 = fork(); if (pid1 == 0) { // Child 1: Writes to pipe close(pipefd[0]); // Close read end dup2(pipefd[1], STDOUT_FILENO); // stdout -> pipe close(pipefd[1]); // Close original write end execvp(cmd1[0], cmd1); perror("mysh: exec"); exit(EXIT_FAILURE); } // Fork second child (right side of pipe) pid2 = fork(); if (pid2 == 0) { // Child 2: Reads from pipe close(pipefd[1]); // Close write end dup2(pipefd[0], STDIN_FILENO); // stdin <- pipe close(pipefd[0]); // Close original read end execvp(cmd2[0], cmd2); perror("mysh: exec"); exit(EXIT_FAILURE); } // Parent: Close both ends and wait close(pipefd[0]); close(pipefd[1]); int status1, status2; waitpid(pid1, &status1, 0); waitpid(pid2, &status2, 0); return WEXITSTATUS(status2); // Return status of last command}You MUST close unused pipe ends in both parent and children. If the parent keeps the write end open, the reader will never receive EOF and will hang forever. This is the #1 bug in shell implementations.
A shell must handle signals properly to provide a good user experience:
The shell itself should ignore SIGINT and SIGTSTP, while foreground children inherit default signal behavior.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
#include <signal.h>#include <stdio.h>#include <unistd.h>#include <sys/wait.h> volatile sig_atomic_t child_terminated = 0; /** * SIGCHLD handler - called when child terminates */void sigchld_handler(int sig) { int saved_errno = errno; pid_t pid; int status; // Reap all terminated children (non-blocking) while ((pid = waitpid(-1, &status, WNOHANG)) > 0) { // Could log background job completion here child_terminated = 1; } errno = saved_errno;} /** * Setup shell signal handlers */void setup_signals(void) { struct sigaction sa; // Ignore SIGINT in shell (Ctrl+C) sa.sa_handler = SIG_IGN; sigemptyset(&sa.sa_mask); sa.sa_flags = 0; sigaction(SIGINT, &sa, NULL); // Ignore SIGTSTP in shell (Ctrl+Z) sigaction(SIGTSTP, &sa, NULL); // Handle SIGCHLD for background processes sa.sa_handler = sigchld_handler; sa.sa_flags = SA_RESTART | SA_NOCLDSTOP; sigaction(SIGCHLD, &sa, NULL);} /** * Reset signals to default before exec in child */void reset_signals(void) { signal(SIGINT, SIG_DFL); signal(SIGTSTP, SIG_DFL); signal(SIGCHLD, SIG_DFL);}You now have the foundation to build a fully functional Unix shell. Start simple, add features incrementally: first basic commands, then redirection, then pipes, then job control. Each addition deepens your understanding of OS concepts.