File System Vs Dbms - Learning Module

Loading content...

0/241

File-Based Data Management

The Dawn of Data Storage

Before the advent of sophisticated Database Management Systems, organizations relied on file-based data management to store, organize, and retrieve their critical information. This approach, which dominated the computing landscape from the 1950s through the early 1970s, represented the first systematic attempt to manage large volumes of data electronically.

Understanding file-based systems is not merely a historical exercise—it's essential for appreciating why modern database systems exist and what problems they were designed to solve. Every design decision in contemporary DBMS architecture reflects lessons learned from the limitations of file-based approaches.

What You Will Learn

By the end of this page, you will understand the fundamental architecture of file-based data management systems, how applications interacted with files to manage organizational data, the role of file organization techniques, and why this approach, despite its initial utility, eventually proved inadequate for growing organizational needs.

Historical Context: The Pre-Database Era

To fully appreciate file-based data management, we must understand the technological landscape in which it emerged. In the 1950s and 1960s, organizations faced a revolutionary challenge: they had acquired computers capable of processing vast amounts of data, but they lacked systematic methods to organize and retrieve that data efficiently.

The Pre-Electronic Era:

Before electronic computing, organizations maintained records through:

Paper-based filing systems — Physical documents organized in cabinets, folders, and ledgers
Punched card systems — Hollerith cards storing discrete records that could be sorted mechanically
Manual indexing — Human-maintained indexes and cross-references

These systems worked for their time but could not scale to meet the demands of growing enterprises. A large insurance company might maintain millions of policy records; a government agency might track tens of millions of citizens. Manual systems were slow, error-prone, and required armies of clerks.

The Punched Card Legacy

Herman Hollerith's punched card system, developed for the 1890 U.S. Census, processed data 10 times faster than manual methods. This system laid the conceptual foundation for file-based computing: data organized into discrete records with defined fields. Many early file system concepts—fixed-length records, sequential processing, batch operations—trace directly to punched card limitations.

The Transition to Electronic Storage:

As magnetic tape and later magnetic disk technology became available in the 1950s and 1960s, organizations gained the ability to store massive amounts of data electronically. However, they approached this new medium with familiar conceptual models:

Files as electronic filing cabinets — Each file representing a collection of related records
Records as electronic documents — Each record containing data about a single entity
Fields as data categories — Each field storing a specific piece of information

This conceptual mapping from physical filing systems to electronic files was intuitive but ultimately limiting. It carried forward assumptions about data isolation and ownership that would create significant problems as organizational data needs grew more complex.

Architecture of File-Based Systems

In a file-based data management system, the fundamental unit of data storage is the file—a collection of records stored on some storage medium. Each application that needs to access data maintains its own set of files, with the application code containing embedded logic for file access, data manipulation, and data validation.

Core Components:

File-Based System Architecture

•Application Programs — Custom-built programs written for specific business functions (payroll processing, inventory management, customer billing). Each program contains its own file handling logic.
•Data Files — Collections of records stored in formats specific to each application. File structures are defined within the application code itself.
•File Access Routines — Library functions provided by the operating system or language runtime for basic file operations (open, read, write, close).
•Operating System — Manages physical storage allocation, file naming, and basic access control but has no understanding of record structure or data semantics.

Converting Mermaid diagram...

Key Architectural Characteristic: Application-Data Coupling

The most significant architectural feature of file-based systems is the tight coupling between applications and data. Each application:

Defines its own file format — The structure of records, field layouts, and data types are embedded in the application code
Implements its own access logic — How to read, write, search, and update records is coded into each application
Manages its own data validation — Business rules and data integrity checks are scattered across application code
Controls its own files — Each department or application owns and controls access to its files

This tight coupling had important implications that we'll explore throughout this module.

File Organization Techniques

To enable efficient data access, file-based systems employed various file organization techniques. Each technique offered different tradeoffs between storage efficiency, access speed, update performance, and implementation complexity. Understanding these techniques reveals both the ingenuity of early data management and its fundamental limitations.

Sequential File Organization

Records are stored in a specific sequence, typically based on a primary key field. This is the simplest and oldest organization method, directly descended from punched card processing.

How It Works:

Records are written to the file in key order
To find a record, the system reads from the beginning until the desired record is found
New records must be inserted in the correct position (requiring file rewrite) or appended to an overflow area

Characteristics:

Operation	Performance	Explanation
Sequential Read	Excellent O(n)	Optimal for processing all records in order
Random Access	Poor O(n/2) avg	Must scan from beginning; no direct access
Insert	Very Poor O(n)	Requires rewriting entire file to maintain order
Delete	Poor O(n)	Typically mark as deleted; periodic reorganization needed
Update (non-key)	Moderate O(n/2)	Find record, update in place if same size

Best Use Case

Sequential organization excels in batch processing scenarios where all or most records are processed in order—payroll runs, end-of-day report generation, monthly billing cycles. It was the dominant approach when magnetic tape was the primary storage medium.

Programming with File-Based Systems

To understand the practical reality of file-based data management, let's examine how programmers actually worked with these systems. The code structure and programming paradigm differed fundamentally from modern database-driven applications.

The COBOL Era:

COBOL (Common Business-Oriented Language), introduced in 1959, became the dominant language for business file processing. Its design explicitly reflected file-based data management concepts:

employee_processing.cob
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
       IDENTIFICATION DIVISION.
       PROGRAM-ID. EMPLOYEE-REPORT.
       
       ENVIRONMENT DIVISION.
       INPUT-OUTPUT SECTION.
       FILE-CONTROL.
           SELECT EMPLOYEE-FILE ASSIGN TO "EMPFILE.DAT"
               ORGANIZATION IS INDEXED
               ACCESS MODE IS SEQUENTIAL
               RECORD KEY IS EMPLOYEE-ID
               FILE STATUS IS FILE-STATUS.
       
       DATA DIVISION.
       FILE SECTION.
       FD EMPLOYEE-FILE.
       01 EMPLOYEE-RECORD.
           05 EMPLOYEE-ID       PIC 9(6).
           05 EMPLOYEE-NAME     PIC X(30).
           05 DEPARTMENT        PIC X(20).
           05 SALARY            PIC 9(7)V99.
           05 HIRE-DATE.
               10 HIRE-YEAR     PIC 9(4).
               10 HIRE-MONTH    PIC 9(2).
               10 HIRE-DAY      PIC 9(2).
       
       WORKING-STORAGE SECTION.
       01 FILE-STATUS          PIC XX.
       01 TOTAL-SALARY         PIC 9(12)V99.
       01 EMPLOYEE-COUNT       PIC 9(6).
       
       PROCEDURE DIVISION.
           OPEN INPUT EMPLOYEE-FILE.
           PERFORM UNTIL FILE-STATUS = "10"
               READ EMPLOYEE-FILE
                   AT END MOVE "10" TO FILE-STATUS
                   NOT AT END
                       ADD SALARY TO TOTAL-SALARY
                       ADD 1 TO EMPLOYEE-COUNT
               END-READ
           END-PERFORM.
           CLOSE EMPLOYEE-FILE.
           STOP RUN.

Key Observations

Notice how the record structure (EMPLOYEE-RECORD) is defined directly in the application code. The file organization (INDEXED), access method (SEQUENTIAL), and physical file location (EMPFILE.DAT) are all specified in the program. Changing any aspect of data storage requires modifying and recompiling the application.

The Typical Development Workflow:

Developing a new business application in the file-based era involved:

•Requirements Analysis — Determine what data the application needs and what reports/transactions it must support
•File Design — Design the record layouts, field definitions, and file organization for each file the application will use
•Program Development — Write the application code with embedded file definitions and access logic
•Testing — Create test data files and verify correct operation
•Initial Data Load — Convert existing data (often from paper records or other systems) into the new file format
•Deployment — Install the application and its data files on the production system
•Ongoing Maintenance — Modify code whenever data requirements change; perform periodic file reorganization

Data Definition Embedded in Code:

The most consequential aspect of this approach was that data definitions lived inside application programs. Consider what happened when a business requirement changed—say, adding a 'job title' field to employee records:

Steps to Add a Single Field

•Modify every program that reads or writes the employee file to include the new field definition
•Recompile all affected programs
•Write a conversion program to read the old file format and write the new format
•Schedule downtime to run the conversion
•Deploy new programs and converted data simultaneously
•Update any documentation, JCL (Job Control Language) scripts, and operational procedures

The Maintenance Burden

In large organizations, a single data file might be accessed by dozens of programs. Adding one field could require modifying 20 or 30 programs—each modification introducing potential bugs. Studies from the 1970s found that maintenance (not new development) consumed 70-80% of programming effort in organizations using file-based systems.

Real-World File-Based Systems

To appreciate the scale and significance of file-based data management, let's examine how organizations actually deployed these systems in production environments.

Case Study: A 1970s Bank

A typical commercial bank in 1975 might operate the following file-based applications:

Banking System Files - Circa 1975
Department	Application	Primary Files	Records (approx.)
Retail Banking	Checking Accounts	CHECKACCT, CHECKTRANS	500,000 accounts
Retail Banking	Savings Accounts	SAVACCT, SAVTRANS	800,000 accounts
Loans	Consumer Loans	CONSLOAN, LOANPMT	100,000 loans
Loans	Mortgage Loans	MORTGAGE, MORTPMT, ESCROW	50,000 mortgages
Operations	Customer Master	CUSTMAST	1,000,000 customers
Operations	Branch Master	BRANCH, BRANCHSTAT	200 branches
HR	Employee Records	EMPLOYEE, PAYROLL, BENEFITS	5,000 employees
Accounting	General Ledger	GLACCTS, GLTRANS, GLBUDGET	10,000 accounts

The Duplication Problem Emerges:

Notice that multiple applications need customer information. The checking account system needs the customer's name and address. The savings account system needs the same information. The loan system needs it. The mortgage system needs it. Each system maintained its own version because:

Different departments developed their systems independently
Different programmers designed their own file structures
No mechanism existed to share data across applications
Each system needed to be self-contained for reliability

File Proliferation Example:

The same customer (John Smith, 123 Main St., Account #12345) appeared in:

•CHECKACCT as 'John Smith'
•SAVACCT as 'John A. Smith'
•CUSTMAST as 'J. Smith'
•CONSLOAN as 'John Smith'
•MORTGAGE as 'JOHN SMITH'

When John Moved:

Updating his address required:

•Updating CHECKACCT file
•Updating SAVACCT file
•Updating CUSTMAST file
•Updating CONSLOAN file
•Updating MORTGAGE file
•Possibly missing one or two...

The Inconsistency Nightmare

It was common for the same customer to have different addresses in different systems—sometimes for years. Statements went to wrong addresses. Collection notices reached customers who had already paid (to a different system). The bank appeared incompetent and unprofessional. This wasn't poor management; it was a fundamental limitation of the file-based architecture.

The File Processing Paradigm

File-based systems operated within a distinctive paradigm that shaped how organizations thought about and managed data. Understanding this paradigm helps explain why certain problems were endemic to the approach.

Batch Processing Dominance:

Most file processing was performed in batch mode—accumulated transactions processed together at scheduled times rather than individually in real-time. A typical batch processing cycle:

Converting Mermaid diagram...

The Master File / Transaction File Model:

This model was the backbone of file-based processing:

Master File — The authoritative record of all entities (customers, accounts, products). Updated periodically.
Transaction File — Records of events that change the master file (deposits, purchases, status changes). Accumulated during business operations.
Update Run — A program that reads both files and produces an updated master file.
Reports — Generated during or after the update run, reflecting the new state.

Why Batch Processing?

Batch processing wasn't just a design choice—it was a technological necessity. Early storage devices (especially magnetic tape) were sequential-access only. Random updates were physically impossible. Even with disk storage, interactive access for thousands of users was beyond system capabilities until the late 1970s.

Implications of Batch Processing:

•Data Currency Gap — Information was always stale. A balance inquiry showed yesterday's balance, not today's. Account status reflected last night's update.
•Error Discovery Delays — Data entry errors weren't detected until the batch run—sometimes days after the original transaction. Correction was complex.
•Limited Query Capability — Questions like 'How many customers in the Western region have balances over $10,000?' required writing a program, scheduling a run, and waiting for results.
•Overnight Windows — Batch runs often took 6-8 hours. If they failed mid-run, everything had to restart. Organizations lived in fear of 'long-running' conditions.
•Generation Data Sets — Organizations kept multiple generations of master files (father, grandfather, great-grandfather) for recovery. Enormous storage overhead.

Program-Data Dependence: The Core Problem

The most fundamental characteristic of file-based systems—and the root cause of most of their problems—is program-data dependence. This term describes the tight coupling between application programs and the structure of the data files they access.

Definition:

Program-Data Dependence occurs when knowledge of the data organization, format, and access method is embedded directly in application programs rather than abstracted into a separate data management layer.

This dependence manifested in several ways:

Manifestations of Program-Data Dependence

•Field Definitions in Code — Every program defined its own view of the record structure. Field names, types, sizes, and positions were coded directly.
•Access Method Logic in Code — How to navigate the file (sequential, indexed, direct) was programmed into each application.
•Physical Storage Knowledge — Programs often contained assumptions about disk geometry, block sizes, and buffer management.
•Change Propagation Requirements — Any modification to data structure required changes to every program that accessed the data.

A Concrete Example of the Problem:

Consider a simple change request: extend the Employee ID field from 6 digits to 8 digits to accommodate growth.

Impact of Employee ID Field Size Change
Affected Area	Required Work	Estimated Effort
Employee Master File	Redesign record layout; convert all records	2 days
Payroll Program	Modify field definition; retest calculations	1 day
Benefits Program	Modify field definition; retest processing	1 day
Reporting Programs (12)	Modify field definition in each	6 days
Data Entry Programs (4)	Modify screen layouts and validation	3 days
Interface to HR System	Modify export format and mapping	2 days
Historical Archives	Convert 5 years of archived files	1 day
Documentation Updates	Update all file layouts, data dictionary	2 days
Total	Simple field size change	18 days

The Ripple Effect

What should be a trivial change—adding two digits to a field—required nearly a person-month of effort. Worse, any missed program would fail when it encountered the new format. Many organizations simply avoided data structure changes because the cost and risk were too high.

Contrast with Data Independence:

The concept of data independence—separating data structure from application logic—became a central goal of database management system design. We'll explore this concept in detail when we examine DBMS advantages, but the key insight is simple:

If data definitions are stored separately from applications, changes to data structure don't require changes to application code.

This single insight drove the development of the entire database industry.

Summary: The File-Based Foundation

We've now established a solid understanding of file-based data management—the predecessor to modern database systems. Let's consolidate the key concepts:

Key Takeaways

•File-based systems evolved from physical filing concepts — Electronic files directly mirrored paper filing systems, inheriting both their organization and their limitations.
•Applications owned their data — Each department or application maintained its own files, with data definitions embedded in application code.
•Multiple file organization techniques existed — Sequential, indexed-sequential, direct, and relative organizations offered different performance tradeoffs for different access patterns.
•Batch processing dominated — Most data operations were accumulated and processed in overnight runs, creating inherent data currency gaps.
•Program-data dependence was the fundamental constraint — The tight coupling between applications and file structures made changes expensive and error-prone.
•These limitations drove DBMS development — Every major DBMS feature can be understood as a solution to a specific file-based system problem.

What's Next:

Now that we understand how file-based systems worked, we're prepared to examine why they failed as organizational data needs grew. The next page explores the specific limitations of file systems—the concrete problems that made database management systems necessary.

As you proceed, keep in mind: these aren't just historical curiosities. Many legacy systems still operate on file-based principles, and understanding their limitations helps you appreciate—and properly utilize—the capabilities of modern DBMS.

Page Complete

You now understand the architecture, operation, and fundamental characteristics of file-based data management systems. This foundation is essential for appreciating why Database Management Systems were developed and what problems they solve. Next, we'll examine the specific limitations that made file-based approaches untenable for modern data management needs.

File-Based Data Management

The Dawn of Data Storage

What You Will Learn

Historical Context: The Pre-Database Era

The Pre-Electronic Era:

Before electronic computing, organizations maintained records through:

Paper-based filing systems — Physical documents organized in cabinets, folders, and ledgers
Punched card systems — Hollerith cards storing discrete records that could be sorted mechanically
Manual indexing — Human-maintained indexes and cross-references

The Punched Card Legacy

The Transition to Electronic Storage:

Files as electronic filing cabinets — Each file representing a collection of related records
Records as electronic documents — Each record containing data about a single entity
Fields as data categories — Each field storing a specific piece of information

Architecture of File-Based Systems

Core Components:

File-Based System Architecture

•Application Programs — Custom-built programs written for specific business functions (payroll processing, inventory management, customer billing). Each program contains its own file handling logic.
•Data Files — Collections of records stored in formats specific to each application. File structures are defined within the application code itself.
•File Access Routines — Library functions provided by the operating system or language runtime for basic file operations (open, read, write, close).
•Operating System — Manages physical storage allocation, file naming, and basic access control but has no understanding of record structure or data semantics.

Converting Mermaid diagram...

Key Architectural Characteristic: Application-Data Coupling

The most significant architectural feature of file-based systems is the tight coupling between applications and data. Each application:

Defines its own file format — The structure of records, field layouts, and data types are embedded in the application code
Implements its own access logic — How to read, write, search, and update records is coded into each application
Manages its own data validation — Business rules and data integrity checks are scattered across application code
Controls its own files — Each department or application owns and controls access to its files

This tight coupling had important implications that we'll explore throughout this module.

File Organization Techniques

Sequential File Organization

Records are stored in a specific sequence, typically based on a primary key field. This is the simplest and oldest organization method, directly descended from punched card processing.

How It Works:

Records are written to the file in key order
To find a record, the system reads from the beginning until the desired record is found
New records must be inserted in the correct position (requiring file rewrite) or appended to an overflow area

Characteristics:

Operation	Performance	Explanation
Sequential Read	Excellent O(n)	Optimal for processing all records in order
Random Access	Poor O(n/2) avg	Must scan from beginning; no direct access
Insert	Very Poor O(n)	Requires rewriting entire file to maintain order
Delete	Poor O(n)	Typically mark as deleted; periodic reorganization needed
Update (non-key)	Moderate O(n/2)	Find record, update in place if same size

Best Use Case

Programming with File-Based Systems

The COBOL Era:

COBOL (Common Business-Oriented Language), introduced in 1959, became the dominant language for business file processing. Its design explicitly reflected file-based data management concepts:

employee_processing.cob
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
       IDENTIFICATION DIVISION.
       PROGRAM-ID. EMPLOYEE-REPORT.
       
       ENVIRONMENT DIVISION.
       INPUT-OUTPUT SECTION.
       FILE-CONTROL.
           SELECT EMPLOYEE-FILE ASSIGN TO "EMPFILE.DAT"
               ORGANIZATION IS INDEXED
               ACCESS MODE IS SEQUENTIAL
               RECORD KEY IS EMPLOYEE-ID
               FILE STATUS IS FILE-STATUS.
       
       DATA DIVISION.
       FILE SECTION.
       FD EMPLOYEE-FILE.
       01 EMPLOYEE-RECORD.
           05 EMPLOYEE-ID       PIC 9(6).
           05 EMPLOYEE-NAME     PIC X(30).
           05 DEPARTMENT        PIC X(20).
           05 SALARY            PIC 9(7)V99.
           05 HIRE-DATE.
               10 HIRE-YEAR     PIC 9(4).
               10 HIRE-MONTH    PIC 9(2).
               10 HIRE-DAY      PIC 9(2).
       
       WORKING-STORAGE SECTION.
       01 FILE-STATUS          PIC XX.
       01 TOTAL-SALARY         PIC 9(12)V99.
       01 EMPLOYEE-COUNT       PIC 9(6).
       
       PROCEDURE DIVISION.
           OPEN INPUT EMPLOYEE-FILE.
           PERFORM UNTIL FILE-STATUS = "10"
               READ EMPLOYEE-FILE
                   AT END MOVE "10" TO FILE-STATUS
                   NOT AT END
                       ADD SALARY TO TOTAL-SALARY
                       ADD 1 TO EMPLOYEE-COUNT
               END-READ
           END-PERFORM.
           CLOSE EMPLOYEE-FILE.
           STOP RUN.

Key Observations

The Typical Development Workflow:

Developing a new business application in the file-based era involved:

•Requirements Analysis — Determine what data the application needs and what reports/transactions it must support
•File Design — Design the record layouts, field definitions, and file organization for each file the application will use
•Program Development — Write the application code with embedded file definitions and access logic
•Testing — Create test data files and verify correct operation
•Initial Data Load — Convert existing data (often from paper records or other systems) into the new file format
•Deployment — Install the application and its data files on the production system
•Ongoing Maintenance — Modify code whenever data requirements change; perform periodic file reorganization

Data Definition Embedded in Code:

Steps to Add a Single Field

•Modify every program that reads or writes the employee file to include the new field definition
•Recompile all affected programs
•Write a conversion program to read the old file format and write the new format
•Schedule downtime to run the conversion
•Deploy new programs and converted data simultaneously
•Update any documentation, JCL (Job Control Language) scripts, and operational procedures

The Maintenance Burden

Real-World File-Based Systems

To appreciate the scale and significance of file-based data management, let's examine how organizations actually deployed these systems in production environments.

Case Study: A 1970s Bank

A typical commercial bank in 1975 might operate the following file-based applications:

Banking System Files - Circa 1975
Department	Application	Primary Files	Records (approx.)
Retail Banking	Checking Accounts	CHECKACCT, CHECKTRANS	500,000 accounts
Retail Banking	Savings Accounts	SAVACCT, SAVTRANS	800,000 accounts
Loans	Consumer Loans	CONSLOAN, LOANPMT	100,000 loans
Loans	Mortgage Loans	MORTGAGE, MORTPMT, ESCROW	50,000 mortgages
Operations	Customer Master	CUSTMAST	1,000,000 customers
Operations	Branch Master	BRANCH, BRANCHSTAT	200 branches
HR	Employee Records	EMPLOYEE, PAYROLL, BENEFITS	5,000 employees
Accounting	General Ledger	GLACCTS, GLTRANS, GLBUDGET	10,000 accounts

The Duplication Problem Emerges:

Different departments developed their systems independently
Different programmers designed their own file structures
No mechanism existed to share data across applications
Each system needed to be self-contained for reliability

File Proliferation Example:

The same customer (John Smith, 123 Main St., Account #12345) appeared in:

•CHECKACCT as 'John Smith'
•SAVACCT as 'John A. Smith'
•CUSTMAST as 'J. Smith'
•CONSLOAN as 'John Smith'
•MORTGAGE as 'JOHN SMITH'

When John Moved:

Updating his address required:

•Updating CHECKACCT file
•Updating SAVACCT file
•Updating CUSTMAST file
•Updating CONSLOAN file
•Updating MORTGAGE file
•Possibly missing one or two...

The Inconsistency Nightmare

The File Processing Paradigm

Batch Processing Dominance:

Most file processing was performed in batch mode—accumulated transactions processed together at scheduled times rather than individually in real-time. A typical batch processing cycle:

Converting Mermaid diagram...

The Master File / Transaction File Model:

This model was the backbone of file-based processing:

Master File — The authoritative record of all entities (customers, accounts, products). Updated periodically.
Transaction File — Records of events that change the master file (deposits, purchases, status changes). Accumulated during business operations.
Update Run — A program that reads both files and produces an updated master file.
Reports — Generated during or after the update run, reflecting the new state.

Why Batch Processing?

Implications of Batch Processing:

•Data Currency Gap — Information was always stale. A balance inquiry showed yesterday's balance, not today's. Account status reflected last night's update.
•Error Discovery Delays — Data entry errors weren't detected until the batch run—sometimes days after the original transaction. Correction was complex.
•Limited Query Capability — Questions like 'How many customers in the Western region have balances over $10,000?' required writing a program, scheduling a run, and waiting for results.
•Overnight Windows — Batch runs often took 6-8 hours. If they failed mid-run, everything had to restart. Organizations lived in fear of 'long-running' conditions.
•Generation Data Sets — Organizations kept multiple generations of master files (father, grandfather, great-grandfather) for recovery. Enormous storage overhead.

Program-Data Dependence: The Core Problem

Definition:

Program-Data Dependence occurs when knowledge of the data organization, format, and access method is embedded directly in application programs rather than abstracted into a separate data management layer.

This dependence manifested in several ways:

Manifestations of Program-Data Dependence

•Field Definitions in Code — Every program defined its own view of the record structure. Field names, types, sizes, and positions were coded directly.
•Access Method Logic in Code — How to navigate the file (sequential, indexed, direct) was programmed into each application.
•Physical Storage Knowledge — Programs often contained assumptions about disk geometry, block sizes, and buffer management.
•Change Propagation Requirements — Any modification to data structure required changes to every program that accessed the data.

A Concrete Example of the Problem:

Consider a simple change request: extend the Employee ID field from 6 digits to 8 digits to accommodate growth.

Impact of Employee ID Field Size Change
Affected Area	Required Work	Estimated Effort
Employee Master File	Redesign record layout; convert all records	2 days
Payroll Program	Modify field definition; retest calculations	1 day
Benefits Program	Modify field definition; retest processing	1 day
Reporting Programs (12)	Modify field definition in each	6 days
Data Entry Programs (4)	Modify screen layouts and validation	3 days
Interface to HR System	Modify export format and mapping	2 days
Historical Archives	Convert 5 years of archived files	1 day
Documentation Updates	Update all file layouts, data dictionary	2 days
Total	Simple field size change	18 days

The Ripple Effect

Contrast with Data Independence:

If data definitions are stored separately from applications, changes to data structure don't require changes to application code.

This single insight drove the development of the entire database industry.

Summary: The File-Based Foundation

We've now established a solid understanding of file-based data management—the predecessor to modern database systems. Let's consolidate the key concepts:

Key Takeaways

•File-based systems evolved from physical filing concepts — Electronic files directly mirrored paper filing systems, inheriting both their organization and their limitations.
•Applications owned their data — Each department or application maintained its own files, with data definitions embedded in application code.
•Multiple file organization techniques existed — Sequential, indexed-sequential, direct, and relative organizations offered different performance tradeoffs for different access patterns.
•Batch processing dominated — Most data operations were accumulated and processed in overnight runs, creating inherent data currency gaps.
•Program-data dependence was the fundamental constraint — The tight coupling between applications and file structures made changes expensive and error-prone.
•These limitations drove DBMS development — Every major DBMS feature can be understood as a solution to a specific file-based system problem.

What's Next:

Page Complete