101 Logo
onenoughtone

Problem Statement

DNA Sequence Analysis

You're working as a bioinformatics researcher at a genomics lab. Your team is analyzing DNA sequences from multiple samples to identify common genetic markers that could indicate predisposition to certain conditions.

To streamline the analysis process, you need to develop a function that can identify the longest common prefix shared by a set of DNA sequences. This will help isolate potential genetic markers that appear consistently across multiple samples.

Your task is to write a function that takes an array of strings (representing DNA sequences) as input and returns the longest common prefix string. If there is no common prefix, return an empty string.

Examples

Example 1:

Input: ["ACGTGGT", "ACGTCAT", "ACGTTGA"]
Output: "ACGT"
Explanation: The first 4 characters "ACGT" are common to all three sequences.

Example 2:

Input: ["GAATTC", "GATTACA", "GATAGC"]
Output: "GA"
Explanation: Only the first 2 characters "GA" are common to all three sequences.

Example 3:

Input: ["TGCAA", "ATCGA", "CGTAT"]
Output: ""
Explanation: There is no common prefix among the three sequences.

Constraints

  • The array of strings contains between 1 and 200 sequences.
  • Each sequence consists of uppercase letters A, C, G, and T (representing the four nucleotides in DNA).
  • Each sequence length is between 1 and 200 characters.
  • If the array contains only one sequence, return that sequence as the common prefix.

Problem Breakdown

To solve this problem, we need to:

  1. The longest common prefix must be a prefix of each string in the array.
  2. We can start by assuming the first string is the common prefix, then iteratively reduce it by comparing with other strings.
  3. Alternatively, we can compare characters at the same position across all strings until we find a mismatch.
  4. The problem can also be approached by sorting the array first, then comparing only the first and last strings.
ProblemSolutionCode
101 Logo
onenoughtone