The document describes converting a text file into a CSV file with an additional column for variance. It provides Perl source code to open the source and target files, parse the source file line by line, extract the year and count values into columns, and calculate the variance between counts which is also added as a column in the target CSV file. The output file uses tabs as separators between columns and shows the first 10 lines as an example of the transformed flat database file with year, count, and variance columns.
Convert text file to CSV with additional column for variance
1. Converting a text file into a CSV file
with an additional column/field
The first 11 lines of source text file
look like this --> 1990
41502 Year
1991
The objective is to
transform the file into a flat
41820
database file containing Village
the following columns: 1992 Count
Year, Count, and 41876
Variance. Variance is the
difference between the 1993
count in a row and that in
the previous row. 41931
2. Perl Source Code
#!/usr/bin/perl
use strict;
my $year;
my $brgy_count_1 = 0;
my $brgy_count_2 = 0;
my $counter = 1;
my $first_line = 1;
open SOURCE_FILE, quot;/path/to/source_filequot;;
open TARGET_FILE, quot;>/path/to/target_filequot;;
3. Perl Source Code (2)
while (<SOURCE_FILE>) {
next if ($_ =~ m/^$/);
if ($counter == 1 or $counter == 3) {
$_ =~ m/(d{4})/;
print TARGET_FILE $1 . quot;tquot;;
$counter += 1;
} else {
$_ =~ m/(d+)$/;
if ($first_line == 0) {
$brgy_count_2 = $1;
}
print TARGET_FILE $1 . quot;tquot;.
($brgy_count_2 - $brgy_count_1) . quot;nquot;;
5. The new flat database file
The first 10 lines of output file
look like this --> 1990 41502 0
1991 41820 318
1992 41876 56
The output file uses a
1993 41931 55
tab as field/column 1994 41919 -12
separator. To use a 1995 41929 10
comma, just go to the 1996 41935 6
related code's line 1997 41939 4
and change the 1998 41940 1
separator from “t” to
“,”. 1999 41940 0
Village
Year Variance
Count