Articles

Line-unwrap: in Perl

In Line wrap algorithm, Perl on October 10, 2011 by gcbenison Tagged:

As promised, in this post I will translate the line-unwrapping algorithm from Haskell to Perl.  (Why?  Not just as an academic exercise – there are many places where a Perl implementation might be wanted, such as in a CGI program.)

The problem was described here – plain text already containing line breaks, when pasted into a fixed-width container that also breaks lines, ends up with ugly jagged right edges.  What is needed is an algorithm to strip out newlines, but not all newlines: just those mid-paragraph, leaving paragraphs, lists, etc. unchanged.

The Haskell solution consisted of a function to decide whether each line should be merged with the next one, and then a “fold” operation to iterate over the input.  The perl version of the whether-to-merge function is a pretty straightforward translation of the Haskell version.  Iteration is a different matter: due to the lack of a “fold” function in the Perl language (but see note below), I relied on custom iteration using the list operator “<>”:

while (<>)
{
  my $this_line = $_;
  chomp $this_line;
  if ($sentinel)
  {
  if (($this_line eq "")
      || ($previous_line eq "")
      || ($this_line =~ /^[^A-Za-z0-9]/)
      || early_indent($previous_line, $this_line))
  { $result .= $previous_line . "\n"; }
  else {$result .= ($previous_line . " "); }
  }
  $previous_line = $this_line;
  $sentinel = 1;
}

Overall, the complete Perl solution is just as succinct as the Haskell one (both are 34 lines long) – however, I find the Haskell version more readable.  The use of custom iteration in the form of Perl’s <> operator with destructive updates is less readily discoverable than a standard “fold” operation.  Also, the strangeness with the “$sentinel” variable was necessary to prevent spurious newlines or spaces from showing up before the first line.  In contrast, the Haskell version resembles a straightforward statement of the problem.  And as is often the case, once the Haskell program compiled correctly, it also ran correctly, whereas with the Perl version I had to go through several edit/test/debug cycles to make it stop doing weird things.  Of course this reflects my relative skill level in the two languages, but it also reflects the advantages of pure functional programming.  I have known Perl a lot longer than I have known Haskell.

note: Since writing this Perl code, I’ve seen that it’s pretty straightforward to write or to find an implementation of ‘fold’ in Perl.  However, it’s definitely not the bread-and-butter of the language, and it’s worth noting that Perl implementations of ‘fold’ rely internally on destructive updates – whereas in Haskell, it’s not too hard to implement ‘fold’ from scratch in a purely functional way.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: